New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New link accuracy chart #1478
New link accuracy chart #1478
Conversation
Test: test_2_rounds_1k_duckdbPercentage change: -8.6%
Test: test_2_rounds_1k_sqlitePercentage change: 2.2%
Click here for vega lite time series charts |
To do:
|
That is one beautiful chart! Just in case this is of any relevance to the error you're seeing, here is an error I was getting with a valid spec (i.e. working in the vega-lite editor, not working with altair saving to png). |
This is really great - I love it! 😍 |
The error is
referring to splink/splink/files/chart_defs/accuracy_chart.json Lines 38 to 113 in 2385ca1
where the "metric_point" param is defined and used (in the transform -> filter) exclusively within the point layer of the chart 🤷 |
Does it save out and load correctly if rather than rendering in jlab you do a |
Is it feasible to add Matthews' correlation coefficient (MCC)? I recommend MCC as the standard binary classification metric because when MCC is high, each of the four basic rates of a confusion matrix is high, without exception. Background context: My feature request for MCC in fastLink Update: In case it could help, MCC is implemented in Python in the function |
@aalexandersson I have added MCC ("phi") as an option (see docstring): Examplelinker.accuracy_chart_from_labels_column("ground_truth", add_metrics=["phi", "f1", "accuracy", "specificity"]) |
This is great, thanks a lot for adding phi (MCC)! Suggestion for very minor labeling improvements in the graph, to be consistent:
|
Thank you for spotting the copy-paste error. That's nudged me to tidy up the labels for consistency and ease of editing in future.
|
Thank you, the new chart looks great to me. I wonder though, how to best use this new chart in a In the example, where does "ground_truth" come from? How do I create the ground truth? The new linker function seems similar to truth_space_table_from_labels_column() but it begs the same question. Update: The documentation in splink/linker.py is useful but it would be nice to have a worked example of "ground truth" somewhere since accuracy (linkage quality) metrics are critical. Update 2: Where will the new chart fit in Robin's demo workflow? Is that demo example with variable |
Is there a reason why this was closed? 🤔 |
Sorry! Must have clicked something by accident! |
@aalexandersson yes, to use these charts you need labelled data from clerical review. You can't produce these charts without clerical review or some other source of labels Specifically, you need a dataset that has pairwise unique IDs and the clerical score like:
The purpose of the new beta clerical labelling tool is to more easily produce data in this format. Occasionally, you may have a fully labelled data. This is most commonly the case if you've generated synthetic data to benchmark/test matching algorithms. In this case, the input dataset contains 'the answer', i.e. the cluster to which the record belongs exists in the source dataset. Here's an example where the cluster is in a column called 'group': In that case, it would be silly/inefficient to need to convert this into a large list of pairwise labels, so we allow the accuracy charts to run off this column (referred to as a labels column) e.g. the In the demo workflow, the labelled data is produced between the step i.e. it is assumed that the user has used the labelling dashboard to produce the pairwise clerical labels, and then read them back in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic! Just one small issue and I'm happy to approve - #1478 (comment)
The follow clauses are failing in postgres at present:
I haven't looked into why yet. |
If we want some further documentation around what these charts show and how to interpret them, this document may be useful for pinching content for a topic guide or some other documentation. |
Have you had a chance to look at why this is failing in postgres? |
allowed, | ||
) | ||
|
||
# Silently filter out invalid entries (except case errors - e.g. ["NPV", "F1"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to throw a warning to the user that an entry is invalid? Happy to go with the current version if you're content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, happy for it to quietly ignore stupid entries
Can you add the new charts as integration tests to the full_example scripts. |
The only test that was failing was SQLite for some reason (see 5463e41 for fix) All (strictly necessary) changes made now 🤞 |
CI test (SQLite) failing with the following error: |
Sqlite can be annoying: "The math functions shown below are part of the SQLite amalgamation source file but are only active if the amalgamation is compiled using the -DSQLITE_ENABLE_MATH_FUNCTIONS compile-time option.” I would just register sqrt in python as a sqlite udf to make the tests pass |
Thanks Robin, already done that in c21ceb3 Postgres tests now passing after covering off some division by zero issues not picked up by the local tests 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sam, this is great!
Type of PR
Is your Pull Request linked to an existing Issue or Pull Request?
Closes #1467
Give a brief description for the solution you have provided
New chart created using
linker.accuracy_chart_from_labels_column("cluster")
:truth_space_table_from_labels_with_predictions_sqls()
:specificity
(=tn_rate
, aka selectivity)npv
(negative predictive value)accuracy
F2
(F0_5
(P4
(Demo example (duckb/accuracy_analysis_from_labels_column.ipynb)
Real example with courts data
PR Checklist