New link accuracy chart #1478

samnlindsay · 2023-07-27T15:00:25Z

Type of PR

BUG
FEAT
MAINT
DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Closes #1467

Give a brief description for the solution you have provided

New chart created using linker.accuracy_chart_from_labels_column("cluster"):

Added additional quantities to truth_space_table_from_labels_with_predictions_sqls():
- specificity (=tn_rate, aka selectivity)
- npv (negative predictive value)
- accuracy
- F2 ($F_2$ - favours recall)
- F0_5 ($F_{0.5}$ - favours precision)
- P4 ($P_4$ - harmonic mean of precision, recall, NPV and accuracy, like an extension of $F_1$)
New chart plotting precision, recall, F1, P4 and accuracy (with just precision and recall highlighted initially) vs match weight threshold.

Demo example (duckb/accuracy_analysis_from_labels_column.ipynb)

Real example with courts data

PR Checklist

Added documentation for changes
Added feature to example notebooks at tutorial in splink_demos (if appropriate)
Added tests (if appropriate)
Made changes based off the latest version of Splink
Run the linter

github-actions · 2023-07-27T15:01:46Z

Test: test_2_rounds_1k_duckdb

Percentage change: -8.6%

	date	time	stats_mean	stats_min	commit_info_branch	commit_info_id	machine_info_cpu_brand_raw	machine_info_cpu_hz_actual_friendly	commit_hash
849	2022-07-12	18:40:05	1.89098	1.87463	splink3	`c334bb9`	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	2.7934 GHz	`c334bb9`
1929	2023-08-09	18:50:44	1.73164	1.71324	(detached head)	`4bdc459`	Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz	2.0952 GHz	`4bdc459`

Test: test_2_rounds_1k_sqlite

Percentage change: 2.2%

	date	time	stats_mean	stats_min	commit_info_branch	commit_info_id	machine_info_cpu_brand_raw	machine_info_cpu_hz_actual_friendly	commit_hash
851	2022-07-12	18:40:05	4.32179	4.25898	splink3	`c334bb9`	Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz	2.7934 GHz	`c334bb9`
1931	2023-08-09	18:50:44	4.47221	4.35404	(detached head)	`4bdc459`	Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz	2.0952 GHz	`4bdc459`

Click here for vega lite time series charts

samnlindsay · 2023-07-27T15:02:30Z

To do:

Fix error when running locally (no problem in Vega Editor)
Allow user to specify metrics up front (e.g. linker.accuracy_chart_from_labels_column("cluster", metrics=["F1", "F2"]))
Decide on appropriate text/naming (I just threw some words down)
Add documention (incl. explainers for the different metrics)
Add to demo notebooks etc.

RobinL · 2023-07-27T15:07:20Z

That is one beautiful chart! Just in case this is of any relevance to the error you're seeing, here is an error I was getting with a valid spec (i.e. working in the vega-lite editor, not working with altair saving to png).

RossKen · 2023-07-27T15:54:27Z

This is really great - I love it! 😍

samnlindsay · 2023-07-27T16:20:58Z

The error is

Error rendering output item using 'jupyter-vega-renderer'
Duplicate signal name: "metric_point_tuple"

referring to

splink/splink/files/chart_defs/accuracy_chart.json

Lines 38 to 113 in 2385ca1

    
           { 
        
             "mark": { 
        
               "type": "point" 
        
             }, 
        
             "params": [ 
        
               { 
        
                 "name": "metric_point", 
        
                 "select": { 
        
                   "type": "point", 
        
                   "fields": [ 
        
                     "score" 
        
                   ] 
        
                 }, 
        
                 "value": [ 
        
                   { 
        
                     "score": "precision" 
        
                   }, 
        
                   { 
        
                     "score": "recall" 
        
                   } 
        
                 ], 
        
                 "bind": "legend" 
        
               }, 
        
               { 
        
                 "name": "point_selection", 
        
                 "select": { 
        
                   "type": "point", 
        
                   "fields": [ 
        
                     "truth_threshold" 
        
                   ], 
        
                   "encodings": [ 
        
                     "x" 
        
                   ], 
        
                   "nearest": true, 
        
                   "on": "mouseover" 
        
                 } 
        
               } 
        
             ], 
        
             "transform": [ 
        
               { 
        
                 "filter": { 
        
                   "param": "metric_point", 
        
                   "empty": true 
        
                 } 
        
               } 
        
             ], 
        
             "encoding": { 
        
               "opacity": { 
        
                 "condition": { 
        
                   "param": "point_selection", 
        
                   "value": 1, 
        
                   "empty": false 
        
                 }, 
        
                 "value": 0 
        
               }, 
        
               "x": { 
        
                 "field": "truth_threshold", 
        
                 "type": "quantitative" 
        
               }, 
        
               "y": { 
        
                 "field": "y", 
        
                 "type": "quantitative" 
        
               }, 
        
               "color": { 
        
                 "field": "score", 
        
                 "type": "nominal", 
        
                 "sort": [ 
        
                   "precision", 
        
                   "recall", 
        
                   "F1", 
        
                   "P4", 
        
                   "accuracy" 
        
                 ] 
        
               } 
        
             } 
        
           },

where the "metric_point" param is defined and used (in the transform -> filter) exclusively within the point layer of the chart 🤷

RobinL · 2023-07-27T19:14:15Z

Does it save out and load correctly if rather than rendering in jlab you do a chart.save("chart.png") or chart.save("chart.html")?

aalexandersson · 2023-07-29T14:28:47Z

Is it feasible to add Matthews' correlation coefficient (MCC)? I recommend MCC as the standard binary classification metric because when MCC is high, each of the four basic rates of a confusion matrix is high, without exception.

Background context: My feature request for MCC in fastLink
New reference: Chicco, D. and Jurman, G.. 2023. The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification. BioData Mining 16 (4), 1-23. URL: https://doi.org/10.1186/s13040-023-00322-4

Update: In case it could help, MCC is implemented in Python in the function matthews_corrcoef() in the library sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.matthews_corrcoef.html
Here is a blog: https://www.statology.org/matthews-correlation-coefficient-python/
Also, I too really, really like this chart!

samnlindsay · 2023-07-31T14:45:47Z

@aalexandersson I have added MCC ("phi") as an option (see docstring):

Example

linker.accuracy_chart_from_labels_column("ground_truth", add_metrics=["phi", "f1", "accuracy", "specificity"])

produces the following:

aalexandersson · 2023-07-31T16:23:03Z

This is great, thanks a lot for adding phi (MCC)!

Suggestion for very minor labeling improvements in the graph, to be consistent:

Add "(MCC)" as in "phi (MCC) = 0.856". -- Alternatively, remove "(TPR)" in "Recall (TPR) = 0.922"
Metric: "Precision (PPV)" instead of "Precision".

samnlindsay · 2023-07-31T16:58:40Z

Thank you for spotting the copy-paste error. That's nudged me to tidy up the labels for consistency and ease of editing in future.

mapping = {
        'precision': 'Precision (PPV)', 
        'recall': 'Recall (TPR)', 
        'specificity': 'Specificity (TNR)', 
        'accuracy': 'Accuracy', 
        'npv':'NPV', 
        'f1': 'F1', 
        'f2':'F2', 
        'f0_5':'F0.5', 
        'p4':'P4', 
        'phi':'\u03C6 (MCC)'
    }

aalexandersson · 2023-07-31T18:41:35Z

Thank you, the new chart looks great to me. I wonder though, how to best use this new chart in a splink workflow -- with and without a clerical review?

In the example, where does "ground_truth" come from? How do I create the ground truth?linker.accuracy_chart_from_labels_column("ground_truth", add_metrics=["phi", "f1", "accuracy", "specificity"])

The new linker function seems similar to truth_space_table_from_labels_column() but it begs the same question.

Update: The documentation in splink/linker.py is useful but it would be nice to have a worked example of "ground truth" somewhere since accuracy (linkage quality) metrics are critical.

Update 2: Where will the new chart fit in Robin's demo workflow? Is that demo example with variable truth_status an example of "ground truth" or something else?

RossKen · 2023-07-31T21:21:19Z

Is there a reason why this was closed? 🤔

RobinL · 2023-08-01T07:23:23Z

Sorry! Must have clicked something by accident!

RobinL · 2023-08-01T07:36:41Z

@aalexandersson yes, to use these charts you need labelled data from clerical review. You can't produce these charts without clerical review or some other source of labels

Specifically, you need a dataset that has pairwise unique IDs and the clerical score like:

source_dataset_l	unique_id_l	source_dataset_r	unique_id_r	clerical_match_score
df_1	1	df_2	2	0.99
df_1	1	df_2	3	0.2

The purpose of the new beta clerical labelling tool is to more easily produce data in this format.

Occasionally, you may have a fully labelled data. This is most commonly the case if you've generated synthetic data to benchmark/test matching algorithms.

In this case, the input dataset contains 'the answer', i.e. the cluster to which the record belongs exists in the source dataset. Here's an example where the cluster is in a column called 'group':
https://github.com/moj-analytical-services/splink/blob/master/tests/datasets/fake_1000_from_splink_demos.csv

In that case, it would be silly/inefficient to need to convert this into a large list of pairwise labels, so we allow the accuracy charts to run off this column (referred to as a labels column) e.g. the roc_chart_from_labels_column function

In the demo workflow, the labelled data is produced between the step
'Write a labelling dashboard for a random sample of input records'
and
'Read in the labels and perform various types of accuracy analysis'

i.e. it is assumed that the user has used the labelling dashboard to produce the pairwise clerical labels, and then read them back in

splink/linker.py

splink/accuracy.py

ThomasHepworth

This is fantastic! Just one small issue and I'm happy to approve - #1478 (comment)

ThomasHepworth · 2023-08-03T21:15:13Z

The follow clauses are failing in postgres at present:

cast(TN as float)/(TN+FN) as npv,

(2 * precision * recall)/(precision + recall) as f1

(5 * precision * recall)/(4 * precision + recall) as f2

(1.25 * precision * recall)/(0.25 * precision + recall) as f0_5

4/((1./precision)+(1./recall)+(1./specificity)+(1./npv)) as p4

cast((TP*TN)-(FP*FN) as float)/sqrt((TP+FP)*P*N*(TN+FN)) as phi

I haven't looked into why yet.

samnlindsay · 2023-08-08T09:56:03Z

If we want some further documentation around what these charts show and how to interpret them, this document may be useful for pinching content for a topic guide or some other documentation.

ThomasHepworth · 2023-08-09T09:42:50Z

Have you had a chance to look at why this is failing in postgres?

ThomasHepworth · 2023-08-09T09:48:55Z

splink/linker.py

+                allowed,
+            )
+
+        # Silently filter out invalid entries (except case errors - e.g. ["NPV", "F1"])


Do we want to throw a warning to the user that an entry is invalid? Happy to go with the current version if you're content.

Yeah, happy for it to quietly ignore stupid entries

splink/linker.py

ThomasHepworth · 2023-08-09T10:10:46Z

Can you add the new charts as integration tests to the full_example scripts.

samnlindsay · 2023-08-09T12:05:40Z

The only test that was failing was SQLite for some reason (see 5463e41 for fix)

All (strictly necessary) changes made now 🤞

samnlindsay · 2023-08-09T14:28:29Z

CI test (SQLite) failing with the following error: sqlite3.OperationalError: no such function: sqrt

Erm...

…rvices/splink into accuracy_chart

RobinL · 2023-08-09T17:21:57Z

CI test (SQLite) failing with the following error: sqlite3.OperationalError: no such function: sqrt

Erm...

Sqlite can be annoying: "The math functions shown below are part of the SQLite amalgamation source file but are only active if the amalgamation is compiled using the -DSQLITE_ENABLE_MATH_FUNCTIONS compile-time option.”

I would just register sqrt in python as a sqlite udf to make the tests pass

samnlindsay · 2023-08-09T18:57:25Z

Thanks Robin, already done that in c21ceb3

Postgres tests now passing after covering off some division by zero issues not picked up by the local tests 🤔

ThomasHepworth

Thanks Sam, this is great!

samnlindsay added 2 commits July 27, 2023 15:24

New chart

2c82e65

lint with black

2385ca1

samnlindsay changed the title ~~[WIP] New chart~~ [WIP] New link accuracy chart Jul 27, 2023

samnlindsay added 2 commits July 31, 2023 14:36

Working with altair

aacaca5

lint with black

2045805

samnlindsay changed the title ~~[WIP] New link accuracy chart~~ New link accuracy chart Jul 31, 2023

samnlindsay added 2 commits July 31, 2023 15:40

Improve docstring

004d3f9

lint with black

2596815

samnlindsay requested a review from ThomasHepworth July 31, 2023 14:46

samnlindsay added 2 commits July 31, 2023 17:56

Tidy up metric labels

14be718

lint with black

f40dccf

RobinL closed this Jul 31, 2023

RobinL reopened this Aug 1, 2023

lint

eda9c4c

ThomasHepworth reviewed Aug 3, 2023

View reviewed changes

splink/linker.py Show resolved Hide resolved

ThomasHepworth reviewed Aug 3, 2023

View reviewed changes

splink/linker.py Show resolved Hide resolved

ThomasHepworth reviewed Aug 3, 2023

View reviewed changes

splink/accuracy.py Outdated Show resolved Hide resolved

ThomasHepworth requested changes Aug 3, 2023

View reviewed changes

samnlindsay added 2 commits August 3, 2023 23:53

Check add_metrics arg

84b1551

lint with black

4637a65

samnlindsay requested a review from ThomasHepworth August 8, 2023 11:05

ThomasHepworth reviewed Aug 9, 2023

View reviewed changes

splink/linker.py Show resolved Hide resolved

samnlindsay added 2 commits August 9, 2023 12:26

Add to integration tests

85b81b8

SQLite test fail fix

5463e41

samnlindsay added 5 commits August 9, 2023 15:36

Add sqrt func to sqlite test

c21ceb3

lint with black

74e8691

Redefine metrics in terms of fp/fn/tp/tn

d79407b

Merge branch 'accuracy_chart' of https://github.com/moj-analytical-se…

ecc668d

…rvices/splink into accuracy_chart

lint

650a713

samnlindsay added 4 commits August 9, 2023 19:28

Avoid div by zero

37cf209

lint

46f05b1

fix case statement

be34667

more div by zero

081c421

ThomasHepworth approved these changes Aug 10, 2023

View reviewed changes

samnlindsay merged commit d64dcd5 into master Aug 10, 2023
11 checks passed

samnlindsay deleted the accuracy_chart branch August 10, 2023 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New link accuracy chart #1478

New link accuracy chart #1478

samnlindsay commented Jul 27, 2023 •

edited

github-actions bot commented Jul 27, 2023 •

edited

samnlindsay commented Jul 27, 2023 •

edited

RobinL commented Jul 27, 2023

RossKen commented Jul 27, 2023

samnlindsay commented Jul 27, 2023 •

edited

RobinL commented Jul 27, 2023

aalexandersson commented Jul 29, 2023 •

edited

samnlindsay commented Jul 31, 2023

aalexandersson commented Jul 31, 2023

samnlindsay commented Jul 31, 2023

aalexandersson commented Jul 31, 2023 •

edited

RossKen commented Jul 31, 2023

RobinL commented Aug 1, 2023

RobinL commented Aug 1, 2023 •

edited

ThomasHepworth left a comment

ThomasHepworth commented Aug 3, 2023

samnlindsay commented Aug 8, 2023

ThomasHepworth commented Aug 9, 2023

ThomasHepworth Aug 9, 2023

samnlindsay Aug 9, 2023

ThomasHepworth commented Aug 9, 2023

samnlindsay commented Aug 9, 2023

samnlindsay commented Aug 9, 2023

RobinL commented Aug 9, 2023 •

edited

samnlindsay commented Aug 9, 2023

ThomasHepworth left a comment

New link accuracy chart #1478

New link accuracy chart #1478

Conversation

samnlindsay commented Jul 27, 2023 • edited

Type of PR

Is your Pull Request linked to an existing Issue or Pull Request?

Give a brief description for the solution you have provided

Demo example (duckb/accuracy_analysis_from_labels_column.ipynb)

Real example with courts data

PR Checklist

github-actions bot commented Jul 27, 2023 • edited

Test: test_2_rounds_1k_duckdb

Test: test_2_rounds_1k_sqlite

samnlindsay commented Jul 27, 2023 • edited

RobinL commented Jul 27, 2023

RossKen commented Jul 27, 2023

samnlindsay commented Jul 27, 2023 • edited

RobinL commented Jul 27, 2023

aalexandersson commented Jul 29, 2023 • edited

samnlindsay commented Jul 31, 2023

Example

aalexandersson commented Jul 31, 2023

samnlindsay commented Jul 31, 2023

aalexandersson commented Jul 31, 2023 • edited

RossKen commented Jul 31, 2023

RobinL commented Aug 1, 2023

RobinL commented Aug 1, 2023 • edited

ThomasHepworth left a comment

Choose a reason for hiding this comment

ThomasHepworth commented Aug 3, 2023

samnlindsay commented Aug 8, 2023

ThomasHepworth commented Aug 9, 2023

ThomasHepworth Aug 9, 2023

Choose a reason for hiding this comment

samnlindsay Aug 9, 2023

Choose a reason for hiding this comment

ThomasHepworth commented Aug 9, 2023

samnlindsay commented Aug 9, 2023

samnlindsay commented Aug 9, 2023

RobinL commented Aug 9, 2023 • edited

samnlindsay commented Aug 9, 2023

ThomasHepworth left a comment

Choose a reason for hiding this comment

samnlindsay commented Jul 27, 2023 •

edited

github-actions bot commented Jul 27, 2023 •

edited

samnlindsay commented Jul 27, 2023 •

edited

samnlindsay commented Jul 27, 2023 •

edited

aalexandersson commented Jul 29, 2023 •

edited

aalexandersson commented Jul 31, 2023 •

edited

RobinL commented Aug 1, 2023 •

edited

RobinL commented Aug 9, 2023 •

edited