Threshold selection tool #2003

samnlindsay · 2024-02-27T12:52:08Z

Type of PR

BUG
FEAT
MAINT
DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Relevant to #1564 in that it ties together several different charts into a single tool.

Creating draft PR while I gather feedback and then I'll add docs etc.

Do we want to replace the individual charts that this combines? Or add this as another function? Only it duplicates a lot of code and it's harder to maintain all of them, especially if they're not being used.

@ThomasHepworth FYI

Give a brief description for the solution you have provided

Combined interactive chart to:

relate match weight threshold to a probability
show the confusion matrix as a function of this threshold
show the impact of threshold selection on performance metrics to aid threshold selection

Screenshot (using pairwise_labels.ipynb)

PR Checklist

Added documentation for changes
Added feature to example notebooks or tutorial (if appropriate)
Added tests (if appropriate)
Updated CHANGELOG.md (if appropriate)
Made changes based off the latest version of Splink
Run the linter

ThomasHepworth · 2024-02-27T15:55:20Z

Yah, really like this and it's essentially the direction I was going in.

I was leaning more towards simply combining the confusion matrix with the accuracy chart to reduce the amount of information on a user's screen:

(Ignore the lack of data in the accuracy chart, it was designed to simplify the json in the editor)

I can see the merit of showing the sigmoid'd match weight chart, so I am very much indifferent on this one.

I'll leave some additional thoughts tomorrow morning when I'm feeling a tad better.

RobinL · 2024-02-27T17:14:06Z

I don't feel at all strongly about it but I think I agree with Tom that the sigmoid doesn't add that much, especially if we used the waterfall trick of having two y axis (in this case it'd be the x axis) that map match weight and match probability on the main precision recall chart.

samnlindsay · 2024-02-27T20:39:53Z

Thanks both. I agree that from a purely information perspective the sigmoid doesn't add anything, but I think it helps make the information more accessible by facilitating different approaches to it. Whether you want to wiggle your cursor up and down the match probability axis or left and right across the match weight axis, it works equally well, while also literally drawing your attention to how your preferred measure maps onto the other.

For example:

No intuition, naive approach - "match probability = 0.5 is the most sensible threshold right?".
Partially informed approach - "The data linking team said cluster_medium was a threshold of 0.99. Is that really that different from 0.95?"
- If what you know is probabilities, it's easier to navigate these charts using the 0-1 y-axis of the sigmoid.
Maximum insight in minimal time approach - "What's the best this model can do? [goes to intersection of precision and recall curves - sees the sigmoid curve highlighted] Wow, that's pretty low!"
- The visual element of seeing the probability close to 0 rather than 1 (for me anyway) makes the information much easier to digest than having to read off a weird probability scale like on the waterfall chart, or reading a value off a tooltip.

I've never liked that axis on the waterfall chart, but that's a rare example of something that only works in a match weight context, so the funky axis is the only way to visually cater for match probability.

samnlindsay · 2024-02-27T20:59:32Z

For comparison, here's how it might look without the sigmoid

aalexandersson · 2024-02-28T17:10:00Z

Is "score" a match probability? If yes, then it seems possible to optionally add back the sigmoid but still only use two charts (that is the confusion matrix plus one combined chart)!?

In the "Score" chart, the y-axis begins at 0.5. How does it look if it begins at 0? For titles, I suggest the y-axis title "Match probability (Score)" and the x-axis title "Match weight". Then, how does it look like to add (overlay) the sigmoid curve on this chart?

RobinL · 2024-02-28T17:28:26Z

I think the y axis 'score' refers to the 'precision or recall score'. (just 'score' because it's used for both metrics). But can see that's a potential source of confusion and perhaps we could find a better name ('performance metric score' perhaps?)

aalexandersson · 2024-02-28T19:24:47Z

The name 'performance metric score' is more general than 'precision or recall score', so better in that sense. My suggestion is to overlay (superimpose) the 'sigmoid' so that the result is one combined chart with two chart areas, not three chart areas. This way, we do require another chart area for the sigmoid.

samnlindsay · 2024-02-28T20:00:36Z

I experimented with being able to zoom in on the y ("Score") axis because the [0,1] range was too big to be useful, but I gave up because it was too difficult to constrain it the way I wanted (with 1 always the maximum).

The reason for the compromise shown here is that if precision/recall/F1 are below 0.5 your model isn't worth pursuing with. The area of interest is loosely described as "close to 1" in the score axis. In the example shown, the ideal range would be something like [0.85,1], and a [0,1] would contain a lot of whitespace and make it difficult to see the detail where needed.

Here's an example showing a less ambiguous y-axis title and with the full [0,1] range:

aalexandersson · 2024-02-29T16:15:35Z

Well, I think this latest combined chart is easier to interpret thanks to the full [0,1] range in the y-axis. For completeness, could you add back the sigmoid chart? That is, I would like to see the first chart but with modification of the full [0,1] range in the y-axis as in the latest chart.

samnlindsay · 2024-02-29T18:19:51Z

Following this discussion I have edited the chart as follows:

Given the "action" is all at the top end of the "Performance metric value" scale, I've left it at [0.5,1] to show more detail in the data.

RobinL · 2024-03-02T19:54:16Z

Should we remove linker.confusion_matrix_from_labels_table i.e. this
seems superfluous to have both ?

Given the functionality is so similar I wouldn't be worried about breaking backwards compatibility

samnlindsay · 2024-03-04T11:56:29Z

Yes, I was thinking about that in #2010

RobinL · 2024-03-05T16:59:55Z

@samnlindsay i edited the above because you referenced 2020, but i think you probably meant 2010

samnlindsay · 2024-03-05T17:35:36Z

Yes sorry I was on my phone

RobinL · 2024-03-05T19:03:54Z

No worries, just didn't want to edit without you knowing in case i did it wrong!

RobinL

The new code looks good to me.

Please can we remove linker.confusion_matrix_from_labels_table and associated code/docs as part of the PR

docs/charts/threshold_selection_tool_from_labels_table.ipynb

splink/linker.py

Co-authored-by: Robin Linacre <robin.linacre@digital.justice.gov.uk>

samnlindsay · 2024-03-12T08:44:07Z

confusion_matrix_from_labels_XXX functions and references to them have been removed.

The one exception is the blog post announcing the confusion matrix chart as a new feature. The image shown is unchanged but it now links to the new threshold selection tool notebook instead of the confusion matrix one.

ThomasHepworth · 2024-03-14T16:58:34Z

splink/linker.py

-                linker.confusion_matrix_from_labels_table("labels")
-                ```
+            ```py
+            linker.accuracy_chart_from_labels_column("ground_truth", add_metrics=["f1"])


I think this may need to be changed to linker.threshold_selection_tool_from_labels_table(...)

ThomasHepworth · 2024-03-14T17:03:57Z

Sorry, a quick Q on this - should we set a default value for the match threshold selection?

Initially, the chart currently looks like so (at least for me):

where the confusion matrix doesn't appear until you hover over one of the graphs.

aalexandersson · 2024-03-14T18:35:40Z

I definitely would like to see the confusion matrix by default. It does not help to hover over one of the graphs when the output is PDF :-)

samnlindsay added 2 commits February 27, 2024 12:35

Add new chart

b8ef8bd

Disable multiple threshold selection

1c2c96a

Chart edits

946917a

samnlindsay force-pushed the threshold_selector branch from 42349bd to 946917a Compare February 29, 2024 16:06

samnlindsay mentioned this pull request Feb 29, 2024

[FEAT] Duplicated code in linker.X_from_labels_Y() #2010

Open

Add notebook for charts gallery

d123930

samnlindsay force-pushed the threshold_selector branch from c57830b to d123930 Compare February 29, 2024 17:54

lint with black

e7d1a67

Merge branch 'master' into threshold_selector

65f3a78

samnlindsay marked this pull request as ready for review February 29, 2024 18:20

RobinL reviewed Mar 7, 2024

View reviewed changes

docs/charts/threshold_selection_tool_from_labels_table.ipynb Outdated Show resolved Hide resolved

splink/linker.py Outdated Show resolved Hide resolved

splink/linker.py Outdated Show resolved Hide resolved

samnlindsay and others added 4 commits March 11, 2024 16:30

Update splink/linker.py

94b4e15

Co-authored-by: Robin Linacre <robin.linacre@digital.justice.gov.uk>

Update splink/linker.py

1f0e044

Co-authored-by: Robin Linacre <robin.linacre@digital.justice.gov.uk>

Remove confusion matrix chart

b151ebb

lint with black

429f925

samnlindsay added 3 commits March 11, 2024 21:09

Merge branch 'master' into threshold_selector

ac9c8b1

Replace chart screenshot

09b5f2c

Edit blog post

f74e2e1

samnlindsay requested a review from RobinL March 11, 2024 21:54

Merge branch 'master' into threshold_selector

4acb56d

RobinL approved these changes Mar 13, 2024

View reviewed changes

samnlindsay merged commit 8b063f0 into master Mar 13, 2024
13 checks passed

samnlindsay deleted the threshold_selector branch March 13, 2024 15:23

ThomasHepworth reviewed Mar 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threshold selection tool #2003

Threshold selection tool #2003

samnlindsay commented Feb 27, 2024 •

edited

ThomasHepworth commented Feb 27, 2024

RobinL commented Feb 27, 2024

samnlindsay commented Feb 27, 2024 •

edited

samnlindsay commented Feb 27, 2024 •

edited

aalexandersson commented Feb 28, 2024 •

edited

RobinL commented Feb 28, 2024 •

edited

aalexandersson commented Feb 28, 2024

samnlindsay commented Feb 28, 2024 •

edited

aalexandersson commented Feb 29, 2024 •

edited

samnlindsay commented Feb 29, 2024

RobinL commented Mar 2, 2024 •

edited

samnlindsay commented Mar 4, 2024 •

edited by RobinL

RobinL commented Mar 5, 2024

samnlindsay commented Mar 5, 2024

RobinL commented Mar 5, 2024

RobinL left a comment

samnlindsay commented Mar 12, 2024

ThomasHepworth Mar 14, 2024

ThomasHepworth commented Mar 14, 2024

aalexandersson commented Mar 14, 2024

Threshold selection tool #2003

Threshold selection tool #2003

Conversation

samnlindsay commented Feb 27, 2024 • edited

Type of PR

Is your Pull Request linked to an existing Issue or Pull Request?

Give a brief description for the solution you have provided

Screenshot (using pairwise_labels.ipynb)

PR Checklist

ThomasHepworth commented Feb 27, 2024

RobinL commented Feb 27, 2024

samnlindsay commented Feb 27, 2024 • edited

samnlindsay commented Feb 27, 2024 • edited

aalexandersson commented Feb 28, 2024 • edited

RobinL commented Feb 28, 2024 • edited

aalexandersson commented Feb 28, 2024

samnlindsay commented Feb 28, 2024 • edited

aalexandersson commented Feb 29, 2024 • edited

samnlindsay commented Feb 29, 2024

RobinL commented Mar 2, 2024 • edited

samnlindsay commented Mar 4, 2024 • edited by RobinL

RobinL commented Mar 5, 2024

samnlindsay commented Mar 5, 2024

RobinL commented Mar 5, 2024

RobinL left a comment

Choose a reason for hiding this comment

samnlindsay commented Mar 12, 2024

ThomasHepworth Mar 14, 2024

Choose a reason for hiding this comment

ThomasHepworth commented Mar 14, 2024

aalexandersson commented Mar 14, 2024

samnlindsay commented Feb 27, 2024 •

edited

samnlindsay commented Feb 27, 2024 •

edited

samnlindsay commented Feb 27, 2024 •

edited

aalexandersson commented Feb 28, 2024 •

edited

RobinL commented Feb 28, 2024 •

edited

samnlindsay commented Feb 28, 2024 •

edited

aalexandersson commented Feb 29, 2024 •

edited

RobinL commented Mar 2, 2024 •

edited

samnlindsay commented Mar 4, 2024 •

edited by RobinL