Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threshold selection tool #2003

Merged
merged 14 commits into from
Mar 13, 2024
Merged

Threshold selection tool #2003

merged 14 commits into from
Mar 13, 2024

Conversation

samnlindsay
Copy link
Contributor

@samnlindsay samnlindsay commented Feb 27, 2024

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Is your Pull Request linked to an existing Issue or Pull Request?

Relevant to #1564 in that it ties together several different charts into a single tool.

Creating draft PR while I gather feedback and then I'll add docs etc.

Do we want to replace the individual charts that this combines? Or add this as another function? Only it duplicates a lot of code and it's harder to maintain all of them, especially if they're not being used.

@ThomasHepworth FYI

Give a brief description for the solution you have provided

Combined interactive chart to:

  • relate match weight threshold to a probability
  • show the confusion matrix as a function of this threshold
  • show the impact of threshold selection on performance metrics to aid threshold selection

Screenshot (using pairwise_labels.ipynb)

image

PR Checklist

  • Added documentation for changes
  • Added feature to example notebooks or tutorial (if appropriate)
  • Added tests (if appropriate)
  • Updated CHANGELOG.md (if appropriate)
  • Made changes based off the latest version of Splink
  • Run the linter

@ThomasHepworth
Copy link
Contributor

Yah, really like this and it's essentially the direction I was going in.

I was leaning more towards simply combining the confusion matrix with the accuracy chart to reduce the amount of information on a user's screen:
Screenshot 2024-02-21 at 20 44 30
(Ignore the lack of data in the accuracy chart, it was designed to simplify the json in the editor)

I can see the merit of showing the sigmoid'd match weight chart, so I am very much indifferent on this one.

I'll leave some additional thoughts tomorrow morning when I'm feeling a tad better.

@RobinL
Copy link
Member

RobinL commented Feb 27, 2024

I don't feel at all strongly about it but I think I agree with Tom that the sigmoid doesn't add that much, especially if we used the waterfall trick of having two y axis (in this case it'd be the x axis) that map match weight and match probability on the main precision recall chart.

image

@samnlindsay
Copy link
Contributor Author

samnlindsay commented Feb 27, 2024

Thanks both. I agree that from a purely information perspective the sigmoid doesn't add anything, but I think it helps make the information more accessible by facilitating different approaches to it. Whether you want to wiggle your cursor up and down the match probability axis or left and right across the match weight axis, it works equally well, while also literally drawing your attention to how your preferred measure maps onto the other.

For example:

  • No intuition, naive approach - "match probability = 0.5 is the most sensible threshold right?".
  • Partially informed approach - "The data linking team said cluster_medium was a threshold of 0.99. Is that really that different from 0.95?"
    • If what you know is probabilities, it's easier to navigate these charts using the 0-1 y-axis of the sigmoid.
  • Maximum insight in minimal time approach - "What's the best this model can do? [goes to intersection of precision and recall curves - sees the sigmoid curve highlighted] Wow, that's pretty low!"
    • The visual element of seeing the probability close to 0 rather than 1 (for me anyway) makes the information much easier to digest than having to read off a weird probability scale like on the waterfall chart, or reading a value off a tooltip.

I've never liked that axis on the waterfall chart, but that's a rare example of something that only works in a match weight context, so the funky axis is the only way to visually cater for match probability.

@samnlindsay
Copy link
Contributor Author

samnlindsay commented Feb 27, 2024

For comparison, here's how it might look without the sigmoid

image

@aalexandersson
Copy link
Contributor

aalexandersson commented Feb 28, 2024

Is "score" a match probability? If yes, then it seems possible to optionally add back the sigmoid but still only use two charts (that is the confusion matrix plus one combined chart)!?

In the "Score" chart, the y-axis begins at 0.5. How does it look if it begins at 0? For titles, I suggest the y-axis title "Match probability (Score)" and the x-axis title "Match weight". Then, how does it look like to add (overlay) the sigmoid curve on this chart?

@RobinL
Copy link
Member

RobinL commented Feb 28, 2024

I think the y axis 'score' refers to the 'precision or recall score'. (just 'score' because it's used for both metrics). But can see that's a potential source of confusion and perhaps we could find a better name ('performance metric score' perhaps?)

@aalexandersson
Copy link
Contributor

The name 'performance metric score' is more general than 'precision or recall score', so better in that sense. My suggestion is to overlay (superimpose) the 'sigmoid' so that the result is one combined chart with two chart areas, not three chart areas. This way, we do require another chart area for the sigmoid.

@samnlindsay
Copy link
Contributor Author

samnlindsay commented Feb 28, 2024

I experimented with being able to zoom in on the y ("Score") axis because the [0,1] range was too big to be useful, but I gave up because it was too difficult to constrain it the way I wanted (with 1 always the maximum).

The reason for the compromise shown here is that if precision/recall/F1 are below 0.5 your model isn't worth pursuing with. The area of interest is loosely described as "close to 1" in the score axis. In the example shown, the ideal range would be something like [0.85,1], and a [0,1] would contain a lot of whitespace and make it difficult to see the detail where needed.

Here's an example showing a less ambiguous y-axis title and with the full [0,1] range:

image

@aalexandersson
Copy link
Contributor

aalexandersson commented Feb 29, 2024

Well, I think this latest combined chart is easier to interpret thanks to the full [0,1] range in the y-axis. For completeness, could you add back the sigmoid chart? That is, I would like to see the first chart but with modification of the full [0,1] range in the y-axis as in the latest chart.

@samnlindsay
Copy link
Contributor Author

Following this discussion I have edited the chart as follows:
image

Given the "action" is all at the top end of the "Performance metric value" scale, I've left it at [0.5,1] to show more detail in the data.

@samnlindsay samnlindsay marked this pull request as ready for review February 29, 2024 18:20
@RobinL
Copy link
Member

RobinL commented Mar 2, 2024

Should we remove linker.confusion_matrix_from_labels_table i.e. this
seems superfluous to have both ?

Given the functionality is so similar I wouldn't be worried about breaking backwards compatibility

@samnlindsay
Copy link
Contributor Author

samnlindsay commented Mar 4, 2024

Yes, I was thinking about that in #2010

@RobinL
Copy link
Member

RobinL commented Mar 5, 2024

@samnlindsay i edited the above because you referenced 2020, but i think you probably meant 2010

@samnlindsay
Copy link
Contributor Author

Yes sorry I was on my phone

@RobinL
Copy link
Member

RobinL commented Mar 5, 2024

No worries, just didn't want to edit without you knowing in case i did it wrong!

Copy link
Member

@RobinL RobinL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code looks good to me.

Please can we remove linker.confusion_matrix_from_labels_table and associated code/docs as part of the PR

splink/linker.py Outdated Show resolved Hide resolved
splink/linker.py Outdated Show resolved Hide resolved
samnlindsay and others added 4 commits March 11, 2024 16:30
Co-authored-by: Robin Linacre <robin.linacre@digital.justice.gov.uk>
Co-authored-by: Robin Linacre <robin.linacre@digital.justice.gov.uk>
@samnlindsay samnlindsay requested a review from RobinL March 11, 2024 21:54
@samnlindsay
Copy link
Contributor Author

confusion_matrix_from_labels_XXX functions and references to them have been removed.

The one exception is the blog post announcing the confusion matrix chart as a new feature. The image shown is unchanged but it now links to the new threshold selection tool notebook instead of the confusion matrix one.

@samnlindsay samnlindsay merged commit 8b063f0 into master Mar 13, 2024
13 checks passed
@samnlindsay samnlindsay deleted the threshold_selector branch March 13, 2024 15:23
linker.confusion_matrix_from_labels_table("labels")
```
```py
linker.accuracy_chart_from_labels_column("ground_truth", add_metrics=["f1"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may need to be changed to linker.threshold_selection_tool_from_labels_table(...)

@ThomasHepworth
Copy link
Contributor

Sorry, a quick Q on this - should we set a default value for the match threshold selection?

Initially, the chart currently looks like so (at least for me):

where the confusion matrix doesn't appear until you hover over one of the graphs.

@aalexandersson
Copy link
Contributor

I definitely would like to see the confusion matrix by default. It does not help to hover over one of the graphs when the output is PDF :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants