Use a threshold for prediction for pairs #131

wdevazelhes · 2018-11-13T08:31:51Z

A pairs predictor should be able to predict a binary result when given a pair, like a classifier, when we would call the predict function. This could be done by using a threshold and comparing the score of similarity for the pair to this threshold.

TODO:
Also:

replace score_pairs by a better name like compute_distances (see comment [MRG] Refactor the metric() method #152 (comment))

The text was updated successfully, but these errors were encountered:

bellet · 2018-11-13T09:44:27Z

This threshold could be tuned (e.g. on the training set) so as to achieve a given level of precision

wdevazelhes · 2019-01-09T10:46:54Z

1/ There's this recent PR in scikit-learn that is related: scikit-learn/scikit-learn#10117

They use a meta-estimator taking as an argument the estimator we want to threshold. It seems to be pretty close to what we want to do, allowing to specify a precision level etc, allowing not to refit the model if needed...

However it does not have an option for setting a threshold for just optimizing the accuracy. They indeed wouldn't need it if all binary classifiers in scikit-learn optimized for a cost function that gave the higher accuracy for a known threshold (e.g. 0.5) (which would mean that accuracy is the high-level metric optimized by default).

I'll try to investigate that but maybe you already know the answer ?

Because if we find a case in scikit-learn where we want to choose a threshold to optimize for accuracy then we could say it in the PR to add it there. But otherwise I think we would need to implement it in metric-learn.

What is more, maybe it would be good to have this (the accuracy maximizing thresholding) by default in pairs metric learners so that they directly have a predict function and we don't need a meta-estimator to have it ? But then for more sophisticated selections (like with precision, etc) they would use the scikit-learn metaestimator which should be compatible ?

2/ Also related, there's this class (from the module sklearn.calibration) that allows to calibrate the predictions. It provides a good predict_proba to estimators that don't have it or have a bad one (I don't think we need to put it in the code but maybe just say in the docs we can use this):

https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html#sklearn.calibration.CalibratedClassifierCV

The only problem is I could only use it using a preprocessor. Indeed at fit time they have a check_array that doesn't accept 3D arrays.. I guess I could file an issue in scikit-learn about that, because it's a bit weird since GridSearchCV for instance doesn't do that and is still a metaestimator.

PGryllos · 2019-01-11T15:45:41Z

@wdevazelhes in the current implementation of scikit-learn/scikit-learn#10117 you can tune the threshold either for optimal roc auc curve point or for fbeta, where you get to choose the parameters. So with beta == 1 you are optimising for f1; So far I didn't see any reason to add accuracy as well as usually f1 is more useful. But can be easily added; I plan to make some progress next week

wdevazelhes · 2019-01-15T11:04:27Z

@PGryllos thanks for your comment, and great PR in scikit-learn ! I would say that accuracy would be the simpler/more natural in our case, as a default function for having a predict function (because in our setting of metric learning on pairs, we don't have a natural threshold hence if we don't set it we cannot predict). But what do you think @bellet ? Maybe f1 score would be more natural as a default ?

wdevazelhes · 2019-02-01T14:39:59Z

I just raised an issue in scikit-learn for using the CalibratedClassifierCV, see: scikit-learn/scikit-learn#13077

bellet added this to the v0.5.0 milestone Dec 20, 2018

bellet mentioned this issue Jan 11, 2019

[MRG] Refactor the metric() method #152

Merged

6 tasks

bellet mentioned this issue Jan 31, 2019

Scoring functions #165

Closed

wdevazelhes mentioned this issue Feb 4, 2019

[MRG+1] Threshold for pairs learners #168

Merged

14 tasks

bellet closed this as completed in #168 Apr 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a threshold for prediction for pairs #131

Use a threshold for prediction for pairs #131

wdevazelhes commented Nov 13, 2018 •

edited

bellet commented Nov 13, 2018

wdevazelhes commented Jan 9, 2019 •

edited

PGryllos commented Jan 11, 2019

wdevazelhes commented Jan 15, 2019

wdevazelhes commented Feb 1, 2019

Use a threshold for prediction for pairs #131

Use a threshold for prediction for pairs #131

Comments

wdevazelhes commented Nov 13, 2018 • edited

bellet commented Nov 13, 2018

wdevazelhes commented Jan 9, 2019 • edited

PGryllos commented Jan 11, 2019

wdevazelhes commented Jan 15, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Nov 13, 2018 •

edited

wdevazelhes commented Jan 9, 2019 •

edited