-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a threshold for prediction for pairs #131
Comments
This threshold could be tuned (e.g. on the training set) so as to achieve a given level of precision |
1/ There's this recent PR in scikit-learn that is related: scikit-learn/scikit-learn#10117 They use a meta-estimator taking as an argument the estimator we want to threshold. It seems to be pretty close to what we want to do, allowing to specify a precision level etc, allowing not to refit the model if needed... However it does not have an option for setting a threshold for just optimizing the accuracy. They indeed wouldn't need it if all binary classifiers in scikit-learn optimized for a cost function that gave the higher accuracy for a known threshold (e.g. 0.5) (which would mean that accuracy is the high-level metric optimized by default). I'll try to investigate that but maybe you already know the answer ? Because if we find a case in scikit-learn where we want to choose a threshold to optimize for accuracy then we could say it in the PR to add it there. But otherwise I think we would need to implement it in metric-learn. What is more, maybe it would be good to have this (the accuracy maximizing thresholding) by default in pairs metric learners so that they directly have a predict function and we don't need a meta-estimator to have it ? But then for more sophisticated selections (like with precision, etc) they would use the scikit-learn metaestimator which should be compatible ? 2/ Also related, there's this class (from the module sklearn.calibration) that allows to calibrate the predictions. It provides a good The only problem is I could only use it using a preprocessor. Indeed at fit time they have a check_array that doesn't accept 3D arrays.. I guess I could file an issue in scikit-learn about that, because it's a bit weird since GridSearchCV for instance doesn't do that and is still a metaestimator. |
@wdevazelhes in the current implementation of scikit-learn/scikit-learn#10117 you can tune the threshold either for optimal roc auc curve point or for fbeta, where you get to choose the parameters. So with beta == 1 you are optimising for f1; So far I didn't see any reason to add accuracy as well as usually f1 is more useful. But can be easily added; I plan to make some progress next week |
@PGryllos thanks for your comment, and great PR in scikit-learn ! I would say that accuracy would be the simpler/more natural in our case, as a default function for having a |
I just raised an issue in scikit-learn for using the |
A pairs predictor should be able to predict a binary result when given a pair, like a classifier, when we would call the
predict
function. This could be done by using a threshold and comparing the score of similarity for the pair to this threshold.TODO:
Also:
score_pairs
by a better name likecompute_distances
(see comment [MRG] Refactor the metric() method #152 (comment))The text was updated successfully, but these errors were encountered: