Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Threshold for pairs learners #168

Merged
merged 43 commits into from
Apr 15, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
676ab86
add some tests for testing that different scores work using the scori…
Feb 4, 2019
cc1c3e6
ENH: Add tests and basic threshold implementation
Feb 5, 2019
f95c456
Add support for LSML and more generally quadruplets
Feb 6, 2019
9ffe8f7
Make CalibratedClassifierCV work (for preprocessor case) thanks to cl…
Feb 6, 2019
3354fb1
Fix some tests and PEP8 errors
Feb 7, 2019
12cb5f1
change the sign in decision function
Feb 19, 2019
dd8113e
Add docstring for threshold_ and classes_ in the base _PairsClassifie…
Feb 19, 2019
1c8cd29
remove quadruplets from the test with scikit learn custom scorings
Feb 19, 2019
d12729a
Remove argument y in quadruplets learners and lsml
Feb 20, 2019
dc9e21d
FIX fix docstrings of decision functions
Feb 20, 2019
402729f
FIX the threshold by taking the opposite (to be adapted to the decisi…
Feb 20, 2019
aaac3de
Fix tests to have no y for quadruplets' estimator fit
Feb 21, 2019
e5b1e47
Remove isin to be compatible with old numpy versions
Feb 21, 2019
a0cb3ca
Fix threshold so that it has a positive value and add small test
Feb 21, 2019
8d5fc50
Fix threshold for itml
Feb 21, 2019
0f14b25
FEAT: Add calibrate_threshold and tests
Mar 4, 2019
a6458a2
MAINT: remove starred syntax for compatibility with older versions of…
Mar 5, 2019
fada5cc
Remove debugging prints and make tests for ITML pass, while waiting f…
Mar 5, 2019
32a4889
FIX: from __future__ import division to pass tests for python 2.7
Mar 5, 2019
5cf71b9
Add some documentation for calibration
Mar 11, 2019
c2bc693
DOC: fix style
Mar 11, 2019
e96ee00
Merge branch 'master' into feat/add_threshold
Mar 21, 2019
3ed3430
Address most comments from aurelien's reviews
Mar 21, 2019
69c6945
Remove classes_ attribute and test for CalibratedClassifierCV
Mar 21, 2019
bc39392
Rename make_args_inc_quadruplets into remove_y_quadruplets
Mar 21, 2019
facc546
TST: Fix remaining threshold into min_rate
Mar 21, 2019
f0ca65e
Remove default_threshold and put calibrate_threshold instead
Mar 21, 2019
a6ec283
Use calibrate_threshold for ITML, and remove description
Mar 21, 2019
49fbbd7
ENH: use calibrate_threshold by default and display its parameters fr…
Mar 21, 2019
960b174
Add a small test to test automatic calibration
Mar 21, 2019
c91acf7
Update documentation of the default threshold
Mar 21, 2019
a742186
Inverse sense for threshold comparison to be more intuitive
Mar 21, 2019
9ec1ead
Address remaining review comments
Mar 21, 2019
986fed3
MAINT: Rename threshold_params into calibration_params
Mar 26, 2019
3f5d6d1
TST: Add test for extreme cases
Mar 26, 2019
7b5e4dd
MAINT: rename threshold_params into calibration_params
Mar 26, 2019
a3ec02c
MAINT: rename threshold_params into calibration_params
Mar 26, 2019
ccc66eb
FIX: Make tests work, and add the right threshold (mean between lowes…
Mar 27, 2019
6dff15b
Merge branch 'master' into feat/add_threshold
Mar 27, 2019
719d018
Go back to previous version of finding the threshold
Apr 2, 2019
551d161
Extract method for validating calibration parameters
Apr 2, 2019
594c485
Validate calibration params before fit
Apr 2, 2019
14713c6
Address https://github.com/metric-learn/metric-learn/pull/168#discuss…
Apr 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
71 changes: 55 additions & 16 deletions metric_learn/base_metric.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from sklearn.base import BaseEstimator
from sklearn.metrics.ranking import _binary_clf_curve
from sklearn.utils.extmath import stable_cumsum
from sklearn.utils.validation import _is_arraylike, check_is_fitted
from sklearn.metrics import roc_auc_score, precision_recall_curve, roc_curve
from sklearn.metrics import roc_auc_score, roc_curve
import numpy as np
from abc import ABCMeta, abstractmethod
import six
Expand Down Expand Up @@ -490,20 +491,39 @@ def calibrate_threshold(self, pairs_valid, y_valid, strategy='accuracy',
scores_sorted = scores[scores_sorted_idces]
# true labels ordered by decision_function value: (higher first)
y_ordered = y_valid[scores_sorted_idces]
# we need to add a threshold that will reject all points
scores_sorted = np.concatenate([[scores_sorted[0] + 1], scores_sorted])

# finds the threshold that maximizes the accuracy:
cum_tp = stable_cumsum(y_ordered == 1) # cumulative number of true
# positives
# we need to add the point where all samples are rejected:
cum_tp = np.concatenate([[0.], cum_tp])
cum_tn_inverted = stable_cumsum(y_ordered[::-1] == -1)
cum_tn = np.concatenate([[0], cum_tn_inverted[:-1]])[::-1]
cum_tn = np.concatenate([[0.], cum_tn_inverted])[::-1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to include the last cum_tn_inverted now, since it's the one where all samples will be rejected

cum_accuracy = (cum_tp + cum_tn) / n_samples
imax = np.argmax(cum_accuracy)
# note: we want a positive threshold (distance), so we take - threshold
self.threshold_ = - scores_sorted[imax]
if imax == len(scores_sorted): # if the best is to accept all points
# we set the threshold to (minus) [the lowest score - 1]
self.threshold_ = - (scores_sorted[imax] - 1)
else:
# otherwise, we set the threshold to the mean between the lowest
# accepted score and the highest accepted score
self.threshold_ = - np.mean(scores_sorted[imax: imax + 2])
# note: if the best is to reject all points it's already one of the
# thresholds (scores_sorted[0] + 1)
return self

if strategy == 'f_beta':
precision, recall, thresholds = precision_recall_curve(
fps, tps, thresholds = _binary_clf_curve(
y_valid, self.decision_function(pairs_valid), pos_label=1)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to take part of the code (but not all) from precision_recall_curve here because there was a section of the code in precision_recall_curve that deleted points when the maximum recall is attained. This was OK when we set the threshold to the the last value we accept, but it's not OK with the new computation of the threshold because we need all the points to put the threshold in the middle of two points. (Otherwise the algo can think some point is the last point of the list and so be set to this value - 1, whereas it should be the mean between this value and the next one)
I can detail more this if needed

precision = tps / (tps + fps)
precision[np.isnan(precision)] = 0
recall = tps / tps[-1]

# here the thresholds are decreasing
# We ignore the warnings here, in the same taste as
# https://github.com/scikit-learn/scikit-learn/blob/62d205980446a1abc1065
# f4332fd74eee57fcf73/sklearn/metrics/classification.py#L1284
Expand All @@ -516,26 +536,45 @@ def calibrate_threshold(self, pairs_valid, y_valid, strategy='accuracy',
f_beta[np.isnan(f_beta)] = 0.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set nans to zero otherwise they will be considered higher than the others (also discussed in https://github.com/scikit-learn/scikit-learn/pull/10117/files#r262115773)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, probably good to mention it in a comment in the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

imax = np.argmax(f_beta)
# note: we want a positive threshold (distance), so we take - threshold
self.threshold_ = - thresholds[imax]
if imax == len(thresholds): # the best is to accept all points
# we set the threshold to (minus) [the lowest score - 1]
self.threshold_ = - (thresholds[imax] - 1)
else:
# otherwise, we set the threshold to the mean between the lowest
# accepted score and the highest rejected score
self.threshold_ = - np.mean(thresholds[imax: imax + 2])
# Note: we don't need to deal with rejecting all points (i.e. threshold =
# max_scores + 1), since this can never happen to be optimal
# (see a more detailed discussion in test_calibrate_threshold_extreme)
return self

fpr, tpr, thresholds = roc_curve(y_valid,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there is a particular corner case that is not explicitely coded here but comes from scikit-learn: we should decide if that's what we want or not:
In roc_curve, an additional point is added in (0, 0) (0 false positives and 0 true positives (so 0 positives)) if needed (i.e. if there is not already such a point), with an artificial threshold which is thresholds[0] + 1 (see https://github.com/scikit-learn/scikit-learn/blob/62d205980446a1abc1065f4332fd74eee57fcf73/sklearn/metrics/ranking.py#L637-L642). (this points represent a case where we want to reject all samples, the +1 makes sense since it will reject all samples from the training set)
Is that what we want ? It kind of makes sense since it works if we want to reject everything but the +1 seems a bit arbitrary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note also that some points can be deleted in the roc_curve (see these lines of code: https://github.com/scikit-learn/scikit-learn/blob/62d205980446a1abc1065f4332fd74eee57fcf73/sklearn/metrics/ranking.py#L628-L635, and scikit-learn/scikit-learn#5237). But it will not change the result as it just removes suboptimal points (and keeps optimal points i.e. points with the highest tpr for the same tnr and points with the highest tnr for the same tpr). We could have avoided this deletion by setting drop_intermediate=True but we don't need to do that since the deletion is harmless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it is fine this way? It does not seem very important anyway so going with the sklearn approach is fine

self.decision_function(pairs_valid),
pos_label=1)
pos_label=1, drop_intermediate=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need drop_intermediate=False here for a similar problem than above: drop_intermediate=True worked fine when we were setting the threshold to an exact value of a point, but if we remove some points then the mean between one point and the next can be wrong (because the "next" point is not the actual "next" point: the actual "next" would be removed)
drop_intermediate=False ensures that no point is dropped

# here the thresholds are decreasing
fpr, tpr, thresholds = fpr, tpr, thresholds

if strategy == 'max_tpr':
indices = np.where(1 - fpr >= min_rate)[0]
max_tpr_index = np.argmax(tpr[indices])
# note: we want a positive threshold (distance), so we take - threshold
self.threshold_ = - thresholds[indices[max_tpr_index]]
if strategy in ['max_tpr', 'max_tnr']:
if strategy == 'max_tpr':
indices = np.where(1 - fpr >= min_rate)[0]
imax = np.argmax(tpr[indices])

if strategy == 'max_tnr':
indices = np.where(tpr >= min_rate)[0]
max_tnr_index = np.argmax(1 - fpr[indices])
if strategy == 'max_tnr':
indices = np.where(tpr >= min_rate)[0]
imax = np.argmax(1 - fpr[indices])

imax_valid = indices[imax]
# note: we want a positive threshold (distance), so we take - threshold
self.threshold_ = - thresholds[indices[max_tnr_index]]
return self
if indices[imax] == len(thresholds): # we want to accept everything
self.threshold_ = - (thresholds[imax_valid] - 1)
elif indices[imax] == 0: # we want to reject everything
# thanks to roc_curve, the first point should be always max_threshold
# + 1 (we should always go through the "if" statement in roc_curve),
# see: https://github.com/scikit-learn/scikit-learn/pull/13523
self.threshold_ = - (thresholds[imax_valid])
else:
self.threshold_ = - np.mean(thresholds[imax_valid: imax_valid + 2])
return self


class _QuadrupletsClassifierMixin(BaseMetricLearner):
Expand Down
29 changes: 16 additions & 13 deletions test/test_pairs_classifiers.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
from functools import partial

import pytest
from numpy.testing import assert_array_equal

from metric_learn.base_metric import _PairsClassifierMixin, MahalanobisMixin
from sklearn.exceptions import NotFittedError
from sklearn.metrics import (f1_score, accuracy_score, fbeta_score,
Expand Down Expand Up @@ -351,46 +353,47 @@ def fit(self, pairs, y, calibration_params=None):
return self

def decision_function(self, pairs):
return np.arange(7)
return np.arange(pairs.shape[0], dtype=float)

rng = np.random.RandomState(42)
pairs = rng.randn(7, 2, 5) # the info in X is not used, it's just for the
# API

y = [1, 1, 1, -1, -1, -1, -1]
y = [1., 1., 1., -1., -1., -1., -1.]
mock_clf = MockBadPairsClassifier()
# case of bad scoring with more negative than positives. In
# this case, when:
# optimizing for accuracy we should reject all points
mock_clf.fit(pairs, y, calibration_params={'strategy': 'accuracy'})
assert (mock_clf.predict(pairs) == - np.ones(7)).all()
assert_array_equal(mock_clf.predict(pairs), - np.ones(7))

# optimizing for max_tpr we should accept all points if min_rate == 0. (
# because by convention then tnr=0/0=0)
mock_clf.fit(pairs, y, calibration_params={'strategy': 'max_tpr',
'min_rate': 0.})
assert (mock_clf.predict(pairs) == np.ones(7)).all()
assert_array_equal(mock_clf.predict(pairs), np.ones(7))
# optimizing for max_tnr we should reject all points if min_rate = 0. (
# because by convention then tpr=0/0=0)
mock_clf.fit(pairs, y, calibration_params={'strategy': 'max_tnr',
'min_rate': 0.})
assert (mock_clf.predict(pairs) == - np.ones(7)).all()
assert_array_equal(mock_clf.predict(pairs), - np.ones(7))

y = [1, 1, 1, 1, -1, -1, -1]
y = [1., 1., 1., 1., -1., -1., -1.]
# case of bad scoring with more positives than negatives. In
# this case, when:
# optimizing for accuracy we should accept all points
mock_clf.fit(pairs, y, calibration_params={'strategy': 'accuracy'})
assert (mock_clf.predict(pairs) == np.ones(7)).all()
assert_array_equal(mock_clf.predict(pairs), np.ones(7))
# optimizing for max_tpr we should accept all points if min_rate == 0. (
# because by convention then tnr=0/0=0)
mock_clf.fit(pairs, y, calibration_params={'strategy': 'max_tpr',
'min_rate': 0.})
assert (mock_clf.predict(pairs) == np.ones(7)).all()
assert_array_equal(mock_clf.predict(pairs), np.ones(7))
# optimizing for max_tnr we should reject all points if min_rate = 0. (
# because by convention then tpr=0/0=0)
mock_clf.fit(pairs, y, calibration_params={'strategy': 'max_tnr',
'min_rate': 0.})
assert (mock_clf.predict(pairs) == - np.ones(7)).all()
assert_array_equal(mock_clf.predict(pairs), - np.ones(7))

# Note: we'll never find a case where we would reject all points for
# maximizing tpr (we can always accept more points), and accept all
Expand All @@ -399,10 +402,10 @@ def decision_function(self, pairs):
# case of alternated scores: for optimizing the f_1 score we should accept
# all points (because this way we have max recall (1) and max precision (
# here: 0.5))
y = [1, -1, 1, -1, 1, -1]
mock_clf.fit(pairs, y, calibration_params={'strategy': 'f_beta',
'beta': 1.})
assert (mock_clf.predict(pairs) == - np.ones(7)).all()
y = [1., -1., 1., -1., 1., -1.]
mock_clf.fit(pairs[:6], y, calibration_params={'strategy': 'f_beta',
'beta': 1.})
assert_array_equal(mock_clf.predict(pairs[:6]), np.ones(6))

# Note: for optimizing f_1 score, we will never find an optimal case where we
# reject all points because in this case we would have 0 precision (by
Expand Down