Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Threshold for pairs learners #168

Merged
merged 43 commits into from
Apr 15, 2019
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
676ab86
add some tests for testing that different scores work using the scori…
Feb 4, 2019
cc1c3e6
ENH: Add tests and basic threshold implementation
Feb 5, 2019
f95c456
Add support for LSML and more generally quadruplets
Feb 6, 2019
9ffe8f7
Make CalibratedClassifierCV work (for preprocessor case) thanks to cl…
Feb 6, 2019
3354fb1
Fix some tests and PEP8 errors
Feb 7, 2019
12cb5f1
change the sign in decision function
Feb 19, 2019
dd8113e
Add docstring for threshold_ and classes_ in the base _PairsClassifie…
Feb 19, 2019
1c8cd29
remove quadruplets from the test with scikit learn custom scorings
Feb 19, 2019
d12729a
Remove argument y in quadruplets learners and lsml
Feb 20, 2019
dc9e21d
FIX fix docstrings of decision functions
Feb 20, 2019
402729f
FIX the threshold by taking the opposite (to be adapted to the decisi…
Feb 20, 2019
aaac3de
Fix tests to have no y for quadruplets' estimator fit
Feb 21, 2019
e5b1e47
Remove isin to be compatible with old numpy versions
Feb 21, 2019
a0cb3ca
Fix threshold so that it has a positive value and add small test
Feb 21, 2019
8d5fc50
Fix threshold for itml
Feb 21, 2019
0f14b25
FEAT: Add calibrate_threshold and tests
Mar 4, 2019
a6458a2
MAINT: remove starred syntax for compatibility with older versions of…
Mar 5, 2019
fada5cc
Remove debugging prints and make tests for ITML pass, while waiting f…
Mar 5, 2019
32a4889
FIX: from __future__ import division to pass tests for python 2.7
Mar 5, 2019
5cf71b9
Add some documentation for calibration
Mar 11, 2019
c2bc693
DOC: fix style
Mar 11, 2019
e96ee00
Merge branch 'master' into feat/add_threshold
Mar 21, 2019
3ed3430
Address most comments from aurelien's reviews
Mar 21, 2019
69c6945
Remove classes_ attribute and test for CalibratedClassifierCV
Mar 21, 2019
bc39392
Rename make_args_inc_quadruplets into remove_y_quadruplets
Mar 21, 2019
facc546
TST: Fix remaining threshold into min_rate
Mar 21, 2019
f0ca65e
Remove default_threshold and put calibrate_threshold instead
Mar 21, 2019
a6ec283
Use calibrate_threshold for ITML, and remove description
Mar 21, 2019
49fbbd7
ENH: use calibrate_threshold by default and display its parameters fr…
Mar 21, 2019
960b174
Add a small test to test automatic calibration
Mar 21, 2019
c91acf7
Update documentation of the default threshold
Mar 21, 2019
a742186
Inverse sense for threshold comparison to be more intuitive
Mar 21, 2019
9ec1ead
Address remaining review comments
Mar 21, 2019
986fed3
MAINT: Rename threshold_params into calibration_params
Mar 26, 2019
3f5d6d1
TST: Add test for extreme cases
Mar 26, 2019
7b5e4dd
MAINT: rename threshold_params into calibration_params
Mar 26, 2019
a3ec02c
MAINT: rename threshold_params into calibration_params
Mar 26, 2019
ccc66eb
FIX: Make tests work, and add the right threshold (mean between lowes…
Mar 27, 2019
6dff15b
Merge branch 'master' into feat/add_threshold
Mar 27, 2019
719d018
Go back to previous version of finding the threshold
Apr 2, 2019
551d161
Extract method for validating calibration parameters
Apr 2, 2019
594c485
Validate calibration params before fit
Apr 2, 2019
14713c6
Address https://github.com/metric-learn/metric-learn/pull/168#discuss…
Apr 2, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
117 changes: 83 additions & 34 deletions doc/weakly_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,47 @@ tuples you're working with (pairs, triplets...). See the docstring of the
`score` method of the estimator you use.


Learning on pairs
=================

Some metric learning algorithms learn on pairs of samples. In this case, one
should provide the algorithm with ``n_samples`` pairs of points, with a
corresponding target containing ``n_samples`` values being either +1 or -1.
These values indicate whether the given pairs are similar points or
dissimilar points.


.. _calibration:

Thresholding
------------
In order to predict whether a new pair represents similar or dissimilar
samples, we need to set a distance threshold, so that points closer (in the
learned space) than this threshold are predicted as similar, and points further
away are predicted as dissimilar. Several methods are possible for this
thresholding.

- **At fit time**: The threshold is set with `calibrate_threshold` (see
below) on the trainset. You can specify the calibration parameters directly
in the `fit` method with the `threshold_params` parameter (see the
documentation of the `fit` method of any metric learner that learns on pairs
of points for more information). This method can cause a little bit of
overfitting. If you want to avoid that, calibrate the threshold after
fitting, on a validation set.

- **Manual**: calling `set_threshold` will set the threshold to a
particular value.

- **Calibration**: calling `calibrate_threshold` will calibrate the
threshold to achieve a particular score on a validation set, the score
being among the classical scores for classification (accuracy, f1 score...).


See also: `sklearn.calibration`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later (when fixing #173) this could be a good place for short note on the use of CalibratedClassifierCV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree



Algorithms
==================
==========

ITML
----
Expand Down Expand Up @@ -192,39 +231,6 @@ programming.
.. [2] Adapted from Matlab code at http://www.cs.utexas.edu/users/pjain/
itml/


LSML
----

`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
Residual

.. topic:: Example Code:

::

from metric_learn import LSML

quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]

# we want to make closer points where the first feature is close, and
# further if the second feature is close

lsml = LSML()
lsml.fit(quadruplets)

.. topic:: References:

.. [1] Liu et al.
"Metric Learning from Relative Comparisons by Minimizing Squared
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf

.. [2] Adapted from https://gist.github.com/kcarnold/5439917


SDML
----

Expand Down Expand Up @@ -343,3 +349,46 @@ method. However, it is one of the earliest and a still often cited technique.
-with-side-information.pdf>`_ Xing, Jordan, Russell, Ng.
.. [2] Adapted from Matlab code `here <http://www.cs.cmu
.edu/%7Eepxing/papers/Old_papers/code_Metric_online.tar.gz>`_.

Learning on quadruplets
=======================

A type of information even weaker than pairs is information about relative
comparisons between pairs. The user should provide the algorithm with a
quadruplet of points, where the two first points are closer than the two
last points. No target vector (``y``) is needed, since the supervision is
already in the order that points are given in the quadruplet.

Algorithms
==========

LSML
----

`LSML`: Metric Learning from Relative Comparisons by Minimizing Squared
Residual

.. topic:: Example Code:

::

from metric_learn import LSML

quadruplets = [[[1.2, 7.5], [1.3, 1.5], [6.4, 2.6], [6.2, 9.7]],
[[1.3, 4.5], [3.2, 4.6], [6.2, 5.5], [5.4, 5.4]],
[[3.2, 7.5], [3.3, 1.5], [8.4, 2.6], [8.2, 9.7]],
[[3.3, 4.5], [5.2, 4.6], [8.2, 5.5], [7.4, 5.4]]]

# we want to make closer points where the first feature is close, and
# further if the second feature is close

lsml = LSML()
lsml.fit(quadruplets)

.. topic:: References:

.. [1] Liu et al.
"Metric Learning from Relative Comparisons by Minimizing Squared
Residual". ICDM 2012. http://www.cs.ucla.edu/~weiwang/paper/ICDM12.pdf

.. [2] Adapted from https://gist.github.com/kcarnold/5439917