Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Neighborhood Components Analysis #10058

Merged
merged 89 commits into from
Feb 28, 2019
Merged
Show file tree
Hide file tree
Changes from 88 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
849a8d8
first commit
Oct 27, 2017
04222de
minor corrections in docstring
Oct 27, 2017
34c5457
remove comment
Oct 27, 2017
89f68ee
Add verbose during iterations
Oct 30, 2017
42e078a
Update code according to code review:
Oct 31, 2017
4c7c0d4
Remove _make_masks and use OneHotEncoder instead
Oct 31, 2017
4c81a16
precise that distances are squared
Oct 31, 2017
824e940
remove useless None
Oct 31, 2017
d4294ac
simplify tests
Oct 31, 2017
296e295
ensure min samples = 2 to make check_fit2d_1sample pass
Nov 2, 2017
616f9a2
Do not precompute pairwise differences
Nov 7, 2017
12cf3a9
add example
Nov 14, 2017
7b37e8d
reorganize transposes
Nov 14, 2017
48cab11
simplify gradient
Nov 14, 2017
47928aa
Fixes according to code review
Nov 22, 2017
4612e5f
Retrieving LMNN documentation in order to adapt it to NCA
Dec 13, 2017
27ab46b
Adapt documentation to Neighborhood Components Analysis
Dec 29, 2017
44e19d6
fix pep8 errors
Jan 3, 2018
dcb1a8a
fix flake8 error
Jan 3, 2018
6ba1692
fix encoding error
Jan 3, 2018
03b126b
changes according to review https://github.com/scikit-learn/scikit-le…
Jan 15, 2018
8b5646c
correct objective function doc
Jan 15, 2018
a7f6458
Merge branch 'master' into nca
May 28, 2018
9a09e29
Add batch computations of loss and gradient.
Jun 5, 2018
7721221
Update documentation.
Jun 5, 2018
d5de730
Merge branch 'master' into nca
Jun 5, 2018
173a966
FIX: import scipy.misc.logsumexp for older versions of scipy, and sci…
Jun 6, 2018
2cd3bf6
FIX: remove newly introduced keepdims for logsumexp
Jun 7, 2018
c50c841
FIX: remove unused old masks and use the new mask instead
Jun 7, 2018
094aa97
FIX: fix doctest CI fail by putting ellipsis
Jun 20, 2018
e6daf4e
FIX: fix doctest CI fail by putting ellipsis, this time in rst file
Jun 20, 2018
e160a6e
FIX: fix doctest CI fail by putting ellipsis, this time in rst file
Jun 20, 2018
fbc679b
Updates to be coherent with latest changes from pr #8602 (commits htt…
Jun 22, 2018
1e93e82
Merge branch 'nca_feat/comments_changes' into nca
Jun 22, 2018
92faf4f
ENH: Add warm_start feature from LMNN (PR #8602)
Jun 22, 2018
b172898
FIX: rename remaining old n_features_out to n_components
Jun 22, 2018
816f3de
FIX: Update doc like in commit https://github.com/scikit-learn/scikit…
Jun 22, 2018
85b2cdd
FIX: make test_warm_start_effectiveness_work
Jun 22, 2018
4ed68dd
ENH: Add possible LDA initialization
Jun 22, 2018
1f9c208
ENH: add 'auto' initialization
Jun 25, 2018
b0a96f9
Merge branch 'master' into nca
Jun 25, 2018
e050128
FIX test appropriate message depending on init
Jun 25, 2018
ead9850
FIX import name with relative path
Jun 25, 2018
a807df2
FIX simplify test and check almost equal to pass tests on linux 32 bits
Jun 25, 2018
e00d4a1
FIX Move LDA import inside NCA class to avoid circular dependencies
Jun 26, 2018
aa90c9b
DOC add what s new entry
Jun 28, 2018
85bd54f
MAINT simplify gradient testing
Jun 29, 2018
aa9ace7
TST FIX be more tolerant on decimals for older versions of numerical …
Jun 29, 2018
cc07261
STY fix continuation lines, removing backslashes
Jun 29, 2018
16cf04d
FIX: fix logsumexp import
Jul 15, 2018
8c7af3c
TST: simplify verbose testing with pytest capsys
Jul 23, 2018
8ce872f
Merge branch 'master' into nca
Jul 23, 2018
27f2b5c
TST: check more explicitely verbose
Aug 1, 2018
85f8d21
FIX: remove non-ASCII character
Aug 1, 2018
396f30f
ENH: simplify gradient expression
Aug 17, 2018
8830373
MAINT: address review https://github.com/scikit-learn/scikit-learn/pu…
Nov 29, 2018
16b022a
Merge branch 'master' into nca
Nov 29, 2018
ded5ecb
DOC: Add what's new entry
Nov 29, 2018
648ed5f
Merge branch 'master' into nca
Dec 6, 2018
589f57d
FIX: try raw string to pass flake8 (cf. https://github.com/iodide-pro…
Dec 6, 2018
600adf2
FIX: try the exact syntax that passed the linter
Dec 6, 2018
d274c4a
TST: give some tolerance for test_toy_example_collapse_points
Dec 6, 2018
2dbf064
relaunch travis
Dec 7, 2018
e17003e
FIX: use checked_random_state instead of np.random
Dec 12, 2018
32118aa
FIX: delete iterate.dat
Dec 12, 2018
5c2154f
Merge branch 'master' into nca
Dec 12, 2018
cf55015
FIX: Fix dealing with the case of LinearDiscriminantAnalysis initiali…
Dec 12, 2018
44839a0
Address reviews https://github.com/scikit-learn/scikit-learn/pull/100…
Jan 18, 2019
822620d
STY: fix PEP8 line too long error
Jan 18, 2019
41d3cef
Fix doctest
Jan 18, 2019
faa84fc
FIX: remove deprecated assert_true
Jan 22, 2019
db2950a
TST fix assertion always true in tests
Jan 22, 2019
f16770c
TST: fix PEP8 indent error
Jan 22, 2019
4f7375e
Merge branch 'master' into nca
Jan 22, 2019
49189c6
API: remove the possibility to store the opt_result (see https://gith…
Jan 22, 2019
0fda2ca
Merge branch 'master' into nca
Feb 25, 2019
f015bad
Move examples up in documentation and add NCA to manifold examples
Feb 25, 2019
0e5d5b3
STY: fix pep8 errors
Feb 25, 2019
77dc953
adress gael's review except https://github.com/scikit-learn/scikit-le…
Feb 26, 2019
a653189
Address aurelien's review
Feb 26, 2019
be9b1e1
Simplify test about auto init even more
Feb 26, 2019
2b1c8f2
Fix doc and replace embedding by projection for consistency
Feb 26, 2019
af14e5d
Address Gael's review
Feb 26, 2019
3a78d1a
few nitpicks and make some links in the doc work
Feb 27, 2019
58d169c
Address alex's review
Feb 27, 2019
fbd28e1
Adress Alex's review
Feb 28, 2019
8d65ebc
Add authors in test too
Feb 28, 2019
ed0d23a
add check_scalar to utils
Feb 28, 2019
6dbef86
MajorFeature > API
jnothman Feb 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1170,6 +1170,7 @@ Model validation
neighbors.RadiusNeighborsRegressor
neighbors.NearestCentroid
neighbors.NearestNeighbors
neighbors.NeighborhoodComponentsAnalysis

.. autosummary::
:toctree: generated/
Expand Down Expand Up @@ -1432,6 +1433,7 @@ Low-level methods
utils.assert_all_finite
utils.check_X_y
utils.check_array
utils.check_scalar
utils.check_consistent_length
utils.check_random_state
utils.class_weight.compute_class_weight
Expand Down
4 changes: 4 additions & 0 deletions doc/modules/decomposition.rst
Original file line number Diff line number Diff line change
Expand Up @@ -953,3 +953,7 @@ when data can be fetched sequentially.
* `"Stochastic Variational Inference"
<http://www.columbia.edu/~jwp2128/Papers/HoffmanBleiWangPaisley2013.pdf>`_
M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013


See also :ref:`nca_dim_reduction` for dimensionality reduction with
Neighborhood Components Analysis.
214 changes: 214 additions & 0 deletions doc/modules/neighbors.rst
Original file line number Diff line number Diff line change
Expand Up @@ -510,3 +510,217 @@ the model from 0.81 to 0.82.

* :ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`: an example of
classification using nearest centroid with different shrink thresholds.


GaelVaroquaux marked this conversation as resolved.
Show resolved Hide resolved
.. _nca:

Neighborhood Components Analysis
================================

.. sectionauthor:: William de Vazelhes <william.de-vazelhes@inria.fr>

Neighborhood Components Analysis (NCA, :class:`NeighborhoodComponentsAnalysis`)
is a distance metric learning algorithm which aims to improve the accuracy of
nearest neighbors classification compared to the standard Euclidean distance.
The algorithm directly maximizes a stochastic variant of the leave-one-out
k-nearest neighbors (KNN) score on the training set. It can also learn a
low-dimensional linear projection of data that can be used for data
visualization and fast classification.

.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. centered:: |nca_illustration_1| |nca_illustration_2|

In the above illustrating figure, we consider some points from a randomly
generated dataset. We focus on the stochastic KNN classification of point no.
3. The thickness of a link between sample 3 and another point is proportional
to their distance, and can be seen as the relative weight (or probability) that
a stochastic nearest neighbor prediction rule would assign to this point. In
the original space, sample 3 has many stochastic neighbors from various
classes, so the right class is not very likely. However, in the projected space
learned by NCA, the only stochastic neighbors with non-negligible weight are
from the same class as sample 3, guaranteeing that the latter will be well
classified. See the :ref:`mathematical formulation <nca_mathematical_formulation>`
for more details.

GaelVaroquaux marked this conversation as resolved.
Show resolved Hide resolved

Classification
--------------

Combined with a nearest neighbors classifier (:class:`KNeighborsClassifier`),
NCA is attractive for classification because it can naturally handle
multi-class problems without any increase in the model size, and does not
introduce additional parameters that require fine-tuning by the user.

NCA classification has been shown to work well in practice for data sets of
varying size and difficulty. In contrast to related methods such as Linear
Discriminant Analysis, NCA does not make any assumptions about the class
distributions. The nearest neighbor classification can naturally produce highly
irregular decision boundaries.

To use this model for classification, one needs to combine a
:class:`NeighborhoodComponentsAnalysis` instance that learns the optimal
transformation with a :class:`KNeighborsClassifier` instance that performs the
classification in the projected space. Here is an example using the two
classes:

>>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis,
... KNeighborsClassifier)
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.pipeline import Pipeline
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... stratify=y, test_size=0.7, random_state=42)
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)])
>>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS
Pipeline(...)
>>> print(nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS
0.96190476...

.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. centered:: |nca_classification_1| |nca_classification_2|

The plot shows decision boundaries for Nearest Neighbor Classification and
Neighborhood Components Analysis classification on the iris dataset, when
training and scoring on only two features, for visualisation purposes.

.. _nca_dim_reduction:

Dimensionality reduction
------------------------

NCA can be used to perform supervised dimensionality reduction. The input data
are projected onto a linear subspace consisting of the directions which
minimize the NCA objective. The desired dimensionality can be set using the
parameter ``n_components``. For instance, the following figure shows a
comparison of dimensionality reduction with Principal Component Analysis
(:class:`sklearn.decomposition.PCA`), Linear Discriminant Analysis
(:class:`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`) and
Neighborhood Component Analysis (:class:`NeighborhoodComponentsAnalysis`) on
the Digits dataset, a dataset with size :math:`n_{samples} = 1797` and
:math:`n_{features} = 64`. The data set is split into a training and a test set
of equal size, then standardized. For evaluation the 3-nearest neighbor
classification accuracy is computed on the 2-dimensional projected points found
by each method. Each data sample belongs to one of 10 classes.

.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|


.. topic:: Examples:

* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_classification.py`
* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py`
* :ref:`sphx_glr_auto_examples_manifold_plot_lle_digits.py`

.. _nca_mathematical_formulation:

Mathematical formulation
------------------------

The goal of NCA is to learn an optimal linear transformation matrix of size
``(n_components, n_features)``, which maximises the sum over all samples
:math:`i` of the probability :math:`p_i` that :math:`i` is correctly
classified, i.e.:

.. math::

\underset{L}{\arg\max} \sum\limits_{i=0}^{N - 1} p_{i}

with :math:`N` = ``n_samples`` and :math:`p_i` the probability of sample
:math:`i` being correctly classified according to a stochastic nearest
neighbors rule in the learned embedded space:

.. math::

p_{i}=\sum\limits_{j \in C_i}{p_{i j}}

where :math:`C_i` is the set of points in the same class as sample :math:`i`,
and :math:`p_{i j}` is the softmax over Euclidean distances in the embedded
space:

.. math::

p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\limits_{k \ne
i} {\exp{-(||L x_i - L x_k||^2)}}} , \quad p_{i i} = 0


Mahalanobis distance
^^^^^^^^^^^^^^^^^^^^

NCA can be seen as learning a (squared) Mahalanobis distance metric:

.. math::

|| L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),

where :math:`M = L^T L` is a symmetric positive semi-definite matrix of size
``(n_features, n_features)``.


Implementation
--------------

This implementation follows what is explained in the original paper [1]_. For
the optimisation method, it currently uses scipy's L-BFGS-B with a full
gradient computation at each iteration, to avoid to tune the learning rate and
provide stable learning.

See the examples below and the docstring of
:meth:`NeighborhoodComponentsAnalysis.fit` for further information.

Complexity
----------

Training
^^^^^^^^
NCA stores a matrix of pairwise distances, taking ``n_samples ** 2`` memory.
Time complexity depends on the number of iterations done by the optimisation
algorithm. However, one can set the maximum number of iterations with the
argument ``max_iter``. For each iteration, time complexity is
``O(n_components x n_samples x min(n_samples, n_features))``.


Transform
^^^^^^^^^
Here the ``transform`` operation returns :math:`LX^T`, therefore its time
complexity equals ``n_components * n_features * n_samples_test``. There is no
added space complexity in the operation.


.. topic:: References:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that these need either to be in a "topic", or in as footnotes: currently, they do not render right
https://48180-843222-gh.circle-artifacts.com/0/doc/modules/neighbors.html#transform

This is because the indentation is not correct.

You could remove the "topic" block, and add the following:

___________

**References**

Where the '__________' inserts an hrule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I went for fixing the indentation, it should work like at the end of this section: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/decomposition.rst#truncated-singular-value-decomposition-and-latent-semantic-analysis
I'll try to build the doc locally to be faster than circleci to check if it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw it and it works :)

.. [1] `"Neighbourhood Components Analysis". Advances in Neural Information"
<http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf>`_,
J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.

.. [2] `Wikipedia entry on Neighborhood Components Analysis
<https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_
2 changes: 1 addition & 1 deletion doc/modules/neural_networks_supervised.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ indices where the value is `1` represents the assigned classes of that sample::
>>> clf.predict([[0., 0.]])
array([[0, 1]])

See the examples below and the doc string of
See the examples below and the docstring of
:meth:`MLPClassifier.fit` for further information.

.. topic:: Examples:
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/sgd.rst
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ one-vs-all classification.

:class:`SGDClassifier` supports both weighted classes and weighted
instances via the fit parameters ``class_weight`` and ``sample_weight``. See
the examples below and the doc string of :meth:`SGDClassifier.fit` for
the examples below and the docstring of :meth:`SGDClassifier.fit` for
further information.

.. topic:: Examples:
Expand Down
10 changes: 8 additions & 2 deletions doc/whats_new/v0.21.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Support for Python 3.4 and below has been officially dropped.
- |Fix| Fixed a bug in :class:`decomposition.NMF` where `init = 'nndsvd'`,
`init = 'nndsvda'`, and `init = 'nndsvdar'` are allowed when
`n_components < n_features` instead of
`n_components <= min(n_samples, n_features)`.
`n_components <= min(n_samples, n_features)`.
:issue:`11650` by :user:`Hossein Pourbozorg <hossein-pourbozorg>` and
:user:`Zijie (ZJ) Poh <zjpoh>`.

Expand Down Expand Up @@ -167,7 +167,7 @@ Support for Python 3.4 and below has been officially dropped.

- |Fix| Fixed a bug in :class:`linear_model.LassoLarsIC`, where user input
``copy_X=False`` at instance creation would be overridden by default
parameter value ``copy_X=True`` in ``fit``.
parameter value ``copy_X=True`` in ``fit``.
:issue:`12972` by :user:`Lucio Fernandez-Arjona <luk-f-a>`

:mod:`sklearn.manifold`
Expand Down Expand Up @@ -244,6 +244,12 @@ Support for Python 3.4 and below has been officially dropped.
when called before ``fit`` :issue:`12279` by :user:`Krishna Sangeeth
<whiletruelearn>`.

- |MajorFeature| A metric learning algorithm:
:class:`neighbors.NeighborhoodComponentsAnalysis`, which implements the
Neighborhood Components Analysis algorithm described in Goldberger et al.
(2005). :issue:`10058` by :user:`William de Vazelhes
<wdevazelhes>` and :user:`John Chiotellis <johny-c>`.

:mod:`sklearn.neural_network`
.............................

Expand Down
Loading