# [MRG+2] Neighborhood Components Analysis #10058

Merged
merged 89 commits into from Feb 28, 2019
Merged

# [MRG+2] Neighborhood Components Analysis#10058

Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
Filter file types
Failed to load files and symbols.
+1,650 −21

#### Just for now

 @@ -1170,6 +1170,7 @@ Model validation neighbors.RadiusNeighborsRegressor neighbors.NearestCentroid neighbors.NearestNeighbors neighbors.NeighborhoodComponentsAnalysis .. autosummary:: :toctree: generated/
 @@ -953,3 +953,7 @@ when data can be fetched sequentially. * "Stochastic Variational Inference" _ M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013 See also :ref:nca_dim_reduction for dimensionality reduction with Neighborhood Components Analysis.
@@ -510,3 +510,217 @@ the model from 0.81 to 0.82.

* :ref:sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py: an example of
classification using nearest centroid with different shrink thresholds.

This conversation was marked as resolved by GaelVaroquaux
.. _nca:

Neighborhood Components Analysis
================================

.. sectionauthor:: William de Vazelhes <william.de-vazelhes@inria.fr>

Neighborhood Components Analysis (NCA, :class:NeighborhoodComponentsAnalysis)
is a distance metric learning algorithm which aims to improve the accuracy of
nearest neighbors classification compared to the standard Euclidean distance.
The algorithm directly maximizes a stochastic variant of the leave-one-out
k-nearest neighbors (KNN) score on the training set. It can also learn a
low-dimensional linear projection of data that can be used for data
visualization and fast classification.

.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. centered:: |nca_illustration_1| |nca_illustration_2|

In the above illustrating figure, we consider some points from a randomly
generated dataset. We focus on the stochastic KNN classification of point no.
3. The thickness of a link between sample 3 and another point is proportional
to their distance, and can be seen as the relative weight (or probability) that
a stochastic nearest neighbor prediction rule would assign to this point. In
the original space, sample 3 has many stochastic neighbors from various
classes, so the right class is not very likely. However, in the projected space
learned by NCA, the only stochastic neighbors with non-negligible weight are
from the same class as sample 3, guaranteeing that the latter will be well
classified. See the :ref:mathematical formulation <nca_mathematical_formulation>
for more details.

This conversation was marked as resolved by GaelVaroquaux

Classification
--------------

Combined with a nearest neighbors classifier (:class:KNeighborsClassifier),
NCA is attractive for classification because it can naturally handle
multi-class problems without any increase in the model size, and does not
introduce additional parameters that require fine-tuning by the user.

NCA classification has been shown to work well in practice for data sets of
varying size and difficulty. In contrast to related methods such as Linear
Discriminant Analysis, NCA does not make any assumptions about the class
distributions. The nearest neighbor classification can naturally produce highly
irregular decision boundaries.

To use this model for classification, one needs to combine a
:class:NeighborhoodComponentsAnalysis instance that learns the optimal
transformation with a :class:KNeighborsClassifier instance that performs the
classification in the projected space. Here is an example using the two
classes:

>>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis,
... KNeighborsClassifier)
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.pipeline import Pipeline
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... stratify=y, test_size=0.7, random_state=42)
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)])
>>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS
Pipeline(...)
>>> print(nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS
0.96190476...
.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. centered:: |nca_classification_1| |nca_classification_2|

The plot shows decision boundaries for Nearest Neighbor Classification and
Neighborhood Components Analysis classification on the iris dataset, when
training and scoring on only two features, for visualisation purposes.

.. _nca_dim_reduction:

Dimensionality reduction
------------------------

NCA can be used to perform supervised dimensionality reduction. The input data
are projected onto a linear subspace consisting of the directions which
minimize the NCA objective. The desired dimensionality can be set using the
parameter n_components. For instance, the following figure shows a
comparison of dimensionality reduction with Principal Component Analysis
(:class:sklearn.decomposition.PCA), Linear Discriminant Analysis
(:class:sklearn.discriminant_analysis.LinearDiscriminantAnalysis) and
Neighborhood Component Analysis (:class:NeighborhoodComponentsAnalysis) on
the Digits dataset, a dataset with size :math:n_{samples} = 1797 and
:math:n_{features} = 64. The data set is split into a training and a test set
of equal size, then standardized. For evaluation the 3-nearest neighbor
classification accuracy is computed on the 2-dimensional projected points found
by each method. Each data sample belongs to one of 10 classes.

.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|

.. topic:: Examples:

* :ref:sphx_glr_auto_examples_neighbors_plot_nca_classification.py
* :ref:sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py
* :ref:sphx_glr_auto_examples_manifold_plot_lle_digits.py

.. _nca_mathematical_formulation:

Mathematical formulation
------------------------

The goal of NCA is to learn an optimal linear transformation matrix of size
(n_components, n_features), which maximises the sum over all samples
:math:i of the probability :math:p_i that :math:i is correctly
classified, i.e.:

.. math::
\underset{L}{\arg\max} \sum\limits_{i=0}^{N - 1} p_{i}
with :math:N = n_samples and :math:p_i the probability of sample
:math:i being correctly classified according to a stochastic nearest
neighbors rule in the learned embedded space:

.. math::
p_{i}=\sum\limits_{j \in C_i}{p_{i j}}
where :math:C_i is the set of points in the same class as sample :math:i,
and :math:p_{i j} is the softmax over Euclidean distances in the embedded
space:

.. math::
p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\limits_{k \ne
i} {\exp{-(||L x_i - L x_k||^2)}}} , \quad p_{i i} = 0
Mahalanobis distance
^^^^^^^^^^^^^^^^^^^^

NCA can be seen as learning a (squared) Mahalanobis distance metric:

.. math::
|| L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),
where :math:M = L^T L is a symmetric positive semi-definite matrix of size
(n_features, n_features).

Implementation
--------------

This implementation follows what is explained in the original paper [1]_. For
the optimisation method, it currently uses scipy's L-BFGS-B with a full
gradient computation at each iteration, to avoid to tune the learning rate and
provide stable learning.

See the examples below and the docstring of
:meth:NeighborhoodComponentsAnalysis.fit for further information.

Complexity
----------

Training
^^^^^^^^
NCA stores a matrix of pairwise distances, taking n_samples ** 2 memory.
Time complexity depends on the number of iterations done by the optimisation
algorithm. However, one can set the maximum number of iterations with the
argument max_iter. For each iteration, time complexity is
O(n_components x n_samples x min(n_samples, n_features)).

Transform
^^^^^^^^^
Here the transform operation returns :math:LX^T, therefore its time
complexity equals n_components * n_features * n_samples_test. There is no
added space complexity in the operation.

.. topic:: References:

#### GaelVaroquaux Feb 26, 2019

Member

I think that these need either to be in a "topic", or in as footnotes: currently, they do not render right
https://48180-843222-gh.circle-artifacts.com/0/doc/modules/neighbors.html#transform

This is because the indentation is not correct.

You could remove the "topic" block, and add the following:

___________

**References**



Where the '__________' inserts an hrule.

#### wdevazelhes Feb 26, 2019

Author Contributor

Thanks, I went for fixing the indentation, it should work like at the end of this section: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/decomposition.rst#truncated-singular-value-decomposition-and-latent-semantic-analysis
I'll try to build the doc locally to be faster than circleci to check if it works

#### wdevazelhes Feb 27, 2019

Author Contributor

I just saw it and it works :)

.. [1] "Neighbourhood Components Analysis". Advances in Neural Information"
<http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf>_,
J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.
.. [2] Wikipedia entry on Neighborhood Components Analysis
<https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>_
 @@ -152,7 +152,7 @@ indices where the value is 1 represents the assigned classes of that sample:: >>> clf.predict([[0., 0.]]) array([[0, 1]]) See the examples below and the doc string of See the examples below and the docstring of :meth:MLPClassifier.fit for further information. .. topic:: Examples:
 @@ -154,7 +154,7 @@ one-vs-all classification. :class:SGDClassifier supports both weighted classes and weighted instances via the fit parameters class_weight and sample_weight. See the examples below and the doc string of :meth:SGDClassifier.fit for the examples below and the docstring of :meth:SGDClassifier.fit for further information. .. topic:: Examples:
 @@ -82,7 +82,7 @@ Support for Python 3.4 and below has been officially dropped. - |Fix| Fixed a bug in :class:decomposition.NMF where init = 'nndsvd', init = 'nndsvda', and init = 'nndsvdar' are allowed when n_components < n_features instead of n_components <= min(n_samples, n_features). n_components <= min(n_samples, n_features). :issue:11650 by :user:Hossein Pourbozorg  and :user:Zijie (ZJ) Poh . @@ -167,7 +167,7 @@ Support for Python 3.4 and below has been officially dropped. - |Fix| Fixed a bug in :class:linear_model.LassoLarsIC, where user input copy_X=False at instance creation would be overridden by default parameter value copy_X=True in fit. parameter value copy_X=True in fit. :issue:12972 by :user:Lucio Fernandez-Arjona  :mod:sklearn.manifold @@ -244,6 +244,12 @@ Support for Python 3.4 and below has been officially dropped. when called before fit :issue:12279 by :user:Krishna Sangeeth . - |MajorFeature| A metric learning algorithm: :class:neighbors.NeighborhoodComponentsAnalysis, which implements the Neighborhood Components Analysis algorithm described in Goldberger et al. (2005). :issue:10058 by :user:William de Vazelhes  and :user:John Chiotellis . :mod:sklearn.neural_network .............................
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.