Skip to content
Browse files

FEA Neighborhood Components Analysis (#10058)

  • Loading branch information...
wdevazelhes authored and jnothman committed Feb 28, 2019
1 parent 1f75ffa commit d31b67f23d3d785b17261ada293710380683bc82
@@ -1169,6 +1169,7 @@ Model validation

.. autosummary::
:toctree: generated/
@@ -1431,6 +1432,7 @@ Low-level methods
@@ -957,3 +957,7 @@ when data can be fetched sequentially.
* `"Stochastic Variational Inference"
M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013

See also :ref:`nca_dim_reduction` for dimensionality reduction with
Neighborhood Components Analysis.
@@ -510,3 +510,217 @@ the model from 0.81 to 0.82.

* :ref:``: an example of
classification using nearest centroid with different shrink thresholds.

.. _nca:

Neighborhood Components Analysis

.. sectionauthor:: William de Vazelhes <>

Neighborhood Components Analysis (NCA, :class:`NeighborhoodComponentsAnalysis`)
is a distance metric learning algorithm which aims to improve the accuracy of
nearest neighbors classification compared to the standard Euclidean distance.
The algorithm directly maximizes a stochastic variant of the leave-one-out
k-nearest neighbors (KNN) score on the training set. It can also learn a
low-dimensional linear projection of data that can be used for data
visualization and fast classification.

.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. centered:: |nca_illustration_1| |nca_illustration_2|

In the above illustrating figure, we consider some points from a randomly
generated dataset. We focus on the stochastic KNN classification of point no.
3. The thickness of a link between sample 3 and another point is proportional
to their distance, and can be seen as the relative weight (or probability) that
a stochastic nearest neighbor prediction rule would assign to this point. In
the original space, sample 3 has many stochastic neighbors from various
classes, so the right class is not very likely. However, in the projected space
learned by NCA, the only stochastic neighbors with non-negligible weight are
from the same class as sample 3, guaranteeing that the latter will be well
classified. See the :ref:`mathematical formulation <nca_mathematical_formulation>`
for more details.


Combined with a nearest neighbors classifier (:class:`KNeighborsClassifier`),
NCA is attractive for classification because it can naturally handle
multi-class problems without any increase in the model size, and does not
introduce additional parameters that require fine-tuning by the user.

NCA classification has been shown to work well in practice for data sets of
varying size and difficulty. In contrast to related methods such as Linear
Discriminant Analysis, NCA does not make any assumptions about the class
distributions. The nearest neighbor classification can naturally produce highly
irregular decision boundaries.

To use this model for classification, one needs to combine a
:class:`NeighborhoodComponentsAnalysis` instance that learns the optimal
transformation with a :class:`KNeighborsClassifier` instance that performs the
classification in the projected space. Here is an example using the two

>>> from sklearn.neighbors import (NeighborhoodComponentsAnalysis,
... KNeighborsClassifier)
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.pipeline import Pipeline
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... stratify=y, test_size=0.7, random_state=42)
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)])
>>>, y_train) # doctest: +ELLIPSIS
>>> print(nca_pipe.score(X_test, y_test)) # doctest: +ELLIPSIS
.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. centered:: |nca_classification_1| |nca_classification_2|

The plot shows decision boundaries for Nearest Neighbor Classification and
Neighborhood Components Analysis classification on the iris dataset, when
training and scoring on only two features, for visualisation purposes.

.. _nca_dim_reduction:

Dimensionality reduction

NCA can be used to perform supervised dimensionality reduction. The input data
are projected onto a linear subspace consisting of the directions which
minimize the NCA objective. The desired dimensionality can be set using the
parameter ``n_components``. For instance, the following figure shows a
comparison of dimensionality reduction with Principal Component Analysis
(:class:`sklearn.decomposition.PCA`), Linear Discriminant Analysis
(:class:`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`) and
Neighborhood Component Analysis (:class:`NeighborhoodComponentsAnalysis`) on
the Digits dataset, a dataset with size :math:`n_{samples} = 1797` and
:math:`n_{features} = 64`. The data set is split into a training and a test set
of equal size, then standardized. For evaluation the 3-nearest neighbor
classification accuracy is computed on the 2-dimensional projected points found
by each method. Each data sample belongs to one of 10 classes.

.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|

.. topic:: Examples:

* :ref:``
* :ref:``
* :ref:``

.. _nca_mathematical_formulation:

Mathematical formulation

The goal of NCA is to learn an optimal linear transformation matrix of size
``(n_components, n_features)``, which maximises the sum over all samples
:math:`i` of the probability :math:`p_i` that :math:`i` is correctly
classified, i.e.:

.. math::
\underset{L}{\arg\max} \sum\limits_{i=0}^{N - 1} p_{i}
with :math:`N` = ``n_samples`` and :math:`p_i` the probability of sample
:math:`i` being correctly classified according to a stochastic nearest
neighbors rule in the learned embedded space:

.. math::
p_{i}=\sum\limits_{j \in C_i}{p_{i j}}
where :math:`C_i` is the set of points in the same class as sample :math:`i`,
and :math:`p_{i j}` is the softmax over Euclidean distances in the embedded

.. math::
p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\limits_{k \ne
i} {\exp{-(||L x_i - L x_k||^2)}}} , \quad p_{i i} = 0
Mahalanobis distance

NCA can be seen as learning a (squared) Mahalanobis distance metric:

.. math::
|| L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),
where :math:`M = L^T L` is a symmetric positive semi-definite matrix of size
``(n_features, n_features)``.


This implementation follows what is explained in the original paper [1]_. For
the optimisation method, it currently uses scipy's L-BFGS-B with a full
gradient computation at each iteration, to avoid to tune the learning rate and
provide stable learning.

See the examples below and the docstring of
:meth:`` for further information.


NCA stores a matrix of pairwise distances, taking ``n_samples ** 2`` memory.
Time complexity depends on the number of iterations done by the optimisation
algorithm. However, one can set the maximum number of iterations with the
argument ``max_iter``. For each iteration, time complexity is
``O(n_components x n_samples x min(n_samples, n_features))``.

Here the ``transform`` operation returns :math:`LX^T`, therefore its time
complexity equals ``n_components * n_features * n_samples_test``. There is no
added space complexity in the operation.

.. topic:: References:

.. [1] `"Neighbourhood Components Analysis". Advances in Neural Information"
J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.
.. [2] `Wikipedia entry on Neighborhood Components Analysis
@@ -152,7 +152,7 @@ indices where the value is `1` represents the assigned classes of that sample::
>>> clf.predict([[0., 0.]])
array([[0, 1]])

See the examples below and the doc string of
See the examples below and the docstring of
:meth:`` for further information.

.. topic:: Examples:
@@ -154,7 +154,7 @@ one-vs-all classification.

:class:`SGDClassifier` supports both weighted classes and weighted
instances via the fit parameters ``class_weight`` and ``sample_weight``. See
the examples below and the doc string of :meth:`` for
the examples below and the docstring of :meth:`` for
further information.

.. topic:: Examples:
@@ -287,6 +287,12 @@ Support for Python 3.4 and below has been officially dropped.

- |MajorFeature| A metric learning algorithm:
:class:`neighbors.NeighborhoodComponentsAnalysis`, which implements the
Neighborhood Components Analysis algorithm described in Goldberger et al.
(2005). :issue:`10058` by :user:`William de Vazelhes
<wdevazelhes>` and :user:`John Chiotellis <johny-c>`.

- |API| Methods in :class:`neighbors.NearestNeighbors` :
Oops, something went wrong.

0 comments on commit d31b67f

Please sign in to comment.
You can’t perform that action at this time.