Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Neighborhood Components Analysis #10058

Merged
merged 89 commits into from Feb 28, 2019
Merged
Changes from 1 commit
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
849a8d8
first commit
wdevazelhes Oct 27, 2017
04222de
minor corrections in docstring
wdevazelhes Oct 27, 2017
34c5457
remove comment
wdevazelhes Oct 27, 2017
89f68ee
Add verbose during iterations
wdevazelhes Oct 30, 2017
42e078a
Update code according to code review:
wdevazelhes Oct 31, 2017
4c7c0d4
Remove _make_masks and use OneHotEncoder instead
wdevazelhes Oct 31, 2017
4c81a16
precise that distances are squared
wdevazelhes Oct 31, 2017
824e940
remove useless None
wdevazelhes Oct 31, 2017
d4294ac
simplify tests
wdevazelhes Oct 31, 2017
296e295
ensure min samples = 2 to make check_fit2d_1sample pass
wdevazelhes Nov 2, 2017
616f9a2
Do not precompute pairwise differences
wdevazelhes Nov 7, 2017
12cf3a9
add example
wdevazelhes Nov 14, 2017
7b37e8d
reorganize transposes
wdevazelhes Nov 14, 2017
48cab11
simplify gradient
wdevazelhes Nov 14, 2017
47928aa
Fixes according to code review
wdevazelhes Nov 22, 2017
4612e5f
Retrieving LMNN documentation in order to adapt it to NCA
wdevazelhes Dec 13, 2017
27ab46b
Adapt documentation to Neighborhood Components Analysis
wdevazelhes Dec 29, 2017
44e19d6
fix pep8 errors
wdevazelhes Jan 3, 2018
dcb1a8a
fix flake8 error
wdevazelhes Jan 3, 2018
6ba1692
fix encoding error
wdevazelhes Jan 3, 2018
03b126b
changes according to review https://github.com/scikit-learn/scikit-le…
wdevazelhes Jan 15, 2018
8b5646c
correct objective function doc
wdevazelhes Jan 15, 2018
a7f6458
Merge branch 'master' into nca
wdevazelhes May 28, 2018
9a09e29
Add batch computations of loss and gradient.
wdevazelhes Jun 5, 2018
7721221
Update documentation.
wdevazelhes Jun 5, 2018
d5de730
Merge branch 'master' into nca
wdevazelhes Jun 5, 2018
173a966
FIX: import scipy.misc.logsumexp for older versions of scipy, and sci…
wdevazelhes Jun 6, 2018
2cd3bf6
FIX: remove newly introduced keepdims for logsumexp
wdevazelhes Jun 7, 2018
c50c841
FIX: remove unused old masks and use the new mask instead
wdevazelhes Jun 7, 2018
094aa97
FIX: fix doctest CI fail by putting ellipsis
wdevazelhes Jun 20, 2018
e6daf4e
FIX: fix doctest CI fail by putting ellipsis, this time in rst file
wdevazelhes Jun 20, 2018
e160a6e
FIX: fix doctest CI fail by putting ellipsis, this time in rst file
wdevazelhes Jun 20, 2018
fbc679b
Updates to be coherent with latest changes from pr #8602 (commits htt…
wdevazelhes Jun 22, 2018
1e93e82
Merge branch 'nca_feat/comments_changes' into nca
wdevazelhes Jun 22, 2018
92faf4f
ENH: Add warm_start feature from LMNN (PR #8602)
wdevazelhes Jun 22, 2018
b172898
FIX: rename remaining old n_features_out to n_components
wdevazelhes Jun 22, 2018
816f3de
FIX: Update doc like in commit https://github.com/scikit-learn/scikit…
wdevazelhes Jun 22, 2018
85b2cdd
FIX: make test_warm_start_effectiveness_work
wdevazelhes Jun 22, 2018
4ed68dd
ENH: Add possible LDA initialization
wdevazelhes Jun 22, 2018
1f9c208
ENH: add 'auto' initialization
wdevazelhes Jun 25, 2018
b0a96f9
Merge branch 'master' into nca
wdevazelhes Jun 25, 2018
e050128
FIX test appropriate message depending on init
wdevazelhes Jun 25, 2018
ead9850
FIX import name with relative path
wdevazelhes Jun 25, 2018
a807df2
FIX simplify test and check almost equal to pass tests on linux 32 bits
wdevazelhes Jun 25, 2018
e00d4a1
FIX Move LDA import inside NCA class to avoid circular dependencies
wdevazelhes Jun 26, 2018
aa90c9b
DOC add what s new entry
wdevazelhes Jun 28, 2018
85bd54f
MAINT simplify gradient testing
wdevazelhes Jun 29, 2018
aa9ace7
TST FIX be more tolerant on decimals for older versions of numerical …
wdevazelhes Jun 29, 2018
cc07261
STY fix continuation lines, removing backslashes
wdevazelhes Jun 29, 2018
16cf04d
FIX: fix logsumexp import
wdevazelhes Jul 15, 2018
8c7af3c
TST: simplify verbose testing with pytest capsys
wdevazelhes Jul 23, 2018
8ce872f
Merge branch 'master' into nca
wdevazelhes Jul 23, 2018
27f2b5c
TST: check more explicitely verbose
wdevazelhes Aug 1, 2018
85f8d21
FIX: remove non-ASCII character
wdevazelhes Aug 1, 2018
396f30f
ENH: simplify gradient expression
wdevazelhes Aug 17, 2018
8830373
MAINT: address review https://github.com/scikit-learn/scikit-learn/pu…
wdevazelhes Nov 29, 2018
16b022a
Merge branch 'master' into nca
wdevazelhes Nov 29, 2018
ded5ecb
DOC: Add what's new entry
wdevazelhes Nov 29, 2018
648ed5f
Merge branch 'master' into nca
wdevazelhes Dec 6, 2018
589f57d
FIX: try raw string to pass flake8 (cf. https://github.com/iodide-pro…
wdevazelhes Dec 6, 2018
600adf2
FIX: try the exact syntax that passed the linter
wdevazelhes Dec 6, 2018
d274c4a
TST: give some tolerance for test_toy_example_collapse_points
wdevazelhes Dec 6, 2018
2dbf064
relaunch travis
wdevazelhes Dec 7, 2018
e17003e
FIX: use checked_random_state instead of np.random
wdevazelhes Dec 12, 2018
32118aa
FIX: delete iterate.dat
wdevazelhes Dec 12, 2018
5c2154f
Merge branch 'master' into nca
wdevazelhes Dec 12, 2018
cf55015
FIX: Fix dealing with the case of LinearDiscriminantAnalysis initiali…
wdevazelhes Dec 12, 2018
44839a0
Address reviews https://github.com/scikit-learn/scikit-learn/pull/100…
wdevazelhes Jan 18, 2019
822620d
STY: fix PEP8 line too long error
wdevazelhes Jan 18, 2019
41d3cef
Fix doctest
wdevazelhes Jan 18, 2019
faa84fc
FIX: remove deprecated assert_true
wdevazelhes Jan 22, 2019
db2950a
TST fix assertion always true in tests
wdevazelhes Jan 22, 2019
f16770c
TST: fix PEP8 indent error
wdevazelhes Jan 22, 2019
4f7375e
Merge branch 'master' into nca
wdevazelhes Jan 22, 2019
49189c6
API: remove the possibility to store the opt_result (see https://gith…
wdevazelhes Jan 22, 2019
0fda2ca
Merge branch 'master' into nca
wdevazelhes Feb 25, 2019
f015bad
Move examples up in documentation and add NCA to manifold examples
wdevazelhes Feb 25, 2019
0e5d5b3
STY: fix pep8 errors
wdevazelhes Feb 25, 2019
77dc953
adress gael's review except https://github.com/scikit-learn/scikit-le…
wdevazelhes Feb 26, 2019
a653189
Address aurelien's review
wdevazelhes Feb 26, 2019
be9b1e1
Simplify test about auto init even more
wdevazelhes Feb 26, 2019
2b1c8f2
Fix doc and replace embedding by projection for consistency
wdevazelhes Feb 26, 2019
af14e5d
Address Gael's review
wdevazelhes Feb 26, 2019
3a78d1a
few nitpicks and make some links in the doc work
wdevazelhes Feb 27, 2019
58d169c
Address alex's review
wdevazelhes Feb 27, 2019
fbd28e1
Adress Alex's review
wdevazelhes Feb 28, 2019
8d65ebc
Add authors in test too
wdevazelhes Feb 28, 2019
ed0d23a
add check_scalar to utils
wdevazelhes Feb 28, 2019
6dbef86
MajorFeature > API
jnothman Feb 28, 2019
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.

Always

Just for now

Retrieving LMNN documentation in order to adapt it to NCA

  • Loading branch information...
wdevazelhes committed Dec 13, 2017
commit 4612e5f049d5f327b8c97b58aa9df6103c6ed796
@@ -514,3 +514,229 @@ the model from 0.81 to 0.82.

* :ref:`sphx_glr_auto_examples_neighbors_plot_nearest_centroid.py`: an example of
classification using nearest centroid with different shrink thresholds.


This conversation was marked as resolved by GaelVaroquaux

This comment has been minimized.

Copy link
@jnothman

jnothman Nov 18, 2018

Member

rm extra blank line

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Nov 29, 2018

Author Contributor

done


.. _nca:

Neighborhood Components Analysis
================================

.. sectionauthor:: William de Vazelhes <william.de-vazelhes@inria.fr>

Neighborhood Components Analysis (NCA,
:class:`NeighborhoodComponentAnalysis`) is
a distance metric learning algorithm which aims to improve the accuracy of
nearest neighbors classification compared to the standard Euclidean distance.

.. |nca_illustration_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_001.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. |nca_illustration_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_illustration_002.png
:target: ../auto_examples/neighbors/plot_nca_illustration.html
:scale: 50

.. centered:: |nca_illustration_1| |nca_illustration_2|

The algorithm directly maximizes a stochastic variant of the
leave-one-out k-nearest neighbors (KNN) score on the training set. It can also
learn a low-dimensional linear embedding of labeled data that can be
used for data visualization and fast classification. Unlike other methods,
our classification model is non-parametric, making no assumptions about the
shape of the class distributions or the boundaries between them. The
performance of the method is demonstrated on several data sets, both for metric
learning and linear dimensionality reduction.
In the above figure, we consider some points from a randomly generated
dataset. We focus on the stochastic KNN classification of point n°3. In the
original
space, it has
many stochastic neighbors, the thickness of the bond representing the
softmax distance to it hence their weight in the prediction of its class.
However, in the embedding space, the only non-negligible stochastic
neighbors are from the same class as sample 3, guaranteeing that it will be
well classified.

This comment has been minimized.

Copy link
@jnothman

jnothman Jan 10, 2019

Member

drop blank

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Jan 18, 2019

Author Contributor

Done


This conversation was marked as resolved by GaelVaroquaux

This comment has been minimized.

Copy link
@jnothman

jnothman Nov 18, 2018

Member

too many blank

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Nov 29, 2018

Author Contributor

done


Classification
--------------

Combined with a nearest neighbors classifier (:class:`KNeighborsClassifier`),
this method is attractive for classification because it can naturally

This comment has been minimized.

Copy link
@bellet

bellet Jan 9, 2018

Contributor

this method --> NCA

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Jan 15, 2018

Author Contributor

Done.

handle multi-class problems without any increase in the model size, and only
a single parameter (``n_neighbors``) has to be selected by the user before
training.

Neighborhood Components Analysis classification has been shown to work well in
practice for data sets of varying size and difficulty. In contrast to
related methods such as Linear Discriminant Analysis, NCA does not make any
assumptions about the class distributions. The nearest neighbor classification
can naturally produce highly irregular decision boundaries.

To use this model for classification, one needs to combine a
:class:`NeighborhoodComponentsAnalysis`
instance that learns the optimal transformation with a :class:`KNeighborsClassifier`
instance that performs the classification in the embedded space. Here is an
example using the two classes:

>>> from sklearn.neighbors import NeighborhoodComponentsAnalysis
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... stratify=y, test_size=0.7, random_state=42)
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> nca.fit(X_train, y_train) # doctest: +ELLIPSIS
NeighborhoodComponentsAnalysis(...)
>>> # Apply the learned transformation when using KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> knn.fit(nca.transform(X_train), y_train) # doctest: +ELLIPSIS
KNeighborsClassifier(...)
>>> print(knn.score(nca.transform(X_test), y_test))
0.961904761905
Alternatively, one can create a :class:`sklearn.pipeline.Pipeline` instance

This comment has been minimized.

Copy link
@jnothman

jnothman Jan 10, 2019

Member

I don't think we have to illustrate both alternatives. This is standard Pipeline usage. I'm happy to only present the Pipeline version.

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Jan 18, 2019

Author Contributor

Agreed, I'll just delete this part:

    >>> from sklearn.neighbors import NeighborhoodComponentsAnalysis
    >>> from sklearn.neighbors import KNeighborsClassifier
    >>> from sklearn.datasets import load_iris
    >>> from sklearn.model_selection import train_test_split
    >>> X, y = load_iris(return_X_y=True)
    >>> X_train, X_test, y_train, y_test = train_test_split(X, y,
    ... stratify=y, test_size=0.7, random_state=42)
    >>> nca = NeighborhoodComponentsAnalysis(random_state=42)
    >>> nca.fit(X_train, y_train) # doctest: +ELLIPSIS
    NeighborhoodComponentsAnalysis(...)
    >>> # Apply the learned transformation when using KNeighborsClassifier
    >>> knn = KNeighborsClassifier(n_neighbors=3)
    >>> knn.fit(nca.transform(X_train), y_train) # doctest: +ELLIPSIS
    KNeighborsClassifier(...)
    >>> print(knn.score(nca.transform(X_test), y_test)) # doctest: +ELLIPSIS
    0.96190476...

Alternatively, one can create a :class:`sklearn.pipeline.Pipeline` instance
that automatically applies the transformation when fitting or predicting:
that automatically applies the transformation when fitting or predicting:

>>> from sklearn.pipeline import Pipeline
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> nca_pipe = Pipeline([('nca', nca), ('knn', knn)])
>>> nca_pipe.fit(X_train, y_train) # doctest: +ELLIPSIS
Pipeline(...)
>>> print(nca_pipe.score(X_test, y_test))
0.961904761905
.. |nca_classification_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_001.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. |nca_classification_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_classification_002.png
:target: ../auto_examples/neighbors/plot_nca_classification.html
:scale: 50

.. centered:: |nca_classification_1| |nca_classification_2|


The plot shows decision boundaries for nearest neighbor classification and
large margin nearest neighbor classification.


Dimensionality reduction
------------------------

:class:`NeighborhoodComponentsAnalysis` can be used to perform supervised
dimensionality reduction. The input data are projected onto a linear subspace
consisting of the directions which minimize the NCA objective. The desired
dimensionality can be set using the parameter ``n_features_out``.
For instance, the following shows a comparison of dimensionality reduction
with Principal Component Analysis (:class:`sklearn.decomposition.PCA`),
Linear Discriminant Analysis (:class:`sklearn.discriminant_analysis.LinearDiscriminantAnalysis`)
and Neighborhood Component Analysis (:class:`NeighborhoodComponentsAnalysis`)
on the Digits dataset, a dataset with size
:math:`n_{samples} = 1797` and :math:`n_{features} = 64`.
The data set is splitted in a training and test set of equal size. What is
more, a :class:`sklearn.preprocessing.StandardScaler` fitted on the training
set and
transforms
the data from both sets. For evaluation the 3-nearest neighbor classification
accuracy is
computed on the
2-dimensional embedding found by each method. Each data sample belongs to one
of 10 classes.

.. |nca_dim_reduction_1| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_001.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_2| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_002.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. |nca_dim_reduction_3| image:: ../auto_examples/neighbors/images/sphx_glr_plot_nca_dim_reduction_003.png
:target: ../auto_examples/neighbors/plot_nca_dim_reduction.html
:width: 32%

.. centered:: |nca_dim_reduction_1| |nca_dim_reduction_2| |nca_dim_reduction_3|


Mathematical formulation
------------------------

NCA learns a linear transformation matrix :math:`L` of
size ``(n_features_out, n_features)``. NCA maximises in average the
probability :math:`p_i` of sample :math:`i` being
classified as :math:`C_i`, where :math:`p_i` is a weighted sum of all other
samples of
class :math:`C_i`, with a weighting related to their distance to :math:`i`.

The contribution of sample :math:`i` to the cost function is therefore the
following (it is the probability of sample :math:`i` to be classify as
:math:`C`):

.. math::
p_{i}=\sum\nolimits_{j \in C_i}{p_{i j}}
where :math:`C_i` is the set of points in the same class as sample :math:`i`,
and :math:`p_{i j}` is the softmax over Euclidean distances in the
transformed space:

.. math::
p_{i j} = \frac{\exp(-||L x_i - L x_j||^2)}{\sum\nolimits_{k \ne
i} {\exp{-(||L x_i - L x_k||^2)}}} , p_{i i} = 0

This comment has been minimized.

Copy link
@bellet

bellet Jan 9, 2018

Contributor

could add \quad before p_{i i} = 0 to improve readability

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Jan 15, 2018

Author Contributor

Done.

Mahalanobis distance
^^^^^^^^^^^^^^^^^^^^

NCA can be seen as learning a (squared) Mahalanobis distance metric:

.. math::
|| L(x_i - x_j)||^2 = (x_i - x_j)^TM(x_i - x_j),
where :math:`M = L^T L` is a symmetric positive semi-definite matrix of size
``(n_features_out, n_features_out)``.


Implementation
--------------

This implementation follows what is explained in the paper. For the

This comment has been minimized.

Copy link
@bellet

bellet Jan 9, 2018

Contributor

in the original paper

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Jan 15, 2018

Author Contributor

Done.

optimisation method,
currently it uses scipy's l-bfgs-b optimization method with a full gradient
computation at each iteration, to avoid to tune the
learning rate and provide a stable learning.

See the examples below and the doc string of
:meth:`NeighborhoodComponentsAnalysis.fit`
for further information.

Complexity
----------

All pairwise differences are needed to compute the cost function, at each
iteration, so the complexity is :math:`O(d*n^2*i)` with :math:`d` the
dimension of the input space, :math:`n` the number of samples, and :math:`i`
the number of iterations.


.. topic:: Examples:

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Feb 25, 2019

Member

Nitpick: I think that I would like the examples listed earlier in the documentation, before mathematical and implementation details, as most end users do not need to understand the mathematical aspects.

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 25, 2019

Author Contributor

I agree, done

* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_classification.py`
* :ref:`sphx_glr_auto_examples_neighbors_plot_nca_dim_reduction.py`


.. topic:: References:

This comment has been minimized.

Copy link
@GaelVaroquaux

GaelVaroquaux Feb 26, 2019

Member

I think that these need either to be in a "topic", or in as footnotes: currently, they do not render right
https://48180-843222-gh.circle-artifacts.com/0/doc/modules/neighbors.html#transform

This is because the indentation is not correct.

You could remove the "topic" block, and add the following:

___________

**References**

Where the '__________' inserts an hrule.

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 26, 2019

Author Contributor

Thanks, I went for fixing the indentation, it should work like at the end of this section: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/decomposition.rst#truncated-singular-value-decomposition-and-latent-semantic-analysis
I'll try to build the doc locally to be faster than circleci to check if it works

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 27, 2019

Author Contributor

I just saw it and it works :)

* | `"Neighbourhood Components Analysis". Advances in Neural Information"
<http://www.cs.nyu.edu/~roweis/papers/ncanips.pdf>`_,
| J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov, Advances in
| Neural Information Processing Systems, Vol. 17, May 2005, pp. 513-520.

* `Wikipedia entry on Neighborhood Components Analysis
<https://en.wikipedia.org/wiki/Neighbourhood_components_analysis>`_
@@ -0,0 +1,87 @@
"""
============================================================================
Comparing Nearest Neighbors and Neighborhood Components Analysis

This comment has been minimized.

Copy link
@bellet

bellet Feb 25, 2019

Contributor

use with and without Neighborhood Components Analysis to avoid the confusion that NCA is itself a classifier

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 26, 2019

Author Contributor

You're right, thanks

============================================================================
An example comparing nearest neighbors classification with and without
Neighborhood Components Analysis.
It will plot the decision boundaries for each class determined by a simple
Nearest Neighbors classifier against the decision boundaries determined by a
Neighborhood Components Analysis classifier. The latter aims to find a distance

This comment has been minimized.

Copy link
@bellet

bellet Feb 25, 2019

Contributor

again as above. maybe:

It will plot the class decision boundaries given by a Nearest Neighbors classifier when using the Euclidean distance on the original features, versus using the Euclidean distance after the transformation learned by
Neighborhood Components Analysis. The latter aims to find a linear transformation that maximises the (stochastic) nearest neighbor classification accuracy on the training set.

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 26, 2019

Author Contributor

Thanks, that was confusing indeed

metric that maximizes the nearest neighbor classification accuracy on a given
training set.
"""

# License: BSD 3 clause

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier, \
NeighborhoodComponentsAnalysis
from sklearn.pipeline import Pipeline


print(__doc__)

n_neighbors = 1

dataset = datasets.load_iris()
X, y = dataset.data, dataset.target

# we only take the first two features. We could avoid this ugly
# slicing by using a two-dim datasets

This comment has been minimized.

Copy link
@agramfort

agramfort Feb 27, 2019

Member

two-dim datasets -> two-dim dataset

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 27, 2019

Author Contributor

Thanks, done

This comment has been minimized.

Copy link
@agramfort

agramfort Feb 27, 2019

Member

not done yet

This comment has been minimized.

Copy link
@wdevazelhes

wdevazelhes Feb 28, 2019

Author Contributor

Oh, sorry, I must have changed a file in auto-examples, I'll double check your other comments to make sure I didn't do it elsewhere

X = X[:, [0, 2]]

X_train, X_test, y_train, y_test = \
train_test_split(X, y, stratify=y, test_size=0.7, random_state=42)

h = .01 # step size in the mesh

# Create color maps
cmap_light = ListedColormap(['#FFAAAA', '#AAFFAA', '#AAAAFF'])
cmap_bold = ListedColormap(['#FF0000', '#00FF00', '#0000FF'])

names = ['K-Nearest Neighbors', 'Neighborhood Components Analysis']

classifiers = [Pipeline([('scaler', StandardScaler()),
('knn', KNeighborsClassifier(n_neighbors=n_neighbors))
]),
Pipeline([('scaler', StandardScaler()),
('nca', NeighborhoodComponentsAnalysis()),
('knn', KNeighborsClassifier(n_neighbors=n_neighbors))
])
]

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))

for name, clf in zip(names, classifiers):

clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light, alpha=.8)

# Plot also the training and testing points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor='k', s=20)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("{} (k = {})".format(name, n_neighbors))
plt.text(0.9, 0.1, '{:.2f}'.format(score), size=15,
ha='center', va='center', transform=plt.gca().transAxes)

plt.show()
@@ -77,9 +77,10 @@
# Make a list of the methods to be compared
dim_reduction_methods = [('PCA', pca), ('LDA', lda), ('NCA', nca)]

plt.figure()
# plt.figure()
for i, (name, model) in enumerate(dim_reduction_methods):
plt.subplot(1, 3, i + 1)
plt.figure()
# plt.subplot(1, 3, i + 1, aspect=1)

# Fit the method's model
model.fit(X_train, y_train)
@@ -94,7 +95,7 @@
X_embedded = model.transform(X)

# Plot the embedding and show the evaluation score
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y)
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y, s=30, cmap='Set1')
plt.title("{}, KNN (k={})\nTest accuracy = {:.2f}".format(name,
n_neighbors,
acc_knn))
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.