# [MRG+2] Neighborhood Components Analysis #10058

Merged
merged 89 commits into from Feb 28, 2019
+1,664 −23
Merged

# [MRG+2] Neighborhood Components Analysis#10058

Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
Filter file types
Failed to load files and symbols.

#### Just for now

Address reviews #10058 (review) and #10058 (review)

wdevazelhes committed Jan 18, 2019
commit 44839a022ba6c1810f2c20299081541dce63e869
@@ -537,9 +537,8 @@ data visualization and fast classification.

.. centered:: |nca_illustration_1| |nca_illustration_2|

In the above illustrating figure, we consider some points from a randomly
generated dataset. We focus on the stochastic KNN classification of point 3,
generated dataset. We focus on the stochastic KNN classification of point no. 3,
the thickness of a bond representing a softmax distance hence the weight of the

#### bellet Feb 25, 2019

Contributor

maybe clarify a bit this sentence because at this point there is not so much context on the method for the user to rely on

proposition:
"The thickness of a link between sample 3 and another point is proportional to their distance, and can be seen as the relative weight (or probability) that a stochastic nearest neighbor prediction rule would assign to this point."

it is possible to refer to the mathematical formulation section for details, but maybe it is better not to do this to avoid confusing users

#### wdevazelhes Feb 26, 2019

Author Contributor

Yes I agree, I'll go for your proposition it's clearer
I'll add a link to the mathematical formulation, they could leave it alone and read the examples before, depending on what they would prefer

neighbor vote in the classification. In the original space, sample 3 has many
stochastic neighbors from various classes, so the right class is not very
@@ -568,26 +567,6 @@ transformation with a :class:`KNeighborsClassifier` instance that performs the
classification in the embedding space. Here is an example using the two
classes:

>>> from sklearn.neighbors import NeighborhoodComponentsAnalysis
>>> from sklearn.neighbors import KNeighborsClassifier
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(X, y,
... stratify=y, test_size=0.7, random_state=42)
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> nca.fit(X_train, y_train) # doctest: +ELLIPSIS
NeighborhoodComponentsAnalysis(...)
>>> # Apply the learned transformation when using KNeighborsClassifier
>>> knn = KNeighborsClassifier(n_neighbors=3)
>>> knn.fit(nca.transform(X_train), y_train) # doctest: +ELLIPSIS
KNeighborsClassifier(...)
>>> print(knn.score(nca.transform(X_test), y_test)) # doctest: +ELLIPSIS
0.96190476...
Alternatively, one can create a :class:`sklearn.pipeline.Pipeline` instance
that automatically applies the transformation when fitting or predicting:

>>> from sklearn.pipeline import Pipeline
>>> nca = NeighborhoodComponentsAnalysis(random_state=42)
>>> knn = KNeighborsClassifier(n_neighbors=3)
 @@ -21,8 +21,8 @@ from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier, \ NeighborhoodComponentsAnalysis from sklearn.neighbors import (KNeighborsClassifier, NeighborhoodComponentsAnalysis) from sklearn.pipeline import Pipeline
 @@ -35,8 +35,8 @@ from sklearn.model_selection import train_test_split from sklearn.decomposition import PCA from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.neighbors import KNeighborsClassifier, \ NeighborhoodComponentsAnalysis from sklearn.neighbors import (KNeighborsClassifier, NeighborhoodComponentsAnalysis) from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler
@@ -12,7 +12,7 @@
import sys
import time
from scipy.optimize import minimize
from ..utils.fixes import logsumexp
from ..utils.extmath import softmax
from ..metrics import pairwise_distances
from ..base import BaseEstimator, TransformerMixin
from ..preprocessing import LabelEncoder
@@ -27,18 +27,25 @@
class NeighborhoodComponentsAnalysis(BaseEstimator, TransformerMixin):
"""Neighborhood Components Analysis
This conversation was marked as resolved by GaelVaroquaux
Neighborhood Component Analysis (NCA) is a machine learning algorithm for
metric learning. It learns a linear transformation in a supervised fashion
to improve the classification accuracy of a stochastic nearest neighbors
rule in the transformed space.
Read more in the :ref:`User Guide <NeighborhoodComponentsAnalysis>`.
Parameters
----------
n_components : int, optional (default=None)
Preferred dimensionality of the embedding.
If None it is inferred from ``init``.
If None it will be set to ``n_features``.
init : string or numpy array, optional (default='auto')
Initialization of the linear transformation. Possible options are
'auto', 'pca', 'lda', 'identity', 'random', and a numpy array of shape
(n_features_a, n_features_b).
'auto':
'auto'
Depending on ``n_components``, the most reasonable initialization
will be chosen among the following ones. First, we try to use
'lda', as it uses labels information: if ``n_components <=
@@ -47,29 +54,29 @@ class NeighborhoodComponentsAnalysis(BaseEstimator, TransformerMixin):
if ``n_components < min(n_features, n_samples)``, ``init = 'pca'``.
Otherwise, we just use 'identity'.

#### jnothman Jan 15, 2019

Member

so identity is used if init='auto' and n_components=None?

#### wdevazelhes Jan 18, 2019

Author Contributor

Good question, I did not think this through... Looking at the code of `_initialize` not always: the only possible case where it would not be identity would be if `n_classes - 1 >= n_features`. In this case the init would be 'lda', which I think is a reasonable init in this case. In fact the reasoning from the code is just that if `n_components` is None, then we set `n_components` to `n_features` and then we choose the appropriate auto init. Therefore I'll replace indeed this:

```    n_components : int, optional (default=None)
Preferred dimensionality of the embedding.
If None it is inferred from ``init``.```

by this

```    n_components : int, optional (default=None)
Preferred dimensionality of the embedding.
If None it will be set to ``n_features``.```

Tell me if you think of something else

#### GaelVaroquaux Feb 25, 2019

Member

This looks good. We should test this in the tests, though.

#### agramfort Feb 27, 2019

Member

the above docstring can be simplified to be clearer.

``````If n_components < n_features, 'lda' init is used.
If XXX then XXX is used.
...
``````

currently it's a bit convoluted

pca:
``n_components`` many principal components of the inputs passed
'pca'
``n_components`` principal components of the inputs passed
to :meth:`fit` will be used to initialize the transformation.

#### agramfort Nov 17, 2017

Member

all params are not indented the same way

#### wdevazelhes Nov 22, 2017

Author Contributor

This is because `pca`, `identity`, `random` and `numpy array` are not arguments but they are possible choices for argument `init`. I took the syntax from LMNN. Should I write it in another way ?

(See :class:`~sklearn.decomposition.PCA`)
(See :class:`PCA`)
lda:
``min(n_components, n_classes)`` many most discriminative
'lda'
``min(n_components, n_classes)`` most discriminative
components of the inputs passed to :meth:`fit` will be used to
initialize the transformation. (If ``n_components > n_classes``,
the rest of the components will be zero.) (See
:class:`~sklearn.discriminant_analysis.LinearDiscriminantAnalysis`)
:class:`LinearDiscriminantAnalysis`)
identity:
'identity'
If ``n_components`` is strictly smaller than the
dimensionality of the inputs passed to :meth:`fit`, the identity
matrix will be truncated to the first ``n_components`` rows.
random:
'random'
The initial transformation will be a random array of shape
(n_components, n_features). Each value is sampled from the
standard normal distribution.
numpy array:
numpy array
n_features_b must match the dimensionality of the inputs passed to
:meth:`fit` and n_features_a must be less than or equal to that.
If ``n_components`` is not None, n_features_a must match it.
@@ -162,13 +169,6 @@ class NeighborhoodComponentsAnalysis(BaseEstimator, TransformerMixin):
>>> print(knn.score(nca.transform(X_test), y_test)) # doctest: +ELLIPSIS
0.961904...
Notes
-----
Neighborhood Component Analysis (NCA) is a machine learning algorithm for
metric learning. It learns a linear transformation in a supervised fashion
to improve the classification accuracy of a stochastic nearest neighbors
rule in the transformed space.
References
----------
.. [1] J. Goldberger, G. Hinton, S. Roweis, R. Salakhutdinov.
@@ -184,8 +184,6 @@ class NeighborhoodComponentsAnalysis(BaseEstimator, TransformerMixin):
def __init__(self, n_components=None, init='auto', warm_start=False,
max_iter=50, tol=1e-5, callback=None, store_opt_result=False,
verbose=0, random_state=None):

# Parameters
self.n_components = n_components
self.init = init
self.warm_start = warm_start
@@ -215,26 +213,26 @@ def fit(self, X, y):

# Verify inputs X and y and NCA parameters, and transform a copy if
# needed
X_valid, y_valid, init = self._validate_params(X, y)
X, y, init = self._validate_params(X, y)

# Initialize the random generator
self.random_state_ = check_random_state(self.random_state)

# Measure the total training time
t_train = time.time()

# Compute mask that stays fixed during optimization:
mask = y_valid[:, np.newaxis] == y_valid[np.newaxis, :]
# Compute a mask that stays fixed during optimization:
same_class_mask = y[:, np.newaxis] == y[np.newaxis, :]
# (n_samples, n_samples)

# Initialize the transformation
transformation = self._initialize(X_valid, y_valid, init)
transformation = self._initialize(X, y, init)

# Create a dictionary of parameters to be passed to the optimizer
disp = self.verbose - 2 if self.verbose > 1 else -1
optimizer_params = {'method': 'L-BFGS-B',
'jac': True,
'x0': transformation,
'tol': self.tol,
@@ -247,7 +245,7 @@ def fit(self, X, y):
opt_result = minimize(**optimizer_params)

# Reshape the solution found by the optimizer
self.components_ = opt_result.x.reshape(-1, X_valid.shape[1])
self.components_ = opt_result.x.reshape(-1, X.shape[1])

# Stop timer
t_train = time.time() - t_train
@@ -305,10 +303,10 @@ def _validate_params(self, X, y):
Returns
-------
X_valid : array, shape (n_samples, n_features)
X : array, shape (n_samples, n_features)
The validated training samples.
y_valid : array, shape (n_samples,)
y : array, shape (n_samples,)
The validated training labels, encoded to be integers in
the range(0, n_classes).
@@ -326,9 +324,9 @@ def _validate_params(self, X, y):
"""

# Validate the inputs X and y, and converts y to numerical classes.
X_valid, y_valid = check_X_y(X, y, ensure_min_samples=2)
check_classification_targets(y_valid)
y_valid = LabelEncoder().fit_transform(y_valid)
X, y = check_X_y(X, y, ensure_min_samples=2)
check_classification_targets(y)
y = LabelEncoder().fit_transform(y)

# Check the preferred embedding dimensionality
if self.n_components is not None:
@@ -366,12 +364,12 @@ def _validate_params(self, X, y):
init = check_array(init)

# Assert that init.shape[1] = X.shape[1]
if init.shape[1] != X_valid.shape[1]:
if init.shape[1] != X.shape[1]:
raise ValueError(
'The input dimensionality ({}) of the given '
'linear transformation `init` must match the '
'dimensionality of the given inputs `X` ({}).'
.format(init.shape[1], X_valid.shape[1]))
.format(init.shape[1], X.shape[1]))

# Assert that init.shape[0] <= init.shape[1]
if init.shape[0] > init.shape[1]:
@@ -397,7 +395,7 @@ def _validate_params(self, X, y):
"`init` must be 'auto', 'pca', 'lda', 'identity', 'random' "
"or a numpy array of shape (n_components, n_features).")

return X_valid, y_valid, init
return X, y, init

def _initialize(self, X, y, init):
"""Initialize the transformation.
@@ -423,7 +421,6 @@ def _initialize(self, X, y, init):
transformation = init
if self.warm_start and hasattr(self, 'components_'):
transformation = self.components_

elif isinstance(init, np.ndarray):
pass
else:
@@ -479,19 +476,19 @@ def _callback(self, transformation):

self.n_iter_ += 1

"""Compute the loss and the loss gradient w.r.t. ``transformation``.
Parameters
----------
transformation : array, shape (n_components, n_features)
The linear transformation on which to compute loss and evaluate
transformation : array, shape (n_components * n_features,)
The raveled linear transformation on which to compute loss and evaluate
X : array, shape (n_samples, n_features)
The training samples.
mask : array, shape (n_samples, n_samples)
same_class_mask : array, shape (n_samples, n_samples)
A mask where ``mask[i, j] == 1`` if ``X[i]`` and ``X[j]`` belong
to the same class, and ``0`` otherwise.
# Compute softmax distances
p_ij = pairwise_distances(X_embedded, squared=True)
np.fill_diagonal(p_ij, np.inf)
p_ij = np.exp(-p_ij - logsumexp(-p_ij, axis=1)[:, np.newaxis])
# (n_samples, n_samples)
p_ij = softmax(-p_ij) # (n_samples, n_samples)

# Compute loss
p = np.sum(masked_p_ij, axis=1, keepdims=True) # (n_samples, 1)
loss = np.sum(p)

# Compute gradient of loss w.r.t. `transform`
weighted_p_ij = masked_p_ij - p_ij * p
weighted_p_ij_sym = weighted_p_ij + weighted_p_ij.T
np.fill_diagonal(weighted_p_ij_sym, - weighted_p_ij.sum(axis=0))
np.fill_diagonal(weighted_p_ij_sym, -weighted_p_ij.sum(axis=0))
# time complexity of the gradient: O(n_components x n_samples x (

#### GaelVaroquaux Feb 25, 2019

Member

I think that the empty line above is spurious.

#### wdevazelhes Feb 25, 2019

Author Contributor

yes indeed, thanks

# n_samples + n_features))

if self.verbose:
@@ -572,7 +569,7 @@ def _check_scalar(x, name, target_type, min_val=None, max_val=None):
The minimum value value the parameter can take. If None (default) it
is implied that the parameter does not have a lower bound.
max_val: float or int, optional (default=None)
max_val : float or int, optional (default=None)
The maximum valid value the parameter can take. If None (default) it
is implied that the parameter does not have an upper bound.
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.