[MRG+2] Neighborhood Components Analysis #10058

wdevazelhes · 2017-11-02T13:30:16Z

Hi, this PR is an implementation of the Neighborhood Components Analysis algorithm (NCA), a popular supervised distance metric learning algorithm. As LMNN (cf PR #8602) this algorithm takes as input a labeled dataset, instead of similar/dissimilar pairs like it is the case for most metric learning algorithms, and learns a linear transformation of the space. However, NCA and LMNN have different objective functions: NCA tries to maximise the probability of every sample to be correctly classified based on a stochastic nearest neighbors rule, and therefore does not need to fix in advance a set of target neighbors.

There have been several attempts to implement NCA (2 PRs: #5276 (closed) and #4789 (not closed)). I created a fresh PR for sake of clarity. Indeed, this code is intended to be as similar to LMNN as possible, which should allow the factorisation of some points of code which are the same in both algorithms.

At the time of writing, this algorithm uses scipy's L-BFGS-B solver to solve the optimisation problem, like LMNN. It has the big advantage of avoiding to tune a learning rate parameter.
I benchmarked this implementation to the package metric-learn's one (https://github.com/all-umass/metric-learn): the one in this PR has the advantage of being scalable to large datasets (indeed, metric-learn's NCA throws Memory Error for too big datasets like faces or digits), for no significative loss in performance for small datasets.

The remaining tasks are the following:

More detailed benchmark of performance against reference implementations (for
instance metric learn's one) (coming soon)
Add an example
Benchmark the algorithm accuracy on several datasets
Documentation

What is more, some improvements could also be made in a second time:

Make algorithmic improvements (like ignore samples where softmax distance is nearly null)
Add possibility to choose another solver like SGD to scale to large
datasets (probably in another PR ?)
Add rules of thumb to choose solver etc by default (idem ?)
Make it possible to pass sparse matrices as input?
Add support for multilabel classification, which is a case where NCA could be really useful (cf. discussion with @bellet and @GaelVaroquaux) Metric learning is a good algorithm to use for these types of problems: for instance NCA would fit one matrix to satisfy multilabel constraints, which is an advantage with respect to one-vs-all/rest algorithms (because it can be better when labels are correlated for instance)

Feedback is welcome !

#1 (review)

wdevazelhes · 2017-11-02T15:33:18Z

I benchmarked this implementation to the package metric-learn's one (https://github.com/all-umass/metric-learn): the one in this PR has the advantage of being scalable to large datasets (indeed, metric-learn's NCA throws Memory Error for too big datasets like faces or digits), for no significative loss in performance for small datasets.

Here is a snippet that shows it (on my machine which has 7.7GB of memory):

from sklearn.datasets import load_digits
from metric_learn import NCA
from sklearn.neighbors import NeighborhoodComponentsAnalysis
from sklearn.utils.testing import assert_raises
digits = load_digits()
X, y = digits.data, digits.target
nca_ml = NCA()
assert_raises(MemoryError, nca_ml.fit, X, y)
nca_sk = NeighborhoodComponentsAnalysis()
nca_sk.fit(X, y)  # does not raise any error

Indeed, they do not add a significative speedup but have a high memory cost.

johny-c · 2017-11-11T10:49:36Z

sklearn/neighbors/nca.py

+        transformation = transformation.reshape(-1, X.shape[1])
+        loss = 0
+        gradient = np.zeros(transformation.shape)
+        X_embedded = transformation.dot(X.T).T


With np.dot(X, transformation.T) you save a transpose.

True, I will change it.

johny-c · 2017-11-11T10:58:25Z

Hi @wdevazelhes , since the code is very similar to LMNN, I felt free to add just a couple of comments, by no means a full review. @jnothman , I hope I didn't break any contribution guidelines..

johny-c · 2017-11-11T15:40:16Z

sklearn/neighbors/nca.py

+                                            diff_embedded[~ci, :]))
+            p_i = np.sum(p_i_j)  # probability of x_i to be correctly
+            # classified
+            gradient += 2 * (p_i * (sum_ci.T + sum_not_ci.T) - sum_ci.T)


You could just add the matrices without transposing, and transpose the gradient once before returning the unraveled gradient.

I remembered I had seen a more efficient computation of the gradient (see last equation in slide 12 here: http://bengio.abracadoudou.com/lce/slides/roweis.pdf ).
This would amount to:

p_i_j = exp_dist_embedded[ci]
p_i_k = exp_dist_embedded[~ci]
diffs = X[i, :] - X
diff_ci = diffs[ci, :]
diff_not_ci = diffs[~ci, :]
sum_ci = diff_ci.T.dot(p_i_j[:, np.newaxis] * diff_ci)
sum_not_ci = diff_not_ci.T.dot(p_i_k[:, np.newaxis] * diff_not_ci)
p_i = np.sum(p_i_j)
gradient += p_i * sum_not_ci + (1 - p_i) * sum_ci

And multiplying by the transformation after the for-loop:
gradient = 2*np.dot(transformation, gradient)

You could just add the matrices without transposing, and transpose the gradient once before returning the unraveled gradient.

True, I will change it.

I remembered I had seen a more efficient computation of the gradient (see last equation in slide 12 here: http://bengio.abracadoudou.com/lce/slides/roweis.pdf ).
This would amount to:
[...]
And multiplying by the transformation after the for-loop:
gradient = 2*np.dot(transformation, gradient)

Since I already computed embedded differences for the softmax before wouldn't it be more efficient to reuse it ?
I agree that the expression with (1 - p_i) is clearer though.

codecov · 2017-11-15T14:04:37Z

Codecov Report

Merging #10058 into master will increase coverage by 0.01%.
The diff coverage is 99.65%.

@@            Coverage Diff             @@
##           master   #10058      +/-   ##
==========================================
+ Coverage   96.19%   96.21%   +0.01%     
==========================================
  Files         336      338       +2     
  Lines       62740    63025     +285     
==========================================
+ Hits        60354    60638     +284     
- Misses       2386     2387       +1

Impacted Files	Coverage Δ
sklearn/neighbors/__init__.py	`100% <100%> (ø)`	⬆️
sklearn/neighbors/tests/test_nca.py	`100% <100%> (ø)`
sklearn/neighbors/nca.py	`99.29% <99.29%> (ø)`
sklearn/decomposition/tests/test_pca.py	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1f2f167...48cab11. Read the comment docs.

wdevazelhes · 2017-11-15T14:04:57Z

Thanks for your comments @johny-c ! I have modified the code according to them.

wdevazelhes · 2017-11-15T14:09:27Z

I also added an example: plot_nca_dim_reduction.py (in commit 12cf3a9)

wdevazelhes · 2017-11-15T14:34:35Z

I benchmarked this PR implementation of NCA against metric-learn one: I plotted the training curves (objective function vs time), for the same initialisation (identity), on breast_cancer dataset, on my machine which has 7.7 Gb of memory and 4 cores.
Since metric-learn optimisation is a sort of variant of stochastic gradient descent, it needs a learning rate to be tuned (contrary to this PR implementation), so I plotted the curve for several learning rates.

At some points, metric-learn NCA training is interrupted prematurely: this is due to numerical instabilities, and this warning is thrown:

RuntimeWarning: invalid value encountered in true_divide
  softmax /= softmax.sum()

bellet · 2017-11-16T13:20:05Z

examples/neighbors/plot_nca_dim_reduction.py

+    # Plot the embedding and show the evaluation score
+    plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y)
+    plt.title("{}, KNN (k={})".format(name, n_neighbors))
+    plt.text(0.9, 0.1, '{:.2f}'.format(acc_knn), size=15,


The accuracy is a bit misplaced when running the example on my laptop. It is probably probably easier to put "Test accuracy = x" in the title (after a line break)

agramfort · 2017-11-17T00:39:23Z

sklearn/neighbors/nca.py

+import time
+from scipy.misc import logsumexp
+from scipy.optimize import minimize
+from sklearn.preprocessing import OneHotEncoder


use relative import

agramfort · 2017-11-17T00:40:03Z

sklearn/neighbors/nca.py

+
+        pca:
+            ``n_features_out`` many principal components of the inputs passed
+            to :meth:`fit` will be used to initialize the transformation.


all params are not indented the same way

This is because pca, identity, random and numpy array are not arguments but they are possible choices for argument init. I took the syntax from LMNN. Should I write it in another way ?

agramfort · 2017-11-17T00:41:46Z

sklearn/neighbors/nca.py

+        ----------
+        transformation : array, shape (n_features_out, n_features)
+            The linear transformation on which to compute loss and evaluate
+            gradient


insert empty line

agramfort · 2017-11-17T00:43:41Z

sklearn/neighbors/nca.py

+        X_embedded = np.dot(X, transformation.T)
+
+        # for every sample x_i, compute its contribution to loss and gradient
+        for i in range(X.shape[0]):


looping over samples seems a problem. Is it possible to vectorize?

it should be possible but one should then try to avoid an O(n_samples*n_samples*n_features) memory complexity

agramfort · 2017-11-20T20:46:02Z

maybe use minibatches?

agramfort · 2019-02-27T21:02:35Z

sklearn/neighbors/nca.py

+
+        Parameters
+        ----------
+        transformation : array, shape(n_components, n_features)


is it raveled here or not?

Absolutely, it should be raveled, good catch, I'll change it

agramfort · 2019-02-27T21:03:33Z

sklearn/neighbors/tests/test_nca.py

@@ -0,0 +1,511 @@
+import pytest


I would add here a header with license and authors on top of the file like we do in other places

That's right it's missing, but did you mean to say in the main file nca.py ? I looked in two or three tests code and it didn't seem they had an author section/licence

agramfort · 2019-02-27T21:05:52Z

ok did one (last?) round.

more tomorrow!

agramfort · 2019-02-28T09:51:55Z

tell me when ready

…

wdevazelhes · 2019-02-28T10:06:35Z

sklearn/neighbors/nca.py

@@ -3,7 +3,9 @@
 Neighborhood Component Analysis
 """

-# License: BSD 3 Clause
+# Authors: William de Vazelhes <wdevazelhes@gmail.com>
+#          John Chiotellis <ioannis.chiotellis@in.tum.de>


@johny-c I put the email I found on your webpage, but feel free to put another one here ;)

wdevazelhes · 2019-02-28T10:10:52Z

There just remains to settle on #10058 (comment), and check on #10058 (comment) and maybe wait for @johny-c to put his email (otherwise he could do it after) ? Otherwise I addressed all your comments, so, it's ready @agramfort @GaelVaroquaux !

wdevazelhes · 2019-02-28T11:09:09Z

sklearn/neighbors/tests/test_nca.py

+"""
+
+# Authors: William de Vazelhes <wdevazelhes@gmail.com>
+#          John Chiotellis <ioannis.chiotellis@in.tum.de>


@johny-c same remark here too

bellet

LGTM too :-)

GaelVaroquaux · 2019-02-28T13:07:27Z

LGTM. +1 for merge once the CI has run.

jnothman · 2019-02-28T14:57:58Z

Let's do it

jnothman · 2019-02-28T14:58:19Z

Congratulations @wdevazelhes!!!! This has been a long time coming...

wdevazelhes · 2019-02-28T15:27:54Z

That's great ! Thanks a lot for your reviews and comments @jnothman @agramfort @GaelVaroquaux @bellet, and congrats to @johny-c too ! I'm excited to work on improvements later on. Also to transpose the changes of this PR to LMNN so that it can be merged, or maybe we'll see first how NCA goes for a bit of time and then see what to do for LMNN

bellet · 2019-02-28T15:53:08Z

Congrats @wdevazelhes !

Indeed, core devs should decide whether they also want to include LMNN (#8602). I think it would be good as LMNN is also a popular method and there is quite a lot of code that could be shared with NCA.

johny-c · 2019-02-28T16:01:33Z

Hi all! @wdevazelhes, congrats and thanks for including me as an author. Regarding LMNN, as far as I remember, what is missing is some testing . I will have a couple of free weeks in March, so I could have another look at it.

This reverts commit 5e8a4ea.

William de Vazelhes added 10 commits October 27, 2017 13:35

first commit

849a8d8

minor corrections in docstring

04222de

remove comment

34c5457

Add verbose during iterations

89f68ee

Update code according to code review:

42e078a

#1 (review)

Remove _make_masks and use OneHotEncoder instead

4c7c0d4

precise that distances are squared

4c81a16

remove useless None

824e940

simplify tests

d4294ac

ensure min samples = 2 to make check_fit2d_1sample pass

296e295

jnothman added the Waiting for Reviewer label Nov 4, 2017

Do not precompute pairwise differences

616f9a2

Indeed, they do not add a significative speedup but have a high memory cost.

johny-c reviewed Nov 11, 2017

View reviewed changes

William de Vazelhes added 3 commits November 14, 2017 14:43

add example

12cf3a9

reorganize transposes

7b37e8d

simplify gradient

48cab11

bellet reviewed Nov 16, 2017

View reviewed changes

agramfort reviewed Nov 17, 2017

View reviewed changes

Fixes according to code review

47928aa

agramfort reviewed Feb 27, 2019

View reviewed changes

Adress Alex's review

fbd28e1

wdevazelhes commented Feb 28, 2019

View reviewed changes

Add authors in test too

8d65ebc

wdevazelhes commented Feb 28, 2019

View reviewed changes

add check_scalar to utils

ed0d23a

bthirion closed this Feb 28, 2019

bthirion reopened this Feb 28, 2019

Sprint Paris 2019 automation moved this from Needs review to To do Feb 28, 2019

agramfort approved these changes Feb 28, 2019

View reviewed changes

agramfort changed the title ~~[MRG+1] Neighborhood Components Analysis~~ [MRG+2] Neighborhood Components Analysis Feb 28, 2019

bellet approved these changes Feb 28, 2019

View reviewed changes

jnothman moved this from To do to Needs review in Sprint Paris 2019 Feb 28, 2019

jnothman moved this from Needs review to Done in Sprint Paris 2019 Feb 28, 2019

MajorFeature > API

6dbef86

jnothman merged commit d31b67f into scikit-learn:master Feb 28, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FEA Neighborhood Components Analysis (scikit-learn#10058)

5e8a4ea

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FEA Neighborhood Components Analysis (scikit-learn#10058)"

52b62a6

This reverts commit 5e8a4ea.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FEA Neighborhood Components Analysis (scikit-learn#10058)"

2daec5a

This reverts commit 5e8a4ea.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FEA Neighborhood Components Analysis (scikit-learn#10058)

2e3d8e5

[MRG+2] Neighborhood Components Analysis #10058

[MRG+2] Neighborhood Components Analysis #10058

Conversation

wdevazelhes commented Nov 2, 2017 • edited

wdevazelhes commented Nov 2, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johny-c commented Nov 11, 2017

Choose a reason for hiding this comment

johny-c Nov 11, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 15, 2017 • edited

Codecov Report

wdevazelhes commented Nov 15, 2017

wdevazelhes commented Nov 15, 2017

wdevazelhes commented Nov 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bellet Nov 20, 2017 • edited

Choose a reason for hiding this comment

agramfort commented Nov 20, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agramfort commented Feb 27, 2019

agramfort commented Feb 28, 2019 via email

Choose a reason for hiding this comment

wdevazelhes commented Feb 28, 2019 • edited

Choose a reason for hiding this comment

bellet left a comment

Choose a reason for hiding this comment

GaelVaroquaux commented Feb 28, 2019

jnothman commented Feb 28, 2019

jnothman commented Feb 28, 2019

wdevazelhes commented Feb 28, 2019

bellet commented Feb 28, 2019

johny-c commented Feb 28, 2019

wdevazelhes commented Nov 2, 2017 •

edited

wdevazelhes commented Nov 2, 2017 •

edited

johny-c Nov 11, 2017 •

edited

codecov bot commented Nov 15, 2017 •

edited

wdevazelhes commented Nov 15, 2017 •

edited

bellet Nov 20, 2017 •

edited

wdevazelhes commented Feb 28, 2019 •

edited