Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT Large Margin Nearest Neighbor implementation #8602

Open
wants to merge 201 commits into
base: main
Choose a base branch
from

Conversation

johny-c
Copy link

@johny-c johny-c commented Mar 17, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Hi! This is an implementation of the Large Margin Nearest Neighbor algorithm that follows the original implementation of the author's (K.Weinberger) Matlab code (see also my docstrings). As far as I know there is no prior implementation of this version of the algorithm in python and no version at all in scikit-learn. The implementation lies in a single file (sklearn/neighbors/lmnn.py). Maybe it should go directly in the sklearn/neighbors/classification.py but I thought it would make things too cluttered. It is basically a subclass of the KNeighborsClassifier with a lot of added functionality. I put also an example in examples/neighbors/plot_lmnn_classification.py that is basically a copy of the simple nearest neighbor example in examples/neighbors/plot_classification.py.

Any other comments?

I run all scikit-learn tests I could find and it now passes all of them. There is still a TODO in the code which I think is a minor issue (the algorithm works anyway). Somehow during fitting, I need to pass a possibly changed attribute (n_neighbors) to the superclass and call the fit method of the superclass. I would appreciate any comments and/or help to finalize this. Thanks!

@johny-c johny-c changed the title [MRG] Large Margin Nearest Neighbor implementation [MRG] Large Margin Nearest Neighbor implementation :label New feature Mar 17, 2017
@johny-c johny-c changed the title [MRG] Large Margin Nearest Neighbor implementation :label New feature [MRG] Large Margin Nearest Neighbor implementation Mar 17, 2017
@johny-c
Copy link
Author

johny-c commented Mar 18, 2017

Hi all! I've been trying to nail down why some travis tests fail and I think it is a bug coming from an old scipy version of fmin_l_bfgs_b. I've opened an issue there ( scipy/scipy#7196 ).

Apart from that, the rest of my pull request was passing all other tests. On my last commit, it seems like one appveyor test is now failing on something completely unrelated to my code, namely on sklearn.decomposition.tests.test_pca.test_singular_values. (https://ci.appveyor.com/project/raghavrv/scikit-learn/build/1.0.217/job/iittcikl6hic3bdg). Am I missing something?

@johny-c
Copy link
Author

johny-c commented Mar 18, 2017

From the last commit+builds, I can confirm that the failing travis tests are isolated and caused by this old scipy version of fmin_l_bfgs_b. I guess this issue is not going to be dealt with by scipy devs, since the newer versions work fine. Is there some standard way of dealing with such issues? Can I do something? I would appreciate any comments and/or help.

@jnothman
Copy link
Member

Sorry I don't have much time to handle this. Usually we would backport functionality from a later version for users that have an older supported version by adding something in sklearn.utils.fixes. You might want to look at old scipy documentation, e.g. https://docs.scipy.org/doc/scipy-0.9.0/reference/generated/scipy.optimize.fmin_l_bfgs_b.html.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an interesting contribution, although I don't know a great deal about its use. Sorry that the core devs are currently quite backlogged and short of time.

sklearn/neighbors/lmnn.py Outdated Show resolved Hide resolved
sklearn/neighbors/lmnn.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member

I used this package when playing with metrics learning.
I don't know if it would be interesting to actually have a module in sklearn?

@jnothman
Copy link
Member

jnothman commented Mar 19, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Mar 19, 2017 via email

Copy link
Member

@cmarmo cmarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @johny-c and happy new year to you too! Thanks a lot for your effort!
Your PR is labeled for review now. Let's wait one or two days before pinging.
In the meanwhile I have checked the examples (nice figures indeed!): they throw some warnings. Also, we are moving examples (one step at the time) to a notebook-style to make them more readable (see for example this one), if you have some time to reorganize them, I think this could improve understanding.
Thanks again for your patience!

Comment on lines 6 to 7
An example comparing nearest neighbors classification with and without
Large Margin Nearest Neighbor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about
"This example compares nearest neighbors...."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure()
plt.pcolormesh(xx, yy, Z, cmap=cmap_light, alpha=.8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the rendered example this line produce a DeprecationWarning

/home/circleci/project/examples/neighbors/plot_lmnn_classification.py:75: MatplotlibDeprecationWarning: shading='flat' when X and Y have the same dimensions as C is deprecated since 3.3.  Either specify the corners of the quadrilaterals with X and Y, or pass shading='auto', 'nearest' or 'gouraud', or set rcParams['pcolor.shading'].  This will become an error two minor releases later.
  plt.pcolormesh(xx, yy, Z, cmap=cmap_light, alpha=.8)
/home/circleci/project/examples/neighbors/plot_lmnn_classification.py:75: MatplotlibDeprecationWarning: shading='flat' when X and Y have the same dimensions as C is deprecated since 3.3.  Either specify the corners of the quadrilaterals with X and Y, or pass shading='auto', 'nearest' or 'gouraud', or set rcParams['pcolor.shading'].  This will become an error two minor releases later.
  plt.pcolormesh(xx, yy, Z, cmap=cmap_light, alpha=.8)

Do you mind fixing it? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Dimensionality Reduction with Large Margin Nearest Neighbor
===========================================================

Sample usage of Large Margin Nearest Neighbor for dimensionality reduction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this sentence has become redundant now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 73 to 75
lmnn = LargeMarginNearestNeighbor(n_neighbors=n_neighbors, n_components=2,
max_iter=20, verbose=1,
random_state=random_state)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rendered example shows a ConvergenceWarning, do you think it could be possible to fix it without increasing too much the execution time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the file locally, I get the following error that seems to be coming from joblib:

  File "plot_lmnn_dim_reduction.py", line 53, in <module>
    faces = datasets.fetch_olivetti_faces()
  File "/work/chiotell/projects/scikit-learn/sklearn/utils/validation.py", line 63, in inner_f
    return f(*args, **kwargs)
  File "/work/chiotell/projects/scikit-learn/sklearn/datasets/_olivetti_faces.py", line 120, in fetch_olivetti_faces
    faces = joblib.load(filepath)
  File "/home/johny/soft/miniconda3/envs/scikit/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 585, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/johny/soft/miniconda3/envs/scikit/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
    obj = unpickler.load()
  File "/home/johny/soft/miniconda3/envs/scikit/lib/python3.6/pickle.py", line 1050, in load
    dispatch[key[0]](self)
  File "/home/johny/soft/miniconda3/envs/scikit/lib/python3.6/pickle.py", line 1338, in load_global
    klass = self.find_class(module, name)
  File "/home/johny/soft/miniconda3/envs/scikit/lib/python3.6/pickle.py", line 1388, in find_class
    __import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib.numpy_pickle'

I tried with multiple versions of joblib from 0.12.2 to 1.0 and the same error appears. Any ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce the error neither with python 3.9 or 3.6 ... have you tried to clean and reinstall? joblib is no longer in externals ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it. The problem was with my scikit_learn_data directory. For some reason (probably because it was created by an old scikit-learn version) it was causing pickle to look for the sklearn.externals.joblib instead of joblib. I deleted the contents and now it works. I also removed the ConvergenceWarning by passing a larger tolerance parameter.

Large Margin Nearest Neighbor Illustration
==========================================

An example illustrating the goal of learning a distance metric that maximizes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about
"This example illustrates...."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# Annotate the impostors (1, 4, 5, 7)
imp_centroid = (X[1] + X[4] + X[5] + X[7]) / 4
imp_arrow_dict = dict(facecolor='black', color='gray', arrowstyle='->',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A UserWarning is thrown by matplotlib (see the rendered example). Do you mind fixing it? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@agramfort
Copy link
Member

@bellet and @wdevazelhes any opinion on this one? as authors of metric_learn I am wondering about the pros and cons (if any) of the proposed implementation. API wise does it lead to comparabale usage?

@wdevazelhes
Copy link
Contributor

@bellet and @wdevazelhes any opinion on this one? as authors of metric_learn I am wondering about the pros and cons (if any) of the proposed implementation. API wise does it lead to comparabale usage?

@agramfort yes, API wise the general parameters (n_components, max_iter, tol, verbose, random_state) match the general API from metric-learn (and the one from sklearn.decomposition.NeighborhoodComponentsAnalysis)

There are just some parameters specific to LMNN that don't match metric-learn's LMNN implementation (n_neighbors vs k, and push_loss_weight vs regularization), however I find the namings in this PR (n_neighbors and push_loss_weight) more explicit, so maybe we could change them in metric-learn after this PR is merged, to ensure the same naming between both.

As a matter of fact, I think this implementation has a lot of cool features that are not yet in metric-learn's implementation, for instance the optimization with scipy.optimize (in metric-learn we do a manual gradient descent), and a chunked version for finding impostors, so I'm thinking it would be cool to integrate it in metric-learn too, what do you think @johny-c @bellet @perimosocordiae ?

Also just one remark: in metric-learn (and in sklearn.decomposition.NeighborhoodComponentsAnalysis) we added an "auto" initialization that can use LDA when possible, it seemed to work well in practice, @johny-c maybe it could be reused here too ?

@johny-c
Copy link
Author

johny-c commented Jan 19, 2021

Hi @wdevazelhes , thanks for the comments ;) Sure, integrating to metric-learn would be fine by me. I guess the PR could get merged here first because it has been already reviewed by a bunch of people. And then it would be a simple copy in metric-learn.
Regarding LDA initialization, yes, it would be easy to add as an option. I'm not sure about an auto option though. I would keep PCA as the default.

@bellet
Copy link
Contributor

bellet commented Jan 19, 2021

+1 for also integrating this implementation in metric-learn, especially if we can show some clear performance improvement (which I think is the case). Our current implementation is actually difficult to maintain and we do have plans to refactor it (see scikit-learn-contrib/metric-learn#210). So @johny-c it would be great if you have a bit of time at some point to contribute your implementation there as well :-) This can be done now in parallel to this (integration to metric-learn should be fast), or after this has been merged as you suggested if you prefer to do it this way.

@johny-c
Copy link
Author

johny-c commented Jan 19, 2021

Hi @bellet , sure but the next 2-3 weeks are a bit busy for me. I can open a PR in metric-learn afterwards. If in the meantime, it gets approval for merge here, it would help so we know how/when to finalize it.

@bellet
Copy link
Contributor

bellet commented Jan 19, 2021

Hi @bellet , sure but the next 2-3 weeks are a bit busy for me. I can open a PR in metric-learn afterwards. If in the meantime, it gets approval for merge here, it would help so we know how/when to finalize it.

That would be great, thanks!

Base automatically changed from master to main January 22, 2021 10:49
@cmarmo cmarmo removed this from the 1.0 milestone Jun 3, 2021
@cmarmo
Copy link
Member

cmarmo commented Jun 3, 2021

Hi @johny-c I saw that you submitted your PR to metric-learn: I think this is the right move ... as apparently the review load for 1.0 is too high. Thank you very much for your patience and your work!

@cmarmo cmarmo added Needs Decision - Close Requires decision for closing Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository labels Dec 13, 2021
@glemaitre glemaitre requested review from glemaitre and removed request for glemaitre July 29, 2022 13:20
@glemaitre glemaitre self-assigned this Jul 29, 2022
@glemaitre glemaitre changed the title [MRG] Large Margin Nearest Neighbor implementation FEAT Large Margin Nearest Neighbor implementation Jul 29, 2022
@glemaitre
Copy link
Member

I wanted to give a hand by bringing this PR to main. I assume that it should be worth having it in scikit-learn still. On the way, I wanted to:

  • use the new parameter validation framework
  • update the docstring
  • improve code linked to the work done for the pairwise computation
  • refactor the code with NCA (this would be done later once this PR merge in its own branch and we have strong tests)

I was pretty unfamiliar with the original paper and I saw that as mentioned earlier, we do not use the hinge loss but instead the squared hinge loss and use LBFGS on differentiable loss. I could not find the original code or paper deriving this solver so it is a bit difficult to review.

I wanted to write a couple of tests to ensure that the minimization goes as expected. The first thing that I wanted to do was to compare the squared hinge loss minimization vs. the hinge loss minimization as shown in the JMLR paper. I quickly tried the metric-learn implementation on the illustration example and I get quite different results:

With metric-learn, I get the following results:

image
image

Basically, I set a very small tolerance and let the algorithm converge (tol=1e-12)
On the other side, the current implementation gives the following (with a low tolerance as well):

image

I was wondering if @bellet @johny-c have any inside regarding the difference in the solution found. Do you think that it is only due to the loss used?

@glemaitre
Copy link
Member

Thinking a bit more about the problem, using the squared hinge should have more impact on the "push" loss. The quadratic part will penalized more impostors and I assume that the solution will therefore try to increase the distance between in-class NN and the impostors.

@glemaitre
Copy link
Member

While I am not really against using the hinge loss, I still think that we should actually both loss and then the most appropriate solver.

@glemaitre glemaitre removed their assignment Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:neighbors module:utils Move to scikit-learn-extra This PR should be moved to the scikit-learn-extras repository Needs Decision - Close Requires decision for closing New Feature
Projects
No open projects
Sprint Paris 2019
  
In progress
Development

Successfully merging this pull request may close these issues.

None yet