Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Export notebook to gallery #180

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented Mar 8, 2019

Fixes #141 #153

Hi, I've just converted @bhargavvader 's notebook from #27 into a sphinx-gallery file (with this snippet: https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe). This way, it will appear nicely in the documentation, and can also allow us to check if every algorithms work fine.
There are a few things to change to make the PR mergeable (to compile the doc, you need sphinx-gallery):

  • As dicussed with @bellet, the iris dataset is maybe not the most expressive dataset for metric learning, we might want to find a dataset where classes are even more mixed and where metric learning gives a very advantageous separation
  • Some parts seem to be broken (see the logo "broken"), I need to see why
  • On my computer, the plan of the notebook appears in the left toolbar, we might not want that (we might want to see only two tabs (because we have two examples in metric-learn/examples) on the left sidebar and not tens of tabs)
  • Some examples seem not to work super well in terms of separation, I need to see why

@bellet
Copy link
Member

bellet commented Mar 13, 2019

I assume that in this PR you will also be fixing #141

@bhargavvader
Copy link
Contributor

Glad to see this going into documentation soon 💃

@bellet
Copy link
Member

bellet commented Mar 14, 2019

And this fixes #153

@wdevazelhes
Copy link
Member Author

I assume that in this PR you will also be fixing #141

And this fixes #153

Yes indeed, thanks, I forgot to add it in the PR description
Done now

@wdevazelhes
Copy link
Member Author

Regarding the dataset, should I go for the faces dataset for instance ? (Probably the supervised version since we use _Supervised algorithms here)

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Mar 14, 2019

I just ran another make html (after merging with master, and cleaning my folder), and now there's no "broken" message anymore (however no image is printed, only the sphinx gallery default logo, but anyway I don't know what we would want to appear here ? A concatenation of all images printed ? I think only the first image should appear (see: https://sphinx-gallery.readthedocs.io/en/latest/tutorials/index.html#notebook-styled-examples and https://sphinx-gallery.readthedocs.io/en/latest/tutorials/plot_notebook.html#sphx-glr-tutorials-plot-notebook-py))

@bellet
Copy link
Member

bellet commented Mar 14, 2019

Regarding the dataset, should I go for the faces dataset for instance ? (Probably the supervised version since we use _Supervised algorithms here)

Yes. Digits could also be an option. Note that you will have to use dimensionality reduction (say t-SNE) to visualize things in 2D.

Otherwise we could work with a 2D or 3D dataset but there is not so many nice things to show in such cases

@wdevazelhes
Copy link
Member Author

Yes. Digits could also be an option. Note that you will have to use dimensionality reduction (say t-SNE) to visualize things in 2D.

Otherwise we could work with a 2D or 3D dataset but there is not so many nice things to show in such cases

Allright, I tried to launch the example with digits, but there is a conditionning problem with SDML, so I think we'll need to wait for #162 to be merged to finish this one

@wdevazelhes
Copy link
Member Author

Here is a first result of plotting with the faces dataset (with the digits dataset points are already well separated out with tsne on the raw dataset):

tsne from the raw dataset:
image

tsne after lmnn:
image

tsne after sdml:
image

tsne after lsml:
image

tsne after nca:
image

tsne after lfda:
image

RCA failed (I still need to understand why)

The improvement is not very clear, except maybe for lmnn, but for lfda for instance it's worse... Though it could work with appropriate tuning.
I took faces because in the PR for NCA it's one dataset where we had a significant improvement (see this figure: https://user-images.githubusercontent.com/31916524/41354636-c759ddce-6f1f-11e8-8ac9-9a9fd36b8af2.png), but I'll try with balance, it was also good and it's easy to fetch now with scikit-learn with the openml fetcher.

@wdevazelhes
Copy link
Member Author

I tried with the balance dataset and reducing the dimension gives a bit weird results (like I think balance kind of defines a grid of regularly spaced points, which seems a bit like an artificial dataset) (but anyway running the default version of the algos didn't work so much either)
I also tried with the isolet dataset and it didn't work that much
So in the end I think the best for now is the faces dataset.
Maybe I can re-run the faces dataset, but this time grid-searching for the best parameters
I'll also try to find another dataset hoping it would work better

@bellet
Copy link
Member

bellet commented Apr 4, 2019

An option would be to add some noise dimensions (this is a bit artificial but will definitely work)

Otherwise you have to look for harder datasets - if they t-SNE visualization is already very good on the original representation, it is hard to obtain clear visual improvements

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Apr 5, 2019

That's right, it works really well indeed with noise:

Here are the images with 5 columns of gaussian noise N(0, 10) added to make_blobs dataset (the legend with sepal etc is wrong, but the image is good, it's the results after applying t-sne):
tsne on original space:
image

tsne after LMNN:
image

tsne after ITML:
image

tsne after SDML:
image

tsne after LSML:
image

tsne after NCA:
image

tsne after LFDA:
image

tsne after RCA:
image

So maybe we can use this with a quick intro paragraph saying "let's say we have a noisy dataset: let's take iris and add some columns of noise etc.." ?

Otherwise do you know any dataset with a lot of noise already inside it ?

Also, LSLM and SDML work not so good, so I guess as soon as we settle on the dataset I'll try to fine tune them and if it still doesn't work maybe we can remove them ?

@wdevazelhes
Copy link
Member Author

In the end I used iris with 5 columns of noise added, in the last commit is a version that I think is mergeable for the example, let me know what you think
Here is the html folder that was generated:

html.zip

@wdevazelhes wdevazelhes changed the title [WIP] Export notebook to gallery [MRG] Export notebook to gallery Apr 10, 2019
@wdevazelhes
Copy link
Member Author

Note that there is still LSML that does not work really well, but it's still better than the initial example, so I think it's OK ?

@perimosocordiae
Copy link
Contributor

I haven't read all the text yet, but I agree this is a good approach and a reasonable example to use.

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks pretty good. As discussed it would be great to use make_classification from sklearn. This allows to natively have noisy dimensions, but also to have several clusters per class (which could be nice to illustrate the difference between methods based on local constraints like LMNN and those based on global ones like ITML), and many other options (class separation, label noise, etc).

@perimosocordiae perimosocordiae added this to the v0.5.0 milestone May 10, 2019
@wdevazelhes
Copy link
Member Author

I just pushed an example that uses make_classification which I guess looks pretty good now,
I just needed to add a scaling to the noise components of the dataset by hand (to mix more the initial points while still making them easy to fit for metric learning), because it's not an argument of make_classification I think, and I also tried to add a lot more of noise components instead but it does not seem as efficient (I need to add more than hundreds of them to see the effect so I'm not sure it's a good idea given that we have 100 samples (I could add more samples but then the example takes more time to run))

@bellet I also wrote something about the fact that some algorithms don't try to cluster all points from the same class in one same cluster, while others implicitly do that, which can be seen quite clearly from these examples (since every class has a distribution with 2 clusters: n_clusters_per_class=2), as we discussed. I just wrote a paragraph because I think we see the effect quite clearly on the examples, so maybe there's no need to put the two figures of applying the algorithms to one dataset with 1 cluster per class, and a dataset with several clusters per class as mentioned?
Or maybe there is need because it's good to have multi-cluster cases where algorithms that enforce mono-clustering fail clearly ? But then I don't know exactly how to tweak the dataset to do that (since now these algorithms mostly manage to group the data in mono-clusters), maybe with less dimensions, to give less degrees of freedom to the algorithms ?

If I understood correctly, the algorithm that don't cluster similar points in a unique cluster are LMNN, NCA, and LFDA, is that right ?

@wdevazelhes
Copy link
Member Author

If these changes are fine I think we are good to merge

@bellet
Copy link
Member

bellet commented May 17, 2019

I just had a quick look and it looks good! I have a few nitpick suggestions on the wording and presentation, will write a review for that asap

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some improvements.

Maybe we should add the missing algorithms? (eg MMC, MLKR)

visualisation which can help understand which algorithm might be best
suited for you.

Of course, depending on the data set and the constraints your results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this paragraph

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

~~~~~~~~~~~~~~~~~~~~~~

This is a small walkthrough which illustrates all the Metric Learning
algorithms implemented in metric-learn, and also does a quick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rather "with some visualizations to provide intuitions into what they are designed to achieve".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

~~~~~~~~~~~~~~~~~~~~~~

This is a small walkthrough which illustrates all the Metric Learning
algorithms implemented in metric-learn, and also does a quick
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I would mention that this is done on synthetic data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

"""

# License: BSD 3 clause
# Authors: Bhargav Srinivasa Desikan <bhargavvader@gmail.com>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should add yourself as an author

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

# Loading our data-set and setting up plotting
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# We will be using the IRIS data-set to illustrate the plotting. You can
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not iris anymore. update and link to make_classification documentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I forgot to update this indeed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Algorithms walkthrough
~~~~~~~~~~~~~~~~~~~~~~

This is a small walkthrough which illustrates all the Metric Learning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all actually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I will add MMC, but I don't know for MLKR, I think I'll just talk about it when describing NCA ? Since the cost function is very similar except that MLKR uses a soft-nearest neighbors for regression as I understood

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, MLKR is for regression. it is a good idea to mention it there as proposed
maybe change "all the metric learning algorithms" to "most metric learning algorithms"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, I added most metric learning algorithms because we don't talk about Covariance, (we could maybe ?) And I added MLKR with make_regression task because I found the results were pretty cool too! Even if it breaks a bit the outline because it's just between supervised algorithms and the "constraints" section... Tell me what you think ?

#
# Implements an efficient sparse metric learning algorithm in high
# dimensional space via an :math:`l_1`-penalised log-determinant
# regularization. Compare to the most existing distance metric learning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compared

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, done

# Neighborhood Components Analysis
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#
# NCA is an extrememly popular metric-learning algorithm, and one of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extremely
maybe remove last part of the sentence

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done for extremely
For the last part, is it because it can make people that don't know the algorithm think that if it's one of the first few it might be not cutting edge/outdated ?
Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just that it is maybe not such a relevant thing to mention here. you could also argue this for LMNN, and others like MMC

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, they are also one of the first ones indeed



######################################################################
# Manual Constraints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not say explicitly what is meant by "constraint" before so I think this part is quite confusing

Also I think we want to insist on the fact that many metric learning algorithms only need the weak supervision given by constraints, not labels. In many applications, this is easier to obtain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I'll reformulate this paragraph, let me know what you think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also delete the num_constraints=200 in all *_Supervised algorithms what do you think ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess so, unless it increases the computation time too much

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I tried and it tooked almost the same time, so I removed them

# it's worth one's while to poke around in the constraints.py file to see
# how exactly this is going on.
#
# This brings us to the end of this tutorial! Have fun Metric Learning :)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also add that metric-learn is compatible with sklearn and so we can easily do model selection, cross validation, scoring etc and refer to the doc for more details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done

@wdevazelhes
Copy link
Member Author

@bellet Thanks for your review, I addressed your comments

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional small comments for improvement.

Very nice example!!! Looking forward to merge it

# distance in the input space, in which the contribution of the noisy
# features is high. So even if points from the same class are close to
# each other in some subspace of the input space, this is not the case in the
# total input space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "this is not the case when considering all dimensions of the input space"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

# good literature review of Metric Learning.
#
# We will briefly explain the metric-learning algorithms implemented by
# metric-learn, before providing some examples for it's usage, and also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done

#
# Basically, we learn this distance:
# :math:`D(x,y)=\sqrt{(x-y)\,M^{-1}(x-y)}`. And we learn this distance by
# learning a Matrix :math:`M`, based on certain constraints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • there is repetition here "we learn this distance"
  • add very quick explanation of what is meant by "constraint" (can be inspired from What is metric learning? page). For instance something like, "we learn the parameters :math:M of this distance to satisfy certain constraints on the distance between points, for example requiring that points of the same class are close together and points of different class are far away."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

#
# We will briefly explain the metric-learning algorithms implemented by
# metric-learn, before providing some examples for it's usage, and also
# discuss how to go about doing manual constraints.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove manual constraint and replace by something like "discuss how to perform metric learning with weaker supervision than class labels"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

plot_tsne(X_rca, Y)

######################################################################
# Metric Learning for Kernel Regression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression example: Metric Learning for Kernel Regression

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# going to go ahead and assume that two points labelled the same will be
# closer than two points in different labels.
#
# Do keep in mind that we are doing this method because we know the labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this above, right after the sentence saying that we are going to create constraints from the labels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



######################################################################
# Using our constraints, let's now train ITML again. We should keep in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure the last sentence is needed (already said that before)

######################################################################
# And that's the result of ITML after being trained on our manual
# constraints! A bit different from our old result but not too different.
# We can also notice that it might be better to rely on the randomised
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last sentence is not very clear. i would remove it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, done

# also compatible with scikit-learn, since their input dataset format described
# above allows to be sliced along the first dimension when doing
# cross-validations (see also this :ref:`section <sklearn_compat_ws>`). See
# also some :ref:`use cases <use_cases>` where you could use scikit-learn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where you could combine metric learning with scikit-learn estimators

# pipeline or cross-validation procedure. And weakly-supervised estimators are
# also compatible with scikit-learn, since their input dataset format described
# above allows to be sliced along the first dimension when doing
# cross-validations (see also this :ref:`section <sklearn_compat_ws>`). See
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid repetition of "see"

@wdevazelhes
Copy link
Member Author

Thanks for the review @bellet, I addressed all your comments

@bellet
Copy link
Member

bellet commented Jun 5, 2019

Thanks! I have pushed a few small updates myself to avoid another reviewing cycle. I think we can merge once CI passes

@bellet bellet merged commit fbd92ff into scikit-learn-contrib:master Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update metric_plotting.ipynb with the new API
4 participants