[MRG] Export notebook to gallery #180

wdevazelhes · 2019-03-08T16:12:02Z

Hi, I've just converted @bhargavvader 's notebook from #27 into a sphinx-gallery file (with this snippet: https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe). This way, it will appear nicely in the documentation, and can also allow us to check if every algorithms work fine.
There are a few things to change to make the PR mergeable (to compile the doc, you need sphinx-gallery):

As dicussed with @bellet, the iris dataset is maybe not the most expressive dataset for metric learning, we might want to find a dataset where classes are even more mixed and where metric learning gives a very advantageous separation
Some parts seem to be broken (see the logo "broken"), I need to see why
On my computer, the plan of the notebook appears in the left toolbar, we might not want that (we might want to see only two tabs (because we have two examples in metric-learn/examples) on the left sidebar and not tens of tabs)
Some examples seem not to work super well in terms of separation, I need to see why

bellet · 2019-03-13T18:16:47Z

I assume that in this PR you will also be fixing #141

bhargavvader · 2019-03-14T00:18:20Z

Glad to see this going into documentation soon 💃

bellet · 2019-03-14T10:08:56Z

And this fixes #153

wdevazelhes · 2019-03-14T15:05:33Z

I assume that in this PR you will also be fixing #141

And this fixes #153

Yes indeed, thanks, I forgot to add it in the PR description
Done now

wdevazelhes · 2019-03-14T15:09:41Z

Regarding the dataset, should I go for the faces dataset for instance ? (Probably the supervised version since we use _Supervised algorithms here)

wdevazelhes · 2019-03-14T15:31:03Z

I just ran another make html (after merging with master, and cleaning my folder), and now there's no "broken" message anymore (however no image is printed, only the sphinx gallery default logo, ~~but anyway I don't know what we would want to appear here ? A concatenation of all images printed ?~~ I think only the first image should appear (see: https://sphinx-gallery.readthedocs.io/en/latest/tutorials/index.html#notebook-styled-examples and https://sphinx-gallery.readthedocs.io/en/latest/tutorials/plot_notebook.html#sphx-glr-tutorials-plot-notebook-py))

…f the example

bellet · 2019-03-14T17:00:42Z

Regarding the dataset, should I go for the faces dataset for instance ? (Probably the supervised version since we use _Supervised algorithms here)

Yes. Digits could also be an option. Note that you will have to use dimensionality reduction (say t-SNE) to visualize things in 2D.

Otherwise we could work with a 2D or 3D dataset but there is not so many nice things to show in such cases

wdevazelhes · 2019-03-18T08:15:46Z

Yes. Digits could also be an option. Note that you will have to use dimensionality reduction (say t-SNE) to visualize things in 2D.

Otherwise we could work with a 2D or 3D dataset but there is not so many nice things to show in such cases

Allright, I tried to launch the example with digits, but there is a conditionning problem with SDML, so I think we'll need to wait for #162 to be merged to finish this one

wdevazelhes · 2019-04-03T14:06:13Z

Here is a first result of plotting with the faces dataset (with the digits dataset points are already well separated out with tsne on the raw dataset):

tsne from the raw dataset:

tsne after lmnn:

tsne after sdml:

tsne after lsml:

tsne after nca:

tsne after lfda:

RCA failed (I still need to understand why)

The improvement is not very clear, except maybe for lmnn, but for lfda for instance it's worse... Though it could work with appropriate tuning.
I took faces because in the PR for NCA it's one dataset where we had a significant improvement (see this figure: https://user-images.githubusercontent.com/31916524/41354636-c759ddce-6f1f-11e8-8ac9-9a9fd36b8af2.png), but I'll try with balance, it was also good and it's easy to fetch now with scikit-learn with the openml fetcher.

wdevazelhes · 2019-04-04T07:36:56Z

I tried with the balance dataset and reducing the dimension gives a bit weird results (like I think balance kind of defines a grid of regularly spaced points, which seems a bit like an artificial dataset) (but anyway running the default version of the algos didn't work so much either)
I also tried with the isolet dataset and it didn't work that much
So in the end I think the best for now is the faces dataset.
Maybe I can re-run the faces dataset, but this time grid-searching for the best parameters
I'll also try to find another dataset hoping it would work better

bellet · 2019-04-04T11:36:22Z

An option would be to add some noise dimensions (this is a bit artificial but will definitely work)

Otherwise you have to look for harder datasets - if they t-SNE visualization is already very good on the original representation, it is hard to obtain clear visual improvements

wdevazelhes · 2019-04-05T07:52:22Z

That's right, it works really well indeed with noise:

Here are the images with 5 columns of gaussian noise N(0, 10) added to make_blobs dataset (the legend with sepal etc is wrong, but the image is good, it's the results after applying t-sne):
tsne on original space:

tsne after LMNN:

tsne after ITML:

tsne after SDML:

tsne after LSML:

tsne after NCA:

tsne after LFDA:

tsne after RCA:

So maybe we can use this with a quick intro paragraph saying "let's say we have a noisy dataset: let's take iris and add some columns of noise etc.." ?

Otherwise do you know any dataset with a lot of noise already inside it ?

Also, LSLM and SDML work not so good, so I guess as soon as we settle on the dataset I'll try to fine tune them and if it still doesn't work maybe we can remove them ?

wdevazelhes · 2019-04-10T12:57:45Z

In the end I used iris with 5 columns of noise added, in the last commit is a version that I think is mergeable for the example, let me know what you think
Here is the html folder that was generated:

html.zip

wdevazelhes · 2019-04-10T13:00:22Z

Note that there is still LSML that does not work really well, but it's still better than the initial example, so I think it's OK ?

perimosocordiae · 2019-04-11T14:49:04Z

I haven't read all the text yet, but I agree this is a good approach and a reasonable example to use.

bellet

I think it looks pretty good. As discussed it would be great to use make_classification from sklearn. This allows to natively have noisy dimensions, but also to have several clusters per class (which could be nice to illustrate the difference between methods based on local constraints like LMNN and those based on global ones like ITML), and many other options (class separation, label noise, etc).

wdevazelhes · 2019-05-16T12:10:55Z

I just pushed an example that uses make_classification which I guess looks pretty good now,
I just needed to add a scaling to the noise components of the dataset by hand (to mix more the initial points while still making them easy to fit for metric learning), because it's not an argument of make_classification I think, and I also tried to add a lot more of noise components instead but it does not seem as efficient (I need to add more than hundreds of them to see the effect so I'm not sure it's a good idea given that we have 100 samples (I could add more samples but then the example takes more time to run))

@bellet I also wrote something about the fact that some algorithms don't try to cluster all points from the same class in one same cluster, while others implicitly do that, which can be seen quite clearly from these examples (since every class has a distribution with 2 clusters: n_clusters_per_class=2), as we discussed. I just wrote a paragraph because I think we see the effect quite clearly on the examples, so maybe there's no need to put the two figures of applying the algorithms to one dataset with 1 cluster per class, and a dataset with several clusters per class as mentioned?
Or maybe there is need because it's good to have multi-cluster cases where algorithms that enforce mono-clustering fail clearly ? But then I don't know exactly how to tweak the dataset to do that (since now these algorithms mostly manage to group the data in mono-clusters), maybe with less dimensions, to give less degrees of freedom to the algorithms ?

If I understood correctly, the algorithm that don't cluster similar points in a unique cluster are LMNN, NCA, and LFDA, is that right ?

wdevazelhes · 2019-05-16T13:48:51Z

If these changes are fine I think we are good to merge

bellet · 2019-05-17T13:59:20Z

I just had a quick look and it looks good! I have a few nitpick suggestions on the wording and presentation, will write a review for that asap

bellet

Some improvements.

Maybe we should add the missing algorithms? (eg MMC, MLKR)

bellet · 2019-05-27T15:54:49Z

examples/plot_metric_examples.py

+visualisation which can help understand which algorithm might be best
+suited for you.
+
+Of course, depending on the data set and the constraints your results


I would remove this paragraph

Agreed, done

bellet · 2019-05-27T15:56:47Z

examples/plot_metric_examples.py

+~~~~~~~~~~~~~~~~~~~~~~
+
+This is a small walkthrough which illustrates all the Metric Learning
+algorithms implemented in metric-learn, and also does a quick


maybe rather "with some visualizations to provide intuitions into what they are designed to achieve".

Agreed, done

bellet · 2019-05-27T15:57:03Z

examples/plot_metric_examples.py

+~~~~~~~~~~~~~~~~~~~~~~
+
+This is a small walkthrough which illustrates all the Metric Learning
+algorithms implemented in metric-learn, and also does a quick


also I would mention that this is done on synthetic data

Agreed, done

bellet · 2019-05-27T15:57:17Z

examples/plot_metric_examples.py

+"""
+
+# License: BSD 3 clause
+# Authors: Bhargav Srinivasa Desikan <bhargavvader@gmail.com>


you should add yourself as an author

Agreed, done

bellet · 2019-05-27T15:57:44Z

examples/plot_metric_examples.py

+# Loading our data-set and setting up plotting
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# 
+# We will be using the IRIS data-set to illustrate the plotting. You can


this is not iris anymore. update and link to make_classification documentation

Thanks, I forgot to update this indeed

bellet · 2019-05-27T16:11:55Z

examples/plot_metric_examples.py

+Algorithms walkthrough
+~~~~~~~~~~~~~~~~~~~~~~
+
+This is a small walkthrough which illustrates all the Metric Learning


not all actually

Agreed, I will add MMC, but I don't know for MLKR, I think I'll just talk about it when describing NCA ? Since the cost function is very similar except that MLKR uses a soft-nearest neighbors for regression as I understood

true, MLKR is for regression. it is a good idea to mention it there as proposed
maybe change "all the metric learning algorithms" to "most metric learning algorithms"

In the end, I added most metric learning algorithms because we don't talk about Covariance, (we could maybe ?) And I added MLKR with make_regression task because I found the results were pretty cool too! Even if it breaks a bit the outline because it's just between supervised algorithms and the "constraints" section... Tell me what you think ?

bellet · 2019-05-27T16:13:27Z

examples/plot_metric_examples.py

+# 
+# Implements an efficient sparse metric learning algorithm in high
+# dimensional space via an :math:`l_1`-penalised log-determinant
+# regularization. Compare to the most existing distance metric learning


Indeed, done

bellet · 2019-05-27T16:14:16Z

examples/plot_metric_examples.py

+# Neighborhood Components Analysis
+# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+# 
+# NCA is an extrememly popular metric-learning algorithm, and one of the


extremely
maybe remove last part of the sentence

Thanks, done for extremely
For the last part, is it because it can make people that don't know the algorithm think that if it's one of the first few it might be not cutting edge/outdated ?
Done

just that it is maybe not such a relevant thing to mention here. you could also argue this for LMNN, and others like MMC

That's right, they are also one of the first ones indeed

bellet · 2019-05-27T16:17:54Z

examples/plot_metric_examples.py

+
+
+######################################################################
+# Manual Constraints


We do not say explicitly what is meant by "constraint" before so I think this part is quite confusing

Also I think we want to insist on the fact that many metric learning algorithms only need the weak supervision given by constraints, not labels. In many applications, this is easier to obtain.

I agree, I'll reformulate this paragraph, let me know what you think

I think we can also delete the num_constraints=200 in all *_Supervised algorithms what do you think ?

i guess so, unless it increases the computation time too much

That's right, I tried and it tooked almost the same time, so I removed them

bellet · 2019-05-27T16:20:14Z

examples/plot_metric_examples.py

+# it's worth one's while to poke around in the constraints.py file to see
+# how exactly this is going on.
+# 
+# This brings us to the end of this tutorial! Have fun Metric Learning :)


maybe also add that metric-learn is compatible with sklearn and so we can easily do model selection, cross validation, scoring etc and refer to the doc for more details.

I agree, done

wdevazelhes · 2019-05-29T10:18:43Z

@bellet Thanks for your review, I addressed your comments

bellet

Some additional small comments for improvement.

Very nice example!!! Looking forward to merge it

bellet · 2019-05-29T16:08:28Z

examples/plot_metric_examples.py

+# distance in the input space, in which the contribution of the noisy
+# features is high. So even if points from the same class are close to
+# each other in some subspace of the input space, this is not the case in the
+# total input space.


maybe "this is not the case when considering all dimensions of the input space"

Agreed, done

bellet · 2019-05-29T16:09:05Z

examples/plot_metric_examples.py

+# good literature review of Metric Learning.
+#
+# We will briefly explain the metric-learning algorithms implemented by
+# metric-learn, before providing some examples for it's usage, and also


Thanks, done

bellet · 2019-05-29T16:15:42Z

examples/plot_metric_examples.py

+# 
+# Basically, we learn this distance:
+# :math:`D(x,y)=\sqrt{(x-y)\,M^{-1}(x-y)}`. And we learn this distance by
+# learning a Matrix :math:`M`, based on certain constraints.


there is repetition here "we learn this distance"

add very quick explanation of what is meant by "constraint" (can be inspired from What is metric learning? page). For instance something like, "we learn the parameters :math:M of this distance to satisfy certain constraints on the distance between points, for example requiring that points of the same class are close together and points of different class are far away."

Agreed, done

bellet · 2019-05-29T16:17:16Z

examples/plot_metric_examples.py

+#
+# We will briefly explain the metric-learning algorithms implemented by
+# metric-learn, before providing some examples for it's usage, and also
+# discuss how to go about doing manual constraints.


remove manual constraint and replace by something like "discuss how to perform metric learning with weaker supervision than class labels"

Agreed, done

bellet · 2019-05-29T16:29:40Z

examples/plot_metric_examples.py

+plot_tsne(X_rca, Y)
+
+######################################################################
+# Metric Learning for Kernel Regression


Regression example: Metric Learning for Kernel Regression

bellet · 2019-05-29T16:45:11Z

examples/plot_metric_examples.py

+# going to go ahead and assume that two points labelled the same will be
+# closer than two points in different labels.
+#
+# Do keep in mind that we are doing this method because we know the labels


move this above, right after the sentence saying that we are going to create constraints from the labels

bellet · 2019-05-29T16:45:57Z

examples/plot_metric_examples.py

+
+
+######################################################################
+# Using our constraints, let's now train ITML again. We should keep in


not sure the last sentence is needed (already said that before)

bellet · 2019-05-29T16:46:33Z

examples/plot_metric_examples.py

+######################################################################
+# And that's the result of ITML after being trained on our manual
+# constraints! A bit different from our old result but not too different.
+# We can also notice that it might be better to rely on the randomised


this last sentence is not very clear. i would remove it

Agreed, done

bellet · 2019-05-29T16:47:40Z

examples/plot_metric_examples.py

+# also compatible with scikit-learn, since their input dataset format described
+# above allows to be sliced along the first dimension when doing
+# cross-validations (see also this :ref:`section <sklearn_compat_ws>`). See
+# also some :ref:`use cases <use_cases>` where you could use scikit-learn


where you could combine metric learning with scikit-learn estimators

bellet · 2019-05-29T16:47:57Z

examples/plot_metric_examples.py

+# pipeline or cross-validation procedure. And weakly-supervised estimators are
+# also compatible with scikit-learn, since their input dataset format described
+# above allows to be sliced along the first dimension when doing
+# cross-validations (see also this :ref:`section <sklearn_compat_ws>`). See


avoid repetition of "see"

wdevazelhes · 2019-06-03T16:04:58Z

Thanks for the review @bellet, I addressed all your comments

bellet · 2019-06-05T09:18:05Z

Thanks! I have pushed a few small updates myself to avoid another reviewing cycle. I think we can merge once CI passes

Export notebook to gallery

529c8c1

Merge branch 'master' into feat/export_nbexamples_to_gallery

4beb532

Merge branch 'master' into feat/export_nbexamples_to_gallery

a6eff40

Fix the figure number in order to get the image printed in the logo o…

8aadc42

…f the example

William de Vazelhes added 2 commits April 2, 2019 16:21

Merge branch 'master' into feat/export_nbexamples_to_gallery

464f9b8

wip replace dataset by faces

8af4ff5

Finalize notebook

27fbd8b

wdevazelhes changed the title ~~[WIP] Export notebook to gallery~~ [MRG] Export notebook to gallery Apr 10, 2019

bellet requested changes Apr 16, 2019

View reviewed changes

Merge branch 'master' into feat/export_nbexamples_to_gallery

fed3346

perimosocordiae added this to the v0.5.0 milestone May 10, 2019

This was referenced May 13, 2019

LSML fails on some classical inputs #202

Closed

[MRG] FIX: fix lsml inversion #203

Closed

William de Vazelhes added 2 commits May 16, 2019 12:57

change dataset for make_classification

b0ee23b

Add comments on the properties of the algorithms

34e29a2

bellet approved these changes May 17, 2019

View reviewed changes

wdevazelhes mentioned this pull request May 21, 2019

[MRG] FIX LMNN gradient and cost function #201

Merged

bellet requested changes May 27, 2019

View reviewed changes

William de Vazelhes added 2 commits May 28, 2019 16:20

Merge branch 'master' into feat/export_nbexamples_to_gallery

a2b11c3

Address scikit-learn-contrib#180 (review)

b6dacb8

bellet approved these changes May 29, 2019

View reviewed changes

Address scikit-learn-contrib#180 (review)

edd3a54

a few updates and minor corrections

0d623b5

bellet merged commit fbd92ff into scikit-learn-contrib:master Jun 5, 2019

bellet mentioned this pull request Jun 7, 2019

Put the metric_plotting example into a doc in sphinx-gallery #153

Closed



		######################################################################
		# Manual Constraints



		######################################################################
		# Using our constraints, let's now train ITML again. We should keep in

[MRG] Export notebook to gallery #180

[MRG] Export notebook to gallery #180

Conversation

wdevazelhes commented Mar 8, 2019 • edited

bellet commented Mar 13, 2019

bhargavvader commented Mar 14, 2019

bellet commented Mar 14, 2019

wdevazelhes commented Mar 14, 2019

wdevazelhes commented Mar 14, 2019

wdevazelhes commented Mar 14, 2019 • edited

bellet commented Mar 14, 2019

wdevazelhes commented Mar 18, 2019

wdevazelhes commented Apr 3, 2019

wdevazelhes commented Apr 4, 2019

bellet commented Apr 4, 2019

wdevazelhes commented Apr 5, 2019 • edited

wdevazelhes commented Apr 10, 2019

wdevazelhes commented Apr 10, 2019

perimosocordiae commented Apr 11, 2019

bellet left a comment • edited

Choose a reason for hiding this comment

wdevazelhes commented May 16, 2019

wdevazelhes commented May 16, 2019

bellet commented May 17, 2019 • edited

bellet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented May 29, 2019

bellet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdevazelhes commented Jun 3, 2019

bellet commented Jun 5, 2019

wdevazelhes commented Mar 8, 2019 •

edited

wdevazelhes commented Mar 14, 2019 •

edited

wdevazelhes commented Apr 5, 2019 •

edited

bellet left a comment •

edited

bellet commented May 17, 2019 •

edited