Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Enhance documentation #208

Merged
merged 21 commits into from Jul 3, 2019

Conversation

wdevazelhes
Copy link
Member

@wdevazelhes wdevazelhes commented May 22, 2019

This PR enhances the documentation by fixing issues about the doc and adding other improvements

Fixes #155
Fixes #149
Fixes #150
Fixes #135

TODO:

@wdevazelhes wdevazelhes changed the title [WIP] Add link to algorithm in the title of sections [WIP] May 22, 2019
@wdevazelhes wdevazelhes changed the title [WIP] [WIP] Enhance documentation May 22, 2019
@wdevazelhes wdevazelhes added this to the v0.5.0 milestone Jun 7, 2019
@wdevazelhes
Copy link
Member Author

wdevazelhes commented Jun 11, 2019

I just made a new commit that did the following:

  • removed all the docstrings from init and put them in the description of the class, like in scikit-learn
  • added some templates, which allows to remove all the files like metric_learn.itml.rst with the same structure, taking inspiration from imbalanced-learn (https://imbalanced-learn.readthedocs.io/en/stable/)
  • added a config in the templates so that we see for each algo in which example(s) it appears, with a mini-picture like in scikit-learn
  • add references for NCA, LSML in the docstring (they were missing)
  • define a link for array-like types, when used in the documentation
  • I still have a problem with LMNN, since it is an "alias of python_LMNN" according to sphinx. I think removing shogun LMNN (PR Error with modshogun #192) will solve this

@bellet
Copy link
Member

bellet commented Jun 12, 2019

Maybe also check for broken links to fix #212?

@bellet
Copy link
Member

bellet commented Jun 12, 2019

When trying to compile the doc on my local machine it crashes (didn't use to be the case)

Unexpected failing examples:
/home/aurelien/Documents/research/github/metric-learn/examples/plot_metric_learning_examples.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/aurelien/Documents/research/github/metric-learn/examples/plot_metric_learning_examples.py", line 225, in <module>
    X_sdml = sdml.fit_transform(X, y)
  File "/home/aurelien/anaconda3/envs/env_doc/lib/python3.7/site-packages/sklearn/base.py", line 465, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "/home/aurelien/Documents/research/github/metric-learn/metric_learn/sdml.py", line 371, in fit
    return _BaseSDML._fit(self, pairs, y)
  File "/home/aurelien/Documents/research/github/metric-learn/metric_learn/sdml.py", line 125, in _fit
    raise RuntimeError(msg)
RuntimeError: There was a problem in SDML when using scikit-learn's graphical lasso solver. skggm's graphical lasso can sometimes converge on non SPD cases where scikit-learn's graphical lasso fails to converge. Try to install skggm and rerun the algorithm (see the README.md for the right version of skggm). The following error message was thrown: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this solver.

@wdevazelhes
Copy link
Member Author

When trying to compile the doc on my local machine it crashes (didn't use to be the case)

That's right, it worked on my machine but it was because I had skggm installed so I didn't see it: in fact it's because I didn't update plot_metric_example to have the same priors as before. So I did it for SDML (and LSML too, since it used too 'covariance') (I didn't update the other with a scaled identity etc since it works without it and adds more confusion than anything)
It should work now
There's just a pb which is that the first example (with LMNN) doesn't give nice results as before, with the 'identity' init, and I don't understand why... Using the 'random' init, it converges well though... I'll keep it with 'random', but in the meantime I'll try to make the example run with the old LMNN implem (the one even before any modif to the gradient, or the loop etc), to see if it worked or not

@wdevazelhes
Copy link
Member Author

I checked, and with this commit 4b889d4 for lmnn, there is the same problem, so it would mean that the wrong gradient actually generated a prettier picture.. Do you think it's possible ? If so I guess we can use init='random' then... (which is even looking better actually than what it was with the wrong gradient so this would mean the good gradient is still good, and it's just the 'identity' initialization that was pretty bad)

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments after browsing the doc:

  • in Package Overview > Module contents, metric_learn.base_metric is missing a decription
  • in the left menu Package Overview > Base classes, it says metric-learn module instead of the corresponding name (base_metric, constraints)
  • in the User Guide, the algorithms sections for pairs and quadruplets should appear as subsubsection? (within pair/quadruplet)?

@bellet
Copy link
Member

bellet commented Jun 12, 2019

Regarding LMNN, can you double check what happens just before the gradient correction? Were we using identity initialization then?
Also maybe it's good to run in verbose mode to see what happens in terms of convergence? Maybe it is a matter of learning rate?

@wdevazelhes
Copy link
Member Author

@perimosocordiae any reason why adding numpydoc_show_class_members = False in doc/conf.py (in d8726d3) ? If we set it to True (or remove it), we can have a mini-list of all the methods of the estimators, with a link and a quick description on the side, like in scikit-learn for instance. I think it's cool since it removes some potential scrolling through all the functions to find the one we want

William de Vazelhes added 2 commits June 14, 2019 14:02
@wdevazelhes
Copy link
Member Author

Some quick comments after browsing the doc:

* in Package Overview > Module contents, metric_learn.base_metric is missing a decription

* in the left menu Package Overview > Base classes, it says metric-learn module instead of the corresponding name (base_metric, constraints)

* in the User Guide, the algorithms sections for pairs and quadruplets should appear as subsubsection? (within pair/quadruplet)?

Thanks for the review @bellet, I'll change that

@wdevazelhes
Copy link
Member Author

Regarding LMNN, can you double check what happens just before the gradient correction? Were we using identity initialization then?

You mean the correction in #201 ? Yes, it was using identity initialization (see this line: https://github.com/wdevazelhes/metric-learn/blob/69c328cf7e05d374e6d7b19c78f377765752f8c1/metric_learn/lmnn.py#L69)

Also maybe it's good to run in verbose mode to see what happens in terms of convergence? Maybe it is a matter of learning rate?

That's right, will do

@perimosocordiae
Copy link
Contributor

numpydoc_show_class_members = False

I think this was added because it made the docs look cleaner at the time. If it's nicer with them, then go ahead.

@wdevazelhes
Copy link
Member Author

I think this was added because it made the docs look cleaner at the time. If it's nicer with them, then go ahead.

I see, here is a screenshot of the list of methods it does ("Methods" section), I find it pretty cool:
image

I'll check the whole doc in the end, to see if has bad side effects elsewhere, but if you agree with this, let's go with this flag

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Jun 14, 2019

Here is the experiment (I noticed that LDA init seemed to work much better)

from metric_learn import LMNN
from sklearn.datasets import make_classification
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=100, n_classes=3, n_clusters_per_class=2,
                           n_informative=3, class_sep=4., n_features=5,
                           n_redundant=0, shuffle=True, random_state=42)

tsne = TSNE(random_state=42)
X_e = tsne.fit_transform(X)
plt.figure()
plt.scatter(X_e[:, 0], X_e[:, 1], c=y)
plt.show()
lmnn = LMNN(verbose=True, random_state=42)
X_r = lmnn.fit_transform(X, y)
X_er = tsne.fit_transform(X_r)
plt.figure()
plt.scatter(X_er[:, 0], X_er[:, 1], c=y)
plt.show()
  • In this PR (with LMNN init='identity'):
    image generated for lmnn:
    image

last iterations:
(what is printed is the following:
it, objective, delta_obj, total_active, learn_rate)

453 104.74567998860198 -1.3463507264878558 77 8.979681847144e-06
454 103.29731193988061 -1.4483680487213633 84 9.069478665615439e-06
455 101.94927437486115 -1.3480375650194674 78 9.160173452271594e-06
456 100.72146516279783 -1.2278092120633204 89 9.25177518679431e-06
457 99.45553175935129 -1.2659334034465388 79 9.344292938662254e-06
458 98.21052261560712 -1.24500914374417 88 9.437735868048877e-06
459 96.9294451186718 -1.2810774969353247 86 9.532113226729365e-06
460 95.78693771659128 -1.1425074020805113 90 9.62743435899666e-06
461 94.73256689403442 -1.0543708225568622 88 9.723708702586626e-06
462 93.65641352710674 -1.0761533669276844 92 9.820945789612493e-06
463 92.67390434072257 -0.9825091863841635 91 9.919155247508618e-06
464 91.7679202013785 -0.9059841393440706 96 1.0018346799983704e-05
465 90.90644559581435 -0.8614746055641547 95 1.011853026798354e-05
466 90.28811422787163 -0.6183313679427158 101 1.0219715570663375e-05
467 89.57167977951548 -0.7164344483561536 96 1.0321912726370009e-05
468 88.70463506102075 -0.867044718494725 103 1.0425131853633708e-05
469 88.13490705251617 -0.5697280085045833 104 1.0529383172170046e-05
470 87.45226005065167 -0.6826470018644955 105 5.317338501945873e-06
471 87.05830061797678 -0.3939594326748903 108 5.370511886965332e-06
472 86.7362670265745 -0.32203359140228827 110 5.424217005834985e-06
473 86.45762127369387 -0.278645752880621 111 5.478459175893335e-06
474 86.18265095892295 -0.2749703147709255 111 5.533243767652269e-06
475 85.90920964699129 -0.27344131193166277 113 5.588576205328791e-06
476 85.64365626369354 -0.2655533832977426 114 5.644461967382079e-06
477 85.37496859459176 -0.2686876691017801 115 5.7009065870559e-06
478 85.12046201301472 -0.2545065815770471 116 5.757915652926459e-06
479 84.88406025857753 -0.23640175443718192 119 5.815494809455724e-06
480 84.6738895283177 -0.2101707302598328 122 5.873649757550281e-06
481 84.49411323559426 -0.17977629272344586 122 5.9323862551257835e-06
482 84.33144166933955 -0.16267156625470136 124 5.991710117677041e-06
483 84.14423126167769 -0.18721040766186547 124 6.051627218853812e-06
484 83.99980355652416 -0.14442770515353232 126 6.11214349104235e-06
485 83.82213513522392 -0.17766842130023974 128 6.1732649259527735e-06
486 83.74329025658584 -0.07884487863807976 130 6.234997575212301e-06
487 83.70792253945561 -0.035367717130228016 128 6.297347550964424e-06
488 83.54144276494694 -0.16647977450867302 136 6.360321026474068e-06
489 83.54010926309243 -0.0013335018545035382 130 6.423924236738809e-06
490 83.31650276852642 -0.22360649456601323 137 6.488163479106197e-06
491 83.11033771624325 -0.2061650522831684 139 3.2765225569486294e-06
492 83.04959434396469 -0.060743372278565744 141 3.309287782518116e-06
493 83.0187708667051 -0.030823477259588117 144 3.342380660343297e-06
494 82.99389418609624 -0.024876680608855395 143 3.3758044669467302e-06
495 82.9668075792149 -0.027086606881340458 144 3.4095625116161974e-06
496 82.9437710921048 -0.02303648711010453 144 3.4436581367323594e-06
497 82.91799919478038 -0.02577189732441809 145 3.478094718099683e-06
498 82.89374326286816 -0.024255931912222195 145 3.51287566528068e-06
499 82.870725252266 -0.0230180106021578 146 3.548004421933487e-06
500 82.84671813111342 -0.024007121152578748 146 3.583484466152822e-06
501 82.82285292100909 -0.023865210104332846 146 3.6193193108143504e-06
502 82.79906257557955 -0.023790345429532067 146 3.655512503922494e-06
503 82.77529915537177 -0.023763420207785657 146 3.6920676289617187e-06
504 82.75152825160514 -0.023770903766632046 146 3.728988305251336e-06
505 82.72821393540934 -0.02331431619579405 147 3.7662781883038493e-06
506 82.70704708765778 -0.02116684775155875 146 3.803940970186888e-06
507 82.68915710390633 -0.017889983751459226 147 3.841980379888757e-06
508 82.66808281156278 -0.021074292343541856 146 3.880400183687645e-06
509 82.64334923240598 -0.024733579156801966 148 3.919204185524521e-06
510 82.62296232247026 -0.020386909935723452 147 3.9583962273797664e-06
511 82.6039231866985 -0.01903913577176297 149 3.997980189653564e-06
512 82.58218464403137 -0.02173854266712283 149 4.0379599915501e-06
513 82.56465829224732 -0.01752635178405626 147 4.078339591465601e-06
514 82.54993670016289 -0.01472159208442747 151 4.119122987380257e-06
515 82.5408164391982 -0.009120260964692761 148 4.16031421725406e-06
516 82.52447441361099 -0.016342025587206876 152 4.201917359426601e-06
517 82.500522281841 -0.023952131769988227 148 4.243936533020867e-06
518 82.50041743645401 -0.00010484538698563028 154 4.2863758983510754e-06
LMNN converged with objective 82.50041743645401
  • In this PR (with LMNN init='random' (in fact in this example I need to tweak the random seed to have a good init, so it works well with the random_seed equals 0)

image

details of the cost function

600 98.44112913540454 -0.0626524079326316 156 2.423164585391844e-06
601 98.38448259239819 -0.056646543006351635 156 2.4473962312457624e-06
602 98.329982675241 -0.05449991715718738 156 2.47187019355822e-06
603 98.26253289452787 -0.06744978071313312 156 2.496588895493802e-06
604 98.19890035393132 -0.06363254059654366 158 2.52155478444874e-06
605 98.14156901373599 -0.057331340195332814 157 2.5467703322932274e-06
606 98.09519312815529 -0.04637588558070149 162 2.57223803561616e-06
607 98.04009558727778 -0.05509754087751162 156 2.5979604159723214e-06
608 97.98092790254665 -0.059167684731121994 160 2.6239400201320447e-06
609 97.91620490669078 -0.06472299585587393 159 2.650179420333365e-06
610 97.86460165493 -0.05160325176078118 157 2.6766812145366987e-06
611 97.80899937025455 -0.05560228467544448 160 2.703448026682066e-06
612 97.7536757584408 -0.055323611813747675 157 2.7304825069488863e-06
613 97.71989306004798 -0.03378269839282666 163 2.7577873320183754e-06
614 97.64913892418582 -0.07075413586215973 159 2.785365205338559e-06
615 97.61442815497537 -0.03471076921044869 163 2.8132188573919447e-06
616 97.54834407435186 -0.06608408062351145 161 2.8413510459658643e-06
617 97.49974551550274 -0.04859855884912179 163 2.869764556425523e-06
618 97.45679086085032 -0.04295465465241932 163 2.898462201989778e-06
619 97.40832543192246 -0.048465428927855214 163 2.927446824009676e-06
620 97.36739930456083 -0.04092612736162948 163 2.9567212922497724e-06
621 97.32228644852171 -0.045112856039125404 164 2.9862885051722702e-06
622 97.2799466120978 -0.04233983642390626 167 3.016151390223993e-06
623 97.24512474281501 -0.03482186928279418 165 3.0463129041262327e-06
624 97.20412098256091 -0.041003760254099575 167 3.076776033167495e-06
625 97.16895089603821 -0.035170086522697375 169 3.10754379349917e-06
626 97.14203919611799 -0.02691169992021969 169 3.1386192314341614e-06
627 97.10535145443599 -0.03668774168200173 168 3.170005423748503e-06
628 97.07851356938411 -0.026837885051875787 170 3.2017054779859883e-06
629 97.04141583448116 -0.03709773490295731 169 3.2337225327658483e-06
630 97.01147521663955 -0.029940617841603512 171 3.266059758093507e-06
631 96.98226898374784 -0.02920623289170976 171 3.298720355674442e-06
632 96.95024766145004 -0.0320213222978083 171 3.3317075592311864e-06
633 96.9216711751098 -0.028576486340242013 171 3.3650246348234983e-06
634 96.8946971080521 -0.026974067057693674 174 3.3986748811717334e-06
635 96.87031370390602 -0.024383404146078647 175 3.432661629983451e-06
636 96.84534434454778 -0.024969359358237853 174 3.4669882462832854e-06
637 96.82049661417624 -0.024847730371547527 174 3.5016581287461183e-06
638 96.79878322495745 -0.021713389218788848 175 3.5366747100335797e-06
639 96.77303997305813 -0.025743251899314146 175 3.5720414571339157e-06
640 96.75299315449539 -0.02004681856274715 174 3.6077618717052547e-06
641 96.7387249569424 -0.0142681975529797 177 3.6438394904223074e-06
642 96.71557961557377 -0.023145341368632444 175 3.6802778853265306e-06
643 96.69598711022766 -0.01959250534611101 180 3.717080664179796e-06
644 96.68098728734618 -0.014999822881478053 176 3.754251470821594e-06
645 96.66390739818982 -0.017079889156363492 179 3.79179398552981e-06
646 96.64934371059378 -0.014563687596037767 179 3.8297119253851085e-06
647 96.64003499407758 -0.009308716516201798 181 3.8680090446389596e-06
648 96.6303106254114 -0.009724368666184091 179 3.906689135085349e-06
649 96.62299584292244 -0.007314782488961669 181 3.945756026436202e-06
650 96.60687340584624 -0.01612243707619143 180 3.985213586700564e-06
651 96.59895525614554 -0.007918149700699928 181 2.012532861283785e-06
652 96.59267680946355 -0.00627844668198918 180 2.0326581898966232e-06
653 96.58615307277594 -0.0065237366876118585 181 2.0529847717955897e-06
654 96.5801502125843 -0.006002860191642867 181 2.0735146195135454e-06
655 96.5742403319832 -0.005909880601095097 181 2.094249765708681e-06
656 96.56986005712746 -0.004380274855748212 182 2.1151922633657675e-06
657 96.56499579123039 -0.004864265897069231 181 2.136344185999425e-06
658 96.55828766599973 -0.006708125230659334 182 2.1577076278594195e-06
659 96.55326357519269 -0.005024090807040693 181 2.179284704138014e-06
660 96.55193601285254 -0.0013275623401511893 183 1.100538775589697e-06
661 96.54945167118518 -0.0024843416673547836 182 1.111544163345594e-06
662 96.54648034908634 -0.002971322098844098 184 1.1226596049790499e-06
663 96.543778101915 -0.0027022471713422647 183 1.1338862010288403e-06
664 96.54367106770049 -0.00010703421450841688 184 1.431531328798911e-07
LMNN converged with objective 96.54367106770049

In commit d8726d3 (the init is identity):

Here is the figure obtained:

image

And here are the iterations:

934 450.896707625 0.236238817483 111 4.74471122352e-178
935 450.896707625 0.236238817483 111 2.37235561176e-178
936 450.896707625 0.236238817483 111 1.18617780588e-178
937 450.896707625 0.236238817483 111 5.9308890294e-179
938 450.896707625 0.236238817483 111 2.9654445147e-179
939 450.896707625 0.236238817483 111 1.48272225735e-179
940 450.896707625 0.236238817483 111 7.41361128675e-180
941 450.896707625 0.236238817483 111 3.70680564338e-180
942 450.896707625 0.236238817483 111 1.85340282169e-180
943 450.896707625 0.236238817483 111 9.26701410844e-181
944 450.896707625 0.236238817483 111 4.63350705422e-181
945 450.896707625 0.236238817483 111 2.31675352711e-181
946 450.896707625 0.236238817483 111 1.15837676356e-181
947 450.896707625 0.236238817483 111 5.79188381778e-182
948 450.896707625 0.236238817483 111 2.89594190889e-182
949 450.896707625 0.236238817483 111 1.44797095444e-182
950 450.896707625 0.236238817483 111 7.23985477222e-183
951 450.896707625 0.236238817483 111 3.61992738611e-183
952 450.896707625 0.236238817483 111 1.80996369306e-183
953 450.896707625 0.236238817483 111 9.04981846528e-184
954 450.896707625 0.236238817483 111 4.52490923264e-184
955 450.896707625 0.236238817483 111 2.26245461632e-184
956 450.896707625 0.236238817483 111 1.13122730816e-184
957 450.896707625 0.236238817483 111 5.6561365408e-185
958 450.896707625 0.236238817483 111 2.8280682704e-185
959 450.896707625 0.236238817483 111 1.4140341352e-185
960 450.896707625 0.236238817483 111 7.070170676e-186
961 450.896707625 0.236238817483 111 3.535085338e-186
962 450.896707625 0.236238817483 111 1.767542669e-186
963 450.896707625 0.236238817483 111 8.837713345e-187
964 450.896707625 0.236238817483 111 4.4188566725e-187
965 450.896707625 0.236238817483 111 2.20942833625e-187
966 450.896707625 0.236238817483 111 1.10471416812e-187
967 450.896707625 0.236238817483 111 5.52357084062e-188
968 450.896707625 0.236238817483 111 2.76178542031e-188
969 450.896707625 0.236238817483 111 1.38089271016e-188
970 450.896707625 0.236238817483 111 6.90446355078e-189
971 450.896707625 0.236238817483 111 3.45223177539e-189
972 450.896707625 0.236238817483 111 1.72611588769e-189
973 450.896707625 0.236238817483 111 8.63057943847e-190
974 450.896707625 0.236238817483 111 4.31528971924e-190
975 450.896707625 0.236238817483 111 2.15764485962e-190
976 450.896707625 0.236238817483 111 1.07882242981e-190
977 450.896707625 0.236238817483 111 5.39411214905e-191
978 450.896707625 0.236238817483 111 2.69705607452e-191
979 450.896707625 0.236238817483 111 1.34852803726e-191
980 450.896707625 0.236238817483 111 6.74264018631e-192
981 450.896707625 0.236238817483 111 3.37132009315e-192
982 450.896707625 0.236238817483 111 1.68566004658e-192
983 450.896707625 0.236238817483 111 8.42830023288e-193
984 450.896707625 0.236238817483 111 4.21415011644e-193
985 450.896707625 0.236238817483 111 2.10707505822e-193
986 450.896707625 0.236238817483 111 1.05353752911e-193
987 450.896707625 0.236238817483 111 5.26768764555e-194
988 450.896707625 0.236238817483 111 2.63384382278e-194
989 450.896707625 0.236238817483 111 1.31692191139e-194
990 450.896707625 0.236238817483 111 6.58460955694e-195
991 450.896707625 0.236238817483 111 3.29230477847e-195
992 450.896707625 0.236238817483 111 1.64615238924e-195
993 450.896707625 0.236238817483 111 8.23076194618e-196
994 450.896707625 0.236238817483 111 4.11538097309e-196
995 450.896707625 0.236238817483 111 2.05769048654e-196
996 450.896707625 0.236238817483 111 1.02884524327e-196
997 450.896707625 0.236238817483 111 5.14422621636e-197
998 450.896707625 0.236238817483 111 2.57211310818e-197
999 450.896707625 0.236238817483 111 1.28605655409e-197
LMNN didn't converge in 1000 steps.

Note that the fact that the objective function stays constant and the gradient too was a known bug (fixed in #100) that was not fixed at that time.

The old LMNN seemed to perform quite well (visually) with respect to new ones, no matter the initialization, but note that the cost function seems better in this PR. I also tried to do less iterations (20) and have a bigger learning (1e-3 instead of 1e-7) rate and it seems to get better visual results (though with a cost function value still high like 1673).

I checked the default parameters and they are the same, so it's weird that the behaviour is different... I'll try to understand why

@bellet
Copy link
Member

bellet commented Jun 14, 2019

Did you double check that the dataset is exactly the same and that target neighbors and impostors are also the same in each case? If so, the objective function is the same and the current code achieves smaller value so it works better (despite the poorer visualization). Did you try to tweak the parameters of t-SNE (e.g., perplexity) to see whether this is simply a visualization issue?

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Jun 17, 2019

Did you double check that the dataset is exactly the same and that target neighbors and impostors are also the same in each case? If so, the objective function is the same and the current code achieves smaller value so it works better (despite the poorer visualization). Did you try to tweak the parameters of t-SNE (e.g., perplexity) to see whether this is simply a visualization issue?

Yes, the dataset is the same (I used a random seed and visually it looked the same (TSNE with a random seed too)
Yes, I tried to change a bit the visualization but it didn't improve, but also one thing I noticed is that during the convergence (for the case that does not work well), the number of active constraints diminishes (and early visualization is indeed nice), but at one point the number of active constraints grows again, which sounds counter intuitive to me... Looking at the details of the objective at this point in training the pull loss improves (decreases), whereas the push loss increases (but from less than the pull loss)... So this would mean that the algorithm finds directions where it can gain more (i.e. have a lower loss) by decreasing the distance between target neighbors, even if it means having new impostors...
Decreasing regularization (hence the coeff in front of the push loss) allows to solve this problem, and gives good visualization....

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Jun 17, 2019

I am not sure about that but I think there might be a problem with the impostors:
Here is a zoomed plot of the MDS (maybe more stable than tsne) embedding of the points, at the end of training, where the black edges are the ones in impostors.
image
Note that there are red-green couples that have no edges, meaning they are not an "impostor" pair.

I am not sure yet what the the variable impostor represent (so maybe the above figure is OK), but I notice weird things: first of all, it is computed at the first iteration and not after that, and depending on some distances with targets neighbors etc (it "looks" a bit like the operation of computing active constraints).
However, a weird thing is that then in loss_grad, there are other computations based on the distances comparisons with the margin, but where we only look at "impostors" (not all pairs of different labels points).

If for instance I "tweak" the find_impostors method to include all actual pair of points from different labels (and not only some based on some distance comparison) (for instance by saying the margin is huge:

margin_radii = 1 + _inplace_paired_L2(Lx[furthest_neighbors], Lx)
margin_radii *= 1000000000

It seems to work

And it kind of makes sense to me: maybe in loss_grad then the active points computation looks amongst all global impostors (pairs of different labels points), which ones are activated ?

@wdevazelhes
Copy link
Member Author

@perimosocordiae @bellet I am not very fast in understanding/parsing the lmnn code, so I wonder, should I continue or should we focus on other things for the release ?

@perimosocordiae
Copy link
Contributor

I'd really prefer that we're as confident as we can be in the correctness of our LMNN implementation before we release.

Can we compare against the scikit-learn version?

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Jun 18, 2019

I'd really prefer that we're as confident as we can be in the correctness of our LMNN implementation before we release.

Can we compare against the scikit-learn version?

That makes sense
After thinking about it if I understand correctly in fact the _find_impostors functions must be used to select only the impostors (i.e. pairs of different labels points that have a distance smaller than the furthest neighbor of one of the points in the pair, i.e. a sort of "preselection" step), in order to compute the push loss only on them and not on every different labels pairs. So this makes sense indeed
Then the figure above (https://user-images.githubusercontent.com/31916524/59609842-746c2d80-9118-11e9-89e6-3eb4e509087e.png) makes sense: because the impostors are not recomputed at every step (they are in the scikit-learn PR), some different labels pairs can be kepts as non-impostors and the algorithm can find metrics that don't put them far away from each other

So I guess recomputing the impostors at every step in the TODO at the top of lmnn.py would be a way to solve that.

@perimosocordiae @bellet should I begin a PR to do that ?

@wdevazelhes
Copy link
Member Author

However, I still don't understand why the number of active constraints was growing at the end of the training...

@wdevazelhes
Copy link
Member Author

I'll update the PR #223, where I'll list possible solutions to the LMNN problem, but then in the meantime I'll not deal with this pb in this PR, so that we can merge it as soon as possible

@wdevazelhes
Copy link
Member Author

Some quick comments after browsing the doc:

* in Package Overview > Module contents, metric_learn.base_metric is missing a decription

* in the left menu Package Overview > Base classes, it says metric-learn module instead of the corresponding name (base_metric, constraints)

* in the User Guide, the algorithms sections for pairs and quadruplets should appear as subsubsection? (within pair/quadruplet)?

@bellet I fixed the description (first bullet point), and changed the sections into subsubsections (third bullet point). But for the second bullet point I don't remember if I fixed it already, I think I did, but just to be sure, now in the doc it's written like metric_learn.base_metric for instance. Did you mean it should be written only base_metric ? Because for all the methods etc it's always written metric_learn.MMC for instance

@wdevazelhes
Copy link
Member Author

wdevazelhes commented Jun 27, 2019

@bellet I fixed the description (first bullet point), and changed the sections into subsubsections (third bullet point). But for the second bullet point I don't remember if I fixed it already, I think I did, but just to be sure, now in the doc it's written like metric_learn.base_metric for instance. Did you mean it should be written only base_metric ? Because for all the methods etc it's always written metric_learn.MMC for instance

I've just seen, it's when we click on metric_learn.base_metric right ? It says metric_learn at top of the page indeed, I'll try to fix it

@wdevazelhes
Copy link
Member Author

Since there are a bit of nipticks to change (like uniformize the docstrings by ensuring we say (default=...) after each param, that the doctstring is well indented etc, I'll opened an issue for that, but it's not so urgent so I guess we can wait for the next release for that

@bellet
Copy link
Member

bellet commented Jun 27, 2019

Is this ready for review then?

@wdevazelhes wdevazelhes changed the title [WIP] Enhance documentation [MRG] Enhance documentation Jun 27, 2019
@wdevazelhes
Copy link
Member Author

Yes! I guess it is

Copy link
Member

@bellet bellet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes to make, otherwie LGTM. Nice work!


All supervised algorithms are scikit-learn `Estimators`, so they are
compatible with Pipelining and scikit-learn model selection routines.
Supervised Metric Learning Algorithms are the easiest metric-learn algorithms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to avoid "easiest" to avoid giving the impression that the other algorithms are complicated to use. Maybe simply "Supervised metric learning algorithms use the same API as scikit-learn"

.org/stable/glossary.html#term-array-like>`_ objects, `X` and `y`. `X`
should be a 2D array-like of shape `(n_samples, n_features)`, where
`n_samples` is the number of points of your dataset and `n_features` is the
number of attributes of each of your points. `y` should be a 1D array-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describing each point


.. todo:: Covariance is unsupervised, so its doc should not be here.
Here is an example of a dataset of two dogs and one
cat (the classes are 'dog' and 'cat') an animal being being represented by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double being


.. note::

If the metric learner that you use learns a Mahalanobis Matrix (like it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe link to the section in "What is metric learning?"


All supervised algorithms are scikit-learn `sklearn.base.Estimators`, and
`sklearn.base.TransformerMixin` so they are compatible with Pipelining and
scikit-learn model selection routines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe name a few sklearn model selection routines to make sure people see what is meant here

already in the order that points are given in the quadruplet.


The goal of weakly-supervised metric-learning algorithms is to transform
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is irrelevant here
should be replaced by a quick paragraph in the same spirit as for the section on pairs


.. _quadruplets_predicting:

Predicting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prediction

Scoring
-------

Not only are they able to predict the label of given pairs, they can also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for pairs


Not only are they able to predict the label of given pairs, they can also
return a `decision_function` for a set of pairs. It is basically the "score"
which sign will be taken to find the prediction for the pair. In fact this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This score corresponds to

of quadruplets have the right predicted ordering.

>>> lsml.score(quadruplets_test)
0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, would be better to have a good number here

@bellet
Copy link
Member

bellet commented Jun 28, 2019

Also there are a number of links to doc pages of other packages (scipy, sklearn) which do not work, it would be nice to fix them

@bellet
Copy link
Member

bellet commented Jun 28, 2019

Should we consider that the new part of the doc on constraint generation fixes #135 ?

@bellet
Copy link
Member

bellet commented Jul 3, 2019

For simplicity I'll merge this and make modifications myself in other PR

@bellet bellet merged commit 2dc9b90 into scikit-learn-contrib:master Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants