umap for supervised (metric) learning #415

icarmi · 2020-04-26T12:35:53Z

Hi,

I read through the tutorial for metric learning:
https://umap-learn.readthedocs.io/en/latest/supervised.html

I have two questions (that may be related, I'm not sure)

Is there an explanation somewhere of how umap uses the labels with the distance function to perform the supervised training?
Are there any hyper-parameters that are more useful in the metric learning (as opposed to unsupervised) that can help avoid over-fitting?

Thank you!

lmcinnes · 2020-04-28T04:34:58Z

Essentially umap takes the labels as a separate metric space (with a categorical distance on it), and tries to fold the two data and labels together by performing an intersection of the simplicial sets.

There are some hyper-parameters. The main one would be target_weight which provides some level of balance between how much weight is applied to the label vs data. A target_weight of 1.0 will put almost all the weight on the labels, while a target_weight of 0.0 will weight as much as can be managed in favour of the data.

icarmi · 2020-04-30T09:30:37Z

Awesome, thank you!!! :)

buhrmann · 2020-10-27T19:09:14Z

Hi @lmcinnes, just wondering if you could also explain how the transform() part of metric learning works, if it's not already mentioned somewhere else. I understand (intuitively) how labels are used during training (the intersection of separate graphs bit). But how does that then affect new data which doesn't have labels? I imagine new data points are embedded somehow based on their similarity to points in the trained graph (intersected sets)?

lmcinnes · 2020-11-02T22:38:03Z

@buhrmann you are essentially correct; it uses the learned graph of the data space (rather than the intersected graph) since we don't have labels for the new points. The assumption is that this structure is sufficient. It, of course, is not always the case, but it has been effective for several use cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

umap for supervised (metric) learning #415

umap for supervised (metric) learning #415

icarmi commented Apr 26, 2020

lmcinnes commented Apr 28, 2020

icarmi commented Apr 30, 2020

buhrmann commented Oct 27, 2020

lmcinnes commented Nov 2, 2020

umap for supervised (metric) learning #415

umap for supervised (metric) learning #415

Comments

icarmi commented Apr 26, 2020

lmcinnes commented Apr 28, 2020

icarmi commented Apr 30, 2020

buhrmann commented Oct 27, 2020

lmcinnes commented Nov 2, 2020