New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
umap for supervised (metric) learning #415
Comments
Essentially umap takes the labels as a separate metric space (with a categorical distance on it), and tries to fold the two data and labels together by performing an intersection of the simplicial sets. There are some hyper-parameters. The main one would be |
Awesome, thank you!!! :) |
Hi @lmcinnes, just wondering if you could also explain how the transform() part of metric learning works, if it's not already mentioned somewhere else. I understand (intuitively) how labels are used during training (the intersection of separate graphs bit). But how does that then affect new data which doesn't have labels? I imagine new data points are embedded somehow based on their similarity to points in the trained graph (intersected sets)? |
@buhrmann you are essentially correct; it uses the learned graph of the data space (rather than the intersected graph) since we don't have labels for the new points. The assumption is that this structure is sufficient. It, of course, is not always the case, but it has been effective for several use cases. |
Hi,
I read through the tutorial for metric learning:
https://umap-learn.readthedocs.io/en/latest/supervised.html
I have two questions (that may be related, I'm not sure)
Is there an explanation somewhere of how umap uses the labels with the distance function to perform the supervised training?
Are there any hyper-parameters that are more useful in the metric learning (as opposed to unsupervised) that can help avoid over-fitting?
Thank you!
The text was updated successfully, but these errors were encountered: