-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Relationship between UMAP.embedding_ and reductions returned by UMAP.transform() #5188
Comments
@viclafargue @dantegd @cjnolet what do you think about making Victor shared some additional context in a different issue. EDIT: Looks like we already reached some level of agreement in that issue. This feels like a good first issue for a new contributor, but I'm going to tag it in the other issue. |
I'm fine with that. It does introduce an additional overhead, which is why we made the default false to begin with. Maybe we could add a quick doc to the argument that states it's true by default but it comes with an overhead so if the user will never expect to be doing |
I think it would be great to develop a bit more the docs to explain the My finding is that the value of I am really wondering why is this weird blob being produced when I am also really wondering why this blob does not occur when To replicate the figures, run the script below (requires PS
Any help would be very much appreciated, thanks! |
For the sake of completion, these are the UMAPs if I transform with the training and test data Updated script: test_hashinput.zip Now it's even more confusing, because when |
After using umap-learn for some time I've written code that relies on
embedding_
== reduction fromtransform()
. I just found out that without settinghash_input=True
this will not be the case with cuML's UMAP. I was a bit surprised. I have since re-read the documentation and while this difference is noted it seems to me something of an unfortunate "gotchya". Perhaps I'm missing something but it seems like the more conservative approach would be to default to the behavior umap-learn and provide additional tuning parameters for those who want to use them. At a minimum it might be nice to have a warning here.The text was updated successfully, but these errors were encountered: