-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] different outputs for UMAP on CPU vs. GPU #5474
Comments
Inspired by #5473 , I tried two things
I unfortunately still have the same issues |
cuML and |
@viclafargue thank you so much for your input. Indeed, increasing The next obvious question of course is how to set the
But it's not clear what is a small or large dataset. It's also not explained in the CPU implementation docs https://umap-learn.readthedocs.io/en/latest/api.html. PS cupy.array and np.float32 explicitly declared are not needed |
Dear @viclafargue I am still very interested in this. Is there some logic to how much |
I think that there is no solid rule about the best value to use. If you want to know more about this, it might be interesting to read Leland McInnes papers. But, in general terms, the goal of dimensionality reduction is to obtain a representative output. Then, adjusting hyperparameters comes down to testing the trustworthiness of the embeddings for different values. cuml.metrics.trustworthiness.trustworthiness offers a GPU accelerated calculation of such value. cc @cjnolet who might have more info about this |
Wouldn't it make sense to use the worthiness metric as a measure of "convergence" akin to the loss function during backpropagation, so that cuml UMAP runs until either the number of epochs provided is exhausted OR the worthiness reaches a high enough value (aka early stop)? |
Also, I am not sure what trustworthiness value is adequate, would for example 0.8 be high enough? I see very significant differences in output quality with trustworthiness of 0.8 and 0.9 i.e. I would not tell from the trustworthiness that the quality is so different |
@antortjim in general, the relationship that the trustworthiness score has to UMAP's objective is similar to that which a score like categorical accuracy might have to categorical cross-entropy. Accuracy, by example, is a more general score for classifiers while categorical cross entropy is specific to an algorithm's objective. We can use accuracy to measure the quality of results for any categorical classifier and we can use trustworthiness to measure the quality of any manifold learning algorithm. Trustworthiness measures the degree to which the neighborhoods in the output space preserve local neighborhood structure (by literally comparing knn results) from the input space. In general, youll expect to see reasonably well-preserved results around 0.91 and above, while a trustworthiness of 0.8 and below might preserve some neighborhoods while other could still looking like random noise. All that being said, there is a known bug with our laplacian eigenmaps (aka spectral) initialization solver, and the resulting embedding quality can sometimes (but not always) be improved by increasing the number of epochs. Please note that this is in addition to, and not mutually exclusive of, what Victor pointed out- the initialization and parallelization of the algorithms does still cause slightly different behaviors, which can cause slightly different results with the same parameter settings. One thing to try might be to set "init=random" to ignore differences that are specific to the spectral initialization. Another common technique that could improve the quality with spectral initialization is to run the points through a PCA before calling UMAP. |
Thank you @cjnolet and sorry for getting back here late. My take-home message then is that the trustworthiness cannot be used as a measurement of convergence, but a measurement of correctness, and in order to ensure convergence I need to pass a high enough |
Describe the bug
Similar to #5473 , the results of the GPU UMAP implementation provided by
cuml
don't match what I expect using the CPU implementation (fromumap-learn
). In particular, ifhash_input=True
(recommended when comparing cpu vs gpu implementations, as explained in [QST] Relationship between UMAP.embedding_ and reductions returned by UMAP.transform() #5188then the output plot is completely garbled and messed up
However, if only new data is provided (simulated by the test data), then it seems to be OK.
Steps/Code to reproduce bug
Expected behavior
I expected the gpu, All plot to be ordered into the same clusters seen in gpu, Train and gpu, Test, in fact, it should be identical to combining both into a single plot (just like with the cpu plot). Instead, the output is messed up.
This is not dependent on the value of
hash_input
(which only affects the output of gpu, Train).A closer look at where the Test data points have been projected to when processed together with the Train data points (Test only plots) shows that their position is not the same as when processed alone i.e. the 4 plot in each row should be identical or very similar to the 2nd one (just with monocolor). This is not true, especially in the gpu implementation.
Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: