-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] different outputs for PCA on CPU vs. GPU #5473
Comments
@stephanie-fu This is related to #4560. |
@lowener Could you please explain how are the results still valid? Indeed, I would expect the output plot to be the same in all three cases. Thank you! |
The issue that you opened on UMAP is not related to this. |
Thank you for the clarification. I am still a little confused about the output of the example above - I would expect all 3 code examples to look like an ellipse, but am getting something from the GPU implementation that visually looks different from a sign flip (in fact, it looks like the CPU output is a flipped version of the data, which seems valid). Is this expected output? |
Thank you for reply. Yes, I imagined probably the root causes are different and thus the solutions or interpretations are gonna be different. I just linked this there because they are similar in the sense that both PCA and UMAP are extremely used algorithms which a lot of people will try to accelerate with cuml. However, doing so they will face the conundrum of getting difficult to interpret results. Maybe in both cases more documentation would be beneficial. I agree with with @stephanie-fu that if the only explanation here with the PCA is a sign flip, that still does not explain the loss of structure observed on the "PCA on GPU" plot. The GPU output is clearly not maximizing the amount of variance on one axis. In other wors, the expected output, regardless of the sign flip, would be that most of the variance will lie along one axis and only a little bit along the opposite axis (opposite to the round blob observed). How can we explain it? Thank you again! |
I jumped too fast to the sign flipping conclusion. The issue seems to be our compatibility with torch Tensor? import cupy
import torch
data = torch.randn((10000, 2)).cuda()
data[:, 0] *= 10
data = cupy.array(data) |
I confirm that making the data a cupy array solves the problem here! |
i.e. this code
|
Thank you for the workaround! |
Describe the bug
Using cuml.PCA with
set_global_device_type
to 'CPU' and 'GPU' produce different results (withset_global_device_type('CPU')
matching the output of sklearn's PCA).Steps/Code to reproduce bug
Expected behavior
All 3 examples above are expected to have the same output.
Environment details (please complete the following information):
The text was updated successfully, but these errors were encountered: