-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RAPIDS implementation of Scanpy rank_genes_groups appears incorrect #29
Comments
Possibly relevant cuML issue: rapidsai/cuml#2478 |
Correction: When using While we wait for the release of the fix for rapidsai/cuml#2478, we have a couple options:
|
hey folks, |
@teju85 thanks for the reminder, we'll check and get back to you. |
This issue should be resolved now: rapidsai/cuml#3645 Will test and close. |
I tried running the RAPIDS implementation of rank_genes_groups alongside the Scanpy CPU implementation on the same data matrix, but I'm getting very different results.
Here's my code for the GPU call:
And the CPU call:
When I look at the top differential gene for each cluster, the outputs reported by the GPU and CPU are disjoint. Also, I note that while the CPU output is sorted by score (i.e., the top 50 diff. genes have high scores, and are sorted in decreasing order), the GPU output seems to be unsorted, and some of the scores are very low. My suspicion is that the GPU output isn't actually being properly sorted by logistic regression coefficient, so the output is just some random set of differential genes & their scores instead of the top N.
When I scatterplot the results, the CPU results also seem to make much more sense than the GPU.
The text was updated successfully, but these errors were encountered: