Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Many algorithms are using 32-bit cusolver workspaces when 64-bit functions are available #2597

Open
cjnolet opened this issue Jul 23, 2020 · 7 comments
Labels
bug Something isn't working inactive-90d

Comments

@cjnolet
Copy link
Member

cjnolet commented Jul 23, 2020

While continuing to investigate #2459, I've noticed that some of our cusolver calls fail when computing the workspace size. Upon further inspection, it appears this happens in some of our algorithms, specifically when the data sizes grow larger than the size of an int.

A good example- it was recently found that PCA was failing in the buffer size computation when an input size of 60kx30k was used, but not when 60kx20k was used. The compuation for the workspace here would be columns^2, so 20k20k4 = 1.6B but 30k30k4 = 3.6B, which is outside the boundaries of an integer.

Looking at the cusolver API docs, I see there are helper functions that can accept a 64-bit workspace. It's likely these helper functions didn't exist when these algorithms were created. I believe this 64-bit workspace will will fix the current size constraints in many of our algorithms.

@cjnolet cjnolet added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jul 23, 2020
@teju85
Copy link
Member

teju85 commented Jul 29, 2020

Yes, this 64b addressing (and a single method based on cudaDatatype) is a recent addition, in cuda 11.0. We have 2 choices:

  1. Keep using the old interface and also add the new ones for future usage purposes (inside cusparse_wrappers, I meant).
  2. Replace old with this new and also update their usages everywhere.

Latter has more work now, but will save a ton more over the long term. Thus, I'd prefer the latter approach. What say?

@teju85
Copy link
Member

teju85 commented Jul 29, 2020

Also, just remembered, now that we have a migrated cusparse_wrappers.h to RAFT. I think these changes should rightfully happen in RAFT and then get propagated into cugraph/cuml, where needed.

@teju85
Copy link
Member

teju85 commented Jul 29, 2020

But we still don't have 11.0 supported in cuML. It is waiting on a 11.0 build of libcumlprims. Which in-turn is waiting on a 11.0 conda packaging to be made available. @dantegd am I right?

@cjnolet
Copy link
Member Author

cjnolet commented Nov 5, 2020

This is also a problem for our 1M cells notebook in https://github.com/clara-parabricks/rapids-single-cell-examples. We would like to port the rank_genes_groups function to the 1M cells notebook but we cannot run the logistic regression on such a large dataset.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@cjnolet cjnolet removed the ? - Needs Triage Need team to review and classify label Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inactive-90d
Projects
None yet
Development

No branches or pull requests

3 participants