LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 #61815

rkansal47 · 2021-07-18T07:07:15Z

This PR builds off of #59148 and modifies the lu_solve routine to avoid MAGMA for b or lu_data matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for).
Fixes #36921
Fixes #61929

…imension >1024

facebook-github-bot · 2021-07-18T07:07:19Z

Hi @rkansal47!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

facebook-github-bot · 2021-07-18T07:07:21Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/61815
📄 Preview docs built from this PR

💊 CI failures summary and remediations

As of commit 08f8d31 (more details on the Dr. CI page):

3/3 failures possibly* introduced in this PR
- 1/3 non-scanned failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .bazelrc
Auto-merging .bazelrc
CONFLICT (add/add): Merge conflict in .azure_pipelines/pytorch-tests-pipeline.yml
Auto-merging .azure_pipelines/pytorch-tests-pipeline.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/nightly-pytorch-tests-pipeline.yml
Auto-merging .azure_pipelines/nightly-pytorch-tests-pipeline.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/wheel-wait-job-template.yml
Auto-merging .azure_pipelines/job_templates/wheel-wait-job-template.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/pytorch-template-unix.yml
Auto-merging .azure_pipelines/job_templates/pytorch-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .bazelrc
Auto-merging .bazelrc
CONFLICT (add/add): Merge conflict in .azure_pipelines/pytorch-tests-pipeline.yml
Auto-merging .azure_pipelines/pytorch-tests-pipeline.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/nightly-pytorch-tests-pipeline.yml
Auto-merging .azure_pipelines/nightly-pytorch-tests-pipeline.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/wheel-wait-job-template.yml
Auto-merging .azure_pipelines/job_templates/wheel-wait-job-template.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/pytorch-template-unix.yml
Auto-merging .azure_pipelines/job_templates/pytorch-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.2-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

facebook-github-bot · 2021-07-18T07:37:00Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

ngimel · 2021-07-20T23:03:06Z

@IvanYashchuk, @xwang233 can you please review this PR and ping me to land it?

xwang233 · 2021-07-21T03:49:08Z

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

 #ifdef USE_CUSOLVER
-  if (batch_size == 1 && m > 512) {
+  if ((batch_size == 1 && m > 512) || (batch_size <= 8 && (m > 1024 || b1 > 1024 || b2 > 1024))) {


I think this PR overall looks good. Since this boolean (m > 1024 || b1 > 1024 || b2 > 1024) is reused below, we can assign it to a variable just below the auto b2 = ... line.

Thanks, updated.

IvanYashchuk

Hey, @rkansal47! Thank you for fixing this long-standing problem!
Could you please add a test to test/test_linalg.py that would fail on master, but pass with this PR. Add a link to the issue in the test description and use @onlyCUDA and @skipCUDAIfNoMagma decorators.

IvanYashchuk · 2021-07-21T14:39:43Z

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

+  auto b1 = b.size(-2);
+  auto b2 = b.size(-1);
+  bool over_magma_dim_limit = m > 1024 || b1 > 1024 || b2 > 1024;  // magma implementation of LU solve cannot handle tensors with dimensions > 1024


MAGMA doesn't work only when b.size(-1) > 1024, all other conditions are unnecessary.

IvanYashchuk · 2021-07-21T14:41:23Z

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

 #ifdef USE_CUSOLVER
-  if (batch_size == 1 && m > 512) {
+  if ((batch_size == 1 && m > 512) || (batch_size <= 8 && over_magma_dim_limit)) {


Where do the conditions batch_size <= 8 and batch_size > 8 come from?

They were chosen based off the benchmarks shown here #59148 (comment) - cuBLAS is faster for batch size > 8 and cuSOLVER for <= 8. Should I mention this somewhere in the PR?

It's fine as it is, thank you for including it here.

I think it'd be good to leave a comment in the code pointing this out.

The testing script only considers a single dtype, float64, could you please confirm the same outcome for float32 and the complex dtypes?

@lezcano added.

@nikitaved thanks, I agree confirming the heuristics for different types is important, but is that not an issue to raise for the original PR which chose them (#59148 (comment))? The only contribution I'm proposing in this PR is directing lu_solve to not use MAGMA for large b matrices. Please let me know if you disagree.

aten/src/ATen/native/cuda/BatchLinearAlgebra.cu

rkansal47 · 2021-07-22T00:00:58Z

@IvanYashchuk thanks! Updated with your suggestions.

IvanYashchuk

Thank you, @rkansal47, for this fix! I am sorry for the delay. Unfortunately, a merge conflict was accumulated, I hope you don't mind that I fixed it.

nikitaved · 2021-07-29T13:49:17Z

test/test_linalg.py

+    # this tests https://github.com/pytorch/pytorch/issues/36921
+    def test_lu_solve_large_matrices(self, device, dtype):
+        def run_test(A_dims, b_dims):


I think the name is a bit confusing. run_tests gets A_dims = (1, 1), b_dims = (1, 1, 1025). So, effectively, matrices are small!

Fair, how's test_lu_solve_large_b_second_dim?

The referenced issue actually considers large batched matrices. Why wouldn't we test all the cases? I.e., large batch small size, small batch large size?

The only remaining issue in the current master branch is with b tensors with a last dimension > 1024 due to the linked bug in MAGMA. IvanYashchuk asked for a test which fails on the master branch but passes after this PR. These other tests would pass in the master branch as well, but I can add them anyway if you'd like.

test/test_linalg.py

facebook-github-bot · 2021-08-09T19:42:09Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-08-10T18:08:52Z

@ngimel merged this pull request in 6d21e36.

Summary: This PR builds off of #59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for). Fixes #36921 Fixes #61929 Pull Request resolved: #61815 Reviewed By: anjali411 Differential Revision: D30199618 Pulled By: ngimel fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9

lu solve uses cublas or cusolver instead of magma for matrices with d…

43b7946

…imension >1024

pytorchbot added the open source label Jul 18, 2021

facebook-github-bot added the cla signed label Jul 18, 2021

mrshenli added module: cublas Problem related to cublas support module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 19, 2021

mrshenli requested a review from ngimel July 19, 2021 21:47

ngimel requested review from IvanYashchuk and xwang233 July 20, 2021 23:02

xwang233 reviewed Jul 21, 2021

View reviewed changes

over_magma_dim_limit bool

9c536fe

IvanYashchuk reviewed Jul 21, 2021

View reviewed changes

ngimel mentioned this pull request Jul 21, 2021

Torch.linalg.solve large matrix cuda error #61929

Closed

linalg test + only checking last dim of b tensor

a3a1267

Merge branch 'master' into master

ccb6f9a

IvanYashchuk requested review from lezcano and nikitaved as code owners July 29, 2021 07:07

IvanYashchuk approved these changes Jul 29, 2021

View reviewed changes

nikitaved reviewed Jul 29, 2021

View reviewed changes

comment on lu solve heuristics

ca0034e

ngimel reviewed Aug 9, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

Update test/test_linalg.py

08f8d31

ngimel approved these changes Aug 9, 2021

View reviewed changes

facebook-github-bot closed this in 6d21e36 Aug 10, 2021

facebook-github-bot added the Merged label Aug 10, 2021

LinkToPast1900 mentioned this pull request Aug 21, 2021

Can‘t solve QP problem when dimension is too high. locuslab/qpth#37

Open

jiaweihe1996 mentioned this pull request Nov 15, 2021

RuntimeError: CUDA error: invalid configuration argument （with python trainGMMOT.py） jiaweihe1996/GMTracker#13

Closed

rkansal47 mentioned this pull request Mar 22, 2022

an error rkansal47/emd_loss#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 #61815

LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 #61815

rkansal47 commented Jul 18, 2021 •

edited by lezcano

facebook-github-bot commented Jul 18, 2021

facebook-github-bot commented Jul 18, 2021 •

edited

facebook-github-bot commented Jul 18, 2021

ngimel commented Jul 20, 2021

xwang233 Jul 21, 2021

rkansal47 Jul 21, 2021

IvanYashchuk left a comment

IvanYashchuk Jul 21, 2021 •

edited

IvanYashchuk Jul 21, 2021

rkansal47 Jul 21, 2021 •

edited

IvanYashchuk Jul 29, 2021

lezcano Jul 29, 2021

nikitaved Jul 29, 2021 •

edited

rkansal47 Jul 29, 2021

rkansal47 commented Jul 22, 2021

IvanYashchuk left a comment

nikitaved Jul 29, 2021

rkansal47 Jul 29, 2021

nikitaved Jul 29, 2021 •

edited

rkansal47 Jul 29, 2021

facebook-github-bot commented Aug 9, 2021

facebook-github-bot commented Aug 10, 2021

LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 #61815

LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 #61815

Conversation

rkansal47 commented Jul 18, 2021 • edited by lezcano

facebook-github-bot commented Jul 18, 2021

Action Required

Process

facebook-github-bot commented Jul 18, 2021 • edited

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

pytorch_xla_linux_bionic_py3_6_clang9_build (2/2)

ci.pytorch.org: 1 failed

facebook-github-bot commented Jul 18, 2021

ngimel commented Jul 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IvanYashchuk left a comment

Choose a reason for hiding this comment

IvanYashchuk Jul 21, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkansal47 Jul 21, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Jul 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkansal47 commented Jul 22, 2021

IvanYashchuk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikitaved Jul 29, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 9, 2021

facebook-github-bot commented Aug 10, 2021

rkansal47 commented Jul 18, 2021 •

edited by lezcano

facebook-github-bot commented Jul 18, 2021 •

edited

IvanYashchuk Jul 21, 2021 •

edited

rkansal47 Jul 21, 2021 •

edited

nikitaved Jul 29, 2021 •

edited

nikitaved Jul 29, 2021 •

edited