-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
preferred blas library; cublaslt gemm implementation #122106
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122106
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (5 Unrelated Failures)As of commit b7ebb6e with merge base cf5ca58 ( FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I left a few comments
{ | ||
if (at::globalContext().blasPreferredBackend() == BlasBackend::Cublaslt) { | ||
#ifdef USE_ROCM | ||
// hipblaslt does not support complex gemm yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does HIPblasLT have any other limitations for the versions we support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hipBLASLt in rocm 6.0 does not support complex or double types. It also only supports MI200 and MI300. I will add a TORCH_CHECK for that in aten/src/ATen/Context.cpp setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good.
Could you extend the current matmul testing in test_matmul_small_brute_force_{1,2,3}d_Nd
to exercise these paths, to make sure the new paths output correct results?
Also see the small point at #122106 (comment)
459f66c
to
a447ded
Compare
Done. |
Rebase failed due to Command
Raised by https://github.com/pytorch/pytorch/actions/runs/8760666291 |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot. |
@pytorchbot merge -f "all failures are unrelated and show up on HUD too" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes broken ROCm 5.7 build caused by #122106. Pull Request resolved: #124797 Approved by: https://github.com/atalman
Fixes broken ROCm 5.7 build caused by pytorch#122106. Pull Request resolved: pytorch#124797 Approved by: https://github.com/atalman
I'm pretty sure this broke windows tests
|
I think best I can do is disable this feature for Win platform. It should have been disabled but I didn't do it correctly. |
PR pytorch#122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly.
Fixes broken ROCm 5.7 build caused by pytorch#122106. Pull Request resolved: pytorch#124797 Approved by: https://github.com/atalman
PR #122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly. Pull Request resolved: #125080 Approved by: https://github.com/clee2000
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources. The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`. Pull Request resolved: pytorch#122106 Approved by: https://github.com/lezcano
Fixes broken ROCm 5.7 build caused by pytorch#122106. Pull Request resolved: pytorch#124797 Approved by: https://github.com/atalman
PR pytorch#122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly. Pull Request resolved: pytorch#125080 Approved by: https://github.com/clee2000
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources. The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`. Pull Request resolved: pytorch#122106 Approved by: https://github.com/lezcano
Fixes broken ROCm 5.7 build caused by pytorch#122106. Pull Request resolved: pytorch#124797 Approved by: https://github.com/atalman
PR #122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly. Pull Request resolved: #125080 Approved by: https://github.com/clee2000
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.
The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling
torch.backends.cuda.preferred_blas_library(backend="cublaslt")
or as an aliasbackend="hipblaslt"
.cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang