Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preferred blas library; cublaslt gemm implementation #122106

Closed
wants to merge 15 commits into from

Conversation

jeffdaily
Copy link
Collaborator

@jeffdaily jeffdaily commented Mar 18, 2024

Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.

The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling torch.backends.cuda.preferred_blas_library(backend="cublaslt") or as an alias backend="hipblaslt".

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang

Copy link

pytorch-bot bot commented Mar 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/122106

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (5 Unrelated Failures)

As of commit b7ebb6e with merge base cf5ca58 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 18, 2024
@pytorch-bot pytorch-bot bot added the release notes: linalg_frontend release notes category label Mar 19, 2024
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a few comments

aten/src/ATen/Context.cpp Outdated Show resolved Hide resolved
{
if (at::globalContext().blasPreferredBackend() == BlasBackend::Cublaslt) {
#ifdef USE_ROCM
// hipblaslt does not support complex gemm yet
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does HIPblasLT have any other limitations for the versions we support?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hipBLASLt in rocm 6.0 does not support complex or double types. It also only supports MI200 and MI300. I will add a TORCH_CHECK for that in aten/src/ATen/Context.cpp setter.

aten/src/ATen/cuda/CUDABlas.cpp Outdated Show resolved Hide resolved
aten/src/ATen/cuda/CUDABlas.cpp Show resolved Hide resolved
test/test_linalg.py Show resolved Hide resolved
torch/backends/cuda/__init__.py Outdated Show resolved Hide resolved
torch/backends/cuda/__init__.py Show resolved Hide resolved
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

Could you extend the current matmul testing in test_matmul_small_brute_force_{1,2,3}d_Nd to exercise these paths, to make sure the new paths output correct results?

Also see the small point at #122106 (comment)

@jeffdaily jeffdaily added the rocm This tag is for PRs from ROCm team label Apr 8, 2024
@jeffdaily
Copy link
Collaborator Author

Changes look good.

Could you extend the current matmul testing in test_matmul_small_brute_force_{1,2,3}d_Nd to exercise these paths, to make sure the new paths output correct results?

Also see the small point at #122106 (comment)

Done.

@jeffdaily jeffdaily requested a review from lezcano April 8, 2024 23:09
@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/122106/head returned non-zero exit code 1

Rebasing (1/14)
Auto-merging aten/src/ATen/Context.h
Auto-merging torch/_C/__init__.pyi.in
Auto-merging torch/_dynamo/trace_rules.py
Auto-merging torch/csrc/Module.cpp
CONFLICT (content): Merge conflict in torch/csrc/Module.cpp
error: could not apply b303c18747e... add preferred blas backend selector
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply b303c18747e... add preferred blas backend selector

Raised by https://github.com/pytorch/pytorch/actions/runs/8760666291

@jeffdaily
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 19, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled. If you believe this is a mistake, then you can re trigger it through pytorch-bot.

@jeffdaily
Copy link
Collaborator Author

@pytorchbot merge -f "all failures are unrelated and show up on HUD too"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Apr 24, 2024
Fixes broken ROCm 5.7 build caused by #122106.

Pull Request resolved: #124797
Approved by: https://github.com/atalman
alat-rights pushed a commit to alat-rights/pytorch that referenced this pull request Apr 26, 2024
Fixes broken ROCm 5.7 build caused by pytorch#122106.

Pull Request resolved: pytorch#124797
Approved by: https://github.com/atalman
@clee2000
Copy link
Contributor

I'm pretty sure this broke windows tests
https://hud.pytorch.org/pytorch/pytorch/commit/6ede882c0b1d5ccc95b0c82ca5e206eb2dfb2911
https://github.com/pytorch/pytorch/actions/runs/8850792172/job/24308901905
Can you forward fix?

2024-04-26T17:27:26.5528977Z _______ TestLinalgCUDA.test_matmul_small_brute_force_1d_Nd_cuda_float32 _______
2024-04-26T17:27:26.5529578Z Traceback (most recent call last):
2024-04-26T17:27:26.5530400Z   File "C:\actions-runner\_work\pytorch\pytorch\test\test_linalg.py", line 4441, in test_matmul_small_brute_force_1d_Nd
2024-04-26T17:27:26.5531227Z     self.check_single_matmul(x, y)
2024-04-26T17:27:26.5532103Z   File "C:\actions-runner\_work\pytorch\pytorch\test\test_linalg.py", line 4392, in check_single_matmul
2024-04-26T17:27:26.5532920Z     ans = torch.matmul(x, y)
2024-04-26T17:27:26.5533479Z RuntimeError: at::cuda::blas::bgemm_internal_cublaslt: not implemented for float
2024-04-26T17:27:26.5533953Z 
2024-04-26T17:27:26.5534208Z To execute this test, run the following from the base repo dir:
2024-04-26T17:27:26.5535104Z      python test\test_linalg.py -k test_matmul_small_brute_force_1d_Nd_cuda_float32
2024-04-26T17:27:26.5535603Z 
2024-04-26T17:27:26.5535927Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

@jeffdaily
Copy link
Collaborator Author

jeffdaily commented Apr 26, 2024

I think best I can do is disable this feature for Win platform. It should have been disabled but I didn't do it correctly.

jeffdaily added a commit to ROCm/pytorch that referenced this pull request Apr 26, 2024
PR pytorch#122106 broke windows tests. The feature should have been disabled
for Windows but was not disabled correctly.
carmocca pushed a commit to carmocca/pytorch that referenced this pull request Apr 29, 2024
Fixes broken ROCm 5.7 build caused by pytorch#122106.

Pull Request resolved: pytorch#124797
Approved by: https://github.com/atalman
@izaitsevfb izaitsevfb added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Apr 29, 2024
pytorchmergebot pushed a commit that referenced this pull request Apr 30, 2024
PR #122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly.
Pull Request resolved: #125080
Approved by: https://github.com/clee2000
andoorve pushed a commit to andoorve/pytorch that referenced this pull request May 1, 2024
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.

The default blas implementation remains cublas or hipblas.  cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`.

Pull Request resolved: pytorch#122106
Approved by: https://github.com/lezcano
andoorve pushed a commit to andoorve/pytorch that referenced this pull request May 1, 2024
Fixes broken ROCm 5.7 build caused by pytorch#122106.

Pull Request resolved: pytorch#124797
Approved by: https://github.com/atalman
andoorve pushed a commit to andoorve/pytorch that referenced this pull request May 1, 2024
PR pytorch#122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly.
Pull Request resolved: pytorch#125080
Approved by: https://github.com/clee2000
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources.

The default blas implementation remains cublas or hipblas.  cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`.

Pull Request resolved: pytorch#122106
Approved by: https://github.com/lezcano
petrex pushed a commit to petrex/pytorch that referenced this pull request May 3, 2024
Fixes broken ROCm 5.7 build caused by pytorch#122106.

Pull Request resolved: pytorch#124797
Approved by: https://github.com/atalman
pytorch-bot bot pushed a commit that referenced this pull request May 3, 2024
PR #122106 broke windows tests. The feature should have been disabled for Windows but was not disabled correctly.
Pull Request resolved: #125080
Approved by: https://github.com/clee2000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo open source release notes: linalg_frontend release notes category rocm This tag is for PRs from ROCm team triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants