-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ROCm] use correct workspace for hipblaslt, silence warning #150227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150227
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 2 Unrelated FailuresAs of commit 03e8aa0 with merge base cbc0964 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchmergebot merge -r |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Follow up to pytorch#145130. That PR caused a warning on ROCm the first time hipblaslt was called for any workload, always.
Successfully rebased |
7a5da19
to
03e8aa0
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The default workspace for hipblaslt is larger than for cublas/cublaslt which requires a slight increase to the buffer needed. Forward-fix for #150227 that broke ROCm distributed tests but wasn't part of initial CI signal. Pull Request resolved: #150348 Approved by: https://github.com/jeffdaily
@pytorchbot revert -m="Diff reverted internally" -c="ghfirst" This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).) |
@pytorchbot successfully started a revert job. Check the current status here. |
@ethanwee1 your PR has been successfully reverted. |
…150227)" This reverts commit c158eac. Reverted #150227 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](#150227 (comment)))
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@pytorchmergebot merge -i |
Merge startedYour change will be merged while ignoring the following 4 checks: pull / linux-jammy-py3-clang12-executorch / build, pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test, linux-binary-manywheel / manywheel-py3_9-cuda12_8-test / test Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
// See Note [hipblaslt handles]. | ||
// ROCm's hipblas and hipblaslt do not share handles, unlike with CUDA. | ||
// Using getCurrentCUDABlasLtHandle is on purpose. For CUDA it's the same as | ||
// getCurrentCUDABlasHandle, but for ROCm it's a unique handle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That handles can be shared between cuBLAS and cuBLASLt is factually true, but I think it is not really relevant here. The cuBLAS handle here is really only used as a key to get a corresponding workspace, and as we do not expect to run a cuBLAS-API backed and cuBLASLt-API backed matmul (on the same stream) at the same time, it's safe to use the workspace that is already allocated for one for the other.
My guess is the real reason the warning shows up on ROCm but not on CUDA is that at present the default CUBLAS_WORKSPACE_CONFIG
effective size is always >= the default CUBLASLT_WORKSPACE_SIZE
setting. On the CUDA side the intent is to only allocate the cuBLAS workspace and reuse it for Lt, but if Lt requests a larger workspace it precludes this unification.
If you agree with this I think a clearer explanation would be along the lines of "CUDA attempts to share workspaces with the assumption that cuBLAS workspace size >= cuBLASLt workspace size, but as this assumption may not hold on ROCm, we also add a mapping for Lt handle -> workspace in addition to BLAS handle -> workspace."
…150227) Follow up to pytorch#145130. That PR caused a warning on ROCm the first time hipblaslt was called for any workload, always. Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#150227 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
The default workspace for hipblaslt is larger than for cublas/cublaslt which requires a slight increase to the buffer needed. Forward-fix for pytorch#150227 that broke ROCm distributed tests but wasn't part of initial CI signal. Pull Request resolved: pytorch#150348 Approved by: https://github.com/jeffdaily
…ytorch#150227)" This reverts commit c158eac. Reverted pytorch#150227 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#150227 (comment)))
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
This PR is no longer needed. |
The default workspace for hipblaslt is larger than for cublas/cublaslt which requires a slight increase to the buffer needed. Forward-fix for pytorch#150227 that broke ROCm distributed tests but wasn't part of initial CI signal. Pull Request resolved: pytorch#150348 Approved by: https://github.com/jeffdaily
Follow up to #145130. That PR caused a warning on ROCm the first time hipblaslt was called for any workload, always.
Fixes #ISSUE_NUMBER
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd