Skip to content

Conversation

BLOrange-AMD
Copy link
Contributor

@BLOrange-AMD BLOrange-AMD commented May 20, 2025

Fixes test_cuda.py::test_cublas_workspace_explicit_allocation on gfx95

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Copy link

pytorch-bot bot commented May 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153988

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job

As of commit a30caff with merge base 2b43d63 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label May 20, 2025
@albanD albanD requested a review from jeffdaily May 22, 2025 17:40
@albanD albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 22, 2025
@jeffdaily jeffdaily changed the title Updated default workspace for gfx95 [ROCm] Updated default workspace for gfx95 May 27, 2025
@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels May 27, 2025
@jeffdaily
Copy link
Collaborator

Only basic CI is needed here since gfx950 is not in our public CI.

@jeffdaily
Copy link
Collaborator

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 27, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "it is not possible for this PR to affect any current CI flows"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@eellison
Copy link
Contributor

Inductor hud is broken since this pr: https://hud.pytorch.org/hud/pytorch/pytorch/main/2?per_page=50&mergeEphemeralLF=true.

Should we revert ?

@jeffdaily
Copy link
Collaborator

Inductor hud is broken since this pr: https://hud.pytorch.org/hud/pytorch/pytorch/main/2?per_page=50&mergeEphemeralLF=true.

Should we revert ?

@eellison I don't see how the inductor hud for cuda could break with this change? The one-line change in this PR is already inside an if torch.version.hip so there's no way it could apply to cuda. The error in the log says

      /tmp/pip-req-build-0czsr0a7/torchao/csrc/cuda/mx_kernels/mx_fp_cutlass_kernels.cu(110): error: class "cutlass::gemm::collective::CollectiveMma<cutlass::gemm::MainloopSm100TmaUmmaWarpSpecializedBlockScaled<11, 3, 2, cute::tuple<cute::_2, cute::_1, cute::_1>>, cute::tuple<cute::_128, cute::_128, cute::_128>, cute::tuple<cutlass::float_e2m1_t, cutlass::float_ue8m0_t>, cute::tuple<cute::tuple<int64_t, cute::C<1>, int64_t>, cute::Layout<cute::tuple<cute::tuple<cute::tuple<cute::_32, cute::_4>, int32_t>, cute::tuple<cute::tuple<cute::_32, cute::_4>, int32_t>, cute::tuple<cute::_1, int32_t>>, cute::tuple<cute::tuple<cute::tuple<cute::_16, cute::_4>, int32_t>, cute::tuple<cute::tuple<cute::C<0>, cute::C<1>>, cute::_512>, cute::tuple<cute::C<0>, int32_t>>>>, cute::tuple<cutlass::float_e2m1_t, cutlass::float_ue8m0_t>, cute::tuple<cute::tuple<int64_t, cute::C<1>, int64_t>, cute::Layout<cute::tuple<cute::tuple<cute::tuple<cute::_32, cute::_4>, int32_t>, cute::tuple<cute::tuple<cute::_32, cute::_4>, int32_t>, cute::tuple<cute::_1, int32_t>>, cute::tuple<cute::tuple<cute::tuple<cute::_16, cute::_4>, int32_t>, cute::tuple<cute::tuple<cute::C<0>, cute::C<1>>, cute::_512>, cute::tuple<cute::C<0>, int32_t>>>>, cute::TiledMMA<cute::MMA_Atom<cute::SM100_MMA_MXF4_SS<cutlass::float_e2m1_t, cutlass::float_e2m1_t, float, cutlass::float_ue8m0_t, 128, 128, 32, cute::UMMA::Major::K, cute::UMMA::Major::K, cute::UMMA::ScaleIn::One, cute::UMMA::ScaleIn::One>>, cute::Layout<cute::tuple<cute::_1, cute::_1, cute::_1>, cute::tuple<cute::C<0>, cute::C<0>, cute::C<0>>>, cute::tuple<cute::Underscore, cute::Underscore, cute::Underscore>>, cute::tuple<cute::SM90_TMA_LOAD, cute::SM90_TMA_LOAD>, cute::tuple<cute::ComposedLayout<cute::Swizzle<2, 4, 3>, cute::smem_ptr_flag_bits<4>, cute::Layout<cute::tuple<cute::_8, cute::_128>, cute::tuple<cute::_128, cute::_1>>>, cute::Layout<cute::tuple<cute::tuple<cute::tuple<cute::tuple<cute::_32, cute::_4>, cute::C<1>>, cute::tuple<cute::_32, cute::_2>>, cute::_1, cute::tuple<cute::_2, cute::_1>>, cute::tuple<cute::tuple<cute::tuple<cute::tuple<cute::_16, cute::_4>, cute::C<512>>, cute::tuple<cute::C<0>, cute::C<1>>>, cute::_0, cute::tuple<cute::C<2>, cute::C<512>>>>>, void, cute::identity, cute::tuple<cute::SM90_TMA_LOAD_MULTICAST, cute::SM90_TMA_LOAD_MULTICAST>, cute::tuple<cute::ComposedLayout<cute::Swizzle<2, 4, 3>, cute::smem_ptr_flag_bits<4>, cute::Layout<cute::tuple<cute::_8, cute::_128>, cute::tuple<cute::_128, cute::_1>>>, cute::Layout<cute::tuple<cute::tuple<cute::tuple<cute::tuple<cute::_32, cute::_4>, cute::C<1>>, cute::tuple<cute::_32, cute::_2>>, cute::_1, cute::tuple<cute::_2, cute::_1>>, cute::tuple<cute::tuple<cute::tuple<cute::tuple<cute::_16, cute::_4>, cute::C<512>>, cute::tuple<cute::C<0>, cute::C<1>>>, cute::_0, cute::tuple<cute::C<2>, cute::C<512>>>>>, void, cute::identity>" has no member "Sm1xxBlkScaledConfig"

@jeffdaily
Copy link
Collaborator

@eellison looks like torchao is failing to build suddenly, but not due to this PR.

iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
Fixes test_cuda.py::test_cublas_workspace_explicit_allocation on gfx95

Pull Request resolved: pytorch#153988
Approved by: https://github.com/jeffdaily
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants