Skip to content

Conversation

naromero77amd
Copy link
Collaborator

@naromero77amd naromero77amd commented Jul 31, 2025

Fixes #159070

The TunableOp failure is due to missing rocBLAS files in our manywheels packaging. This bug has been present since June 7-8 time frame. It was caused by a typo in the rocBLAS environment variable that stores the list of files. It was introduced in this PR: #155388

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@naromero77amd naromero77amd requested a review from a team as a code owner July 31, 2025 15:58
Copy link

pytorch-bot bot commented Jul 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159570

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1852496 with merge base b4619f0 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jul 31, 2025
@naromero77amd naromero77amd added the topic: not user facing topic category label Jul 31, 2025
@malfet
Copy link
Contributor

malfet commented Jul 31, 2025

@naromero77amd if your PR fixes a segfault in a simple OP, it would be good to introduce a unit test of sorts.

@naromero77amd
Copy link
Collaborator Author

@naromero77amd if your PR fixes a segfault in a simple OP, it would be good to introduce a unit test of sorts.

I will talk to @jeffdaily @jithunnair-amd when they get back from vacation. The problem is that this is a packaging issue -- so the only way to catch it is to install the newly created wheel and run a unit test.

@naromero77amd naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 31, 2025
Copy link

pytorch-bot bot commented Jul 31, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 31, 2025
@naromero77amd
Copy link
Collaborator Author

naromero77amd commented Jul 31, 2025

@malfet Can you please add the following ci workflows:

ciflow/binaries_wheel
ciflow/rocm
ciflow/trunk

@naromero77amd naromero77amd added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Aug 1, 2025
Copy link

pytorch-bot bot commented Aug 1, 2025

To add the ciflow label ciflow/binaries_wheel please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Aug 1, 2025
@naromero77amd
Copy link
Collaborator Author

@eqy Are you able to add ciflow labels?

@eqy
Copy link
Collaborator

eqy commented Aug 1, 2025

@pytorchmergebot label ciflow/rocm ciflow/binaries_wheel

@pytorch-bot pytorch-bot bot added ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/rocm Trigger "default" config CI on ROCm labels Aug 1, 2025
@naromero77amd
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 2, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@naromero77amd naromero77amd deleted the rocm_fix_rocblas_manywheels branch August 6, 2025 18:34
@atalman atalman added this to the 2.8.1 milestone Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ROCm][TunableOp] Simple matmul with TunableOp results in segfault

6 participants