[ROCm] Build FBGEMM_GENAI for gfx942 only #162648

jithunnair-amd · 2025-09-10T22:20:56Z

Fixes build timeouts >4h on libtorch build jobs: https://hud.pytorch.org/hud/pytorch/pytorch/75e7f49f9c70116d7c4f8f86c3d0688ade306284/1?per_page=50&name_filter=inux-binary-libtorch%20%2F%20libtorch-rocm&mergeEphemeralLF=true

Brings back code to narrow down CK compilation targets from 69a25f6#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777

gfx942 supports fp8

Don't enable gfx950 for now, until more optimizations are in place as per https://github.com/pytorch/pytorch/pull/162648/files#r2369588738

Validation:
rocm6.4 and rocm6.3 libtorch builds finished within 3.9h.

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @danielvegamyhre (since their change had removed this snippet, causing ROCm builds to increase >4h)

pytorch-bot · 2025-09-10T22:21:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162648

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 5 Cancelled Jobs, 1 Unrelated Failure

As of commit 0aac9b5 with merge base 3a7db34 ():

NEW FAILURE - The following job has failed:

windows-binary-wheel / wheel-py3_14t-xpu-test (gh)
Failed to resolve action download info.

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

windows-binary-wheel / wheel-py3_11-xpu-build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torch/testing/_internal/common_cuda.py

danielvegamyhre · 2025-09-10T22:33:41Z

@danielvegamyhre (since their change had removed this snippet, causing ROCm builds to increase >4h)

Can you clarify what code exactly was removed in my PR that cause build time to increase for ROCM...? It's not clear to me and I'd like to understand, thanks

jeffdaily · 2025-09-10T22:45:57Z

@danielvegamyhre (since their change had removed this snippet, causing ROCm builds to increase >4h)

Can you clarify what code exactly was removed in my PR that cause build time to increase for ROCM...? It's not clear to me and I'd like to understand, thanks

b6d0a9e#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777L277-L282

jithunnair-amd · 2025-09-15T20:21:42Z

@pytorchbot rebase

pytorchmergebot · 2025-09-15T20:23:11Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-09-15T20:23:14Z

Successfully rebased build_fbgemm_ck_only_for_gfx942 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout build_fbgemm_ck_only_for_gfx942 && git pull --rebase)

CMakeLists.txt

aten/src/ATen/CMakeLists.txt

cthi · 2025-09-22T15:54:27Z

@jithunnair-amd this change looks good to me overall, just clarified what I meant before as we should only build gfx942 for now with fbgemm + ROCm.

Do you plan to merge it soon?

jithunnair-amd · 2025-09-23T02:29:25Z

@jithunnair-amd this change looks good to me overall, just clarified what I meant before as we should only build gfx942 for now with fbgemm + ROCm.

Do you plan to merge it soon?

@cthi Yes, however, the CUDA build failures were a bit baffling. I'm going to try rebasing again.

jithunnair-amd · 2025-09-23T02:30:00Z

@pytorchbot rebase

pytorchmergebot · 2025-09-23T02:31:38Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-09-23T02:31:42Z

Successfully rebased build_fbgemm_ck_only_for_gfx942 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout build_fbgemm_ck_only_for_gfx942 && git pull --rebase)

jithunnair-amd · 2025-09-23T18:53:32Z

@pytorchbot merge -f "CI failures unrelated. Merging to restore nightly libtorch builds"

pytorchmergebot · 2025-09-23T18:55:12Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897). This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently. This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch). Pull Request resolved: #163776 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

Fixes build timeouts >4h on libtorch build jobs: https://hud.pytorch.org/hud/pytorch/pytorch/75e7f49f9c70116d7c4f8f86c3d0688ade306284/1?per_page=50&name_filter=inux-binary-libtorch%20%2F%20libtorch-rocm&mergeEphemeralLF=true Brings back code to narrow down CK compilation targets from pytorch@69a25f6#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777 gfx942 supports fp8 Don't enable gfx950 for now, until more optimizations are in place as per https://github.com/pytorch/pytorch/pull/162648/files#r2369588738 Validation: [rocm6.4](https://github.com/pytorch/pytorch/actions/runs/17944766350/job/51028483128) and [rocm6.3](https://github.com/pytorch/pytorch/actions/runs/17944766350/job/51028483093) libtorch builds finished within 3.9h. Pull Request resolved: pytorch#162648 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

Fixes build timeouts >4h on libtorch build jobs: https://hud.pytorch.org/hud/pytorch/pytorch/75e7f49f9c70116d7c4f8f86c3d0688ade306284/1?per_page=50&name_filter=inux-binary-libtorch%20%2F%20libtorch-rocm&mergeEphemeralLF=true Brings back code to narrow down CK compilation targets from 69a25f6#diff-ce80f3115ab2f6be5142f0678a1fc92c6b2d7727766ce44f48726c99e720f777 gfx942 supports fp8 Don't enable gfx950 for now, until more optimizations are in place as per https://github.com/pytorch/pytorch/pull/162648/files#r2369588738 Validation: [rocm6.4](https://github.com/pytorch/pytorch/actions/runs/17944766350/job/51028483128) and [rocm6.3](https://github.com/pytorch/pytorch/actions/runs/17944766350/job/51028483093) libtorch builds finished within 3.9h. Pull Request resolved: #162648 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897). This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently. This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch). Pull Request resolved: #163776 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897). This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently. This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch). Pull Request resolved: #163776 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com> (cherry picked from commit 0ec946a)

[ROCm] Increase binary build timeout to 5 hours (300 minutes) (#163776) Despite narrowing down the [FBGEMM_GENAI build to gfx942](#162648), the nightly builds still timed out because they [didn't get enough time to finish the post-PyTorch-build steps](https://github.com/pytorch/pytorch/actions/runs/17969771026/job/51109432897). This PR increases timeout for ROCm builds for both [libtorch ](https://github.com/pytorch/pytorch/actions/runs/17969771026)and [manywheel](https://github.com/pytorch/pytorch/actions/runs/17969771041), because both of those are close to the 4hr mark currently. This PR is a more ROCm-targeted version of #162880 (which is for release/2.9 branch). Pull Request resolved: #163776 Approved by: https://github.com/jeffdaily (cherry picked from commit 0ec946a) Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com>

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Sep 10, 2025

jithunnair-amd requested review from cthi, danielvegamyhre and jeffdaily September 10, 2025 22:21

jithunnair-amd added the topic: not user facing topic category label Sep 10, 2025

danielvegamyhre reviewed Sep 10, 2025

View reviewed changes

torch/testing/_internal/common_cuda.py Outdated Show resolved Hide resolved

pytorchbot added the open source label Sep 10, 2025

jeffdaily approved these changes Sep 11, 2025

View reviewed changes

jeffdaily added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 11, 2025

jithunnair-amd added the ciflow/binaries_libtorch Trigger binary build and upload jobs for libtorch on the PR label Sep 11, 2025

jeffdaily marked this pull request as ready for review September 11, 2025 21:48

jithunnair-amd added the ciflow/binaries_wheel Trigger binary build and upload jobs for wheel on the PR label Sep 12, 2025

jithunnair-amd mentioned this pull request Sep 12, 2025

[ROCm] Bump FBGEMM commit to avoid CK errors #162590

Closed

pytorchmergebot force-pushed the build_fbgemm_ck_only_for_gfx942 branch from 3e42b38 to a55349f Compare September 15, 2025 20:23

jammm mentioned this pull request Sep 18, 2025

[Issue]: Linux 'nightly' PyTorch builds broken by aotriton changes ROCm/TheRock#1408

Open

cthi reviewed Sep 22, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

cthi reviewed Sep 22, 2025

View reviewed changes

aten/src/ATen/CMakeLists.txt Outdated Show resolved Hide resolved

jithunnair-amd and others added 2 commits September 23, 2025 02:31

Build fbgemm_genai for gfx942 and gfx950 only

fad54ff

revert changes to torch/testing/_internal/common_cuda.py

dd48ec5

lint

5b12488

pytorchmergebot force-pushed the build_fbgemm_ck_only_for_gfx942 branch from a55349f to 5b12488 Compare September 23, 2025 02:31

jithunnair-amd added 2 commits September 23, 2025 06:27

Remove gfx950 based on upstream review

343a3e3

Remove gfx950 from HIP clang flags based on upstream review

0aac9b5

jithunnair-amd changed the title ~~[ROCm] Build FBGEMM_GENAI for gfx942 and gfx950 only~~ [ROCm] Build FBGEMM_GENAI for gfx942 only Sep 23, 2025

pytorchmergebot added the merging label Sep 23, 2025

pytorchmergebot closed this in bcb893a Sep 23, 2025

pytorchmergebot added Merged and removed merging labels Sep 23, 2025

pytorch-auto-revert bot mentioned this pull request Sep 23, 2025

[DO NOT CLOSE] Autorevert actions shadow mode stream #163650

Open

jithunnair-amd mentioned this pull request Sep 24, 2025

[ROCm] Increase binary build timeout to 5 hours (300 minutes) #163776

Closed

araravik-psd mentioned this pull request Sep 29, 2025

[CI] Disable FBGEMM_GENAI and Flash attention on build_portable_linux_pytorch_wheels for Pytorch version 2.10 ROCm/TheRock#1619

Open

pytorchbot mentioned this pull request Oct 6, 2025

[ROCm] Increase binary build timeout to 5 hours (300 minutes) #164770

Merged

[ROCm] Build FBGEMM_GENAI for gfx942 only #162648

[ROCm] Build FBGEMM_GENAI for gfx942 only #162648

Uh oh!

Conversation

jithunnair-amd commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/162648

❌ 1 New Failure, 5 Cancelled Jobs, 1 Unrelated Failure

Uh oh!

Uh oh!

danielvegamyhre commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffdaily commented Sep 10, 2025

Uh oh!

jithunnair-amd commented Sep 15, 2025

Uh oh!

pytorchmergebot commented Sep 15, 2025

Uh oh!

pytorchmergebot commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

cthi commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jithunnair-amd commented Sep 23, 2025

Uh oh!

jithunnair-amd commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Uh oh!

jithunnair-amd commented Sep 23, 2025

Uh oh!

pytorchmergebot commented Sep 23, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jithunnair-amd commented Sep 10, 2025 •

edited

Loading

pytorch-bot bot commented Sep 10, 2025 •

edited

Loading

danielvegamyhre commented Sep 10, 2025 •

edited

Loading

cthi commented Sep 22, 2025 •

edited

Loading