[rocm] scaled_grouped_mm support gfx942 fp8 data type by xiaobochen-amd · Pull Request #3802 · pytorch/ao

xiaobochen-amd · 2026-02-03T14:06:28Z

Old PR: #3540

pytorch-bot · 2026-02-03T14:06:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3802

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 0bd013d with merge base 01d3a2d ():

NEW FAILURE - The following job has failed:

PR Label Check / Check PR Labels (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

danielvegamyhre · 2026-02-06T21:39:04Z

@xiaobochen-amd can you share some perf numbers/benchmarks comparing to bf16 baseline? microbenchmarks or e2e training in torchtitan?

alex-minooka · 2026-02-12T20:59:57Z

@xiaobochen-amd can you share some perf numbers/benchmarks comparing to bf16 baseline? microbenchmarks or e2e training in torchtitan?

Hello, here is some of the becnhmarking data from torchtitan using @xiaobochen-amd PR. testing e2e performance

FP8 grouped gemm is about 10%~ behind on TPS compared to FP16, but that is something we are looking into.

@danielvegamyhre

danielvegamyhre · 2026-02-17T17:17:41Z

@alex-minooka @xiaobochen-amd fyi this part of the codebase is going through a substantial refactor, so we need to pause landing this until that is complete (in 1-2 days or so). Sorry for the inconvenience. #3862

danielvegamyhre · 2026-02-26T04:36:28Z

hey @xiaobochen-amd go ahead and rebase and land if you want, that refactor PR is landed now

xiaobochen-amd · 2026-02-26T04:38:52Z

hey @xiaobochen-amd go ahead and rebase and land if you want, that refactor PR is landed now

@danielvegamyhre Ok, I’ll handle this PR soon.

xiaobochen-amd added 11 commits February 3, 2026 06:36

scale_grouped_mm add gfx942 fp8 dtype

233fe9f

Address PR review feedback

998ce4d

fix trtion index int32 overflow

edc1ca4

Address reviewer feedback

6c736a9

update allclose rtol and atol

f4ff1d0

fix ruff issue

d7bcbd4

add more small shape test case

bf6e818

reduce test case

fb40729

update note

9cc38b8

fix format error

0dd73d2

fix some issues

0bd013d

pytorch-bot bot added the device: rocm label Feb 3, 2026

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 3, 2026

danielvegamyhre self-requested a review February 5, 2026 01:21

danielvegamyhre added the topic: new feature Use this tag if this PR adds a new feature label Feb 5, 2026

xiaobochen-amd closed this Feb 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rocm] scaled_grouped_mm support gfx942 fp8 data type#3802

[rocm] scaled_grouped_mm support gfx942 fp8 data type#3802
xiaobochen-amd wants to merge 11 commits intopytorch:mainfrom
xiaobochen-amd:dev

xiaobochen-amd commented Feb 3, 2026

Uh oh!

pytorch-bot bot commented Feb 3, 2026 •

edited

Loading

Uh oh!

danielvegamyhre commented Feb 6, 2026

Uh oh!

alex-minooka commented Feb 12, 2026 •

edited

Loading

Uh oh!

danielvegamyhre commented Feb 17, 2026

Uh oh!

danielvegamyhre commented Feb 26, 2026

Uh oh!

xiaobochen-amd commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xiaobochen-amd commented Feb 3, 2026

Uh oh!

pytorch-bot bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3802

❌ 1 New Failure

Uh oh!

danielvegamyhre commented Feb 6, 2026

Uh oh!

alex-minooka commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielvegamyhre commented Feb 17, 2026

Uh oh!

danielvegamyhre commented Feb 26, 2026

Uh oh!

xiaobochen-amd commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Feb 3, 2026 •

edited

Loading

alex-minooka commented Feb 12, 2026 •

edited

Loading