Enable CUTLASS grouped GEMM for pretraining wgrad on GB200 and H100 (resubmit) #4913

jiawenliu64 · 2025-09-22T21:37:13Z

Summary:

Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
Optimize performance of pretraining moe shapes on H100
Support total_K in quantize_bench for wgrad
The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849

Differential Revision: D83001505

netlify · 2025-09-22T21:37:17Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`7d49746`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68d2c854a0154d000805640e
😎 Deploy Preview	https://deploy-preview-4913--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot · 2025-09-22T21:37:35Z

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

facebook-github-bot · 2025-09-23T01:31:05Z

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

…resubmit) (pytorch#4913) Summary: Pull Request resolved: pytorch#4913 X-link: facebookresearch/FBGEMM#1937 - Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100 - Optimize performance of pretraining moe shapes on H100 - Support total_K in quantize_bench for wgrad - The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849 Differential Revision: D83001505

facebook-github-bot · 2025-09-23T16:11:52Z

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

…resubmit) (pytorch#4913) Summary: Pull Request resolved: pytorch#4913 X-link: facebookresearch/FBGEMM#1937 - Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100 - Optimize performance of pretraining moe shapes on H100 - Support total_K in quantize_bench for wgrad - The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849 Reviewed By: cthi Differential Revision: D83001505

facebook-github-bot · 2025-09-23T16:18:18Z

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

facebook-github-bot · 2025-09-23T16:54:14Z

This pull request has been merged in 10c5a3e.

meta-cla bot added the cla signed label Sep 22, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 22, 2025

jiawenliu64 force-pushed the export-D83001505 branch from 87dd0dd to 1355c76 Compare September 23, 2025 01:31

jiawenliu64 force-pushed the export-D83001505 branch from 1355c76 to 3360001 Compare September 23, 2025 16:12

jiawenliu64 force-pushed the export-D83001505 branch from 3360001 to 7d49746 Compare September 23, 2025 16:18

facebook-github-bot closed this in 10c5a3e Sep 23, 2025

facebook-github-bot added the Merged label Sep 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable CUTLASS grouped GEMM for pretraining wgrad on GB200 and H100 (resubmit) #4913

Enable CUTLASS grouped GEMM for pretraining wgrad on GB200 and H100 (resubmit) #4913

Uh oh!

jiawenliu64 commented Sep 22, 2025

Uh oh!

netlify bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

Uh oh!

Enable CUTLASS grouped GEMM for pretraining wgrad on GB200 and H100 (resubmit) #4913

Enable CUTLASS grouped GEMM for pretraining wgrad on GB200 and H100 (resubmit) #4913

Uh oh!

Conversation

jiawenliu64 commented Sep 22, 2025

Uh oh!

netlify bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

facebook-github-bot commented Sep 22, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

facebook-github-bot commented Sep 23, 2025

Uh oh!

Uh oh!

netlify bot commented Sep 22, 2025 •

edited

Loading