Skip to content

Conversation

jiawenliu64
Copy link
Member

Summary:

  • Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
  • Optimize performance of pretraining moe shapes on H100
  • Support total_K in quantize_bench for wgrad
  • The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849

Differential Revision: D83001505

Copy link

netlify bot commented Sep 22, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 7d49746
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68d2c854a0154d000805640e
😎 Deploy Preview https://deploy-preview-4913--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Sep 22, 2025
@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

jiawenliu64 added a commit to jiawenliu64/FBGEMM that referenced this pull request Sep 23, 2025
…resubmit) (pytorch#4913)

Summary:
Pull Request resolved: pytorch#4913

X-link: facebookresearch/FBGEMM#1937

- Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
- Optimize performance of pretraining moe shapes on H100
- Support total_K in quantize_bench for wgrad
- The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849

Differential Revision: D83001505
@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

jiawenliu64 added a commit to jiawenliu64/FBGEMM that referenced this pull request Sep 23, 2025
…resubmit) (pytorch#4913)

Summary:
Pull Request resolved: pytorch#4913

X-link: facebookresearch/FBGEMM#1937

- Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
- Optimize performance of pretraining moe shapes on H100
- Support total_K in quantize_bench for wgrad
- The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849

Reviewed By: cthi

Differential Revision: D83001505
…resubmit) (pytorch#4913)

Summary:
Pull Request resolved: pytorch#4913

X-link: facebookresearch/FBGEMM#1937

- Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100
- Optimize performance of pretraining moe shapes on H100
- Support total_K in quantize_bench for wgrad
- The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849

Reviewed By: cthi

Differential Revision: D83001505
@facebook-github-bot
Copy link
Contributor

@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 10c5a3e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants