-
Notifications
You must be signed in to change notification settings - Fork 657
Enable CUTLASS grouped GEMM for pretraining wgrad on GB200 and H100 (resubmit) #4913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505. |
@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505. |
87dd0dd
to
1355c76
Compare
…resubmit) (pytorch#4913) Summary: Pull Request resolved: pytorch#4913 X-link: facebookresearch/FBGEMM#1937 - Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100 - Optimize performance of pretraining moe shapes on H100 - Support total_K in quantize_bench for wgrad - The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849 Differential Revision: D83001505
@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505. |
…resubmit) (pytorch#4913) Summary: Pull Request resolved: pytorch#4913 X-link: facebookresearch/FBGEMM#1937 - Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100 - Optimize performance of pretraining moe shapes on H100 - Support total_K in quantize_bench for wgrad - The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849 Reviewed By: cthi Differential Revision: D83001505
1355c76
to
3360001
Compare
…resubmit) (pytorch#4913) Summary: Pull Request resolved: pytorch#4913 X-link: facebookresearch/FBGEMM#1937 - Enable CUTLASS grouped GEMM for llama4x pretraining wgrad on GB200 and H100 - Optimize performance of pretraining moe shapes on H100 - Support total_K in quantize_bench for wgrad - The FBGEMM relocation issue has been released for short-term, so resubmit. Passed all tests in T238469849 Reviewed By: cthi Differential Revision: D83001505
@jiawenliu64 has exported this pull request. If you are a Meta employee, you can view the originating diff in D83001505. |
3360001
to
7d49746
Compare
This pull request has been merged in 10c5a3e. |
Summary:
Differential Revision: D83001505