Do not pass opt type in hstu umia st publish #3544

EddyLXJ · 2025-11-13T21:25:02Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2127

Previously if kvzch table enable PARTIAL_ROWWISE_ADAM opt type. It will pass PARTIAL_ROWWISE_ADAM to all sharder as fused param, which will let sharder init opt with PARTIAL_ROWWISE_ADAM and will cause OOM issue. This diff is changing only pass opt type PARTIAL_ROWWISE_ADAM to KVZCH tbe and avoid OOM issue.

Differential Revision: D86787539

meta-codesync · 2025-11-13T21:25:11Z

@EddyLXJ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D86787539.

Summary: X-link: meta-pytorch/torchrec#3544 X-link: facebookresearch/FBGEMM#2127 Previously if kvzch table enable PARTIAL_ROWWISE_ADAM opt type. It will pass PARTIAL_ROWWISE_ADAM to all sharder as fused param, which will let sharder init opt with PARTIAL_ROWWISE_ADAM and will cause OOM issue. This diff is changing only pass opt type PARTIAL_ROWWISE_ADAM to KVZCH tbe and avoid OOM issue. Differential Revision: D86787539

Summary: X-link: pytorch/FBGEMM#5116 X-link: facebookresearch/FBGEMM#2122 For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host. So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight. For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb Reviewed By: emlin Differential Revision: D85830053

Summary: X-link: pytorch/FBGEMM#5125 X-link: facebookresearch/FBGEMM#2127 Previously if kvzch table enable PARTIAL_ROWWISE_ADAM opt type. It will pass PARTIAL_ROWWISE_ADAM to all sharder as fused param, which will let sharder init opt with PARTIAL_ROWWISE_ADAM and will cause OOM issue. This diff is changing only pass opt type PARTIAL_ROWWISE_ADAM to KVZCH tbe and avoid OOM issue. Reviewed By: steven1327 Differential Revision: D86787539

Summary: X-link: meta-pytorch/torchrec#3544 X-link: facebookresearch/FBGEMM#2127 Previously if kvzch table enable PARTIAL_ROWWISE_ADAM opt type. It will pass PARTIAL_ROWWISE_ADAM to all sharder as fused param, which will let sharder init opt with PARTIAL_ROWWISE_ADAM and will cause OOM issue. This diff is changing only pass opt type PARTIAL_ROWWISE_ADAM to KVZCH tbe and avoid OOM issue. Reviewed By: steven1327 Differential Revision: D86787539

Summary: Pull Request resolved: #5125 X-link: meta-pytorch/torchrec#3544 X-link: https://github.com/facebookresearch/FBGEMM/pull/2127 Previously if kvzch table enable PARTIAL_ROWWISE_ADAM opt type. It will pass PARTIAL_ROWWISE_ADAM to all sharder as fused param, which will let sharder init opt with PARTIAL_ROWWISE_ADAM and will cause OOM issue. This diff is changing only pass opt type PARTIAL_ROWWISE_ADAM to KVZCH tbe and avoid OOM issue. Reviewed By: steven1327 Differential Revision: D86787539 fbshipit-source-id: eeeed6a449e8ea130fc684b58e6f15e5f7418e3a

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 13, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 13, 2025

EddyLXJ mentioned this pull request Nov 13, 2025

Do not pass opt type in hstu umia st publish pytorch/FBGEMM#5125

Closed

EddyLXJ added 2 commits November 14, 2025 15:02

EddyLXJ force-pushed the export-D86787539 branch from 7f7f847 to 006c8bb Compare November 14, 2025 23:02

meta-codesync bot closed this in 0677e23 Nov 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not pass opt type in hstu umia st publish #3544

Do not pass opt type in hstu umia st publish #3544

Uh oh!

EddyLXJ commented Nov 13, 2025

Uh oh!

meta-codesync bot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Do not pass opt type in hstu umia st publish #3544

Do not pass opt type in hstu umia st publish #3544

Uh oh!

Conversation

EddyLXJ commented Nov 13, 2025

Uh oh!

meta-codesync bot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant