Skip to content

Conversation

@EddyLXJ
Copy link
Contributor

@EddyLXJ EddyLXJ commented Nov 11, 2025

Summary:
X-link: meta-pytorch/torchrec#3538

X-link: https://github.com/facebookresearch/FBGEMM/pull/2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053

@netlify
Copy link

netlify bot commented Nov 11, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit 01876f2
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/6913b64606ae3e0008bffaff
😎 Deploy Preview https://deploy-preview-5116--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Nov 11, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 11, 2025

@EddyLXJ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D85830053.

EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 13, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 13, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Nov 13, 2025
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Nov 13, 2025
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 13, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Nov 13, 2025
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Nov 14, 2025
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 14, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 14, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Nov 14, 2025
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 14, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/torchrec that referenced this pull request Nov 14, 2025
Summary:
X-link: pytorch/FBGEMM#5116


X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053
EddyLXJ added a commit to EddyLXJ/FBGEMM-1 that referenced this pull request Nov 14, 2025
Summary:

X-link: meta-pytorch/torchrec#3538

X-link: facebookresearch/FBGEMM#2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053
meta-codesync bot pushed a commit to meta-pytorch/torchrec that referenced this pull request Nov 15, 2025
Summary:
X-link: pytorch/FBGEMM#5116

Pull Request resolved: #3538

X-link: https://github.com/facebookresearch/FBGEMM/pull/2122

For silvertorch publish, we don't want to load opt into backend due to limited cpu memory in publish host.
So we need to load the whole row into state dict which loading the checkpoint in st publish, then only save weight into backend, after that backend will only have metaheader + weight.
For the first loading, we need to set dim with metaheader_dim + emb_dim + optimizer_state_dim, otherwise the checkpoint loadding will throw size mismatch error. after the first loading, we only need to get metaheader+weight from backend for state dict, so we can set dim with metaheader_dim + emb

Reviewed By: emlin

Differential Revision: D85830053

fbshipit-source-id: 0eddbe9e69ea8271e8c77dc0147e87a08f0b3934
@meta-codesync meta-codesync bot closed this in f3d282b Nov 15, 2025
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Nov 15, 2025

This pull request has been merged in f3d282b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants