-
Couldn't load subscription status.
- Fork 566
update ShardedEmbeddingBagCollection to be use registered EBCs with shardedTensors as registered modules #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
2 similar comments
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
…hardedTensors as registered modules (#88026) Summary: X-link: pytorch/pytorch#88026 Pull Request resolved: #758 update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit this works with DMP named_parameters() behavior changes -> use include_fused as temporary flag to gate this behavior note that due to ShardedTensor not supporting grads directly, this won't work for Dense compute kernels when non data parallel. This is not used today, and will add a TODO but is low pri Differential Revision: D40458625 fbshipit-source-id: 9135216ac67c828d8532d5c251cd6b8d170c058b
|
This pull request was exported from Phabricator. Differential Revision: D40458625 |
…e registered EBCs with shardedTensors as registered modules (#758) (#88026) Summary: X-link: meta-pytorch/torchrec#758 This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore. this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals) It defines device of ShardedTensor to be None when local_tensor() does not exist on rank update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit Differential Revision: D40458625 Pull Request resolved: #88026 Approved by: https://github.com/wanchaol, https://github.com/rohan-varma
…e registered EBCs with shardedTensors as registered modules (pytorch#758) (pytorch#88026) Summary: X-link: meta-pytorch/torchrec#758 This PR fixes a bug in FSDP/DDP, where ShardedTensors are not supported even if passed in as params to ignore. this is important for composability because TorchRec named_parameters() will return FQN of shardedTensors (as defined in goals) It defines device of ShardedTensor to be None when local_tensor() does not exist on rank update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit Differential Revision: D40458625 Pull Request resolved: pytorch#88026 Approved by: https://github.com/wanchaol, https://github.com/rohan-varma
Summary:
update ShardedEmbeddingBagCollection to be composable according to https://docs.google.com/document/d/1TBJSd5zgEg6cRcXv3Okuj7bBkqQwGS2IPh4TLWNNzFI/edit
this works with DMP
named_parameters() behavior changes -> use include_fused as temporary flag to gate this behavior
note that due to ShardedTensor not supporting grads directly, this won't work for Dense compute kernels when non data parallel. This is not used today, and will add a TODO but is low pri
Differential Revision: D40458625