-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP] Another fix for DTensor
, use_orig_params=True
#89845
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89845
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5ff0f38: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 6198bcfb367bc123a32a0e131c9b56dd64421584 Pull Request resolved: #89845
@pytorchbot rebase -s |
@pytorchbot successfully started a rebase job. Check the current status here |
The issue for `test_2d_parallel.py` is that `DTensor` does not support the idiom `param.data = view` where `view` is a `DTensor`. To work around this, we do not preserve the parameter variable `param` and instead create a new parameter variable altogether via `nn.Parameter(view)`. Preserving the parameter variable when unsharded was not a strict requirement -- it just made sense to do that if we are already doing that when _sharded_, where it _is_ a strict requirement to support the optimizer step. The sharded case is not an issue for 2D because sharded implies local tensor, not `DTensor`. [ghstack-poisoned]
Successfully rebased |
ghstack-source-id: 364d0f87a9b29618fa5c3ea81954d1d3d1046781 Pull Request resolved: #89845
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This pull request has been reverted by 6efedfd. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk). |
We will re-land this once |
The issue for `test_2d_parallel.py` is that `DTensor` does not support the idiom `param.data = view` where `view` is a `DTensor`. To work around this, we do not preserve the parameter variable `param` and instead create a new parameter variable altogether via `nn.Parameter(view)`. Preserving the parameter variable when unsharded was not a strict requirement -- it just made sense to do that if we are already doing that when _sharded_, where it _is_ a strict requirement to support the optimizer step. The sharded case is not an issue for 2D because sharded implies local tensor, not `DTensor`. Pull Request resolved: pytorch#89845 Approved by: https://github.com/zhaojuanmao
) This is a reland of #89845 with nothing changed. This should avoid the internal breakage now that `DTensor` does not import `torchgen` (#90106). Pull Request resolved: #90562 Approved by: https://github.com/fduwjj
Stack from ghstack (oldest at bottom):
DTensor
,use_orig_params=True
#89845The issue for
test_2d_parallel.py
is thatDTensor
does not support the idiomparam.data = view
whereview
is aDTensor
. To work around this, we do not preserve the parameter variableparam
and instead create a new parameter variable altogether viann.Parameter(view)
. Preserving the parameter variable when unsharded was not a strict requirement -- it just made sense to do that if we are already doing that when sharded, where it is a strict requirement to support the optimizer step. The sharded case is not an issue for 2D because sharded implies local tensor, notDTensor
.