Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSDP] Another fix for DTensor, use_orig_params=True #89845

Closed
wants to merge 2 commits into from

Conversation

awgu
Copy link
Contributor

@awgu awgu commented Nov 29, 2022

Stack from ghstack (oldest at bottom):

The issue for test_2d_parallel.py is that DTensor does not support the idiom param.data = view where view is a DTensor. To work around this, we do not preserve the parameter variable param and instead create a new parameter variable altogether via nn.Parameter(view). Preserving the parameter variable when unsharded was not a strict requirement -- it just made sense to do that if we are already doing that when sharded, where it is a strict requirement to support the optimizer step. The sharded case is not an issue for 2D because sharded implies local tensor, not DTensor.

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 29, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89845

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5ff0f38:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu added a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: 6198bcfb367bc123a32a0e131c9b56dd64421584
Pull Request resolved: #89845
@awgu
Copy link
Contributor Author

awgu commented Nov 29, 2022

@pytorchbot rebase -s

@awgu awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 29, 2022
@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

The issue for `test_2d_parallel.py` is that `DTensor` does not support the idiom `param.data = view` where `view` is a `DTensor`. To work around this, we do not preserve the parameter variable `param` and instead create a new parameter variable altogether via `nn.Parameter(view)`. Preserving the parameter variable when unsharded was not a strict requirement -- it just made sense to do that if we are already doing that when _sharded_, where it _is_ a strict requirement to support the optimizer step. The sharded case is not an issue for 2D because sharded implies local tensor, not `DTensor`.

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/awgu/219/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/89845)

pytorchmergebot pushed a commit that referenced this pull request Nov 29, 2022
ghstack-source-id: 364d0f87a9b29618fa5c3ea81954d1d3d1046781
Pull Request resolved: #89845
@awgu
Copy link
Contributor Author

awgu commented Nov 29, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by 6efedfd. To re-land this change, please open another pull request, assignthe same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).

@awgu
Copy link
Contributor Author

awgu commented Dec 3, 2022

We will re-land this once DTensor removes its dependency on torchgen.

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
The issue for `test_2d_parallel.py` is that `DTensor` does not support the idiom `param.data = view` where `view` is a `DTensor`. To work around this, we do not preserve the parameter variable `param` and instead create a new parameter variable altogether via `nn.Parameter(view)`. Preserving the parameter variable when unsharded was not a strict requirement -- it just made sense to do that if we are already doing that when _sharded_, where it _is_ a strict requirement to support the optimizer step. The sharded case is not an issue for 2D because sharded implies local tensor, not `DTensor`.
Pull Request resolved: pytorch#89845
Approved by: https://github.com/zhaojuanmao
pytorchmergebot pushed a commit that referenced this pull request Dec 10, 2022
)

This is a reland of #89845 with nothing changed. This should avoid the internal breakage now that `DTensor` does not import `torchgen` (#90106).
Pull Request resolved: #90562
Approved by: https://github.com/fduwjj
@facebook-github-bot facebook-github-bot deleted the gh/awgu/219/head branch June 8, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants