Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not flatten states when use_orig_param is True and sharding is NO_SHARD #100189

Closed
wants to merge 1 commit into from

Conversation

zhaojuanmao
Copy link
Contributor

@zhaojuanmao zhaojuanmao commented Apr 27, 2023

When use_orig_param is True and sharding is NO_SHARD, parameters and states are not flattened, so optimizer states should not be flattened as well. The unit test will fail without the fix.

Stack from ghstack (oldest at bottom):

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 27, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100189

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 90a560f:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Apr 27, 2023
zhaojuanmao added a commit that referenced this pull request Apr 27, 2023
ghstack-source-id: 524ba5724383bd5f4dd7e2a9a6a6fc1de6166d88
Pull Request resolved: #100189
@zhaojuanmao zhaojuanmao changed the title Not flatten states when use_orig_param is True and shaarding is NO_SHARD Not flatten states when use_orig_param is True and sharding is NO_SHARD Apr 27, 2023
Copy link
Contributor

@awgu awgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Maybe @fegin has any opinion about unit test organization.

)
optim = torch.optim.Adam(model.parameters(), lr=1e-2)

def step():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Curious, why do we define step() here if we only call it once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied from other unit tests:)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be OK, it is neat to use a function

@zhaojuanmao
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 27, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged merging release notes: distributed (fsdp) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants