Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Checkpoint][Test] Add test for optimizer state_dict and resharding to 2d checkpoint test #91092

Closed
wants to merge 7 commits into from
Closed

[Checkpoint][Test] Add test for optimizer state_dict and resharding to 2d checkpoint test #91092

wants to merge 7 commits into from

Conversation

wz337
Copy link
Contributor

@wz337 wz337 commented Dec 19, 2022

This PR updates the 2d checkpoint model state test to include:

  1. optimizer state dict test
  2. simple resharding test (pg change)
  3. rename test

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 19, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91092

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 61cfbdc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Dec 19, 2022
@wz337 wz337 marked this pull request as ready for review December 19, 2022 17:48
@wz337 wz337 requested a review from fduwjj December 19, 2022 17:48
Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Unblock and have a small question here.

@@ -171,6 +208,29 @@ def test_2d_model_state_checkpoint(self) -> None:
else:
self.assertEqual(n_p1[1], n_p2[1])

def opt_at(opt, idx):
return list(iter(opt.state.values()))[idx]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need "iter" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching. iter is unnecessary here.

@wz337
Copy link
Contributor Author

wz337 commented Dec 23, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Dec 24, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Dec 25, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #91092, but it was already up to date

@wz337
Copy link
Contributor Author

wz337 commented Dec 26, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Dec 27, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Dec 28, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #91092, but it was already up to date

@wz337
Copy link
Contributor Author

wz337 commented Dec 28, 2022

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 28, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule Distributed):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

@wz337
Copy link
Contributor Author

wz337 commented Dec 29, 2022

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Jan 3, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Jan 4, 2023

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased add_2d_dcp_optim_state_test onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout add_2d_dcp_optim_state_test && git pull --rebase)

@wz337
Copy link
Contributor Author

wz337 commented Jan 4, 2023

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants