Skip to content

Conversation

awgu
Copy link
Collaborator

@awgu awgu commented Dec 14, 2022

Stack from ghstack:

In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 14, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90846

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c814a10:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@awgu awgu added the topic: improvements topic category label Dec 14, 2022
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.

[ghstack-poisoned]
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.

[ghstack-poisoned]
@awgu awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 14, 2022
Copy link
Contributor

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the preemptive stamp - coming back for a full review :)

import sys
from collections import Counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have formatting changes separated out and in a different PR? Formatting changes with logical changes make it harder for reviewers to understand the critical parts of the PR to review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that. I separated it out. Eventually, we should try to have all developers use ufmt so that we can ufmt files right before pushing. Converging to the same formatter saves the uncertainty of how code should be formatted, and we chose ufmt since PyTorch recommends that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about how we will encourage this practice. lintrunner is today's automated tool that can run pre-commit and is enforced by PyTorch CI.

Should we work with dev infra / CI folks to add ufmt to lintrunner? Without automation, enforcement of a linting standard is prone to breaking.

In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.

[ghstack-poisoned]
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.

[ghstack-poisoned]
@awgu
Copy link
Collaborator Author

awgu commented Dec 15, 2022

I will fix test failures tomorrow.

In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.

[ghstack-poisoned]
awgu pushed a commit to awgu/pytorch that referenced this pull request Dec 15, 2022
@awgu
Copy link
Collaborator Author

awgu commented Dec 15, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category topic: improvements topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants