Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSDP] Allow to use TorchDispatch with FSDP #88014

Closed
wants to merge 5 commits into from

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Oct 28, 2022

Stack from ghstack (oldest at bottom):

Add _no_dispatch_record_stream to disable TorchDispatch before calling record_stream().

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 28, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88014

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b594741:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Oct 28, 2022
fegin added a commit that referenced this pull request Oct 28, 2022
ghstack-source-id: 22c02de0773c4106e1941f21025abbd70e9a68ca
Pull Request resolved: #88014
@awgu
Copy link
Contributor

awgu commented Oct 28, 2022

Is this for landing, or is this just for experimentation? I am concerned that memory logs from using this will be inaccurate because memory will be incorrectly freed early without record_stream().

Copy link
Contributor

@zhaojuanmao zhaojuanmao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look good to me, will let Andrew or Rohan to accept

@fegin
Copy link
Contributor Author

fegin commented Oct 28, 2022

@awgu record_stream() is still called with this PR. What _no_dispatch_record_stream() does is to disable TorchDispatch before calling record_stream() and reenable it after record_stream() is done. So the memory won't be freed early.

Copy link
Contributor

@awgu awgu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I definitely misunderstood.

This looks good to me. Do you know if Core team is planning to add support for record_stream() so that this is a temporary fix?

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 28, 2022
@fegin
Copy link
Contributor Author

fegin commented Oct 31, 2022

This looks good to me. Do you know if Core team is planning to add support for record_stream() so that this is a temporary fix?

Saw a TODO for TorchDispatch to support class block list filter. If that function is implemented, users can just use the block list to filter out record_stream(). Not sure when that feature will be implemented though.

@awgu
Copy link
Contributor

awgu commented Oct 31, 2022

This looks good to me. Do you know if Core team is planning to add support for record_stream() so that this is a temporary fix?

Saw a TODO for TorchDispatch to support class block list filter. If this function is implemented, users can just the block list to filter out record_stream(). Not sure when that feature will be implemented though.

Sounds good to me!

Also, sorry, I am landing some PRs from my stack, so there may be some rebase conflicts :(

Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Oct 31, 2022
ghstack-source-id: 649546b87eec1a2565748efa48b41ecf4f722727
Pull Request resolved: #88014
Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Oct 31, 2022
ghstack-source-id: 6786e65e5e34174604a12c73ac17646d59810710
Pull Request resolved: #88014
Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Nov 3, 2022
ghstack-source-id: 7eb67be4760376db9ebb6fbcf01928fc0e92c3fd
Pull Request resolved: #88014
Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`.

[ghstack-poisoned]
fegin added a commit that referenced this pull request Nov 3, 2022
ghstack-source-id: c8504963f4e2c693c3ae0c68eacaf1378a655f14
Pull Request resolved: #88014
@fegin
Copy link
Contributor Author

fegin commented Nov 3, 2022

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`.
Pull Request resolved: pytorch#88014
Approved by: https://github.com/awgu
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Add `_no_dispatch_record_stream` to disable TorchDispatch before calling `record_stream()`.
Pull Request resolved: pytorch#88014
Approved by: https://github.com/awgu
@facebook-github-bot facebook-github-bot deleted the gh/fegin/37/head branch June 8, 2023 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (fsdp) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants