Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch dtensor and functional collective to use optree #110670

Closed
wants to merge 2 commits into from

Conversation

wanchaol
Copy link
Contributor

@wanchaol wanchaol commented Oct 6, 2023

Stack from ghstack (oldest at bottom):

optree recently landed and provide quite good perf, conditionally import
new optree if optree is installed

Some numbers testing mlp layer with TP + func collective:
before this PR: 10.390ms
after this PR: 9.189ms

so around e2e 10% CPU overhead reduction

optree recently landed and provide quite good perf, conditionally import
new optree if optree is installed

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110670

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 73754cb with merge base 2aa3064 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wanchaol added a commit that referenced this pull request Oct 6, 2023
optree recently landed and provide quite good perf, conditionally import
new optree if optree is installed

ghstack-source-id: 28ac266eab5e4b34690bbf8aae0cfe7619451e12
Pull Request resolved: #110670
@wanchaol wanchaol added the release notes: distributed (dtensor) release notes category label Oct 6, 2023
Copy link
Contributor

@fduwjj fduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow do we know how much perf we can get from switching to optree?

try:
from torch.utils._cxx_pytree import tree_flatten, tree_unflatten
except ImportError:
tree_flatten = torch.utils._pytree.tree_flatten # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why the assignment version here instead of doing import like above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed a similar approach taken in #109684, probably another import also works (in both case we need to ignore type annotation, lmk your preference :)

try:
from torch.utils._cxx_pytree import tree_map_only
except ImportError:
tree_map_only = torch.utils._pytree.tree_map_only # type: ignore[assignment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know under what builds/setups this import won't exist? Should we issue a warning if we're in a known slow mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe some certain CI does not have access to pip channel and it only have conda channel, where optree not yet available in conda. @XuehaiPan might have more details

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm making optree optional in PR #109684. We don't need this try-except blocks after PR #109684.

@wanchaol
Copy link
Contributor Author

wanchaol commented Oct 6, 2023

wow do we know how much perf we can get from switching to optree?

I tried it and there's not too much perf gain for large scale 2d, reason probably because we already hide almost all cpu overhead. This could potentially benefits smaller models we can try

See some updates on summary, locally testing shows around 10% CPU overhead reduction

Copy link
Contributor

@fegin fegin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

optree recently landed and provide quite good perf, conditionally import
new optree if optree is installed

Some numbers testing mlp layer with TP + func collective:
before this PR: 10.390ms
after this PR: 9.189ms

so around e2e 10% CPU overhead reduction


[ghstack-poisoned]
wanchaol added a commit that referenced this pull request Oct 7, 2023
optree recently landed and provide quite good perf, conditionally import
new optree if optree is installed

ghstack-source-id: cdaf58eaacd93bf32f2edc2060d0b54b68470d0d
Pull Request resolved: #110670
@wanchaol wanchaol added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 7, 2023
@wanchaol
Copy link
Contributor Author

wanchaol commented Oct 8, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot facebook-github-bot deleted the gh/wanchaol/361/head branch October 11, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: distributed (dtensor) release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants