[pytree] Remove LeafSpec construction cost in tree_flatten #112392

peterbell10 · 2023-10-30T13:29:26Z

Stack from ghstack (oldest at bottom):

On my machine, pytree.LeafSpec() takes ~600ns but since every leaf spec is the
same, we can just use a global constant.

cc @zou3519 @XuehaiPan @jon-chuang

On my machine, `pytree.LeafSpec()` takes ~600ns but since every leaf spec is the same, we can just use a global constant. [ghstack-poisoned]

pytorch-bot · 2023-10-30T13:29:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112392

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e44f847 with merge base 29844ad ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

On my machine, `pytree.LeafSpec()` takes ~600ns but since every leaf spec is the same, we can just use a global constant. ghstack-source-id: 05cff54 Pull Request resolved: pytorch#112392

lezcano

Fine, but I reckon that we shouldn't invest that much time in optimising this implementation, as we are working towards moving to optree. See the stack #112110 and more generally the work of the author of those PRs

peterbell10 · 2023-10-30T18:24:23Z

@XuehaiPan I see that some parts of the codebase already use optree as an optional dependency. Is the plan to keep it as optional, meaning the python implementation would still be relevant?

zou3519 · 2023-10-30T18:52:20Z

In the short to medium term, both the python and C++ pytree (optree) will be relevant. Open question for the longer term (we likely will need to keep around the python pytree implementation so that Dynamo can trace through it)

peterbell10 · 2023-10-30T19:13:26Z

@pytorchbot merge

pytorchmergebot · 2023-10-30T19:15:17Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

peterbell10 · 2023-10-30T19:30:55Z

@pytorchbot merge

pytorchmergebot · 2023-10-30T19:32:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

#112393) We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. Pull Request resolved: #112393 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392

Pull Request resolved: #112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393

Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: #112417 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393, #112394

…12392) On my machine, `pytree.LeafSpec()` takes ~600ns but since every leaf spec is the same, we can just use a global constant. Pull Request resolved: pytorch#112392 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391

pytorch#112393) We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. Pull Request resolved: pytorch#112393 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392

Pull Request resolved: pytorch#112394 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393

Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: pytorch#112417 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393, pytorch#112394

…12392) On my machine, `pytree.LeafSpec()` takes ~600ns but since every leaf spec is the same, we can just use a global constant. Pull Request resolved: pytorch#112392 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391

pytorch#112393) We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. Pull Request resolved: pytorch#112393 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392

Pull Request resolved: pytorch#112394 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393

Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: pytorch#112417 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393, pytorch#112394

[pytree] Remove LeafSpec construction cost in tree_flatten

e44f847

On my machine, `pytree.LeafSpec()` takes ~600ns but since every leaf spec is the same, we can just use a global constant. [ghstack-poisoned]

This was referenced Oct 30, 2023

[pytree] Avoid constructing intermediate lists in tree_{flatten,leaves} #112391

Closed

[pytree] Add arg_tree_leaves to optimize flattening function arguments #112393

Closed

Use pytree.arg_tree_leaves everywhere #112394

Closed

pytorchbot added the open source label Oct 30, 2023

peterbell10 added the module: pytree label Oct 30, 2023

peterbell10 requested a review from lezcano October 30, 2023 16:21

peterbell10 marked this pull request as ready for review October 30, 2023 16:21

lezcano approved these changes Oct 30, 2023

View reviewed changes

This was referenced Oct 30, 2023

Use pytree.tree_map_ everywhere #112417

Closed

[FakeTensor] Reuse flat_args throughout FakeTensorMode.dispatch #112418

Closed

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 30, 2023

pytorchmergebot added the merging label Oct 30, 2023

pytorchmergebot removed the merging label Oct 30, 2023

peterbell10 added the topic: not user facing topic category label Oct 30, 2023

pytorchmergebot added the merging label Oct 30, 2023

pytorchmergebot added Merged and removed merging labels Oct 30, 2023

pytorchmergebot closed this in 31c0ef9 Oct 30, 2023

peterbell10 mentioned this pull request Oct 31, 2023

[fx] Cache translation_validation_enabled on ShapeEnv #112493

Closed

pytorchmergebot pushed a commit that referenced this pull request Oct 31, 2023

Use pytree.arg_tree_leaves everywhere (#112394)

66c32d0

Pull Request resolved: #112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393

facebook-github-bot deleted the gh/peterbell10/647/head branch November 3, 2023 14:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[pytree] Remove LeafSpec construction cost in tree_flatten #112392

[pytree] Remove LeafSpec construction cost in tree_flatten #112392

Uh oh!

peterbell10 commented Oct 30, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 30, 2023 •

edited

Loading

Uh oh!

lezcano left a comment

Uh oh!

peterbell10 commented Oct 30, 2023

Uh oh!

zou3519 commented Oct 30, 2023

Uh oh!

peterbell10 commented Oct 30, 2023

Uh oh!

pytorchmergebot commented Oct 30, 2023

Uh oh!

peterbell10 commented Oct 30, 2023

Uh oh!

pytorchmergebot commented Oct 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[pytree] Remove LeafSpec construction cost in tree_flatten #112392

[pytree] Remove LeafSpec construction cost in tree_flatten #112392

Uh oh!

Conversation

peterbell10 commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112392

✅ No Failures

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Oct 30, 2023

Uh oh!

zou3519 commented Oct 30, 2023

Uh oh!

peterbell10 commented Oct 30, 2023

Uh oh!

pytorchmergebot commented Oct 30, 2023

Merge failed

Uh oh!

peterbell10 commented Oct 30, 2023

Uh oh!

pytorchmergebot commented Oct 30, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

peterbell10 commented Oct 30, 2023 •

edited

Loading

pytorch-bot bot commented Oct 30, 2023 •

edited

Loading