[pytree] Add arg_tree_leaves to optimize flattening function arguments #112393

peterbell10 · 2023-10-30T13:29:31Z

Stack from ghstack (oldest at bottom):

We commonly do some variation of tree_leaves((args, kwargs)). This adds a new
function arg_tree_leaves(*args, **kwargs) which takes advantage of the known
structure of args and kwargs to skip their flatten_fn.

I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement
when passing just one of args or kwargs. For shallow structures, this can be
proportionally quite significant. For example, the empty_strided call I've been
using as a benchmark:

args = ((100, 100), (100, 1))
kwargs = dict(device="cuda")

Sees a 30% speedup from this.

cc @zou3519

We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. [ghstack-poisoned]

pytorch-bot · 2023-10-30T13:29:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112393

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ca986a8 with merge base 29844ad ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…on arguments" We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. cc zou3519 [ghstack-poisoned]

We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. ghstack-source-id: 3f5310d Pull Request resolved: pytorch#112393

Pull Request resolved: #112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393

Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: #112417 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393, #112394

pytorch#112393) We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. Pull Request resolved: pytorch#112393 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392

Pull Request resolved: pytorch#112394 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393

Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: pytorch#112417 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393, pytorch#112394

pytorch#112393) We commonly do some variation of `tree_leaves((args, kwargs))`. This adds a new function `arg_tree_leaves(*args, **kwargs)` which takes advantage of the known structure of `args` and `kwargs` to skip their `flatten_fn`. I see ~1 us improvement per call for args + kwargs, or a 0.5 us improvement when passing just one of `args` or `kwargs`. For shallow structures, this can be proportionally quite significant. For example, the empty_strided call I've been using as a benchmark: ``` args = ((100, 100), (100, 1)) kwargs = dict(device="cuda") ``` Sees a 30% speedup from this. Pull Request resolved: pytorch#112393 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392

Pull Request resolved: pytorch#112394 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393

Wherever we discard the output of `tree_map` it's better to call `tree_map_` which doesn't unflatten the mapped results and so is a lot cheaper. Pull Request resolved: pytorch#112417 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#112391, pytorch#112392, pytorch#112393, pytorch#112394

This was referenced Oct 30, 2023

[pytree] Avoid constructing intermediate lists in tree_{flatten,leaves} #112391

Closed

[pytree] Remove LeafSpec construction cost in tree_flatten #112392

Closed

Use pytree.arg_tree_leaves everywhere #112394

Closed

pytorchbot added the open source label Oct 30, 2023

peterbell10 added the module: pytree label Oct 30, 2023

lezcano approved these changes Oct 30, 2023

View reviewed changes

peterbell10 marked this pull request as ready for review October 30, 2023 17:52

This was referenced Oct 30, 2023

Use pytree.tree_map_ everywhere #112417

Closed

[FakeTensor] Reuse flat_args throughout FakeTensorMode.dispatch #112418

Closed

peterbell10 mentioned this pull request Oct 31, 2023

[fx] Cache translation_validation_enabled on ShapeEnv #112493

Closed

pytorchmergebot added the Merged label Oct 31, 2023

pytorchmergebot closed this in 046c0c6 Oct 31, 2023

pytorchmergebot pushed a commit that referenced this pull request Oct 31, 2023

Use pytree.arg_tree_leaves everywhere (#112394)

66c32d0

Pull Request resolved: #112394 Approved by: https://github.com/lezcano ghstack dependencies: #112391, #112392, #112393

facebook-github-bot deleted the gh/peterbell10/648/head branch November 4, 2023 14:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[pytree] Add arg_tree_leaves to optimize flattening function arguments #112393

[pytree] Add arg_tree_leaves to optimize flattening function arguments #112393

Uh oh!

peterbell10 commented Oct 30, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 30, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[pytree] Add arg_tree_leaves to optimize flattening function arguments #112393

[pytree] Add arg_tree_leaves to optimize flattening function arguments #112393

Uh oh!

Conversation

peterbell10 commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112393

✅ No Failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

peterbell10 commented Oct 30, 2023 •

edited

Loading

pytorch-bot bot commented Oct 30, 2023 •

edited

Loading