Skip to content

Conversation

peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Oct 30, 2023

Stack from ghstack (oldest at bottom):

This function repeatedly flattens and unflattens the args, kwargs pair so we
get a quite significant perf improvement from saving the flat_args and
operating directly on those. I see a 15% improvement in dispatch for
empty_strided.

This function repeatedly flattens and unflattens the `args, kwargs` pair so we
get a quite significant perf improvement from saving the `flat_args` and
operating directly on those. I see a 15% improvement in dispatch for
`empty_strided`.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Oct 30, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112418

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2cafa24 with merge base 29f3d39 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This function repeatedly flattens and unflattens the `args, kwargs` pair so we
get a quite significant perf improvement from saving the `flat_args` and
operating directly on those. I see a 15% improvement in dispatch for
`empty_strided`.

[ghstack-poisoned]
@peterbell10 peterbell10 changed the title Reuse flat_args throughout FakeTensorMode.dispatch [FakeTensor] Reuse flat_args throughout FakeTensorMode.dispatch Oct 31, 2023
@peterbell10 peterbell10 marked this pull request as ready for review October 31, 2023 15:53
@peterbell10 peterbell10 requested a review from lezcano October 31, 2023 15:53
@peterbell10 peterbell10 added the topic: not user facing topic category label Oct 31, 2023
Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two optional nits


def wrap_meta_outputs_with_default_device_logic(self, r, func, args, kwargs):
wrap = self.gen_wrap_fn(func, args, kwargs)
def wrap_meta_outputs_with_default_device_logic(self, r, func, flat_args, kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. perhaps just pass `device rather than passing flat_args and kwargs. I found this a bit confusing.


return tree_map(map_out, r)
flat_out = [map_out(o) for o in flat_out]
return pytree.tree_unflatten(flat_out, out_spec)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here perhaps we just want to tree_map all the transformation on r rather than unpacking and packing again?

…patch"

This function repeatedly flattens and unflattens the `args, kwargs` pair so we
get a quite significant perf improvement from saving the `flat_args` and
operating directly on those. I see a 15% improvement in dispatch for
`empty_strided`.

[ghstack-poisoned]
…patch"

This function repeatedly flattens and unflattens the `args, kwargs` pair so we
get a quite significant perf improvement from saving the `flat_args` and
operating directly on those. I see a 15% improvement in dispatch for
`empty_strided`.

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 1, 2023
`ShapeEnv` has tons of functionallity that is conditioned on this
`translation_validation_enabled()` check, to the point where 8% of
time in `empty_strided` is spent just in that function.

However, it doesn't really make sense for the value of
`translation_validation_enabled()` to change throughout the life of a `ShapeEnv`
so we might as well run the check once and store it in the `ShapeEnv`.
Pull Request resolved: #112493
Approved by: https://github.com/lezcano
ghstack dependencies: #112418
@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/651/head branch November 5, 2023 15:26
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
…rch#112418)

This function repeatedly flattens and unflattens the `args, kwargs` pair so we
get a quite significant perf improvement from saving the `flat_args` and
operating directly on those. I see a 15% improvement in dispatch for
`empty_strided`.
Pull Request resolved: pytorch#112418
Approved by: https://github.com/lezcano
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023
`ShapeEnv` has tons of functionallity that is conditioned on this
`translation_validation_enabled()` check, to the point where 8% of
time in `empty_strided` is spent just in that function.

However, it doesn't really make sense for the value of
`translation_validation_enabled()` to change throughout the life of a `ShapeEnv`
so we might as well run the check once and store it in the `ShapeEnv`.
Pull Request resolved: pytorch#112493
Approved by: https://github.com/lezcano
ghstack dependencies: pytorch#112418
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
…rch#112418)

This function repeatedly flattens and unflattens the `args, kwargs` pair so we
get a quite significant perf improvement from saving the `flat_args` and
operating directly on those. I see a 15% improvement in dispatch for
`empty_strided`.
Pull Request resolved: pytorch#112418
Approved by: https://github.com/lezcano
Skylion007 pushed a commit to Skylion007/pytorch that referenced this pull request Nov 14, 2023
`ShapeEnv` has tons of functionallity that is conditioned on this
`translation_validation_enabled()` check, to the point where 8% of
time in `empty_strided` is spent just in that function.

However, it doesn't really make sense for the value of
`translation_validation_enabled()` to change throughout the life of a `ShapeEnv`
so we might as well run the check once and store it in the `ShapeEnv`.
Pull Request resolved: pytorch#112493
Approved by: https://github.com/lezcano
ghstack dependencies: pytorch#112418
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants