[AOTAutograd] Use _set_grad_enabled instead of no_grad #128183

peterbell10 · 2024-06-07T01:52:19Z

Stack from ghstack (oldest at bottom):

This saves ~1us of overhead from each inductor graph call.

[ghstack-poisoned]

pytorch-bot · 2024-06-07T01:52:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128183

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rebase your PRs: Unstable CUDA signal in CI caused by cudnn 9 update

✅ You can merge normally! (3 Unrelated Failures)

As of commit 6306c3f with merge base 9554300 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
gluon_inception_v3
inductor-periodic / cuda12.4-py3.10-gcc9-sm86 / test (dynamic_inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
gluon_inception_v3

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal, unstable) (gh) (#126993)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

This saves ~1us of overhead from each inductor graph call. [ghstack-poisoned]

lezcano · 2024-06-08T12:27:25Z

torch/_functorch/_aot_autograd/runtime_wrappers.py

                    t._dynamo_weak_dynamic_indices = o.dynamic_dims.copy()
        if runtime_metadata.grad_enabled_mutation is not None:
-            torch.set_grad_enabled(runtime_metadata.grad_enabled_mutation)
+            torch._C._set_grad_enabled(runtime_metadata.grad_enabled_mutation)


is there so much overhead calling torch.set_grad_enabled vs calling the C API directly?

In [1]: import torch ...: %timeit torch.set_grad_enabled(False) ...: %timeit torch._C._set_grad_enabled(False) 536 ns ± 3.02 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) 217 ns ± 1.28 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

It's 2.5 x slower, mostly because torch.set_grad_enabled is a context manager object that pretends to be a normal function.

the wonders of python

This saves ~1us of overhead from each inductor graph call. ghstack-source-id: 217cfd7 Pull Request resolved: pytorch#128183

This saves ~1us of overhead from each inductor graph call. ghstack-source-id: a6dd843 Pull Request resolved: pytorch#128183

`gen_alias_from_base` spends about ~0.5 us in this import statement, which is called for each view in the graph output. Pull Request resolved: #128184 Approved by: https://github.com/lezcano ghstack dependencies: #128183

Going through the dispatcher + pybind11 + torch.ops adds about 2 us overhead per call compared to `PyArgParser`. Note that views of inputs are reconstructed by AOTAutograd before being returned to the python code, so dispatching for autograd's sake shouldn't be required here. Pull Request resolved: #128185 Approved by: https://github.com/lezcano ghstack dependencies: #128183, #128184

… from C++ (#128187) Marginal overhead reduction when calling through the `torch.ops` API. Pull Request resolved: #128187 Approved by: https://github.com/lezcano ghstack dependencies: #128183, #128184, #128185

This saves ~1us of overhead from each inductor graph call. Pull Request resolved: pytorch#128183 Approved by: https://github.com/lezcano

…h#128184) `gen_alias_from_base` spends about ~0.5 us in this import statement, which is called for each view in the graph output. Pull Request resolved: pytorch#128184 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#128183

…8185) Going through the dispatcher + pybind11 + torch.ops adds about 2 us overhead per call compared to `PyArgParser`. Note that views of inputs are reconstructed by AOTAutograd before being returned to the python code, so dispatching for autograd's sake shouldn't be required here. Pull Request resolved: pytorch#128185 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#128183, pytorch#128184

… from C++ (pytorch#128187) Marginal overhead reduction when calling through the `torch.ops` API. Pull Request resolved: pytorch#128187 Approved by: https://github.com/lezcano ghstack dependencies: pytorch#128183, pytorch#128184, pytorch#128185

Update

9f67754

[ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Jun 7, 2024

pytorchbot added the open source label Jun 7, 2024

peterbell10 added 2 commits June 7, 2024 03:21

Update

0ded13a

[ghstack-poisoned]

Update on "[AOTAutograd] Use _set_grad_enabled instead of no_grad"

6306c3f

This saves ~1us of overhead from each inductor graph call. [ghstack-poisoned]

peterbell10 marked this pull request as ready for review June 7, 2024 17:04

peterbell10 requested a review from lezcano June 7, 2024 17:04

lezcano approved these changes Jun 8, 2024

View reviewed changes

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Jun 9, 2024

[AOTAutograd] Use _set_grad_enabled instead of no_grad

5a6990c

This saves ~1us of overhead from each inductor graph call. ghstack-source-id: 217cfd7 Pull Request resolved: pytorch#128183

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Jun 9, 2024

[AOTAutograd] Use _set_grad_enabled instead of no_grad

7fefb95

This saves ~1us of overhead from each inductor graph call. ghstack-source-id: a6dd843 Pull Request resolved: pytorch#128183

pytorchmergebot closed this in 55b2a0a Jun 9, 2024

pytorchmergebot added the Merged label Jun 9, 2024

github-actions bot deleted the gh/peterbell10/736/head branch July 10, 2024 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AOTAutograd] Use _set_grad_enabled instead of no_grad #128183

[AOTAutograd] Use _set_grad_enabled instead of no_grad #128183

Uh oh!

peterbell10 commented Jun 7, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 7, 2024 •

edited

Loading

Uh oh!

lezcano Jun 8, 2024

Uh oh!

peterbell10 Jun 9, 2024

Uh oh!

lezcano Jun 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AOTAutograd] Use _set_grad_enabled instead of no_grad #128183

[AOTAutograd] Use _set_grad_enabled instead of no_grad #128183

Uh oh!

Conversation

peterbell10 commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128183

❗ 1 Active SEVs

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

lezcano Jun 8, 2024

Choose a reason for hiding this comment

Uh oh!

peterbell10 Jun 9, 2024

Choose a reason for hiding this comment

Uh oh!

lezcano Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

peterbell10 commented Jun 7, 2024 •

edited

Loading

pytorch-bot bot commented Jun 7, 2024 •

edited

Loading