[dynamo] "TorchDynamo Cache Lookup" event: use C++ api #108436

davidberard98 · 2023-09-01T18:34:02Z

Stack from ghstack (oldest at bottom):

Background: "TorchDynamo Cache Lookup" events appear in traces to indicate a dynamo cache lookup; it's useful to check when cache lookups are taking a long time. To add a profiler event, one can use the torch.profiler.record_function context manager, or the C++ equivalent. Previously, the python version was used; first, when the profiler was enabled, callbacks for record_function_enter and record_function_exit were registered; then those would be called before and after every cache lookup.

This PR: Instead of calling the python bindings for torch.profiler.record_function, directly call the C++ implementation. This simplifies a lot of the code for binding C/C++. It also improves performance; previously there was a lot of overhead in the "TorchDynamo Cache Lookup" event, making the event artificially take a long time. After this change the events now appear shorter, because there's less overhead in starting/stopping the event: in other words, the profiler no longer distorts the results as much.

Performance results:
I ran using the script below on a cpu-only 1.6GHz machine. I report the median time (from 100 measurements) of a "TorchDynamo Cache Lookup" event before and after this PR. I think it is reasonable to consider the difference to be due to a reduction in overhead.

Benchmarking script

def fn(x, y):
    return (x * y).relu()

a, b = [torch.rand((4, 4), requires_grad=True) for _ in range(2)]

opt_fn = torch.compile(fn)

opt_fn(a, b)
opt_fn(a, b)

with torch.profiler.profile() as prof:
    opt_fn(a, b)

Median before PR: 198-228 us (median of 100, measured 5 times)
Median after PR: 27us

cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov

WIP [ghstack-poisoned]

pytorch-bot · 2023-09-01T18:34:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108436

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1e296f3 with merge base db63bf3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

WIP ghstack-source-id: 1880fc8 Pull Request resolved: #108436

WIP cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov [ghstack-poisoned]

**Background**: "TorchDynamo Cache Lookup" events appear in traces to indicate a dynamo cache lookup; it's useful to check when cache lookups are taking a long time. To add a profiler event, one can use the `torch.profiler.record_function` context manager, or the C++ equivalent. Previously, the python version was used; first, when the profiler was enabled, callbacks for record_function_enter and record_function_exit were registered; then those would be called before and after every cache lookup. **This PR**: Instead of calling the python bindings for `torch.profiler.record_function`, directly call the C++ implementation. This simplifies a lot of the code for binding C/C++. It also improves performance; previously there was a lot of overhead in the "TorchDynamo Cache Lookup" event, making the event artificially take a long time. After this change the events now appear shorter, because there's less overhead in starting/stopping the event: in other words, the profiler no longer distorts the results as much. **Performance results**: I ran using the script below on a cpu-only 1.6GHz machine. I report the median time (from 100 measurements) of a "TorchDynamo Cache Lookup" event before and after this PR. I think it is reasonable to consider the difference to be due to a reduction in overhead. <details> <summary>Benchmarking script</summary> ```python def fn(x, y): return (x * y).relu() a, b = [torch.rand((4, 4), requires_grad=True) for _ in range(2)] opt_fn = torch.compile(fn) opt_fn(a, b) opt_fn(a, b) with torch.profiler.profile() as prof: opt_fn(a, b) ``` </details> Median before PR: 198-228 us (median of 100, measured 5 times) Median after PR: 27us cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov [ghstack-poisoned]

anijain2305 · 2023-09-02T00:15:34Z

torch/csrc/dynamo/cpp_shim.h

+typedef struct _PytorchRecordFunctionState {
+  void* guard;
+} _PytorchRecordFunctionState;
+


Can you just declare the struct here and forward declare the guard part in cpp_shim.cpp?

struct _PytorchRecordFunctionState;

@anijain2305 I had to change the return type to a _PytorchRecordFunctionState*. Could you do a sanity check?

anijain2305

LGTM. Just a minor comment on the shim wrapper to see if void* guard can be avoid in header file.

**Background**: "TorchDynamo Cache Lookup" events appear in traces to indicate a dynamo cache lookup; it's useful to check when cache lookups are taking a long time. To add a profiler event, one can use the `torch.profiler.record_function` context manager, or the C++ equivalent. Previously, the python version was used; first, when the profiler was enabled, callbacks for record_function_enter and record_function_exit were registered; then those would be called before and after every cache lookup. **This PR**: Instead of calling the python bindings for `torch.profiler.record_function`, directly call the C++ implementation. This simplifies a lot of the code for binding C/C++. It also improves performance; previously there was a lot of overhead in the "TorchDynamo Cache Lookup" event, making the event artificially take a long time. After this change the events now appear shorter, because there's less overhead in starting/stopping the event: in other words, the profiler no longer distorts the results as much. **Performance results**: I ran using the script below on a cpu-only 1.6GHz machine. I report the median time (from 100 measurements) of a "TorchDynamo Cache Lookup" event before and after this PR. I think it is reasonable to consider the difference to be due to a reduction in overhead. <details> <summary>Benchmarking script</summary> ```python def fn(x, y): return (x * y).relu() a, b = [torch.rand((4, 4), requires_grad=True) for _ in range(2)] opt_fn = torch.compile(fn) opt_fn(a, b) opt_fn(a, b) with torch.profiler.profile() as prof: opt_fn(a, b) ``` </details> Median before PR: 198-228 us (median of 100, measured 5 times) Median after PR: 27us cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov [ghstack-poisoned]

ghstack-source-id: 4656e9d Pull Request resolved: #108436

davidberard98 · 2023-09-04T02:06:06Z

@pytorchbot merge

pytorchmergebot · 2023-09-04T02:07:59Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

davidberard98 · 2023-09-04T02:13:37Z

@pytorchbot merge

pytorchmergebot · 2023-09-04T02:15:27Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[dynamo] TorchDynamo Cache Lookup - use C++ api

8088163

WIP [ghstack-poisoned]

davidberard98 added a commit that referenced this pull request Sep 1, 2023

[dynamo] TorchDynamo Cache Lookup - use C++ api

6aa224d

WIP ghstack-source-id: 1880fc8 Pull Request resolved: #108436

github-actions bot added the module: dynamo label Sep 1, 2023

Update on "[dynamo] TorchDynamo Cache Lookup - use C++ api"

46250d3

WIP cc voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx chenyang78 aakhundov [ghstack-poisoned]

davidberard98 changed the title ~~[dynamo] TorchDynamo Cache Lookup - use C++ api~~ [dynamo] "TorchDynamo Cache Lookup" event: use C++ api Sep 1, 2023

davidberard98 requested a review from wconstab September 1, 2023 23:33

davidberard98 requested a review from anijain2305 September 1, 2023 23:33

davidberard98 marked this pull request as ready for review September 1, 2023 23:33

davidberard98 requested a review from aaronenyeshi as a code owner September 1, 2023 23:33

davidberard98 requested a review from jansel September 1, 2023 23:43

anijain2305 reviewed Sep 2, 2023

View reviewed changes

anijain2305 approved these changes Sep 2, 2023

View reviewed changes

davidberard98 mentioned this pull request Sep 2, 2023

[dynamo] Add "Torch-Compiled Region" profiler event #108462

Closed

davidberard98 added a commit that referenced this pull request Sep 2, 2023

[dynamo] TorchDynamo Cache Lookup - use C++ api

ee979fe

ghstack-source-id: 4656e9d Pull Request resolved: #108436

jansel approved these changes Sep 2, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 4, 2023

pytorchmergebot added the merging label Sep 4, 2023

pytorchmergebot removed the merging label Sep 4, 2023

davidberard98 added the topic: not user facing topic category label Sep 4, 2023

pytorchmergebot added the merging label Sep 4, 2023

pytorchmergebot added the Merged label Sep 4, 2023

pytorchmergebot removed the merging label Sep 4, 2023

pytorchmergebot closed this in 06b1737 Sep 4, 2023

facebook-github-bot deleted the gh/davidberard98/228/head branch September 7, 2023 14:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[dynamo] "TorchDynamo Cache Lookup" event: use C++ api #108436

[dynamo] "TorchDynamo Cache Lookup" event: use C++ api #108436

Uh oh!

davidberard98 commented Sep 1, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 1, 2023 •

edited

Loading

Uh oh!

anijain2305 Sep 2, 2023

Uh oh!

davidberard98 Sep 2, 2023

Uh oh!

anijain2305 left a comment

Uh oh!

davidberard98 commented Sep 4, 2023

Uh oh!

pytorchmergebot commented Sep 4, 2023

Uh oh!

davidberard98 commented Sep 4, 2023

Uh oh!

pytorchmergebot commented Sep 4, 2023

Uh oh!

Uh oh!

[dynamo] "TorchDynamo Cache Lookup" event: use C++ api #108436

[dynamo] "TorchDynamo Cache Lookup" event: use C++ api #108436

Uh oh!

Conversation

davidberard98 commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108436

✅ No Failures

Uh oh!

anijain2305 Sep 2, 2023

Choose a reason for hiding this comment

Uh oh!

davidberard98 Sep 2, 2023

Choose a reason for hiding this comment

Uh oh!

anijain2305 left a comment

Choose a reason for hiding this comment

Uh oh!

davidberard98 commented Sep 4, 2023

Uh oh!

pytorchmergebot commented Sep 4, 2023

Merge failed

Uh oh!

davidberard98 commented Sep 4, 2023

Uh oh!

pytorchmergebot commented Sep 4, 2023

Merge started

Uh oh!

Uh oh!

davidberard98 commented Sep 1, 2023 •

edited

Loading

pytorch-bot bot commented Sep 1, 2023 •

edited

Loading