[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy Pattern #81501

davidchencsl · 2022-07-14T20:48:52Z

Stack from ghstack (oldest at bottom):

[TorchTidy] Add pattern to detect if bias is enabled in conv2d followed by batchnorm2d #81941
[TorchTidy] Add pattern to detect if set_to_none is set in zero_grad() #81921
[TorchTidy] Add Pattern to detect Synchronous Data Loader #81740
[TorchTidy] Add pattern to detect if single-tensor implementation optimizer is used #81733
-> [Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy Pattern #81501

Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine.

Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py

Differential Revision: D37894566

…Pattern [ghstack-poisoned]

facebook-github-bot · 2022-07-14T20:48:58Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81501
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 364e9a0 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…Pattern ghstack-source-id: 298efa2 Pull Request resolved: #81501

… CUDA Copy Pattern" Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py [ghstack-poisoned]

davidchencsl · 2022-07-15T20:07:46Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… CUDA Copy Pattern" Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Differential Revision: [D37894566](https://our.internmc.facebook.com/intern/diff/D37894566) [ghstack-poisoned]

davidchencsl · 2022-07-15T20:17:07Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… CUDA Copy Pattern" Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Differential Revision: [D37894566](https://our.internmc.facebook.com/intern/diff/D37894566) [ghstack-poisoned]

…Pattern ghstack-source-id: 4a3e88a Pull Request resolved: #81501

davidchencsl · 2022-07-15T20:41:55Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

… CUDA Copy Pattern" Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Differential Revision: [D37894566](https://our.internmc.facebook.com/intern/diff/D37894566) [ghstack-poisoned]

…Pattern ghstack-source-id: 7788b09 Pull Request resolved: #81501

davidchencsl · 2022-07-18T22:46:44Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

robieta

Overall looks good. The only thing that I would say is report_all_anti_patterns should take should_benchmark: bool = False as an argument and plumb it through. It can add a lot of time to the analysis, so we want users to opt into it. (At some point TorchTidy might be sophisticated enough to pick an appropriate subset to benchmark, but that's a long way off.)

… CUDA Copy Pattern" Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Differential Revision: [D37894566](https://our.internmc.facebook.com/intern/diff/D37894566) [ghstack-poisoned]

robieta

LGTM

… CUDA Copy Pattern" Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Differential Revision: [D37894566](https://our.internmc.facebook.com/intern/diff/D37894566) [ghstack-poisoned]

davidchencsl · 2022-07-22T19:15:44Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

davidchencsl · 2022-07-22T22:18:26Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

davidchencsl · 2022-07-22T23:25:00Z

@pytorchbot merge

pytorchmergebot · 2022-07-22T23:26:22Z

@pytorchbot successfully started a merge job. Check the current status here

github-actions · 2022-07-22T23:28:16Z

Hey @davidchencsl.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

davidchencsl · 2022-07-25T20:53:32Z

@davidchencsl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…Pattern (#81501) (#81501) Summary: The main idea is that we can run some baseline benchmarks after we are done matching the events. This gives us ability to accurate measure speed gain because system performance varies from machine to machine. Pull Request resolved: #81501 Approved by: https://github.com/robieta Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/64c6387c0ff82d49a5bfdcae579b522ae830c2c8 Test plan from GitHub: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Original Phabricator Test Plan: I did some manually testing on all the models in torchbench, as well as added a simple test in test_profiler.py Reviewed By: robieta Differential Revision: D37894566 Pulled By: davidchencsl fbshipit-source-id: 3e7adcf9b647d02cfad28772cf72fe08da2c6f93

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy …

1588541

…Pattern [ghstack-poisoned]

This was referenced Jul 14, 2022

Add pattern to detect for loop indexing into tensor #81056

Closed

[Profiler] Add pattern to detect if TF32 is available but not used #81273

Closed

facebook-github-bot added the cla signed label Jul 14, 2022

davidchencsl added a commit that referenced this pull request Jul 14, 2022

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy …

502ccee

…Pattern ghstack-source-id: 298efa2 Pull Request resolved: #81501

davidchencsl requested a review from robieta July 14, 2022 20:49

davidchencsl added a commit that referenced this pull request Jul 15, 2022

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy …

67ef796

…Pattern ghstack-source-id: 4a3e88a Pull Request resolved: #81501

davidchencsl requested a review from chaekit July 18, 2022 18:03

davidchencsl added a commit that referenced this pull request Jul 18, 2022

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy …

79f462c

…Pattern ghstack-source-id: 7788b09 Pull Request resolved: #81501

robieta reviewed Jul 22, 2022

View reviewed changes

robieta approved these changes Jul 22, 2022

View reviewed changes

pytorchmergebot added the Merged label Jul 22, 2022

pytorchmergebot closed this in 64c6387 Jul 22, 2022

facebook-github-bot deleted the gh/davidchencsl/21/head branch July 26, 2022 14:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy Pattern #81501

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy Pattern #81501

Uh oh!

davidchencsl commented Jul 14, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 14, 2022 •

edited

Loading

Uh oh!

davidchencsl commented Jul 15, 2022

Uh oh!

davidchencsl commented Jul 15, 2022

Uh oh!

davidchencsl commented Jul 15, 2022

Uh oh!

davidchencsl commented Jul 18, 2022

Uh oh!

robieta left a comment

Uh oh!

robieta left a comment

Uh oh!

davidchencsl commented Jul 22, 2022

Uh oh!

davidchencsl commented Jul 22, 2022

Uh oh!

davidchencsl commented Jul 22, 2022

Uh oh!

pytorchmergebot commented Jul 22, 2022

Uh oh!

github-actions bot commented Jul 22, 2022

Uh oh!

davidchencsl commented Jul 25, 2022

Uh oh!

Uh oh!

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy Pattern #81501

[Profiler] Add speedup estimate for FP32 pattern and Extra CUDA Copy Pattern #81501

Uh oh!

Conversation

davidchencsl commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

davidchencsl commented Jul 15, 2022

Uh oh!

davidchencsl commented Jul 15, 2022

Uh oh!

davidchencsl commented Jul 15, 2022

Uh oh!

davidchencsl commented Jul 18, 2022

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

davidchencsl commented Jul 22, 2022

Uh oh!

davidchencsl commented Jul 22, 2022

Uh oh!

davidchencsl commented Jul 22, 2022

Uh oh!

pytorchmergebot commented Jul 22, 2022

Uh oh!

github-actions bot commented Jul 22, 2022

Uh oh!

davidchencsl commented Jul 25, 2022

Uh oh!

Uh oh!

davidchencsl commented Jul 14, 2022 •

edited

Loading

facebook-github-bot commented Jul 14, 2022 •

edited

Loading