-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Adding adaptive timing method on torch Timer #44219 #44288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💊 CI failures summary and remediationsAs of commit 980d5ee (more details on the Dr. CI page):
🕵️ 8 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Codecov Report
@@ Coverage Diff @@
## master #44288 +/- ##
==========================================
- Coverage 69.24% 69.22% -0.02%
==========================================
Files 381 381
Lines 47573 47598 +25
==========================================
+ Hits 32943 32951 +8
- Misses 14630 14647 +17
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add unit tests: https://github.com/pytorch/pytorch/blob/master/test/test_utils.py#L606
It would also be good to run some tests on workloads of various size and variation to see how consistent the results of adaptive_autorange are.
| raise NotImplementedError("See `Timer.blocked_autorange.`") | ||
|
|
||
| def blocked_autorange(self, callback=None, min_run_time=0.2): | ||
| def adaptive_autorange(self, threshold=0.1, max_run_time=10.0, callback=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replicates a lot of blocked_autorange. Is it possible to factor some of the shared logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored the logic into a common function.
| times = [] | ||
| rounds = 0 | ||
| with common.set_torch_threads(self._num_threads): | ||
| while time.time() - start_time < max_run_time: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also include time spent in callback, _construct_measurement, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see I misunderstood. I read this min_time as "Get back to be with answer in this time" as opposed to "collect this much signal"
|
Added unit tests to validate this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. This generally looks reasonable (modulo a few nits); however I'd still like to see some examples of how it performs in practice. (We can sync if you need help with that.)
test/test_utils.py
Outdated
| ) | ||
| small = timer.adaptive_autorange(min_run_time=0.1).median | ||
| timer = benchmark_utils.Timer( | ||
| stmt="torch.sum(torch.ones((100,100)))", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this (500, 100)? Despite being 100x apart, this one only takes 1.5x as much time due to the (10, 10) case being overhead bound. And that should still leave plenty of space between medium and large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| break | ||
| return times | ||
|
|
||
| def adaptive_autorange(self, threshold=0.1, max_run_time=None, callback=None, min_run_time=0.1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_run_time should have some reasonable default (e.g. 10 seconds) so that the default behavior can't lead to infinite loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
Example usage includes:
Small size, so selects 1000 measure per run
Larger, so goes down to 10 runs per measure
Go to a low threshold; (times out at approximately iqr/mean = 0.08882)
Set for a higher threshold (0.5) results in many fewer runs;
Try 1000 times to see if we get a warning:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to submit once you've fixed the lint/CI issues. I think you need to rebase/merge. Pro-tip: use viable/strict or fbcode/warm rather than master and you'll be less likely to hit unrelated breakages.
Congrats on your first PR to PyTorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Rebased and posted as #44607. |
Summary: Fixes #44219 Rebasing #44288 and fixing the git history. This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement. Pull Request resolved: #44607 Test Plan: There are unit tests, and we manually tested using Examples posted in git. Reviewed By: robieta Differential Revision: D23671208 Pulled By: bitfort fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12
Summary: Fixes #44219 Rebasing #44288 and fixing the git history. This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement. Pull Request resolved: #44607 Test Plan: There are unit tests, and we manually tested using Examples posted in git. Reviewed By: robieta Differential Revision: D23671208 Pulled By: bitfort fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12
Fixes #{issue number}
This adds adaptive timing for benchmarking. This auto-determines how many times to run the benchmark to get an accurate measurement.