Skip to content

Conversation

@bitfort
Copy link

@bitfort bitfort commented Sep 8, 2020

Fixes #{issue number}

This adds adaptive timing for benchmarking. This auto-determines how many times to run the benchmark to get an accurate measurement.

@bitfort bitfort requested a review from robieta September 8, 2020 02:27
@dr-ci
Copy link

dr-ci bot commented Sep 8, 2020

💊 CI failures summary and remediations

As of commit 980d5ee (more details on the Dr. CI page):


  • 10/11 failures possibly* introduced in this PR
    • 2/10 non-CircleCI failure(s)
  • 1/11 broken upstream at merge base 10dd25d on Sep 07 from 10:29am to 3:52pm PDT (2 commits; 10dd25d - 5d748e6)

🕵️ 8 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/8)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/ios_definitions.py 
Auto-merging .circleci/cimodel/data/simple/ios_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_definitions.py 
Auto-merging .circleci/cimodel/data/binary_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test (2/8)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 11 03:08:46 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:22: error: expected ‘;’ before ‘}’ token\n 2 | int main() { return 0 }\n | ^~\n | ;\n" }
Sep 11 03:08:46     raise RuntimeError(err_message) 
Sep 11 03:08:46 RuntimeError: test_type_hints failed! 
Sep 11 03:08:46  
Sep 11 03:08:46 real	39m24.378s 
Sep 11 03:08:46 user	59m41.455s 
Sep 11 03:08:46 sys	1m54.741s 
Sep 11 03:08:46 + cleanup 
Sep 11 03:08:46 + retcode=1 
Sep 11 03:08:46 + set +x 
Sep 11 03:08:46 =================== sccache compilation log =================== 
Sep 11 03:08:46 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:22: error: expected ‘;’ before ‘}’ token\n    2 | int main() { return 0 }\n      |                      ^~\n      |                      ;\n" } 
Sep 11 03:08:46  
Sep 11 03:08:46 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Sep 11 03:08:46 Compile requests                 0 
Sep 11 03:08:46 Compile requests executed        0 
Sep 11 03:08:46 Cache hits                       0 
Sep 11 03:08:46 Cache misses                     0 
Sep 11 03:08:46 Cache timeouts                   0 
Sep 11 03:08:46 Cache read errors                0 
Sep 11 03:08:46 Forced recaches                  0 
Sep 11 03:08:46 Cache write errors               0 

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (3/8)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/ios_definitions.py 
Auto-merging .circleci/cimodel/data/simple/ios_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_definitions.py 
Auto-merging .circleci/cimodel/data/binary_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_macos_10_13_py3_test (4/8)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Sep 10 19:15:36 NameError: name 'loop' is not defined
Sep 10 19:15:36 TypeError: 'bool' object is not callable 
Sep 10 19:15:36  
Sep 10 19:15:36 ====================================================================== 
Sep 10 19:15:36 ERROR [0.007s]: test_timer (__main__.TestBenchmarkUtils) 
Sep 10 19:15:36 ---------------------------------------------------------------------- 
Sep 10 19:15:36 Traceback (most recent call last): 
Sep 10 19:15:36   File "test_utils.py", line 611, in test_timer 
Sep 10 19:15:36     median = timer.blocked_autorange(min_run_time=0.1).median 
Sep 10 19:15:36   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/utils/_benchmark/utils/timer.py", line 132, in blocked_autorange 
Sep 10 19:15:36     times = self._threaded_measurement_loop(loop, min_run_time=min_run_time, max_run_time=None, callback=callback) 
Sep 10 19:15:36 NameError: name 'loop' is not defined 
Sep 10 19:15:36  
Sep 10 19:15:36 ---------------------------------------------------------------------- 
Sep 10 19:15:36 Ran 35 tests in 16.198s 
Sep 10 19:15:36  
Sep 10 19:15:36 FAILED (errors=2, skipped=3) 
Sep 10 19:15:36  
Sep 10 19:15:36 Generating XML reports... 
Sep 10 19:15:36 Generated XML report: test-reports/python-unittest/TEST-TestBenchmarkUtils-20200910191520.xml 
Sep 10 19:15:36 Generated XML report: test-reports/python-unittest/TEST-TestBottleneck-20200910191520.xml 
Sep 10 19:15:36 Generated XML report: test-reports/python-unittest/TEST-TestCheckpoint-20200910191520.xml 

See CircleCI build pytorch_linux_bionic_py3_6_clang9_test (5/8)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 11 02:28:15 torch/utils/_benchmark/utils/timer.py:132: error: Argument "max_run_time" to "_threaded_measurement_loop" of "Timer" has incompatible type "None"; expected "float" [arg-type]
Sep 11 02:28:15 ====================================================================== 
Sep 11 02:28:15 FAIL [58.507s]: test_run_mypy (__main__.TestTypeHints) 
Sep 11 02:28:15 ---------------------------------------------------------------------- 
Sep 11 02:28:15 Traceback (most recent call last): 
Sep 11 02:28:15   File "test_type_hints.py", line 218, in test_run_mypy 
Sep 11 02:28:15     self.fail("mypy failed: {} {}".format(stdout, stderr)) 
Sep 11 02:28:15 AssertionError: mypy failed: torch/utils/_benchmark/utils/timer.py:90: error: Name 'number' is not defined  [name-defined] 
Sep 11 02:28:15 torch/utils/_benchmark/utils/timer.py:90: error: Name 'time_taken' is not defined  [name-defined] 
Sep 11 02:28:15 torch/utils/_benchmark/utils/timer.py:132: error: Missing positional argument "stop_hook" in call to "_threaded_measurement_loop" of "Timer"  [call-arg] 
Sep 11 02:28:15 torch/utils/_benchmark/utils/timer.py:132: error: Name 'loop' is not defined  [name-defined] 
Sep 11 02:28:15 torch/utils/_benchmark/utils/timer.py:132: error: Argument "max_run_time" to "_threaded_measurement_loop" of "Timer" has incompatible type "None"; expected "float"  [arg-type] 
Sep 11 02:28:15 Found 5 errors in 1 file (checked 1063 source files) 
Sep 11 02:28:15   
Sep 11 02:28:15  
Sep 11 02:28:15 ---------------------------------------------------------------------- 
Sep 11 02:28:15 Ran 4 tests in 95.669s 
Sep 11 02:28:15  
Sep 11 02:28:15 FAILED (failures=1) 
Sep 11 02:28:15  
Sep 11 02:28:15 Generating XML reports... 
Sep 11 02:28:15 Generated XML report: test-reports/python-unittest/TEST-TestTypeHints-20200911022639.xml 

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (6/8)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 11 02:19:43 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in
Sep 11 02:19:43     #7 0x55e8e8daf7eb in PyEval_EvalCode /tmp/build/80754af9/python_1588903631989/work/Python/ceval.c:731 
Sep 11 02:19:43     #8 0x55e8e8e2fe73 in run_mod /tmp/build/80754af9/python_1588903631989/work/Python/pythonrun.c:1025 
Sep 11 02:19:43     #9 0x55e8e8e2ff0c in PyRun_StringFlags /tmp/build/80754af9/python_1588903631989/work/Python/pythonrun.c:949 
Sep 11 02:19:43     #10 0x55e8e8e2ff6e in PyRun_SimpleStringFlags /tmp/build/80754af9/python_1588903631989/work/Python/pythonrun.c:445 
Sep 11 02:19:43     #11 0x55e8e8e33d72 in run_command /tmp/build/80754af9/python_1588903631989/work/Modules/main.c:301 
Sep 11 02:19:43     #12 0x55e8e8e33d72 in Py_Main /tmp/build/80754af9/python_1588903631989/work/Modules/main.c:749 
Sep 11 02:19:43     #13 0x55e8e8cfdf2d in main /tmp/build/80754af9/python_1588903631989/work/Programs/python.c:69 
Sep 11 02:19:43     #14 0x7f7a346f683f in __libc_start_main /build/glibc-e6zv40/glibc-2.23/csu/../csu/libc-start.c:291 
Sep 11 02:19:43     #15 0x55e8e8ddd27e in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103 
Sep 11 02:19:43  
Sep 11 02:19:43 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:11:3 in  
Sep 11 02:19:43 + retcode=1 
Sep 11 02:19:43 + set -e 
Sep 11 02:19:43 + return 1 
Sep 11 02:19:43 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX-* ]] 
Sep 11 02:19:43 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 == *-NO_AVX2-* ]] 
Sep 11 02:19:43 + '[' -n https://github.com/pytorch/pytorch/pull/44288 ']' 
Sep 11 02:19:43 + [[ pytorch-linux-xenial-py3-clang5-asan-test2 != *coverage* ]] 
Sep 11 02:19:43 ++ mktemp 
Sep 11 02:19:43 + DETERMINE_FROM=/tmp/tmp.b93J3jWgqf 
Sep 11 02:19:43 + file_diff_from_base /tmp/tmp.b93J3jWgqf 

See CircleCI build pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (7/8)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/config.yml 
Auto-merging .circleci/config.yml 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/ios_definitions.py 
Auto-merging .circleci/cimodel/data/simple/ios_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_definitions.py 
Auto-merging .circleci/cimodel/data/binary_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (8/8)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

NameError: name 'loop' is not defined
TypeError: 'bool' object is not callable 
 
====================================================================== 
ERROR [0.305s]: test_timer (__main__.TestBenchmarkUtils) 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
  File "test_utils.py", line 611, in test_timer 
    median = timer.blocked_autorange(min_run_time=0.1).median 
  File "C:\Users\circleci\project\build\win_tmp\build\torch\utils\_benchmark\utils\timer.py", line 132, in blocked_autorange 
    times = self._threaded_measurement_loop(loop, min_run_time=min_run_time, max_run_time=None, callback=callback) 
NameError: name 'loop' is not defined 
 
---------------------------------------------------------------------- 
Ran 35 tests in 30.133s 
 
FAILED (errors=2, skipped=3) 
 
Generating XML reports... 
Generated XML report: test-reports\python-unittest\TEST-TestBenchmarkUtils-20200911021930.xml 
Generated XML report: test-reports\python-unittest\TEST-TestBottleneck-20200911021930.xml 
Generated XML report: test-reports\python-unittest\TEST-TestCheckpoint-20200911021930.xml 

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 20 times.

@codecov
Copy link

codecov bot commented Sep 8, 2020

Codecov Report

Merging #44288 into master will decrease coverage by 0.01%.
The diff coverage is 43.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #44288      +/-   ##
==========================================
- Coverage   69.24%   69.22%   -0.02%     
==========================================
  Files         381      381              
  Lines       47573    47598      +25     
==========================================
+ Hits        32943    32951       +8     
- Misses      14630    14647      +17     
Impacted Files Coverage Δ
torch/utils/_benchmark/utils/timer.py 67.53% <37.03%> (-17.66%) ⬇️
torch/utils/_benchmark/utils/common.py 77.68% <100.00%> (+0.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10dd25d...980d5ee. Read the comment docs.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add unit tests: https://github.com/pytorch/pytorch/blob/master/test/test_utils.py#L606

It would also be good to run some tests on workloads of various size and variation to see how consistent the results of adaptive_autorange are.

raise NotImplementedError("See `Timer.blocked_autorange.`")

def blocked_autorange(self, callback=None, min_run_time=0.2):
def adaptive_autorange(self, threshold=0.1, max_run_time=10.0, callback=None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replicates a lot of blocked_autorange. Is it possible to factor some of the shared logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the logic into a common function.

times = []
rounds = 0
with common.set_torch_threads(self._num_threads):
while time.time() - start_time < max_run_time:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also include time spent in callback, _construct_measurement, etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see I misunderstood. I read this min_time as "Get back to be with answer in this time" as opposed to "collect this much signal"

@bitfort
Copy link
Author

bitfort commented Sep 10, 2020

Added unit tests to validate this.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. This generally looks reasonable (modulo a few nits); however I'd still like to see some examples of how it performs in practice. (We can sync if you need help with that.)

)
small = timer.adaptive_autorange(min_run_time=0.1).median
timer = benchmark_utils.Timer(
stmt="torch.sum(torch.ones((100,100)))",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this (500, 100)? Despite being 100x apart, this one only takes 1.5x as much time due to the (10, 10) case being overhead bound. And that should still leave plenty of space between medium and large.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

break
return times

def adaptive_autorange(self, threshold=0.1, max_run_time=None, callback=None, min_run_time=0.1):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_run_time should have some reasonable default (e.g. 10 seconds) so that the default behavior can't lead to infinite loops.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@bitfort bitfort requested a review from robieta September 11, 2020 00:24
@bitfort
Copy link
Author

bitfort commented Sep 11, 2020

Example usage includes:

import torch; from torch.utils._benchmark import Timer;

Small size, so selects 1000 measure per run

Timer(stmt='sum(x)', globals={'x': list(range(200))}).adaptive_autorange()
sum(x)
Median: 953.41 ns
IQR: 95.22 ns (911.68 to 1006.90)
569 measurements, 1000 runs per measurement, 1 thread

Larger, so goes down to 10 runs per measure

Timer(stmt='sum(x)', globals={'x': list(range(20000))}).adaptive_autorange()
sum(x)
Median: 83.23 us
IQR: 8.27 us (81.38 to 89.65)
916 measurements, 10 runs per measurement, 1 thread

Go to a low threshold; (times out at approximately iqr/mean = 0.08882)

Timer(stmt='sum(list(range(200000)))', num_threads=10).adaptive_autorange(threshold=0.01, max_run_time=30)
sum(list(range(200000)))
Median: 3.49 ms
IQR: 0.31 ms (3.34 to 3.65)
8418 measurements, 1 runs per measurement, 10 threads

Set for a higher threshold (0.5) results in many fewer runs;

Timer(stmt='sum(list(range(200000)))', num_threads=10).adaptive_autorange(threshold=0.5, max_run_time=30)
sum(list(range(200000)))
Median: 4.07 ms
IQR: 1.18 ms (3.41 to 4.59)
25 measurements, 1 runs per measurement, 10 threads
WARNING: Interquartile range is 28.9% of the median measurement.
This suggests significant environmental influence.

Try 1000 times to see if we get a warning:

sum([Timer(stmt='sum(list(range(200000)))', num_threads=10).adaptive_autorange(max_run_time=10).has_warnings for _ in range(1000)])
0

Copy link

@robieta robieta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to submit once you've fixed the lint/CI issues. I think you need to rebase/merge. Pro-tip: use viable/strict or fbcode/warm rather than master and you'll be less likely to hit unrelated breakages.

Congrats on your first PR to PyTorch.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitfort has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@bitfort
Copy link
Author

bitfort commented Sep 13, 2020

Rebased and posted as #44607.

@bitfort bitfort closed this Sep 13, 2020
facebook-github-bot pushed a commit that referenced this pull request Sep 14, 2020
Summary:
Fixes #44219

Rebasing #44288 and fixing the git history.

This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement.

Pull Request resolved: #44607

Test Plan: There are unit tests, and we manually tested using Examples posted in git.

Reviewed By: robieta

Differential Revision: D23671208

Pulled By: bitfort

fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12
@bitfort bitfort deleted the adaptive_timing_44219 branch September 14, 2020 04:24
xuzhao9 pushed a commit that referenced this pull request Sep 18, 2020
Summary:
Fixes #44219

Rebasing #44288 and fixing the git history.

This allows users to bencmark code without having to specify how long to run the benchmark. It runs the benchmark until the variance (IQR / Median) is low enough that we can be confident in the measurement.

Pull Request resolved: #44607

Test Plan: There are unit tests, and we manually tested using Examples posted in git.

Reviewed By: robieta

Differential Revision: D23671208

Pulled By: bitfort

fbshipit-source-id: d63184290b88b26fb81c2452e1ae701c7d513d12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants