quant: switch observers to use min_max #42957

vkuzo · 2020-08-13T00:52:26Z

Stack from ghstack:

quant: switch observers to use min_max #42957 quant: switch observers to use min_max

Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time. We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except HistogramObserver.

Test Plan:

CI for correctness

performance:

cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23093995

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e0dedf621a555bd6f821427299ccce63618ac8d0 Pull Request resolved: #42957

dr-ci · 2020-08-13T06:47:03Z

💊 CI failures summary and remediations

As of commit 6a60cbc (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: Codecov - codecov/patch

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.7-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 31 times.

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks for all observers except `HistogramObserver`. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu /* * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/ * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/ * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/ * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/ */ ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995) [ghstack-poisoned]

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 721eb176284cae1bd1aeb82dba9deaf37662c502 Pull Request resolved: #42957

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks for all observers except `HistogramObserver`. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu /* * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/ * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/ * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/ * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/ */ ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995) [ghstack-poisoned]

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 673b854bc2458171e806d50fa020a584b9c1d7ae Pull Request resolved: #42957

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks for all observers except `HistogramObserver`. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu /* * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/ * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/ * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/ * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/ */ ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995) [ghstack-poisoned]

Summary: Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 056222595367143386a75c0218fdf33fb5b7929f Pull Request resolved: #42957

codecov · 2020-09-05T03:38:04Z

Codecov Report

Merging #42957 into gh/vkuzo/123/base will increase coverage by 0.03%.
The diff coverage is 67.54%.

@@                  Coverage Diff                  @@
##           gh/vkuzo/123/base   #42957      +/-   ##
=====================================================
+ Coverage              69.32%   69.35%   +0.03%     
=====================================================
  Files                    381      381              
  Lines                  47190    47323     +133     
=====================================================
+ Hits                   32713    32822     +109     
- Misses                 14477    14501      +24

Impacted Files	Coverage Δ
torch/_classes.py	`87.50% <0.00%> (ø)`
torch/jit/_fuser.py	`32.60% <0.00%> (ø)`
torch/jit/_serialization.py	`85.71% <0.00%> (ø)`
torch/jit/supported_ops.py	`0.00% <0.00%> (ø)`
torch/jit/unsupported_tensor_ops.py	`0.00% <0.00%> (ø)`
torch/nn/modules/_functions.py	`63.30% <0.00%> (ø)`
torch/nn/parallel/distributed.py	`42.53% <ø> (ø)`
torch/nn/qat/__init__.py	`100.00% <ø> (ø)`
torch/nn/qat/modules/conv.py	`100.00% <ø> (ø)`
torch/nn/qat/modules/linear.py	`100.00% <ø> (ø)`
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 307f9e0...6a60cbc. Read the comment docs.

facebook-github-bot · 2020-09-08T22:16:45Z

This pull request has been merged in fd8e206.

This was referenced Aug 13, 2020

min_max kernel: add CUDA #42868

Closed

_min_max_val.dim: CPU implementation #42894

Closed

_min_max.dim: CUDA implementation #42943

Closed

quant bench: update observer configs #42956

Closed

vkuzo requested review from raghuramank100, jerryzh168, supriyar and z-a-f August 13, 2020 00:56

supriyar approved these changes Aug 13, 2020

View reviewed changes

vkuzo mentioned this pull request Sep 2, 2020

rename _min_max to _aminmax #44001

Closed

facebook-github-bot closed this in fd8e206 Sep 8, 2020

facebook-github-bot added the merged label Sep 8, 2020

facebook-github-bot deleted the gh/vkuzo/123/head branch September 12, 2020 14:17

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quant: switch observers to use min_max #42957

quant: switch observers to use min_max #42957

vkuzo commented Aug 13, 2020 •

edited

Loading

dr-ci bot commented Aug 13, 2020 •

edited

Loading

codecov bot commented Sep 5, 2020 •

edited

Loading

facebook-github-bot commented Sep 8, 2020

quant: switch observers to use min_max #42957

quant: switch observers to use min_max #42957

Conversation

vkuzo commented Aug 13, 2020 • edited Loading

dr-ci bot commented Aug 13, 2020 • edited Loading

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

ci.pytorch.org: 1 failed

codecov bot commented Sep 5, 2020 • edited Loading

Codecov Report

facebook-github-bot commented Sep 8, 2020

vkuzo commented Aug 13, 2020 •

edited

Loading

dr-ci bot commented Aug 13, 2020 •

edited

Loading

codecov bot commented Sep 5, 2020 •

edited

Loading