Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quant: switch observers to use min_max #42957

Closed
wants to merge 6 commits into from

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Aug 13, 2020

Stack from ghstack:

Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time. We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except HistogramObserver.

Test Plan:

CI for correctness

performance:

cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23093995

Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Aug 13, 2020
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: e0dedf621a555bd6f821427299ccce63618ac8d0
Pull Request resolved: #42957
@dr-ci
Copy link

dr-ci bot commented Aug 13, 2020

💊 CI failures summary and remediations

As of commit 6a60cbc (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 31 times.

Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995)

[ghstack-poisoned]
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995)

[ghstack-poisoned]
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Sep 2, 2020
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 721eb176284cae1bd1aeb82dba9deaf37662c502
Pull Request resolved: #42957
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Sep 2, 2020
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 673b854bc2458171e806d50fa020a584b9c1d7ae
Pull Request resolved: #42957
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D23093995](https://our.internmc.facebook.com/intern/diff/D23093995)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Sep 5, 2020
Summary:

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks.

Test Plan:

CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 056222595367143386a75c0218fdf33fb5b7929f
Pull Request resolved: #42957
@codecov
Copy link

codecov bot commented Sep 5, 2020

Codecov Report

Merging #42957 into gh/vkuzo/123/base will increase coverage by 0.03%.
The diff coverage is 67.54%.

Impacted file tree graph

@@                  Coverage Diff                  @@
##           gh/vkuzo/123/base   #42957      +/-   ##
=====================================================
+ Coverage              69.32%   69.35%   +0.03%     
=====================================================
  Files                    381      381              
  Lines                  47190    47323     +133     
=====================================================
+ Hits                   32713    32822     +109     
- Misses                 14477    14501      +24     
Impacted Files Coverage Δ
torch/_classes.py 87.50% <0.00%> (ø)
torch/jit/_fuser.py 32.60% <0.00%> (ø)
torch/jit/_serialization.py 85.71% <0.00%> (ø)
torch/jit/supported_ops.py 0.00% <0.00%> (ø)
torch/jit/unsupported_tensor_ops.py 0.00% <0.00%> (ø)
torch/nn/modules/_functions.py 63.30% <0.00%> (ø)
torch/nn/parallel/distributed.py 42.53% <ø> (ø)
torch/nn/qat/__init__.py 100.00% <ø> (ø)
torch/nn/qat/modules/conv.py 100.00% <ø> (ø)
torch/nn/qat/modules/linear.py 100.00% <ø> (ø)
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 307f9e0...6a60cbc. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in fd8e206.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants