[quant][qat] Ensure observer respects device affinity #47514

supriyar · 2020-11-06T20:12:34Z

Stack from ghstack:

[quant][qat] Ensure observer respects device affinity #47514 [quant][qat] Ensure observer respects device affinity

Summary:

fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id)
in the calculate_qparams function

Fixes #46533

Test Plan:
python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D24800495

Summary: Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 15d3d50 Pull Request resolved: #47514

dr-ci · 2020-11-06T20:17:50Z

💊 CI failures summary and remediations

As of commit be9df93 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm3.8-py3.6

codecov.io: 1 failed

Failed: codecov/project

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 13 times.

supriyar · 2020-11-06T21:14:35Z

QAT Benchmark on MobilenetV2
Before change:

PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "229.25734519958496"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "522.6466655731201"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat1_obs_0_latency_ms", "unit": "ms", "value": "261.6147994995117"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat2_obs_0_bn_0_latency_ms", "unit": "ms", "value": "250.9593963623047"}
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_cuda_max_mem_mib", "unit": "MiB", "value": "246.51806640625"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_cuda_max_mem_mib", "unit": "MiB", "value": "314.82080078125"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat1_obs_0_cuda_max_mem_mib", "unit": "MiB", "value": "314.82080078125"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat2_obs_0_bn_0_cuda_max_mem_mib", "unit": "MiB", "value": "314.66650390625"}
/home/supriyar/local/fbsource/fbcode/buck-out/dev/gen/caffe2/caffe2/fb/high_perf_models/pytorch/mobilenetv2/mobilenetv2_bench#link-tree/torch/quantization/observer.py:123: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  reduce_range will be deprecated in a future release of PyTorch."
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "231.16016387939453"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "546.6287136077881"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat1_obs_0_latency_ms", "unit": "ms", "value": "282.5195789337158"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat2_obs_0_bn_0_latency_ms", "unit": "ms", "value": "268.3393955230713"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_cuda_max_mem_mib", "unit": "MiB", "value": "246.80126953125"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_cuda_max_mem_mib", "unit": "MiB", "value": "315.10400390625"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat1_obs_0_cuda_max_mem_mib", "unit": "MiB", "value": "315.10400390625"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat2_obs_0_bn_0_cuda_max_mem_mib", "unit": "MiB", "value": "314.94970703125"}

After change

PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "221.08221054077148"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "506.84404373168945"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat1_obs_0_latency_ms", "unit": "ms", "value": "261.29603385925293"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat2_obs_0_bn_0_latency_ms", "unit": "ms", "value": "239.6218776702881"}
PyTorchObserver {"type": "_", "metric": "qnnpack_fp_cuda_max_mem_mib", "unit": "MiB", "value": "246.51806640625"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_cuda_max_mem_mib", "unit": "MiB", "value": "314.82080078125"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat1_obs_0_cuda_max_mem_mib", "unit": "MiB", "value": "314.82080078125"}
PyTorchObserver {"type": "_", "metric": "qnnpack_qat2_obs_0_bn_0_cuda_max_mem_mib", "unit": "MiB", "value": "314.66650390625"}
/home/supriyar/local/fbsource/fbcode/buck-out/dev/gen/caffe2/caffe2/fb/high_perf_models/pytorch/mobilenetv2/mobilenetv2_bench#link-tree/torch/quantization/observer.py:123: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
  reduce_range will be deprecated in a future release of PyTorch."
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "214.65778350830078"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "502.2757053375244"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat1_obs_0_latency_ms", "unit": "ms", "value": "304.18944358825684"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat2_obs_0_bn_0_latency_ms", "unit": "ms", "value": "258.211612701416"}
PyTorchObserver {"type": "_", "metric": "fbgemm_fp_cuda_max_mem_mib", "unit": "MiB", "value": "246.80126953125"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_cuda_max_mem_mib", "unit": "MiB", "value": "315.10400390625"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat1_obs_0_cuda_max_mem_mib", "unit": "MiB", "value": "315.10400390625"}
PyTorchObserver {"type": "_", "metric": "fbgemm_qat2_obs_0_bn_0_cuda_max_mem_mib", "unit": "MiB", "value": "314.94970703125"}

vkuzo · 2020-11-06T22:16:48Z

wow, interesting, great catch

self.min_val = self.min_val.copy_(min_val).to(min_val.device)

is the self.min_val.copy_(min_val).to(min_val.device) code actually doing CUDA -> CPU -> CUDA? If yes, is there a way to remove that?

supriyar · 2020-11-06T23:07:37Z

wow, interesting, great catch

self.min_val = self.min_val.copy_(min_val).to(min_val.device)

is the self.min_val.copy_(min_val).to(min_val.device) code actually doing CUDA -> CPU -> CUDA? If yes, is there a way to remove that?

Hmm, it might. I think I can use y = x.cuda(device=x.device) instead to copy to same device if the input is a cuda tensor.

Update: It would appear that using x.cuda isn't scriptable and seems to map to the wrong function.
Repro

class M(torch.nn.Module):
            def __init__(self):
                super().__init__()
                self.x = torch.tensor(0)

            def forward(self, input):
                x = input.cuda(device=input.device)
                return x

        m = M().eval()
        input = torch.rand(5, device='cuda:1')
        print(m(input))
        script = torch.jit.script(m)

RuntimeError:

aten::cuda(Tensor(a) self) -> (Tensor(b|a)):
Keyword argument device unknown.
:
  File "/home/supriyar/pytorch/test/quantization/test_workflow_module.py", line 514
            def forward(self, input):
                x = input.cuda(device=input.device)
                    ~~~~~~~~~~ <--- HERE
                return x

I can instead to the to call first before the copy_ to make sure we move the tensor to CUDA before doing the COPY. @vkuzo thoughts on this approach?

torch/quantization/observer.py

vkuzo · 2020-11-07T00:15:06Z

I can instead to the to call first before the copy_ to make sure we move the tensor to CUDA before doing the COPY. @vkuzo thoughts on this approach?

that would make sense to me, and the performance metrics don't show a slowdown which is good.

Doesn't have to block this bugfix, but I think in the future it might make sense to rethink how we do device management in observers. Ideally we can enforce strong assumptions that all observer buffers are on the correct device after initialization and any to calls, and then just throw an error with a stack trace when any mismatch is detected. Then we could remove all the custom code which moves tensors between devices.

Summary: Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24800495](https://our.internmc.facebook.com/intern/diff/D24800495) [ghstack-poisoned]

Summary: Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2cbd6f3 Pull Request resolved: #47514

torch/quantization/observer.py

test/quantization/test_workflow_module.py

Summary: Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24800495](https://our.internmc.facebook.com/intern/diff/D24800495) [ghstack-poisoned]

Summary: fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Fixes #46533 Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 90eb81d Pull Request resolved: #47514

vkuzo

lg, the test failure in QuantizePerChannel4d looks potentially related

Summary: fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Fixes #46533 Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D24800495](https://our.internmc.facebook.com/intern/diff/D24800495) [ghstack-poisoned]

Summary: fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Fixes #46533 Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: a3dc910 Pull Request resolved: #47514

codecov · 2020-11-10T04:12:08Z

Codecov Report

Merging #47514 (be9df93) into gh/supriyar/206/base (8b3f1d1) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@                  Coverage Diff                  @@
##           gh/supriyar/206/base   #47514   +/-   ##
=====================================================
  Coverage                 80.78%   80.79%           
=====================================================
  Files                      1809     1809           
  Lines                    190036   190036           
=====================================================
+ Hits                     153526   153535    +9     
+ Misses                    36510    36501    -9

facebook-github-bot · 2020-11-10T19:21:49Z

This pull request has been merged in 6bb18b2.

facebook-github-bot added the cla signed label Nov 6, 2020

supriyar requested review from jerryzh168 and vkuzo November 6, 2020 21:14

ngimel reviewed Nov 7, 2020

View reviewed changes

torch/quantization/observer.py Outdated Show resolved Hide resolved

vkuzo reviewed Nov 7, 2020

View reviewed changes

torch/quantization/observer.py Outdated Show resolved Hide resolved

vkuzo reviewed Nov 7, 2020

View reviewed changes

test/quantization/test_workflow_module.py Show resolved Hide resolved

vkuzo approved these changes Nov 9, 2020

View reviewed changes

facebook-github-bot closed this in 6bb18b2 Nov 10, 2020

facebook-github-bot added the Merged label Nov 10, 2020

facebook-github-bot deleted the gh/supriyar/206/head branch November 14, 2020 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[quant][qat] Ensure observer respects device affinity #47514

[quant][qat] Ensure observer respects device affinity #47514

Uh oh!

supriyar commented Nov 6, 2020 •

edited

Loading

Uh oh!

dr-ci bot commented Nov 6, 2020 •

edited

Loading

Uh oh!

supriyar commented Nov 6, 2020

Uh oh!

vkuzo commented Nov 6, 2020

Uh oh!

supriyar commented Nov 6, 2020 •

edited

Loading

Uh oh!

Uh oh!

vkuzo commented Nov 7, 2020

Uh oh!

Uh oh!

Uh oh!

vkuzo left a comment

Uh oh!

codecov bot commented Nov 10, 2020 •

edited

Loading

Uh oh!

facebook-github-bot commented Nov 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[quant][qat] Ensure observer respects device affinity #47514

[quant][qat] Ensure observer respects device affinity #47514

Uh oh!

Conversation

supriyar commented Nov 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Nov 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

codecov.io: 1 failed

Uh oh!

supriyar commented Nov 6, 2020

Uh oh!

vkuzo commented Nov 6, 2020

Uh oh!

supriyar commented Nov 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vkuzo commented Nov 7, 2020

Uh oh!

Uh oh!

Uh oh!

vkuzo left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

facebook-github-bot commented Nov 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

supriyar commented Nov 6, 2020 •

edited

Loading

dr-ci bot commented Nov 6, 2020 •

edited

Loading

supriyar commented Nov 6, 2020 •

edited

Loading

codecov bot commented Nov 10, 2020 •

edited

Loading