Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantization: fix scale+zp serialization of quantized BatchNorm{2|3}d #70432

Closed
wants to merge 4 commits into from

Conversation

vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 27, 2021

Stack from ghstack:

Summary:

Scale and zero_point need to be buffers for serialization to work
on them properly. This PR moves them to buffers. This is BC breaking,
but the "before" state was completely broken (scale + zp were not
serialized at all) so there is no value in trying to handle it.

Test Plan:

python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm3d_serialization

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D33330022

Summary:

Not for review yet.

Scale and zero_point need to be buffers for serialization to work
on them properly.  For now, this PR fixes the issue without dealing
with BC, just so we can run CI and see how many callsites expect
the old behavior.  CI results will determine next steps.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Dec 27, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/0bee19a1b86a2c1d647b92869d8bdeace75ccd34/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

vkuzo added a commit that referenced this pull request Dec 27, 2021
Summary:

Not for review yet.

Scale and zero_point need to be buffers for serialization to work
on them properly.  For now, this PR fixes the issue without dealing
with BC, just so we can run CI and see how many callsites expect
the old behavior.  CI results will determine next steps.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: cdbf2362c95f4ff33547aa84cc669cb5d1dc51b1
Pull Request resolved: #70432
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 27, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 0bee19a (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda11.3-py3.7-gcc7 / test (fx2trt, 1, 1, linux.4xlarge.nvidia.gpu) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-01-05T20:56:08.2171952Z RuntimeError: test_nn failed!
2022-01-05T20:56:06.9914071Z Generated XML report: test-reports/python-unittest/test_nn/TEST-TestModuleGlobalHooks-20220105205239.xml
2022-01-05T20:56:07.2269441Z Generated XML report: test-reports/python-unittest/test_nn/TEST-TestNN-20220105205239.xml
2022-01-05T20:56:07.4125388Z Generated XML report: test-reports/python-unittest/test_nn/TEST-TestNNDeviceTypeCUDA-20220105205239.xml
2022-01-05T20:56:07.4157585Z Generated XML report: test-reports/python-unittest/test_nn/TEST-TestNNInit-20220105205239.xml
2022-01-05T20:56:07.4162765Z Generated XML report: test-reports/python-unittest/test_nn/TEST-TestStateDictHooks-20220105205239.xml
2022-01-05T20:56:08.2164254Z Traceback (most recent call last):
2022-01-05T20:56:08.2165012Z   File "test/run_test.py", line 1096, in <module>
2022-01-05T20:56:08.2167797Z     main()
2022-01-05T20:56:08.2168451Z   File "test/run_test.py", line 1074, in main
2022-01-05T20:56:08.2171292Z     raise RuntimeError(err_message)
2022-01-05T20:56:08.2171952Z RuntimeError: test_nn failed!
2022-01-05T20:56:08.8824087Z + cleanup
2022-01-05T20:56:08.8824579Z + retcode=1
2022-01-05T20:56:08.8825041Z + set +x
2022-01-05T20:56:08.8879159Z ##[error]Process completed with exit code 1.
2022-01-05T20:56:08.8938141Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-01-05T20:56:08.8939141Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m
2022-01-05T20:56:08.8939969Z �[36;1mdocker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .�[0m
2022-01-05T20:56:08.8953157Z shell: /usr/bin/bash -e {0}
2022-01-05T20:56:08.8953583Z env:
2022-01-05T20:56:08.8954256Z   BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 27, 2021

@vkuzo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…3}d"

Summary:

Not for review yet.

Scale and zero_point need to be buffers for serialization to work
on them properly.  For now, this PR fixes the issue without dealing
with BC, just so we can run CI and see how many callsites expect
the old behavior.  CI results will determine next steps.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33330022](https://our.internmc.facebook.com/intern/diff/D33330022)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 28, 2021
Summary:

Not for review yet.

Scale and zero_point need to be buffers for serialization to work
on them properly.  For now, this PR fixes the issue without dealing
with BC, just so we can run CI and see how many callsites expect
the old behavior.  CI results will determine next steps.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 1308837dbdc16ababcca95f6a461372c2ae78060
Pull Request resolved: #70432
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 28, 2021

@vkuzo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@vkuzo vkuzo changed the title [wip] fix scale+zp serialization of quantized BatchNorm{2|3}d quantization: fix scale+zp serialization of quantized BatchNorm{2|3}d Dec 28, 2021
…hNorm{2|3}d"

Summary:

Scale and zero_point need to be buffers for serialization to work
on them properly.  This PR moves them to buffers.  This is BC breaking,
but the "before" state was completely broken (scale + zp were not
serialized at all) so there is no value in trying to handle it.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm3d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33330022](https://our.internmc.facebook.com/intern/diff/D33330022)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 28, 2021
Summary:

Scale and zero_point need to be buffers for serialization to work
on them properly.  This PR moves them to buffers.  This is BC breaking,
but the "before" state was completely broken (scale + zp were not
serialized at all) so there is no value in trying to handle it.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm3d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 6a3424d0f001431ca38f8ce1c6766fdee6e48f02
Pull Request resolved: #70432
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 28, 2021

@vkuzo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jbschlosser jbschlosser removed their request for review December 28, 2021 15:04
@albanD albanD removed their request for review December 29, 2021 11:31
@@ -644,6 +644,50 @@ def test_batch_norm3d(self):
self.assertEqual(quant_ref.int_repr().numpy(), qy.int_repr().numpy(),
msg="BatchNorm3d module API failed")

def _test_batch_norm_serialization(self, get_model, data1, data2):
m1 = get_model()
m1.qconfig = torch.quantization.default_qconfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: torch.ao.quantization

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

…hNorm{2|3}d"

Summary:

Scale and zero_point need to be buffers for serialization to work
on them properly.  This PR moves them to buffers.  This is BC breaking,
but the "before" state was completely broken (scale + zp were not
serialized at all) so there is no value in trying to handle it.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm3d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D33330022](https://our.internmc.facebook.com/intern/diff/D33330022)

[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Jan 5, 2022
Summary:

Scale and zero_point need to be buffers for serialization to work
on them properly.  This PR moves them to buffers.  This is BC breaking,
but the "before" state was completely broken (scale + zp were not
serialized at all) so there is no value in trying to handle it.

Test Plan:

```
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm2d_serialization
python test/test_quantization.py TestStaticQuantizedModule.test_batch_norm3d_serialization
```

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 1649ae2f68bd43b804f29e7fd30f5a132ee9773c
Pull Request resolved: #70432
@vkuzo
Copy link
Contributor Author

vkuzo commented Jan 5, 2022

@vkuzo has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

investigate reported issues with quantized BatchNorm2d
3 participants