Add native impl of gelu for QuantizedCPU #69968

Xia-Weiwen · 2021-12-15T07:23:56Z

Add native implementation of gelu for quantized CPU.

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

pytorch-probot · 2021-12-15T07:24:00Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/Xia-Weiwen/pytorch/blob/30f052af53bc16ae3915928327542f257443128e/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-12-16T00:31:36Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/69968
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 30f052a (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…ariables.bzl

Xia-Weiwen · 2021-12-17T00:07:46Z

The failure does not seem to be caused by this patch

20:30:48   test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (__main__.TestGradientsCUDA) ... Memory exception on virtual address 0x7f586a00e000, node id 2 : Page not present
20:30:48 Address does not belong to a known buffer
20:30:48 Memory access fault by GPU node-2 (Agent handle: 0x561a5e5a6b50) on address 0x7f586a00e000. Reason: Page not present or supervisor privilege.
20:30:48 Traceback (most recent call last):
20:30:48   File "test/run_test.py", line 1068, in <module>
20:30:48     main()
20:30:48   File "test/run_test.py", line 1046, in main
20:30:48     raise RuntimeError(err_message)
20:30:48 RuntimeError: test_ops failed! Received signal: SIGIOT

vkuzo

thanks, lgtm!

vkuzo · 2021-12-17T14:31:47Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

+          const auto kAlphaVec = Vectorized<float>(M_SQRT1_2);
+          const auto kOneVec = Vectorized<float>(1);
+          const auto kPointFiveVec = Vectorized<float>(0.5);


nit: maybe initialize outside the loop?

vkuzo · 2021-12-17T14:32:09Z

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

+  float output_scale = scale;
+
+  AT_DISPATCH_QINT_TYPES(qx.scalar_type(), "qgelu", [&]() {
+    float inv_output_scale = 1.0 / output_scale;


nit: init outside the macro?

vkuzo · 2021-12-17T14:33:43Z

test/quantization/core/test_quantized_op.py

+                                           dtype=torch_type)
+            qY_hat = op(qX)
+            self.assertEqual(qY.dequantize(), qY_hat.dequantize(),
+                             msg="F.leaky_relu failed ({} vs {})".format(qY, qY_hat))


nit: "gelu" instead of "leaky_relu"?

facebook-github-bot · 2021-12-17T14:35:31Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

vkuzo · 2021-12-17T14:35:53Z

could you take a look at the minor comments, and I can import again after those?

…ments out of the dispatch macro and loop

Xia-Weiwen · 2021-12-20T03:37:13Z

could you take a look at the minor comments, and I can import again after those?

Hi. I have modified the code according to your comments. Please take a look.

facebook-github-bot · 2021-12-27T19:13:10Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

vkuzo · 2021-12-28T14:03:04Z

Hi @Xia-Weiwen , internal CI caught one more issue: "dos-newlines: You must use ONLY Unix linebreaks \n in source code." for qgelu.cpp. Any chance you could check if the newline characters in that file are something other than \n, and change it to \n?

Xia-Weiwen · 2022-01-04T06:17:36Z

Hi @Xia-Weiwen , internal CI caught one more issue: "dos-newlines: You must use ONLY Unix linebreaks \n in source code." for qgelu.cpp. Any chance you could check if the newline characters in that file are something other than \n, and change it to \n?

Hi @vkuzo. I have removed those '\r'. Please check again. Thanks.

facebook-github-bot · 2022-01-04T14:17:12Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-01-05T13:59:33Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

vkuzo · 2022-01-05T14:03:02Z

any chance you could rebase and upload again? I'm seeing a rebase conflict in Meta CI.

facebook-github-bot · 2022-01-06T16:09:08Z

@vkuzo has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Add quantized gelu

0d3c30d

pytorch-probot bot added the ciflow/default label Dec 15, 2021

facebook-github-bot added the cla signed label Dec 16, 2021

pytorchbot added the open source label Dec 16, 2021

Xia-Weiwen added 2 commits December 16, 2021 10:00

Merge branch 'master' into gelu_quantized_cpu

24bffc6

Add qgelu.cpp to aten_native_source_non_codegen_list in tools/build_v…

c8823fd

…ariables.bzl

Xia-Weiwen changed the title ~~Add quantized gelu~~ Add native impl of gelu for QuantizedCPU Dec 17, 2021

Xia-Weiwen marked this pull request as ready for review December 17, 2021 02:43

Xia-Weiwen requested a review from ezyang as a code owner December 17, 2021 02:43

mruberry added the oncall: quantization Quantization support in PyTorch label Dec 17, 2021

mruberry requested a review from vkuzo December 17, 2021 14:22

mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 17, 2021

vkuzo approved these changes Dec 17, 2021

View reviewed changes

Native impl of gelu for quantized CPU: Move some initialization state…

03b0fba

…ments out of the dispatch macro and loop

Remove \r in qgelu.cpp

8e6ea19

Xia-Weiwen and others added 2 commits January 6, 2022 08:45

Merge branch 'master' into gelu_quantized_cpu

af8c582

Merge branch 'master' into gelu_quantized_cpu

30f052a

facebook-github-bot closed this in 32bf5e0 Jan 7, 2022

Xia-Weiwen deleted the gelu_quantized_cpu branch December 1, 2022 05:48

Add native impl of gelu for QuantizedCPU #69968

Add native impl of gelu for QuantizedCPU #69968

Uh oh!

Conversation

Xia-Weiwen commented Dec 15, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-probot bot commented Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

Xia-Weiwen commented Dec 17, 2021

Uh oh!

vkuzo left a comment

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

vkuzo Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Dec 17, 2021

Uh oh!

vkuzo commented Dec 17, 2021

Uh oh!

Xia-Weiwen commented Dec 20, 2021

Uh oh!

facebook-github-bot commented Dec 27, 2021

Uh oh!

vkuzo commented Dec 28, 2021

Uh oh!

Xia-Weiwen commented Jan 4, 2022

Uh oh!

facebook-github-bot commented Jan 4, 2022

Uh oh!

facebook-github-bot commented Jan 5, 2022

Uh oh!

vkuzo commented Jan 5, 2022

Uh oh!

facebook-github-bot commented Jan 6, 2022

Uh oh!

Uh oh!

Xia-Weiwen commented Dec 15, 2021 •

edited by pytorch-probot bot

Loading

pytorch-probot bot commented Dec 15, 2021 •

edited

Loading

facebook-github-bot commented Dec 16, 2021 •

edited

Loading