quantize_tensor_per_channel ARM implementation #46018

ajliu · 2020-10-08T04:21:22Z

Stack from ghstack:

quantize_tensor_per_channel ARM implementation #46018 quantize_tensor_per_channel ARM implementation
Add benchmark for per channel tensor quantization #46017 Add benchmark for per channel tensor quantization

Summary:

Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not.

Test Plan:

Build for ARM Neon

BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh  -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON

Build for ARM64

BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh  -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON

Run the test binary over adb shell with the commands below.

adb push build_android/bin/quantized_test /data/local/tmp
adb shell "/data/local/tmp/quantized_test"

Run the benchmark binary over adb shell with the commands below.

adb push build_android/bin/quantize_per_channel /data/local/tmp/
adb shell "/data/local/tmp/quantize_per_channel"

Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu.

adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor"
adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed"

Resulting benchmarks are located here
Google spreadsheet comparing results here

Overall results:

aarch64
- 2x slowdown on 2d tensor benchmarks
- 4x speed up on 4d contiguous tensor benchmarks
- 4x speed up on 4d channels last tensor benchmarks
neon
- 2x speed up on 2d tensor benchmarks
- 20x speed up on 4d contiguous tensor benchmarks
- 20x speed up on 4d contiguous tensor benchmarks

I suspect that this is an overall performance boost, however it adds a small amount of overhead that becomes very noticeable when quantizing small tensors.

Reviewers: kimishpatel

Subscribers:

Tasks: T76832258

Tags:

Differential Revision: D24286528

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 54e77eaa727a7c3c7bc9bd68e674eccba1eb7548 Pull Request resolved: #46018

kimishpatel · 2020-10-08T14:03:38Z

Can you post benchmarking result here?

ajliu · 2020-10-09T19:35:24Z

Closing because after some testing it appears that the compiler is performing these optimizations automatically.

Edit: I was wrong, I evidently was benchmarking the wrong function because randint returns a float tensor unless otherwise specified.

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 54e77eaa727a7c3c7bc9bd68e674eccba1eb7548 Pull Request resolved: pytorch#46018

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 21882a2b455e5bf6e5d24c756719c03f8dd25bc5 Pull Request resolved: #46018

codecov · 2020-10-13T23:19:58Z

Codecov Report

Merging #46018 into gh/ajliu/2/base will increase coverage by 0.00%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           gh/ajliu/2/base   #46018   +/-   ##
================================================
  Coverage            60.81%   60.81%           
================================================
  Files                 2748     2748           
  Lines               254038   254038           
================================================
+ Hits                154496   154500    +4     
+ Misses               99542    99538    -4

## Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. ## Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Then run the benchmark binary over adb shell. Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Resulting benchmarks are located [here](https://gist.github.com/AJLiu/d1711bb6a5e93b3338eca2c14c8aec9f) Google spreadsheet comparing results [here](https://docs.google.com/spreadsheets/d/1Ky-rEu2CqOqex2a84b67hB1VLAlfEDgAN2ZXe8IlGF8/edit?usp=sharing) **Overall results:** - aarch64 - 2x slowdown on 2d tensor benchmarks - 4x speed up on 4d contiguous tensor benchmarks - 4x speed up on 4d channels last tensor benchmarks - neon - 2x speed up on 2d tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks I suspect that this is an overall performance boost, however it adds a small amount of overhead that becomes very noticeable when quantizing small tensors. Reviewers: kimishpatel Subscribers: Tasks: T76832258 Tags: Differential Revision: [D24286528](https://our.internmc.facebook.com/intern/diff/D24286528) [ghstack-poisoned]

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: dd1fa8c7b57b0f57e90d8587f9dbb2756c858ae9 Pull Request resolved: #46018

ajliu · 2020-10-14T01:07:26Z

Latest commit minimizes the overhead slightly. Now when compiling with Neon we see an overall improvement compared to before, but Arm64 still has a bit of a slowdown on small tensors.

New benchmark outputs: https://gist.github.com/AJLiu/814232ac153b8d8029a8d37dc074320c

Overall results:

aarch64
- 1.4x slowdown on 2d tensor benchmarks
- 4x speed up on 4d contiguous tensor benchmarks
- 4x speed up on 4d channels last tensor benchmarks
neon
- 3x speed up on 2d tensor benchmarks
- 20x speed up on 4d contiguous tensor benchmarks
- 20x speed up on 4d contiguous tensor benchmarks

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp

kimishpatel

Left a few comments.
Also can you add this test https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/test/quantized_test.cpp to mobile tests and run it through to make sure it passes.

## Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. ## Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Then run the benchmark binary over adb shell. Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Resulting benchmarks are located [here](https://gist.github.com/AJLiu/d1711bb6a5e93b3338eca2c14c8aec9f) Google spreadsheet comparing results [here](https://docs.google.com/spreadsheets/d/1Ky-rEu2CqOqex2a84b67hB1VLAlfEDgAN2ZXe8IlGF8/edit?usp=sharing) **Overall results:** - aarch64 - 2x slowdown on 2d tensor benchmarks - 4x speed up on 4d contiguous tensor benchmarks - 4x speed up on 4d channels last tensor benchmarks - neon - 2x speed up on 2d tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks I suspect that this is an overall performance boost, however it adds a small amount of overhead that becomes very noticeable when quantizing small tensors. Reviewers: kimishpatel Subscribers: Tasks: T76832258 Tags: Differential Revision: [D24286528](https://our.internmc.facebook.com/intern/diff/D24286528) [ghstack-poisoned]

dr-ci · 2020-10-17T02:46:14Z

💊 CI failures summary and remediations

As of commit b8958d0 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

Extra GitHub checks: 1 failed

Failed: GitHub Actions - flake8-py3

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

## Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. ## Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Then run the benchmark binary over adb shell. Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Resulting benchmarks are located [here](https://gist.github.com/AJLiu/d1711bb6a5e93b3338eca2c14c8aec9f) Google spreadsheet comparing results [here](https://docs.google.com/spreadsheets/d/1Ky-rEu2CqOqex2a84b67hB1VLAlfEDgAN2ZXe8IlGF8/edit?usp=sharing) **Overall results:** - aarch64 - 2x slowdown on 2d tensor benchmarks - 4x speed up on 4d contiguous tensor benchmarks - 4x speed up on 4d channels last tensor benchmarks - neon - 2x speed up on 2d tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks I suspect that this is an overall performance boost, however it adds a small amount of overhead that becomes very noticeable when quantizing small tensors. Reviewers: kimishpatel Subscribers: Tasks: T76832258 Tags: Differential Revision: [D24286528](https://our.internmc.facebook.com/intern/diff/D24286528) [ghstack-poisoned]

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 73198c6b8608833effb1840697f883f7402cf1fc Pull Request resolved: #46018

ajliu · 2020-10-17T03:10:02Z

Addressed requested changes and enabled pytorch/aten/src/ATen/test/quantized_test.cpp for mobile.

However it looks like none of the tests in that file actually test quantize_tensor_per_channel so I added two new unit tests for them (to check contiguous and channels_last formats). It looks like both new tests pass for both mobile architectures.

kimishpatel · 2020-10-28T15:37:02Z

Addressed requested changes and enabled pytorch/aten/src/ATen/test/quantized_test.cpp for mobile.

However it looks like none of the tests in that file actually test quantize_tensor_per_channel so I added two new unit tests for them (to check contiguous and channels_last formats). It looks like both new tests pass for both mobile architectures.

Thanks for these changes. Looks good overall. I am little perplexed as to why 2D tensors are so slow. Only thing different seems to be new allocations for inv scale and offset and writing those values out which maybe is adding extra memory overhead.

## Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. ## Test Plan: Build for ARM Neon ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI="armeabi-v7a with NEON" ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Build for ARM64 ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON ``` Run the test binary over adb shell with the commands below. ``` adb push build_android/bin/quantized_test /data/local/tmp adb shell "/data/local/tmp/quantized_test" ``` Run the benchmark binary over adb shell with the commands below. ``` adb push build_android/bin/quantize_per_channel /data/local/tmp/ adb shell "/data/local/tmp/quantize_per_channel" ``` Note that by android cpu is not frequency locked by default and can lead to noisy benchmark results, but this can be changed by running the following for every cpu. ``` adb shell "echo userspace > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_governor" adb shell "echo '2000000' > /sys/devices/system/cpu/${cpu}/cpufreq/scaling_setspeed" ``` Resulting benchmarks are located [here](https://gist.github.com/AJLiu/d1711bb6a5e93b3338eca2c14c8aec9f) Google spreadsheet comparing results [here](https://docs.google.com/spreadsheets/d/1Ky-rEu2CqOqex2a84b67hB1VLAlfEDgAN2ZXe8IlGF8/edit?usp=sharing) **Overall results:** - aarch64 - 2x slowdown on 2d tensor benchmarks - 4x speed up on 4d contiguous tensor benchmarks - 4x speed up on 4d channels last tensor benchmarks - neon - 2x speed up on 2d tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks - 20x speed up on 4d contiguous tensor benchmarks I suspect that this is an overall performance boost, however it adds a small amount of overhead that becomes very noticeable when quantizing small tensors. Reviewers: kimishpatel Subscribers: Tasks: T76832258 Tags: Differential Revision: [D24286528](https://our.internmc.facebook.com/intern/diff/D24286528) [ghstack-poisoned]

Summary: Currently on mobile devices quantize_tensor has a vectorized implementation using ARM intrinsics; however quantize_tensor_per_channel does not. Test Plan: Build and push to mobile device ``` BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON adb push build_android/bin/quantize_per_channel /data/local/tmp ``` and then run the benchmark binary over adb shell Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 05ef282ebe63a48d6a275692bdac59d89446e247 Pull Request resolved: #46018

kimishpatel

Looks good. Make sure CI is clean. I see some lint errors.

ajliu · 2020-11-02T22:12:45Z

Reran the benchmarks and the overhead is gone.
Overall results:

aarch64
- 1.4x speed up on 2d tensor benchmarks
- 4x speed up on 4d contiguous tensor benchmarks
- 4x speed up on 4d channels last tensor benchmarks
neon
- 3x speed up on 2d tensor benchmarks
- 20x speed up on 4d contiguous tensor benchmarks
- 20x speed up on 4d contiguous tensor benchmarks

ajliu · 2020-11-02T22:16:42Z

Lint errors are unrelated to this pr, looks like b3eb0c8 fixes it

kimishpatel · 2020-11-02T22:17:47Z

Sounds good.

facebook-github-bot · 2020-11-03T03:16:07Z

This pull request has been merged in b0e954f.

ajliu mentioned this pull request Oct 8, 2020

Add benchmark for per channel tensor quantization #46017

Closed

ajliu closed this Oct 9, 2020

ajliu reopened this Oct 13, 2020

ajliu requested a review from kimishpatel October 13, 2020 20:22

kimishpatel reviewed Oct 14, 2020

View reviewed changes

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp Outdated Show resolved Hide resolved

kimishpatel reviewed Oct 14, 2020

View reviewed changes

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp Outdated Show resolved Hide resolved

kimishpatel reviewed Oct 14, 2020

View reviewed changes

aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp Outdated Show resolved Hide resolved

kimishpatel requested changes Oct 14, 2020

View reviewed changes

ajliu requested a review from kimishpatel October 17, 2020 03:06

facebook-github-bot added the cla signed label Oct 30, 2020

kimishpatel approved these changes Nov 2, 2020

View reviewed changes

facebook-github-bot closed this in b0e954f Nov 3, 2020

facebook-github-bot added the Merged label Nov 3, 2020

facebook-github-bot deleted the gh/ajliu/2/head branch November 6, 2020 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantize_tensor_per_channel ARM implementation #46018

quantize_tensor_per_channel ARM implementation #46018

ajliu commented Oct 8, 2020 •

edited

kimishpatel commented Oct 8, 2020

ajliu commented Oct 9, 2020 •

edited

codecov bot commented Oct 13, 2020 •

edited

ajliu commented Oct 14, 2020 •

edited

kimishpatel left a comment

dr-ci bot commented Oct 17, 2020 •

edited

ajliu commented Oct 17, 2020 •

edited

kimishpatel commented Oct 28, 2020

kimishpatel left a comment

ajliu commented Nov 2, 2020

ajliu commented Nov 2, 2020

kimishpatel commented Nov 2, 2020

facebook-github-bot commented Nov 3, 2020

quantize_tensor_per_channel ARM implementation #46018

quantize_tensor_per_channel ARM implementation #46018

Conversation

ajliu commented Oct 8, 2020 • edited

Summary:

Test Plan:

kimishpatel commented Oct 8, 2020

ajliu commented Oct 9, 2020 • edited

codecov bot commented Oct 13, 2020 • edited

Codecov Report

ajliu commented Oct 14, 2020 • edited

kimishpatel left a comment

Choose a reason for hiding this comment

dr-ci bot commented Oct 17, 2020 • edited

💊 CI failures summary and remediations

Extra GitHub checks: 1 failed

ajliu commented Oct 17, 2020 • edited

kimishpatel commented Oct 28, 2020

kimishpatel left a comment

Choose a reason for hiding this comment

ajliu commented Nov 2, 2020

ajliu commented Nov 2, 2020

kimishpatel commented Nov 2, 2020

facebook-github-bot commented Nov 3, 2020

ajliu commented Oct 8, 2020 •

edited

ajliu commented Oct 9, 2020 •

edited

codecov bot commented Oct 13, 2020 •

edited

ajliu commented Oct 14, 2020 •

edited

dr-ci bot commented Oct 17, 2020 •

edited

ajliu commented Oct 17, 2020 •

edited