[Executorch][quant] Optimize per channel dequantize #5604

kimishpatel · 2024-09-24T19:25:14Z

Stack from ghstack (oldest at bottom):

When using quantized kv cache, dequantization routine takes significantly long.
This diff just vectorizes dequant per channel for common case.

Differential Revision: D63338858

When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) [ghstack-poisoned]

pytorch-bot · 2024-09-24T19:25:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5604

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 29 New Failures

As of commit c385f92 with merge base b2517d6 ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
Lint / lintrunner / linux-job (gh)
>>> Lint for kernels/quantized/test/test_quant_dequant_per_token.py:
pull / test-custom-ops-linux (buck2) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-custom-ops-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (bf16, buck2, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (bf16, cmake, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (fp32, buck2, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (fp32, buck2, xnnpack+custom) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (fp32, buck2, xnnpack+custom+qe) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (fp32, cmake, portable) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (fp32, cmake, xnnpack+custom) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux (fp32, cmake, xnnpack+custom+qe) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llama-runner-linux-android (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:133:5: error: unknown type name 'int32x4_t'; did you mean 'int32_t'?
pull / test-llama-runner-qnn-linux (fp32, cmake, qnn) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-llava-runner-linux / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-models-linux (buck2, mv3, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-models-linux (buck2, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-models-linux (cmake, mv3, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-models-linux (cmake, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-models-linux (cmake, vit, portable, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-models-linux (cmake, vit, xnnpack-delegation, linux.2xlarge, 90) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-pybind-build-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-quantized-aot-lib-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-selective-build-linux (buck2) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-selective-build-linux (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / test-setup-linux-gcc (cmake) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:172:14: error: ‘const Tensor’ {aka ‘const class at::Tensor’} has no member named ‘dim_order’
pull / unittest / linux / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]
pull / unittest / macos / macos-job (gh)
/Users/ec2-user/runner/_work/executorch/executorch/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:133:5: error: unknown type name 'int32x4_t'; did you mean 'int32_t'?
pull / unittest-arm (buck2) / linux-job (gh)
/pytorch/executorch/kernels/quantized/cpu/op_dequantize.cpp:573:42: error: non-constant-expression cannot be narrowed from type 'size_t' (aka 'unsigned long') to 'long' in initializer list [-Wc++11-narrowing]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-09-24T19:25:47Z

This pull request was exported from Phabricator. Differential Revision: D63338858

When using quantized kv cache, dequantization routine takes significantly long. This diff just vectorizes dequant per channel for common case. Differential Revision: [D63338858](https://our.internmc.facebook.com/intern/diff/D63338858/) ghstack-source-id: 244449198 Pull Request resolved: #5604

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2024

This was referenced Sep 24, 2024

[ExecuTorch] Some updated to kv cache #5597

Closed

Fix dequantize per channel to handle double scale type #5524

Closed

kimishpatel mentioned this pull request Sep 24, 2024

[ExecuTorch] Add quantized kv cache to llama #5598

Closed

facebook-github-bot added the fb-exported label Sep 24, 2024

kimishpatel closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Executorch][quant] Optimize per channel dequantize #5604

[Executorch][quant] Optimize per channel dequantize #5604

Uh oh!

kimishpatel commented Sep 24, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

Uh oh!

[Executorch][quant] Optimize per channel dequantize #5604

[Executorch][quant] Optimize per channel dequantize #5604

Uh oh!

Conversation

kimishpatel commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5604

❌ 29 New Failures

Uh oh!

facebook-github-bot commented Sep 24, 2024

Uh oh!

Uh oh!

kimishpatel commented Sep 24, 2024 •

edited

Loading

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading