[ET-VK] Add `kInt8x4` dtype and `GPUMemoryLayout`s for packed quantized tensors #14609

pytorchbot · 2025-09-25T20:05:18Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #14329 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/329/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/329/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/329/orig
Differential Revision: D82542336
@diff-train-skip-merge

…ed tensors Pull Request resolved: #14329 ## Motivation Lay the foundations for being able to execute statically quantized CNNs with ET-VK. Unlike with dynamic quantization, static quantization allows the output of quantized operators to stay in integer representation and be fed directly to the next quantized operator. ## Context Typically, int8 quantized tensors can be represented by simply having the tensor use the int8 data type. While this is possible in ET-VK, in practice quantized operators expect int8 quantized tensors to be packed so that 16 8-bit values are packed into each `ivec4`, such that quantized int8 tensors will load/store with a granularity of 16 elements. The reason for this is twofold: * Support for shader int8 / storage buffer int8 extension is not guaranteed, meaning some devices do not allow using int8 types in shaders * We have found that load/store from storage buffers/textures that use int8 data types sometimes results in worse memory load performance, due to vectorized load/store instructions not being used. Therefore, in ET-VK we need a way to mark that a quantized tensor should 1. Use int32 as the underlying data type for the storage buffer/texture 2. Account for the block-packing that may be used ## Changes First, introduce the `Int8x4` dtype that can be used for packed int8 tensors. This dtype is functionally the same as `Int`, but denotes that each int32 actually contains 4 packed 8-bit values. Second, introduce new memory layouts: `kPackedInt8_4W4C` and `kPackedInt8_4H4W`. The former will be used for convolution, whil the latter will be used for matrix multiplication. See the inline comments for more details about these memory layouts. Then, update `QuantizedConvolution.cpp` and `QuantizedLinear.cpp` to use the new data type and memory layouts for the packed int8 input tensor. ghstack-source-id: 312106548 Differential Revision: [D82542336](https://our.internmc.facebook.com/intern/diff/D82542336/)

pytorch-bot · 2025-09-25T20:05:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14609

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 20 Pending

As of commit 0c04519 with merge base c18abc8 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 96458690e6ecd1bcffc2a196de5312c5c6a34b3e8c29331f797c63c5e477f673 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t e9403f6c1946a41d0a7f84ec63a3cb7f2351eb87a41741699d5837114afb799c /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t f93166636e54749359a31e1e2e472c92a1f3dfbb01a6d13700dcfe7d088c73c5 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-09-25T20:05:56Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

pytorchbot requested review from SS-JIA, kirklandsign and larryliu0820 as code owners September 25, 2025 20:05

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025

SS-JIA approved these changes Sep 25, 2025

View reviewed changes

SS-JIA merged commit 681680e into main Sep 25, 2025
124 of 132 checks passed

SS-JIA deleted the gh/SS-JIA/329/orig branch September 25, 2025 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Add `kInt8x4` dtype and `GPUMemoryLayout`s for packed quantized tensors #14609

[ET-VK] Add `kInt8x4` dtype and `GPUMemoryLayout`s for packed quantized tensors #14609

Uh oh!

pytorchbot commented Sep 25, 2025

Uh oh!

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ET-VK] Add kInt8x4 dtype and GPUMemoryLayouts for packed quantized tensors #14609

[ET-VK] Add kInt8x4 dtype and GPUMemoryLayouts for packed quantized tensors #14609

Uh oh!

Conversation

pytorchbot commented Sep 25, 2025

Uh oh!

pytorch-bot bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14609

❌ 3 New Failures, 20 Pending

Uh oh!

github-actions bot commented Sep 25, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ET-VK] Add `kInt8x4` dtype and `GPUMemoryLayout`s for packed quantized tensors #14609

[ET-VK] Add `kInt8x4` dtype and `GPUMemoryLayout`s for packed quantized tensors #14609

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

This PR needs a `release notes:` label