[ET-VK] Improve q8 matmul by increasing TILE_N4 #14597

SS-JIA · 2025-09-25T15:52:23Z

Stack from ghstack (oldest at bottom):

Title says it all! I found that the latency of executing int8 matmul can be improved by increases the output tile's N4 dimension to 2. The improvement is about 20-25% on Samsung Galaxy S25.

Differential Revision: D83253129

Title says it all! I found that the latency of executing int8 matmul can be improved by increases the output tile's N4 dimension to 2. The improvement is about 20-25% on Samsung Galaxy S25. Differential Revision: [D83253129](https://our.internmc.facebook.com/intern/diff/D83253129/) [ghstack-poisoned]

pytorch-bot · 2025-09-25T15:52:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14597

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit 48fa7b0 with merge base c18abc8 ():

NEW FAILURES - The following jobs have failed:

pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh)
RuntimeError: Command docker exec -t 7955bd31ab4f07af9dc899856d41b158f40e4bc3876be074340a39db00f2ea8f /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.11) / linux-job (gh)
RuntimeError: Command docker exec -t e41dc1c5c875af6fae4541f2695fb3f029b8cd7a1842bb9e7f0453e214f569a5 /exec failed with exit code 1
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh)
RuntimeError: Command docker exec -t 6926347ed8e5e4c07cc28b5a0f9eeca1574e3e6844bbf04a78d0cd3056fbf371 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

Check Labels (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-binary-size-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / test-setup-linux-gcc / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh) (trunk failure)
exir/tests/test_quant_fusion_pass.py::TestQuantFusionPass::test_embedding_torchao
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
exir/tests/test_quant_fusion_pass.py::TestQuantFusionPass::test_embedding_torchao

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-09-25T15:53:01Z

@SS-JIA has exported this pull request. If you are a Meta employee, you can view the originating diff in D83253129.

digantdesai

Review automatically exported from Phabricator review in Meta.

github-actions · 2025-09-25T15:53:07Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Title says it all! I found that the latency of executing int8 matmul can be improved by increases the output tile's N4 dimension to 2. The improvement is about 20-25% on Samsung Galaxy S25. Differential Revision: [D83253129](https://our.internmc.facebook.com/intern/diff/D83253129/) ghstack-source-id: 312106549 Pull Request resolved: #14597

@SS-JIA

This PR was created by the merge bot to help merge the original PR into the main branch. ghstack PR number: #14597 by @SS-JIA ^ Please use this as the source of truth for the PR details, comments, and reviews ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/331/base ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/331/head Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/329/orig Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/331/orig Differential Revision: [D83253129](https://our.internmc.facebook.com/intern/diff/D83253129/) @diff-train-skip-merge Co-authored-by: ssjia <ssjia@devvm1479.ncg0.facebook.com>

SS-JIA mentioned this pull request Sep 17, 2025

[ET-VK] Add kInt8x4 dtype and GPUMemoryLayouts for packed quantized tensors #14329

Merged

SS-JIA mentioned this pull request Sep 25, 2025

[ET-VK] Conv2d quantize/dequantize ops for conv2d activations #14330

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 25, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 25, 2025

digantdesai approved these changes Sep 25, 2025

View reviewed changes

facebook-github-bot merged commit 1a28dbb into gh/SS-JIA/331/base Sep 25, 2025
126 of 143 checks passed

facebook-github-bot deleted the gh/SS-JIA/331/head branch September 25, 2025 20:04

facebook-github-bot temporarily deployed to cherry-pick-bot September 25, 2025 20:04 — with GitHub Actions Inactive

pytorchbot mentioned this pull request Sep 25, 2025

[ET-VK] Improve q8 matmul by increasing TILE_N4 #14610

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK] Improve q8 matmul by increasing TILE_N4 #14597

[ET-VK] Improve q8 matmul by increasing TILE_N4 #14597

Uh oh!

SS-JIA commented Sep 25, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

digantdesai left a comment

Uh oh!

github-actions bot commented Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

[ET-VK] Improve q8 matmul by increasing TILE_N4 #14597

[ET-VK] Improve q8 matmul by increasing TILE_N4 #14597

Uh oh!

Conversation

SS-JIA commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14597

❌ 3 New Failures, 1 Cancelled Job, 4 Unrelated Failures

Uh oh!

facebook-github-bot commented Sep 25, 2025

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 25, 2025

This PR needs a release notes: label

Uh oh!

Uh oh!

Uh oh!

SS-JIA commented Sep 25, 2025 •

edited

Loading

pytorch-bot bot commented Sep 25, 2025 •

edited

Loading

This PR needs a `release notes:` label