[ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic by SS-JIA · Pull Request #20059 · pytorch/executorch

SS-JIA · 2026-06-05T00:48:58Z

Stack from ghstack (oldest at bottom):

-> [ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic #20059
[ETVK] Add benchmark binary + im2col/GEMM conv2d prototype #20058

Wire a routing heuristic into the production conv2d dispatch (add_conv2d_node in Convolution.cpp) so that aten.convolution.default automatically selects the im2col + GEMM path (conv2d_gemm_impl) instead of the direct sliding-window shader when it is beneficial. The new should_use_conv2d_im2col() helper applies preconditions (groups == 1, dilation == 1, Kh*Kw > 1 — 1x1 is already handled by the optimized conv2d_pw path) and then selects im2col when device_is_mali() || c_out >= 128. The rationale is that Mali (buffer path) benefits from im2col universally, while on Adreno (texture path) im2col only amortizes the fixed im2col-gather cost once the output channel count is large enough; smaller c_out shapes stay on the direct shader where they are faster, which also avoids the costly texture3d gather that occurs for very wide im2col matrices.

Differential Revision: D107595816

[ghstack-poisoned]

pytorch-bot · 2026-06-05T00:49:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20059

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit b361111 with merge base 502fdbe ():

NEW FAILURE - The following job has failed:

pull / test-vulkan-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 3745eba5970e9e3102aac54c0bbd319a5f0ae9bd2510115b6de4e088a876c0c8 /exec failed with exit code 1

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-models-linux (resnet18, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, cmake, linux.2xlarge, executorch-u... / linux-job (gh) (matched linux rule in flaky-rules.json)
The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2026-06-05T00:49:04Z

❌ The email address for the commit (2bf7c3d, 7e332ce, b361111) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

github-actions · 2026-06-05T00:49:48Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

SS-JIA · 2026-06-05T00:55:10Z

@claude Review this PR

claude · 2026-06-05T00:56:01Z

Claude finished @SS-JIA's task in 2m 3s —— View job

PR Review: [ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic

Gather context and read changed files
Analyze routing heuristic logic
Review test coverage
Post review feedback

Summary

Clean PR that wires a device-aware routing heuristic into the production conv2d dispatch so that SlidingWindow convolutions can automatically select the im2col+GEMM path when beneficial. The heuristic is simple and well-motivated, the test coverage is thorough (boundary configs, forced-path selectors, cross-device verification strategy), and the force_direct escape hatch for tests is well-designed.

Bug: Transposed convolutions may be incorrectly routed to im2col+GEMM

Convolution.cpp:527-530 — After the Pointwise and Depthwise early returns, the code unconditionally evaluates should_use_conv2d_im2col() for all remaining methods, which includes Conv2dMethod::Transposed:

const bool use_im2col = force_direct
    ? false
    : should_use_conv2d_im2col(graph, weight_data, groups_val, kernel_params);
if (use_im2col) {
    return conv2d_gemm_impl(...);  // Wrong for transposed!
}

A transposed conv2d with groups==1, dilation==1, Kh*Kw > 1, and c_out >= 128 (or on Mali) would pass all preconditions in should_use_conv2d_im2col() and incorrectly dispatch to conv2d_gemm_impl, which implements standard (non-transposed) forward convolution.

Suggested fix — guard the heuristic so it only fires for SlidingWindow:

const bool use_im2col = (!force_direct && method == Conv2dMethod::SlidingWindow)
    ? should_use_conv2d_im2col(graph, weight_data, groups_val, kernel_params)
    : false;

Fix this →

Minor observations

Redundant kernel-size check (Convolution.cpp:329-331) — The Kh*Kw <= 1 check is defensive but can never fire for SlidingWindow (since get_conv2d_method routes 1×1 to Pointwise). If you add the method == SlidingWindow guard above, this becomes even more clearly dead code. Fine to keep as a safety net, just noting it.
Magic constant — The c_out >= 128 threshold is well-justified in comments. Consider extracting it to a named constant (e.g., kIm2colCoutThreshold) for grep-ability if this threshold needs tuning in the future, but this is entirely optional.
Comment verbosity (Convolution.cpp:300-314) — The block comment on should_use_conv2d_im2col is detailed, which is appropriate for a heuristic where the "why" matters. Reads well.
Test design — The boundary test configs at c_out=64/128 combined with all three selectors ("", "direct", "im2col") is a solid strategy for catching routing regressions. The "direct" selector addition also nicely ensures Mali CI doesn't lose coverage of the direct shader path.

Verdict

The PR is well-structured and the heuristic is sound. The transposed-conv routing issue above is the only correctness concern — it's likely not triggered by current model graphs (transposed convs with c_out≥128 and large kernels are uncommon in practice), but it's a latent bug worth fixing before landing.

[ghstack-poisoned]

…vice-aware heuristic Pull Request resolved: #20059 Wire a routing heuristic into the production conv2d dispatch (`add_conv2d_node` in `Convolution.cpp`) so that `aten.convolution.default` automatically selects the im2col + GEMM path (`conv2d_gemm_impl`) instead of the direct sliding-window shader when it is beneficial. The new `should_use_conv2d_im2col()` helper applies preconditions (`groups == 1`, `dilation == 1`, `Kh*Kw > 1` — 1x1 is already handled by the optimized `conv2d_pw` path) and then selects im2col when `device_is_mali() || c_out >= 128`. The rationale is that Mali (buffer path) benefits from im2col universally, while on Adreno (texture path) im2col only amortizes the fixed im2col-gather cost once the output channel count is large enough; smaller `c_out` shapes stay on the direct shader where they are faster, which also avoids the costly texture3d gather that occurs for very wide im2col matrices. ghstack-source-id: 390484641 @exported-using-ghexport Differential Revision: [D107595816](https://our.internmc.facebook.com/intern/diff/D107595816/)

[ghstack-poisoned]

…vice-aware heuristic Pull Request resolved: #20059 Wire a routing heuristic into the production conv2d dispatch (`add_conv2d_node` in `Convolution.cpp`) so that `aten.convolution.default` automatically selects the im2col + GEMM path (`conv2d_gemm_impl`) instead of the direct sliding-window shader when it is beneficial. The new `should_use_conv2d_im2col()` helper applies preconditions (`groups == 1`, `dilation == 1`, `Kh*Kw > 1` — 1x1 is already handled by the optimized `conv2d_pw` path) and then selects im2col when `device_is_mali() || c_out >= 128`. The rationale is that Mali (buffer path) benefits from im2col universally, while on Adreno (texture path) im2col only amortizes the fixed im2col-gather cost once the output channel count is large enough; smaller `c_out` shapes stay on the direct shader where they are faster, which also avoids the costly texture3d gather that occurs for very wide im2col matrices. ghstack-source-id: 390528476 @exported-using-ghexport Differential Revision: [D107595816](https://our.internmc.facebook.com/intern/diff/D107595816/)

Update

2bf7c3d

[ghstack-poisoned]

SS-JIA mentioned this pull request Jun 5, 2026

[ETVK] Add benchmark binary + im2col/GEMM conv2d prototype #20058

Open

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 5, 2026

meta-codesync Bot added the meta-exported label Jun 5, 2026

metascroy approved these changes Jun 5, 2026

View reviewed changes

Update

7e332ce

[ghstack-poisoned]

Update

b361111

[ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic#20059

[ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic#20059
SS-JIA wants to merge 3 commits into
gh/SS-JIA/557/basefrom
gh/SS-JIA/557/head

SS-JIA commented Jun 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

SS-JIA commented Jun 5, 2026

Uh oh!

claude Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-JIA commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20059

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

linux-foundation-easycla Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 5, 2026

This PR needs a release notes: label

Uh oh!

SS-JIA commented Jun 5, 2026

Uh oh!

claude Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: [ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic

Summary

Bug: Transposed convolutions may be incorrectly routed to im2col+GEMM

Minor observations

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Jun 5, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 5, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 5, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 5, 2026 •

edited

Loading