[ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic#20059
[ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic#20059SS-JIA wants to merge 3 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20059
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit b361111 with merge base 502fdbe ( NEW FAILURE - The following job has failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This PR needs a
|
|
@claude Review this PR |
|
Claude finished @SS-JIA's task in 2m 3s —— View job PR Review: [ET-VK][conv2d] Auto-route SlidingWindow conv2d to im2col/GEMM via device-aware heuristic
SummaryClean PR that wires a device-aware routing heuristic into the production conv2d dispatch so that Bug: Transposed convolutions may be incorrectly routed to im2col+GEMM
const bool use_im2col = force_direct
? false
: should_use_conv2d_im2col(graph, weight_data, groups_val, kernel_params);
if (use_im2col) {
return conv2d_gemm_impl(...); // Wrong for transposed!
}A transposed conv2d with Suggested fix — guard the heuristic so it only fires for const bool use_im2col = (!force_direct && method == Conv2dMethod::SlidingWindow)
? should_use_conv2d_im2col(graph, weight_data, groups_val, kernel_params)
: false;Minor observations
VerdictThe PR is well-structured and the heuristic is sound. The transposed-conv routing issue above is the only correctness concern — it's likely not triggered by current model graphs (transposed convs with c_out≥128 and large kernels are uncommon in practice), but it's a latent bug worth fixing before landing. |
…vice-aware heuristic Pull Request resolved: #20059 Wire a routing heuristic into the production conv2d dispatch (`add_conv2d_node` in `Convolution.cpp`) so that `aten.convolution.default` automatically selects the im2col + GEMM path (`conv2d_gemm_impl`) instead of the direct sliding-window shader when it is beneficial. The new `should_use_conv2d_im2col()` helper applies preconditions (`groups == 1`, `dilation == 1`, `Kh*Kw > 1` — 1x1 is already handled by the optimized `conv2d_pw` path) and then selects im2col when `device_is_mali() || c_out >= 128`. The rationale is that Mali (buffer path) benefits from im2col universally, while on Adreno (texture path) im2col only amortizes the fixed im2col-gather cost once the output channel count is large enough; smaller `c_out` shapes stay on the direct shader where they are faster, which also avoids the costly texture3d gather that occurs for very wide im2col matrices. ghstack-source-id: 390484641 @exported-using-ghexport Differential Revision: [D107595816](https://our.internmc.facebook.com/intern/diff/D107595816/)
…vice-aware heuristic Pull Request resolved: #20059 Wire a routing heuristic into the production conv2d dispatch (`add_conv2d_node` in `Convolution.cpp`) so that `aten.convolution.default` automatically selects the im2col + GEMM path (`conv2d_gemm_impl`) instead of the direct sliding-window shader when it is beneficial. The new `should_use_conv2d_im2col()` helper applies preconditions (`groups == 1`, `dilation == 1`, `Kh*Kw > 1` — 1x1 is already handled by the optimized `conv2d_pw` path) and then selects im2col when `device_is_mali() || c_out >= 128`. The rationale is that Mali (buffer path) benefits from im2col universally, while on Adreno (texture path) im2col only amortizes the fixed im2col-gather cost once the output channel count is large enough; smaller `c_out` shapes stay on the direct shader where they are faster, which also avoids the costly texture3d gather that occurs for very wide im2col matrices. ghstack-source-id: 390528476 @exported-using-ghexport Differential Revision: [D107595816](https://our.internmc.facebook.com/intern/diff/D107595816/)
Stack from ghstack (oldest at bottom):
Wire a routing heuristic into the production conv2d dispatch (
add_conv2d_nodeinConvolution.cpp) so thataten.convolution.defaultautomatically selects the im2col + GEMM path (conv2d_gemm_impl) instead of the direct sliding-window shader when it is beneficial. The newshould_use_conv2d_im2col()helper applies preconditions (groups == 1,dilation == 1,Kh*Kw > 1— 1x1 is already handled by the optimizedconv2d_pwpath) and then selects im2col whendevice_is_mali() || c_out >= 128. The rationale is that Mali (buffer path) benefits from im2col universally, while on Adreno (texture path) im2col only amortizes the fixed im2col-gather cost once the output channel count is large enough; smallerc_outshapes stay on the direct shader where they are faster, which also avoids the costly texture3d gather that occurs for very wide im2col matrices.Differential Revision: D107595816