[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests #157749

felipealmeida · 2025-09-09T21:00:44Z

[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests

Summary

Introduce two passes that recognize a common 2-D row/column loop idiom (e.g. image/matrix kernels) and produce:

a fallback version for LoopVectorize, and
a strided version using widened ops, VP strided loads/stores, and controlled unrolling on scalable-vector targets.

What it matches

Outer canonical IV (rows) and inner canonical IV (cols), both starting at 0, step = 1, != predicate.
Inner loop: unit-stride loads/stores of uniform element size.
Outer loop: base pointers advanced by a regular (dynamic) stride SCEV.
Single store in inner body drives the producer graph.

What it does

Function pass (StridedLoopUnrollVersioningPass):

Builds LAA with AssumptionCache; uses LoopVersioning with runtime pointer checks.
Adds guards: inner TC divisible by unroll, outer TC divisible by vscale, and alignment if required by target.
Unrolls inner loop (heuristic 8 / elemSize), hoists invariant loads, eliminates duplicate loads.
Marks loops (alias scopes, llvm.stride.loop_idiom, etc.).

Loop pass (StridedLoopUnrollPass):

On loops marked llvm.stride.loop_idiom, widens supported ops by vscale.
Lowers unit-stride memory to experimental.vp.strided_{load,store}, adjusts IV increments, and cleans up dead code.

Why does it matter

The x264 project works on calls to pixel_avg of images of 8x8 or 16x16
sizes. This loop versioning allows the use of strided load/stores to
load the whole image depending on the size of the vector for images
8x8. This gives a considerable boost in performance in SPEC 2017 for
Ventana design (6% instruction count reduction) on x264_r test.

Feedback

We want to see if this could be improved, or maybe if some other alternative way to implement this would be better.

github-actions · 2025-09-09T21:04:39Z

✅ With the latest revision this PR passed the undef deprecator.

github-actions · 2025-09-09T21:04:39Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…l loops Introduce two passes that recognize a common 2-D row/column loop idiom (e.g. image/matrix kernels) and produce: * a **fallback** version for LoopVectorize, and * a **strided** version using widened ops, VP strided loads/stores, and controlled unrolling on scalable-vector targets. * Outer canonical IV (rows) and inner canonical IV (cols), both starting at 0, step = 1, `!=` predicate. * Inner loop: **unit-stride** loads/stores of uniform element size. * Outer loop: base pointers advanced by a **regular (dynamic) stride** SCEV. * Single store in inner body drives the producer graph. * Target supports scalable vectors (`TTI::supportsScalableVectors()`). Function pass (`StridedLoopUnrollVersioningPass`): * Builds LAA with **AssumptionCache**; uses `LoopVersioning` with runtime pointer checks. * Adds guards: inner TC divisible by unroll, outer TC divisible by `vscale`, and alignment if required by target. * Unrolls inner loop (heuristic `8 / elemSize`), hoists invariant loads, eliminates duplicate loads. * Marks loops (alias scopes, `llvm.stride.loop_idiom`, etc.). Loop pass (`StridedLoopUnrollPass`): * On loops marked `llvm.stride.loop_idiom`, widens supported ops by `vscale`. * Lowers unit-stride memory to **`experimental.vp.strided_{load,store}`**, adjusts IV increments, and cleans up dead code.

Adds initial test to show difference in code generation and for regression test for Strided Loop Unroll passes test

RISCV: Add basic test for Strided Loop Unroll passes

5ea0c55

LoopVersioning: Add option to hoist runtime checks

4eb3b59

felipealmeida force-pushed the felipe_strided_loop_unroll_upstream branch 4 times, most recently from ab898fb to ee48286 Compare September 9, 2025 22:43

felipealmeida added 2 commits September 10, 2025 09:20

[Vectorize] Update Strided Loop Unroll test with optimization

33e47f7

Adds initial test to show difference in code generation and for regression test for Strided Loop Unroll passes test

felipealmeida force-pushed the felipe_strided_loop_unroll_upstream branch from ee48286 to 33e47f7 Compare September 10, 2025 12:21

mshockwave self-requested a review September 10, 2025 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests #157749

[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests #157749

Uh oh!

felipealmeida commented Sep 9, 2025

Uh oh!

github-actions bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests #157749

Are you sure you want to change the base?

[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests #157749

Uh oh!

Conversation

felipealmeida commented Sep 9, 2025

[Vectorize] Add StridedLoopUnroll + Versioning for 2-D strided loop nests

Summary

What it matches

What it does

Why does it matter

Feedback

Uh oh!

github-actions bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Sep 9, 2025 •

edited

Loading

github-actions bot commented Sep 9, 2025 •

edited

Loading