[Stdlib] Implement SIMD-width-based unroll factor heuristic in elementwise/reduction loops by msaelices · Pull Request #6084 · modular/modular

msaelices · 2026-03-06T15:59:57Z

Summary

Five hot loops in algorithm/functional.mojo and algorithm/reduction.mojo had unroll_factor = 8 hard-coded with TODO comments asking for a cost heuristic.

Replace with max(1, min(8, simd_width // 4)):

Scales unrolling to the native SIMD width of the target.
Caps at 8 to avoid excessive code-size on wide-SIMD targets (AVX-512, SVE).
Clamps to at least 1 to satisfy the unroll_factor > 0 invariant.

Affected sites: _elementwise_impl_cpu_1d, _elementwise_impl_cpu_nd, _stencil_impl_cpu (functional.mojo) and map_reduce, reduce_boolean (reduction.mojo).

Copilot

Pull request overview

This PR updates several hot-loop CPU implementations in the stdlib to choose loop unrolling based on the target’s native SIMD width rather than using a hard-coded constant, aiming to better balance throughput and code size across architectures.

Changes:

Replaced comptime unroll_factor = 8 with a SIMD-width-based heuristic in map_reduce and reduce_boolean.
Replaced comptime unroll_factor = 8 with the same heuristic in _elementwise_impl_cpu_1d, _elementwise_impl_cpu_nd, and _stencil_impl_cpu.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
`mojo/stdlib/std/algorithm/reduction.mojo`	Uses a SIMD-width-based unroll heuristic for `map_reduce` and `reduce_boolean` to avoid over-unrolling on wide-SIMD targets.
`mojo/stdlib/std/algorithm/functional.mojo`	Applies the SIMD-width-based unroll heuristic to elementwise and stencil CPU vectorized loops.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

soraros · 2026-03-07T01:35:41Z

I can't comment on whether this is a good heuristics. I think you could use clamp though.

abduld · 2026-03-07T04:48:39Z

mojo/stdlib/std/algorithm/functional.mojo

    comptime assert rank == 1, "Specialization for 1D"

-    comptime unroll_factor = 8  # TODO: Comeup with a cost heuristic.
+    comptime unroll_factor = max(1, min(8, simd_width // 4))


can you use the clamp function for these

msaelices · 2026-03-09T15:54:00Z

@abduld could you PTAL?

…twise/reduction loops Five hot loops in functional.mojo and reduction.mojo had `unroll_factor = 8` with TODO comments asking for a cost heuristic. Replace the hard-coded 8 with `max(1, min(8, simd_width // 4))`: - Scales with the native SIMD width of the dtype/target. - Caps at 8 to avoid excessive code size on wide-SIMD targets. - Ensures at least 1 to satisfy the unroll-factor > 0 invariant. Typical values: float32 / AVX2 (width=8): heuristic → 2 float32 / AVX512 (width=16): heuristic → 4 float64 / AVX2 (width=4): heuristic → 1 Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from a team as a code owner March 6, 2026 15:59

Copilot AI review requested due to automatic review settings March 6, 2026 15:59

github-actions bot added mojo-stdlib Tag for issues related to standard library waiting-on-review labels Mar 6, 2026

Copilot started reviewing on behalf of msaelices March 6, 2026 16:00 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

msaelices force-pushed the fix/reduction-unroll-heuristic branch from 80cc91c to 830ed98 Compare March 6, 2026 18:48

abduld reviewed Mar 7, 2026

View reviewed changes

msaelices force-pushed the fix/reduction-unroll-heuristic branch 2 times, most recently from 0dd7bf5 to 2bd479d Compare March 9, 2026 15:37

msaelices requested a review from abduld March 9, 2026 15:53

msaelices force-pushed the fix/reduction-unroll-heuristic branch from 2bd479d to a9457d3 Compare March 10, 2026 11:28

Merge branch 'main' into fix/reduction-unroll-heuristic

4fad575

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stdlib] Implement SIMD-width-based unroll factor heuristic in elementwise/reduction loops#6084

[Stdlib] Implement SIMD-width-based unroll factor heuristic in elementwise/reduction loops#6084
msaelices wants to merge 2 commits intomodular:mainfrom
msaelices:fix/reduction-unroll-heuristic

msaelices commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

soraros commented Mar 7, 2026

Uh oh!

abduld Mar 7, 2026

Uh oh!

msaelices Mar 9, 2026

Uh oh!

msaelices commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

msaelices commented Mar 6, 2026

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

soraros commented Mar 7, 2026

Uh oh!

abduld Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants