Add fast path for contiguous innermost-dim reduction in sum.IntList_out by pssrawat · Pull Request #18789 · pytorch/executorch

pssrawat · 2026-04-09T15:23:56Z

Summary:
Add a fast path to the portable sum.IntList_out kernel for the common case of reducing a contiguous tensor over its innermost (last) dimension with matching input/output dtypes. The fast path uses a tight loop over contiguous memory (acc += row[j]) that the compiler can auto-vectorize, bypassing the generic MapReduceOverDimListPlan infrastructure which has
5 layers of lambda indirection, per-output-element get_init_index() calls, and strided access patterns.

Differential Revision: D99293418

Summary: Add a fast path to the portable sum.IntList_out kernel for the common case of reducing a contiguous tensor over its innermost (last) dimension with matching input/output dtypes. The fast path uses a tight loop over contiguous memory (`acc += row[j]`) that the compiler can auto-vectorize, bypassing the generic MapReduceOverDimListPlan infrastructure which has 5 layers of lambda indirection, per-output-element get_init_index() calls, and strided access patterns. Differential Revision: D99293418

pytorch-bot · 2026-04-09T15:24:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18789

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

❌ 1 New Failure, 2 Unrelated Failures

As of commit 561b4d4 with merge base 463f07b ():

NEW FAILURE - The following job has failed:

pull / unittest / macos / macos-job (gh)
backends/xnnpack/test/ops/test_conv2d.py::TestConv2d::test_fp16_conv2d

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2026-04-09T15:24:04Z

@pssrawat has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99293418.

…ut (pytorch#18789) Summary: Add a fast path to the portable sum.IntList_out kernel for the common case of reducing a contiguous tensor over its innermost (last) dimension with matching input/output dtypes. The fast path uses a tight loop over contiguous memory (`acc += row[j]`) that the compiler can auto-vectorize, bypassing the generic MapReduceOverDimListPlan infrastructure which has 5 layers of lambda indirection, per-output-element get_init_index() calls, and strided access patterns. Differential Revision: D99293418

pssrawat requested a review from manuelcandales as a code owner April 9, 2026 15:23

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 9, 2026

meta-codesync bot added fb-exported meta-exported labels Apr 9, 2026

pssrawat added the release notes: none Do not include this in the release notes label Apr 9, 2026

manuelcandales approved these changes Apr 9, 2026

View reviewed changes

meta-codesync bot merged commit 6b63d3d into pytorch:main Apr 9, 2026
172 of 186 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast path for contiguous innermost-dim reduction in sum.IntList_out#18789

Add fast path for contiguous innermost-dim reduction in sum.IntList_out#18789
meta-codesync[bot] merged 1 commit intopytorch:mainfrom
pssrawat:export-D99293418

pssrawat commented Apr 9, 2026

Uh oh!

pytorch-bot bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

meta-codesync bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pssrawat commented Apr 9, 2026

Uh oh!

pytorch-bot bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18789

❗ 1 Active SEVs

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

meta-codesync bot commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Apr 9, 2026 •

edited

Loading