fuse flt/map over window into stride-1 in-place loop#325
Merged
Conversation
New 2-word opcode that materialises xs[idx..idx+n] into a destination list register, reusing the existing Vec in place when its strong count is 1. Backs the fused `flt fn (window n xs)` / `map fn (window n xs)` emitter coming in a follow-up commit. The reuse path mirrors the RC-peek pattern already used by OP_ADD_SS, OP_LISTAPPEND, and OP_MSET. When the destination doesn't hold a uniquely owned list, falls back to allocating fresh and drop_rc'ing the previous value. Also threads the new opcode through find_block_leaders (2-word skip) and chunk_is_all_numeric (non-numeric result).
Adds jit_window_view (mirrors the OP_WINDOW_VIEW VM dispatcher's reuse arm) and wires it into both the JIT and AOT compile paths. Each path declares the 5-arg helper, registers it in the relocation table, and emits an OP_WINDOW_VIEW arm that calls the helper with (cur, xs, idx, n, span). The helper takes the destination register's current value, attempts the in-place reuse fast path when strong count is 1, and otherwise allocates fresh. Returning the same NanVal bits on reuse keeps def_var as a no-op write so the SSA variable's identity is preserved across iterations. Also marks OP_WINDOW_VIEW as a non-numeric / non-bool writer in both Cranelift flag-flow analyses.
Pattern-matches `flt fn (window n xs)` and `map fn (window n xs)` at bytecode emit time and replaces the eager `L (L t)` materialisation with a tight stride-1 loop that reuses one scratch list as the per-call window. The reuse hinges on the new OP_WINDOW_VIEW arm's RC=1 fast path. Why this matters: bioinformatics rerun5 hit a 5.8x slowdown on `flt all-hydro (window 15 (chars seq))` over an 11.4M-residue input — the unfused form allocated ~170M small lists. With fusion, drop iterations and map iterations reuse a single Vec; only flt's keep iterations force a fresh allocation (because the accumulator clone_rc's the window into itself, bumping RC > 1). The non-fused `window` path is unchanged: when the result escapes to a binding or isn't consumed by an outer flt/map, the eager OP_WINDOW dispatcher fires as before. Tree-walker keeps the reference impl; the VM dispatcher and Cranelift helper handle the reuse.
Pin the fused emitter's output against tree, VM, and Cranelift engines:
- flt/map over window across happy paths
- all-pass case (every iter clone_rc's, forcing fresh allocation)
- empty source / n > len(xs) edge cases (limit reg <= 0, loop body
never runs)
- size-1 windows (smallest non-trivial case)
- non-bool predicate error path (matches the unfused error message)
- negative case: window result bound to a variable, fusion must not
fire and the eager OP_WINDOW path produces the expected list-of-
lists
Adds an examples/window-stream.ilo demonstrator that the examples_engines
harness runs across every engine.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
danieljohnmorris
added a commit
that referenced
this pull request
May 16, 2026
eight fixes since 0.11.4, all from rerun5 personas: bare-bang silent-nil regression (#324), Cranelift AArch64 panic catch_unwind fallback (#319), multi-line body span drift (#318), HOF tree-bridge error parity on Cranelift (#321), bool-ternary brace sugar (#323), single-line body diagnostic with brace-block bodies (#322), unknown-subcommand error in multi-fn files (#320), window perf cliff fused flt/map (#325).
This was referenced May 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Manifesto framing: idiomatic
flt all-hydro (window 15 (chars seq))was~6x slower than the hand-rolled imperative form on the bioinformatics
rerun5 corpus (11.4M residues). Root cause: the unfused emitter
materialised an
L (L t)ofn-k+1small inner lists, allocating ~170Mfresh Vecs total. AI agents reaching for the idiomatic shape paid a real
latency tax; the imperative escape hatch is more tokens to emit and
defeats the point of having higher-order list builtins.
This PR fuses
flt fn (window n xs)andmap fn (window n xs)at emittime into a stride-1 loop that walks
xsonce with a single reusablescratch list. The reuse path piggy-backs on the same RC=1 in-place
mutation pattern already used by
OP_ADD_SS,OP_LISTAPPEND, andOP_MSET.Result on the bioinformatics workload (local repro, full corpus): the
fused path matches the hand-rolled imperative form within noise. Agents
can now write the idiomatic shape and pay no penalty.
Repro
Before (unfused, on main):
After (fused, this PR):
What's in the diff
Four commits, one coherent change per commit:
opcode, VM dispatcher arm with RC=1 in-place reuse fast path,
find_block_leaders2-word skip,chunk_is_all_numericexclusion.jit_window_view(mirrors the VM dispatcher's reuse logic), wiredinto both JIT and AOT compile paths. Non-numeric / non-bool writer
flag updates.
compile_fused_window_hof, pattern-match in the(Builtin::Flt, 2)and
(Builtin::Map, 2)arms (only when the inner call is exactlywindow n xswith no unwrap mode). Tree-walker keeps the eager pathas the reference impl.
regression tests across tree / VM / Cranelift plus an
examples/window-stream.iloexercised by the examples_enginesharness.
Test plan
cargo test --release --features cranelift --test regression_window_fused(11/11)cargo test --release --features cranelift --test examples_engines(1/1)cargo test --release --features cranelift(full suite green)cargo fmt --checkcargo clippy --release --features cranelift --all-targets -- -D warningsFollow-ups
None blocking. A future PR could extend the same fusion to
fold fn init (window n xs)andflt+map fusionif the bioinformaticswork surfaces more patterns.