Skip to content

perf: O(n²)→O(n) mset accumulator via RC=1 in-place HashMap mutation#249

Merged
danieljohnmorris merged 7 commits into
mainfrom
fix/rc-aware-mutation
May 13, 2026
Merged

perf: O(n²)→O(n) mset accumulator via RC=1 in-place HashMap mutation#249
danieljohnmorris merged 7 commits into
mainfrom
fix/rc-aware-mutation

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

@danieljohnmorris danieljohnmorris commented May 13, 2026

Summary

Phase 2a of the RC-aware mutation work parked under nlp-engineer's 222k-token Moby Dick OOM cluster. Makes the m = mset m k v accumulator pattern run in O(n) amortised on the VM and Cranelift backends, mirroring the OP_LISTAPPEND / OP_ADD_SS RC=1 fast paths landed in PR #232. Tree-walker explicitly deferred to its own gate.

Also fixes a pre-existing latent RC bug in jit_mset and a never-reached aliasing hazard in the new helper.

Repro

16k-key map accumulator (the nlp-engineer rerun shape; logs-forensics 58k-key host map was an OOM):

bench n:n>n;m=mmap;@i 0..n{k=fmt "key{}" i;m=mset m k i};len (mkeys m)
main>n;bench 16000
Engine Before After
tree 70s 70s (Phase 2b deferred)
vm 37s 0.02s (~1800x)
cranelift 22s 0.02s (~1100x)

5k-key wall-clock test in tests/regression_mset_accumulator.rs pins the budget so future regressions show up in CI.

What's in the diff

Per-commit:

  1. vm: RC=1 in-place mset for sole-owner accumulator, OP_MSET dispatch fast path. When a == b (the rebind shape the new peephole emits) and strong_count == 1, mutate the HashMap through the raw pointer instead of cloning the whole map. Same unsafe pattern as OP_LISTAPPEND / OP_ADD_SS from fix: at s i on text no longer allocates a Vec per call #232.

  2. vm: compiler peephole for mset self-rebind accumulator, recognise name = mset name k v in compile_stmt and emit OP_MSET with a == b. Mirrors the existing name = +=name item and name = +name suffix peepholes.

  3. jit: fix latent RC bug in jit_mset and add RC=1 fast path, pre-fix helper bit-copied map entries via m.clone() but never bumped RC for retained heap values, while HeapObj::Drop for Map decrements every value. Existing tests only used numeric values so it never manifested. Also adds the RC=1 fast path.

  4. test + example: cross-engine regression for mset accumulator, covers chains, overwrites, Text values, List values, the RC > 1 function-call-boundary case, 500-key correctness loop, 5k-key scaling budget. examples/mset-accumulator.ilo enters the engine harness via tests/examples_engines.rs.

  5. jit: only fire in-place mset when destination shares source register, review caught that the previous commit's jit_mset fast path fired whenever RC=1 regardless of caller register layout. With m2 = mset m k v and sole-owner m, both slots would alias the same Rc and writes through m2 would mutate m. Split the helper: jit_mset always clones (the pure correctness path); jit_mset_inplace mutates when RC=1, falls back to clone otherwise. The Cranelift OP_MSET compile path picks the in-place variant only when a_idx == b_idx in the bytecode, which the new peephole guarantees.

  6. test: regression for non-rebind mset preserves caller's map, pins the alias-guard contract across all three engines.

Notes on what's not in scope:

  • Tree-walker rewrite, Value::Map(HashMap) is not Rc-backed; converting to Rc<HashMap> is a separate week-scale refactor (~1300 match sites across the crate). Parked.
  • m2 = m register aliasing, the VM compiler reuses the same SSA register for an alias binding, so the rebind peephole fires even when the user "expected" two distinct copies. Same property already affects OP_LISTAPPEND from fix: at s i on text no longer allocates a Vec per call #232. Pre-existing; out of scope here.
  • jit_listappend RC=1 aliasing when a != b, the equivalent hole in the list helper from fix: at s i on text no longer allocates a Vec per call #232, surfaced by this review. Documented as a follow-up; not modified here.

Test plan

  • cargo clippy --release --features cranelift --lib --bins --tests -- -D warnings, clean
  • cargo fmt --check, clean
  • tests/regression_mset_accumulator.rs, 20/20 pass on all three engines
  • tests/examples_engines.rs, all examples (incl. new mset-accumulator.ilo) green on tree/vm/cranelift
  • Full integration suite under --no-fail-fast, all bundles green (the 7 lib vm::compile_cranelift::tests::aot_* failures are pre-existing CARGO_TARGET_DIR/libilo.a search-path mismatches; CI doesn't enable the cranelift feature for nextest so these don't run in CI)
  • 16k-key bench before/after on VM + Cranelift (above)

Follow-ups

  • m2 = m alias-then-mutate visibility (pre-existing for +=, same shape now for mset)
  • jit_listappend a != b RC=1 aliasing parity fix (mirror of this PR's helper split, against jit_listappend)
  • Tree-walker Rc conversion (Phase 2b), its own gated decision

When the OP_MSET destination and source registers are the same and the
map's strong count is 1, mutate the HashMap in place instead of cloning
the whole map on every insert. Mirrors the OP_LISTAPPEND / OP_ADD_SS
RC=1 fast paths from PR #232.

The fast path only fires when a == b, which the compiler can guarantee
via a self-rebind peephole (next commit). With it, m=mset m k v in a
loop runs in O(n) amortised instead of O(n²) HashMap clones.
Detect the rebind shape m = mset m k v in compile_stmt and emit OP_MSET
with a == b so the runtime fast path can mutate the map in place. Same
shape as the existing peepholes for += list append and += string concat.

The peephole only fires when the first arg is a Ref to the let target,
matching the natural accumulator pattern. Other mset shapes (different
target, nested calls, unwrap form) fall through to the general path.
jit_mset previously bit-copied map entries via m.clone() but never bumped
RC for retained heap values, while HeapObj::Drop for Map decrements every
value when a map is dropped. With numeric values this was harmless because
non-heap NanVals are no-ops in clone_rc/drop_rc; with Text, List, or Map
values the slow path produced use-after-free / double-decrement. Existing
tests only exercised numeric values so the UB never surfaced.

Bump RC per retained entry to match the VM OP_MSET slow path. Also add
an RC=1 fast path matching jit_listappend: when the caller's map is the
sole holder, mutate the HashMap in place and return the same NanVal so
the def_var rebind keeps the same Rc pointer at strong_count 1.
Pin correctness across tree / vm / cranelift for the mset accumulator
pattern, with deliberate coverage of non-numeric values (Text, List)
that the previous jit_mset RC bug would have corrupted. Includes:

- Basic chain and overwrite shapes with Text values
- Function-call boundary as a real RC > 1 trigger (the slow path)
- Map-of-list to exercise nested-Rc accounting
- 500-key and 5k-key loop accumulators with a wall-clock budget that
  the old O(n^2) HashMap-clone path would blow on VM and Cranelift

The example exposes the same pattern to engine harness coverage via
tests/examples_engines.rs and gives future agents an in-context
demonstration of the working accumulator idiom.

Notes on shared-map aliasing (m2 = m followed by mutation of m): the
VM compiler reuses m's register for m2, so RC stays at 1 and the fast
path fires. This is a pre-existing property of the register allocator
that also affects OP_LISTAPPEND from PR #232 and is out of scope here.
The previous commit's jit_mset RC=1 fast path fired whenever
strong_count == 1, regardless of whether the destination and source
registers were the same SSA variable. With a fresh destination
(m2 = mset m k v), both slot m and slot m2 ended up referencing the
same Rc and the caller's source map silently mutated.

Split the helper: jit_mset always clones (and now correctly bumps RC
per retained entry, fixing the pre-existing latent bug on its own);
jit_mset_inplace mutates in place when RC == 1 and falls back to the
cloning path otherwise. The Cranelift OP_MSET compile path picks
mset_inplace only when a_idx == b_idx in the bytecode — which the new
VM compiler peephole guarantees for the rebind shape.

The same a != b RC=1 aliasing hole exists in jit_listappend from
PR #232; left for a follow-up since it predates this PR.
Pin the m2 = mset m k v alias case across tree, vm, and cranelift. Pre-fix
Cranelift jit_mset would fast-path-mutate m's map in place when RC was 1,
making subsequent reads through m observe m2's insertion. The split into
jit_mset / jit_mset_inplace plus compile-time a_idx == b_idx gating fixes
this; the test pins the contract.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 76.50273% with 43 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/vm/mod.rs 69.40% 41 Missing ⚠️
src/vm/jit_cranelift.rs 94.44% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

The cross-engine regression file spawns the ilo binary, which keeps the
fast-path code paths invisible to llvm-cov's library-level instrumentation.
Codecov's patch coverage gate flagged the diff at 67% as a result.

Adds in-module tests under vm::tests and vm::jit_cranelift::tests that
exercise the same shapes directly through the in-process VM / JIT
harnesses (vm_run, jit_run), so llvm-cov sees the lines exercised:

- vm_mset_rebind_accumulator_text_values: 8-key text accumulator
- vm_mset_rebind_overwrite_same_key: drop_rc on displaced value
- vm_mset_fn_arg_forces_slow_path: RC > 1 via function-call boundary
- vm_mset_non_rebind_distinct_dest: a != b must not fast-path
- cranelift_mset_rebind_accumulator_text: jit_mset_inplace happy path
- cranelift_mset_non_rebind_does_not_alias: a != b alias guard
- cranelift_mset_overwrite_drops_old_text: displaced Text RC accounting

No production code change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant