perf: O(n²)→O(n) mset accumulator via RC=1 in-place HashMap mutation#249
Merged
Conversation
When the OP_MSET destination and source registers are the same and the map's strong count is 1, mutate the HashMap in place instead of cloning the whole map on every insert. Mirrors the OP_LISTAPPEND / OP_ADD_SS RC=1 fast paths from PR #232. The fast path only fires when a == b, which the compiler can guarantee via a self-rebind peephole (next commit). With it, m=mset m k v in a loop runs in O(n) amortised instead of O(n²) HashMap clones.
Detect the rebind shape m = mset m k v in compile_stmt and emit OP_MSET with a == b so the runtime fast path can mutate the map in place. Same shape as the existing peepholes for += list append and += string concat. The peephole only fires when the first arg is a Ref to the let target, matching the natural accumulator pattern. Other mset shapes (different target, nested calls, unwrap form) fall through to the general path.
jit_mset previously bit-copied map entries via m.clone() but never bumped RC for retained heap values, while HeapObj::Drop for Map decrements every value when a map is dropped. With numeric values this was harmless because non-heap NanVals are no-ops in clone_rc/drop_rc; with Text, List, or Map values the slow path produced use-after-free / double-decrement. Existing tests only exercised numeric values so the UB never surfaced. Bump RC per retained entry to match the VM OP_MSET slow path. Also add an RC=1 fast path matching jit_listappend: when the caller's map is the sole holder, mutate the HashMap in place and return the same NanVal so the def_var rebind keeps the same Rc pointer at strong_count 1.
Pin correctness across tree / vm / cranelift for the mset accumulator pattern, with deliberate coverage of non-numeric values (Text, List) that the previous jit_mset RC bug would have corrupted. Includes: - Basic chain and overwrite shapes with Text values - Function-call boundary as a real RC > 1 trigger (the slow path) - Map-of-list to exercise nested-Rc accounting - 500-key and 5k-key loop accumulators with a wall-clock budget that the old O(n^2) HashMap-clone path would blow on VM and Cranelift The example exposes the same pattern to engine harness coverage via tests/examples_engines.rs and gives future agents an in-context demonstration of the working accumulator idiom. Notes on shared-map aliasing (m2 = m followed by mutation of m): the VM compiler reuses m's register for m2, so RC stays at 1 and the fast path fires. This is a pre-existing property of the register allocator that also affects OP_LISTAPPEND from PR #232 and is out of scope here.
The previous commit's jit_mset RC=1 fast path fired whenever strong_count == 1, regardless of whether the destination and source registers were the same SSA variable. With a fresh destination (m2 = mset m k v), both slot m and slot m2 ended up referencing the same Rc and the caller's source map silently mutated. Split the helper: jit_mset always clones (and now correctly bumps RC per retained entry, fixing the pre-existing latent bug on its own); jit_mset_inplace mutates in place when RC == 1 and falls back to the cloning path otherwise. The Cranelift OP_MSET compile path picks mset_inplace only when a_idx == b_idx in the bytecode — which the new VM compiler peephole guarantees for the rebind shape. The same a != b RC=1 aliasing hole exists in jit_listappend from PR #232; left for a follow-up since it predates this PR.
Pin the m2 = mset m k v alias case across tree, vm, and cranelift. Pre-fix Cranelift jit_mset would fast-path-mutate m's map in place when RC was 1, making subsequent reads through m observe m2's insertion. The split into jit_mset / jit_mset_inplace plus compile-time a_idx == b_idx gating fixes this; the test pins the contract.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The cross-engine regression file spawns the ilo binary, which keeps the fast-path code paths invisible to llvm-cov's library-level instrumentation. Codecov's patch coverage gate flagged the diff at 67% as a result. Adds in-module tests under vm::tests and vm::jit_cranelift::tests that exercise the same shapes directly through the in-process VM / JIT harnesses (vm_run, jit_run), so llvm-cov sees the lines exercised: - vm_mset_rebind_accumulator_text_values: 8-key text accumulator - vm_mset_rebind_overwrite_same_key: drop_rc on displaced value - vm_mset_fn_arg_forces_slow_path: RC > 1 via function-call boundary - vm_mset_non_rebind_distinct_dest: a != b must not fast-path - cranelift_mset_rebind_accumulator_text: jit_mset_inplace happy path - cranelift_mset_non_rebind_does_not_alias: a != b alias guard - cranelift_mset_overwrite_drops_old_text: displaced Text RC accounting No production code change.
This was referenced May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2a of the RC-aware mutation work parked under nlp-engineer's 222k-token Moby Dick OOM cluster. Makes the
m = mset m k vaccumulator pattern run in O(n) amortised on the VM and Cranelift backends, mirroring the OP_LISTAPPEND / OP_ADD_SS RC=1 fast paths landed in PR #232. Tree-walker explicitly deferred to its own gate.Also fixes a pre-existing latent RC bug in
jit_msetand a never-reached aliasing hazard in the new helper.Repro
16k-key map accumulator (the nlp-engineer rerun shape; logs-forensics 58k-key host map was an OOM):
5k-key wall-clock test in
tests/regression_mset_accumulator.rspins the budget so future regressions show up in CI.What's in the diff
Per-commit:
vm: RC=1 in-place mset for sole-owner accumulator, OP_MSET dispatch fast path. Whena == b(the rebind shape the new peephole emits) andstrong_count == 1, mutate the HashMap through the raw pointer instead of cloning the whole map. Same unsafe pattern as OP_LISTAPPEND / OP_ADD_SS from fix: at s i on text no longer allocates a Vec per call #232.vm: compiler peephole for mset self-rebind accumulator, recognisename = mset name k vincompile_stmtand emit OP_MSET witha == b. Mirrors the existingname = +=name itemandname = +name suffixpeepholes.jit: fix latent RC bug in jit_mset and add RC=1 fast path, pre-fix helper bit-copied map entries viam.clone()but never bumped RC for retained heap values, whileHeapObj::Dropfor Map decrements every value. Existing tests only used numeric values so it never manifested. Also adds the RC=1 fast path.test + example: cross-engine regression for mset accumulator, covers chains, overwrites, Text values, List values, the RC > 1 function-call-boundary case, 500-key correctness loop, 5k-key scaling budget.examples/mset-accumulator.iloenters the engine harness viatests/examples_engines.rs.jit: only fire in-place mset when destination shares source register, review caught that the previous commit'sjit_msetfast path fired whenever RC=1 regardless of caller register layout. Withm2 = mset m k vand sole-ownerm, both slots would alias the same Rc and writes throughm2would mutatem. Split the helper:jit_msetalways clones (the pure correctness path);jit_mset_inplacemutates when RC=1, falls back to clone otherwise. The Cranelift OP_MSET compile path picks the in-place variant only whena_idx == b_idxin the bytecode, which the new peephole guarantees.test: regression for non-rebind mset preserves caller's map, pins the alias-guard contract across all three engines.Notes on what's not in scope:
Value::Map(HashMap)is not Rc-backed; converting toRc<HashMap>is a separate week-scale refactor (~1300 match sites across the crate). Parked.m2 = mregister aliasing, the VM compiler reuses the same SSA register for an alias binding, so the rebind peephole fires even when the user "expected" two distinct copies. Same property already affects OP_LISTAPPEND from fix: at s i on text no longer allocates a Vec per call #232. Pre-existing; out of scope here.jit_listappendRC=1 aliasing whena != b, the equivalent hole in the list helper from fix: at s i on text no longer allocates a Vec per call #232, surfaced by this review. Documented as a follow-up; not modified here.Test plan
cargo clippy --release --features cranelift --lib --bins --tests -- -D warnings, cleancargo fmt --check, cleantests/regression_mset_accumulator.rs, 20/20 pass on all three enginestests/examples_engines.rs, all examples (incl. newmset-accumulator.ilo) green on tree/vm/cranelift--no-fail-fast, all bundles green (the 7 libvm::compile_cranelift::tests::aot_*failures are pre-existing CARGO_TARGET_DIR/libilo.a search-path mismatches; CI doesn't enable the cranelift feature for nextest so these don't run in CI)Follow-ups
m2 = malias-then-mutate visibility (pre-existing for+=, same shape now formset)jit_listappenda != bRC=1 aliasing parity fix (mirror of this PR's helper split, againstjit_listappend)