Skip to content

fix: at s i on text no longer allocates a Vec per call#232

Merged
danieljohnmorris merged 4 commits into
mainfrom
fix/at-string-quadratic
May 13, 2026
Merged

fix: at s i on text no longer allocates a Vec per call#232
danieljohnmorris merged 4 commits into
mainfrom
fix/at-string-quadratic

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Persona reports of OOMs at the 222k-token corpus scale (Moby Dick word frequency, per-char lowercasing) traced to a more mundane root cause: at s i on a text value was doing s.chars().collect::<Vec<char>>() on every call to find the i-th codepoint. A @i 0..len s{c=at s i} loop was therefore O(n^2) in time AND allocating a fresh Vec per iteration. The Vec allocator churn was the OOM-looking trigger at corpus scale.

This is Phase 1 of a two-part fix. Phase 2 (RC-aware mutation for the +/+= accumulator pattern in concat, which is the other half of the same scaling story) is captured as a deferred entry in ilo_assessment_feedback.md under the [nlp-engineer] persona section.

Repro

go>n;s="";@k 0..100000{s=+s "A"};l=len s;n=0;@i 0..l{c=at s i;n=+n 1};n

Before: tree 5.97s, vm/cranelift ~5s, RSS climbing the whole time.
After: tree 0.58s, no allocator pressure from at. 10x wall-clock win, no per-call Vec.

What's in the diff

  • add char_at_signed helper for allocation-free i-th codepoint lookup (src/builtins.rs). New pub(crate) fn char_at_signed(s: &str, raw_idx: i64) -> CharAtResult with Found(char) / OutOfRange { len } variants. Positive indices use chars().nth(idx); negative indices pay one chars().count() then chars().nth(adjusted). Allocation-free in every path. Unit tests cover ascii/unicode x positive/negative x in-range/oor.

    Deliberately NOT branching on s.is_ascii() for a constant-time ascii path here, because is_ascii itself is O(n) and would dominate per-call cost. True O(1) ascii indexing needs a cached is_ascii flag on the string value; that's tied up with the deferred Phase 2 work.

  • wire at builtin through char_at_signed in tree, vm, and cranelift jit. Three call sites in src/interpreter/mod.rs, src/vm/mod.rs (OP_AT dispatch + jit_at helper). Behaviour preserved: tree/vm still raise ILO-R009 on out-of-range; cranelift still returns nil to match its existing hd/at JIT semantics.

  • test: cross-engine regression coverage for at on text values. New tests/regression_at_string.rs pins ascii + unicode + negative-index behaviour across all three engines, plus a 50k-char scaling sanity check with a 30s wall-clock budget that will trip on any return to per-call Vec allocation. New examples/string-large-at.ilo shows the now-cheap per-char idiom.

Test plan

  • cargo fmt
  • cargo test --release --features cranelift - full suite green
  • cargo clippy --release --features cranelift --all-targets -- -D warnings - clean
  • Existing regression_at_builtin.rs still passes (list-of-number and list-of-text cases unchanged)
  • New regression_at_string.rs passes on tree, vm, cranelift
  • examples/string-large-at.ilo asserts pass through tests/examples_engines.rs

Follow-ups

  • Phase 2: RC-aware mutation for the +/+= accumulator pattern (deferred in ilo_assessment_feedback.md). That's the other O(n^2) leg of the same NLP corpus scaling story.
  • Cached is_ascii flag on Value::Text / HeapObj::Str for true O(1) ascii indexing.
  • The other chars().collect::<Vec<char>>() call sites in interpreter and vm (slc/rev/srt on text, etc.) have the same pattern; not touched here to keep this PR tight. Worth a sweep once Phase 2 lands.

Each engine was doing s.chars().collect::<Vec<char>>() on every at-call
to find the i-th codepoint, making per-char loops over a string O(n^2)
and allocating a fresh Vec per iteration. The Vec allocator pressure
was the observable trigger behind the 222k-token OOM reports from NLP
personas. Behaviour is unchanged: tree and vm still raise ILO-R009 on
out-of-range; cranelift still returns nil to match its existing hd/at
JIT semantics.
Pins at-on-text behaviour across tree, vm, and cranelift for ascii and
unicode strings, positive and negative indices, plus out-of-range. The
scaling test runs a 50k-char per-char loop with a 30s wall-clock budget
so a regression to chars().collect-per-call shows up loudly. Example
demonstrates the now-cheap per-char idiom (upper-letter count, forward
vs negative-index roundtrip).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/builtins.rs 94.11% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

CI runs cargo nextest without --release, so the 50k-char build phase
(itself quadratic, deferred Phase 2) blew the 30s wall-clock budget on
slower runners (37-52s observed across tree/vm/cranelift). The time
check was conflating build cost with at-call cost anyway; the perf
claim belongs in the PR / commit message, not the suite. Keep the
cross-engine correctness check on a smaller 2k-char loop so a future
regression that drops or doubles characters still trips.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant