fix: at s i on text no longer allocates a Vec per call by danieljohnmorris · Pull Request #232 · ilo-lang/ilo

danieljohnmorris · 2026-05-13T12:51:31Z

Summary

Persona reports of OOMs at the 222k-token corpus scale (Moby Dick word frequency, per-char lowercasing) traced to a more mundane root cause: at s i on a text value was doing s.chars().collect::<Vec<char>>() on every call to find the i-th codepoint. A @i 0..len s{c=at s i} loop was therefore O(n^2) in time AND allocating a fresh Vec per iteration. The Vec allocator churn was the OOM-looking trigger at corpus scale.

This is Phase 1 of a two-part fix. Phase 2 (RC-aware mutation for the +/+= accumulator pattern in concat, which is the other half of the same scaling story) is captured as a deferred entry in ilo_assessment_feedback.md under the [nlp-engineer] persona section.

Repro

go>n;s="";@k 0..100000{s=+s "A"};l=len s;n=0;@i 0..l{c=at s i;n=+n 1};n

Before: tree 5.97s, vm/cranelift ~5s, RSS climbing the whole time.
After: tree 0.58s, no allocator pressure from at. 10x wall-clock win, no per-call Vec.

What's in the diff

add char_at_signed helper for allocation-free i-th codepoint lookup (src/builtins.rs). New pub(crate) fn char_at_signed(s: &str, raw_idx: i64) -> CharAtResult with Found(char) / OutOfRange { len } variants. Positive indices use chars().nth(idx); negative indices pay one chars().count() then chars().nth(adjusted). Allocation-free in every path. Unit tests cover ascii/unicode x positive/negative x in-range/oor.

Deliberately NOT branching on s.is_ascii() for a constant-time ascii path here, because is_ascii itself is O(n) and would dominate per-call cost. True O(1) ascii indexing needs a cached is_ascii flag on the string value; that's tied up with the deferred Phase 2 work.
wire at builtin through char_at_signed in tree, vm, and cranelift jit. Three call sites in src/interpreter/mod.rs, src/vm/mod.rs (OP_AT dispatch + jit_at helper). Behaviour preserved: tree/vm still raise ILO-R009 on out-of-range; cranelift still returns nil to match its existing hd/at JIT semantics.
test: cross-engine regression coverage for at on text values. New tests/regression_at_string.rs pins ascii + unicode + negative-index behaviour across all three engines, plus a 50k-char scaling sanity check with a 30s wall-clock budget that will trip on any return to per-call Vec allocation. New examples/string-large-at.ilo shows the now-cheap per-char idiom.

Test plan

cargo fmt
cargo test --release --features cranelift - full suite green
cargo clippy --release --features cranelift --all-targets -- -D warnings - clean
Existing regression_at_builtin.rs still passes (list-of-number and list-of-text cases unchanged)
New regression_at_string.rs passes on tree, vm, cranelift
examples/string-large-at.ilo asserts pass through tests/examples_engines.rs

Follow-ups

Phase 2: RC-aware mutation for the +/+= accumulator pattern (deferred in ilo_assessment_feedback.md). That's the other O(n^2) leg of the same NLP corpus scaling story.
Cached is_ascii flag on Value::Text / HeapObj::Str for true O(1) ascii indexing.
The other chars().collect::<Vec<char>>() call sites in interpreter and vm (slc/rev/srt on text, etc.) have the same pattern; not touched here to keep this PR tight. Worth a sweep once Phase 2 lands.

Each engine was doing s.chars().collect::<Vec<char>>() on every at-call to find the i-th codepoint, making per-char loops over a string O(n^2) and allocating a fresh Vec per iteration. The Vec allocator pressure was the observable trigger behind the 222k-token OOM reports from NLP personas. Behaviour is unchanged: tree and vm still raise ILO-R009 on out-of-range; cranelift still returns nil to match its existing hd/at JIT semantics.

Pins at-on-text behaviour across tree, vm, and cranelift for ascii and unicode strings, positive and negative indices, plus out-of-range. The scaling test runs a 50k-char per-char loop with a 30s wall-clock budget so a regression to chars().collect-per-call shows up loudly. Example demonstrates the now-cheap per-char idiom (upper-letter count, forward vs negative-index roundtrip).

codecov · 2026-05-13T12:54:46Z

Codecov Report

❌ Patch coverage is 95.23810% with 3 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/builtins.rs	94.11%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

CI runs cargo nextest without --release, so the 50k-char build phase (itself quadratic, deferred Phase 2) blew the 30s wall-clock budget on slower runners (37-52s observed across tree/vm/cranelift). The time check was conflating build cost with at-call cost anyway; the perf claim belongs in the PR / commit message, not the suite. Keep the cross-engine correctness check on a smaller 2k-char loop so a future regression that drops or doubles characters still trips.

danieljohnmorris added 3 commits May 13, 2026 13:50

add char_at_signed helper for allocation-free i-th codepoint lookup

2606322

danieljohnmorris merged commit e6766a0 into main May 13, 2026
5 checks passed

danieljohnmorris deleted the fix/at-string-quadratic branch May 13, 2026 13:01

This was referenced May 13, 2026

perf: O(n²)→O(n) mset accumulator via RC=1 in-place HashMap mutation #249

Merged

fix: prevent OP_LISTAPPEND non-rebind aliasing + pin mset contract #250

Merged

fix: prevent + non-rebind aliasing on string concat (VM + Cranelift) #260

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: at s i on text no longer allocates a Vec per call#232

fix: at s i on text no longer allocates a Vec per call#232
danieljohnmorris merged 4 commits into
mainfrom
fix/at-string-quadratic

danieljohnmorris commented May 13, 2026

Uh oh!

codecov Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 13, 2026

Summary

Repro

What's in the diff

Test plan

Follow-ups

Uh oh!

codecov Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 13, 2026 •

edited

Loading