builtins: rgxall1 flat-capture + ct count-by-predicate (rerun6 batch) by danieljohnmorris · Pull Request #333 · ilo-lang/ilo

danieljohnmorris · 2026-05-17T00:32:04Z

Summary

Two builtin additions from the rerun6 persona round, both manifesto-aligned token-economy wins on multi-rerun-old shapes:

rgxall1 pat s -> L t — flat first-capture-group convenience over rgxall. Closes the ext xs:L t>t;hd xs helper that the html-scraper persona kept paying ~30 tokens to re-declare across five reruns (rerun5, line 4988).
ct fn xs -> n — count-by-predicate, parallel to flt's 2/3-arg shape. Avoids the intermediate L b allocation len (flt p xs) pays on every call. Originating ask: bioinformatics rerun6 tm=ct has-tm seqs over 20k proteins (line 5028).

Named ct rather than the persona's wished-for cnt: cnt is reserved as the loop-continue keyword (src/parser/mod.rs:3507). Two chars beats three, no parser surgery, the loop-control reservation stays intact (regression-covered).

Repro before/after

rgxall1. html-scraper persona, "extract every match of one capture group":

-- Before (rerun5, 40 LOC; ext is the only helper that pays no semantic rent):
ext xs:L t>t;hd xs
titles=map ext (rgxall "<a class=\"titleline\">([^<]+)</a>" html)

-- After (PR):
titles=rgxall1 "<a class=\"titleline\">([^<]+)</a>" html

ct. bioinformatics persona, "how many proteins have a TM helix":

-- Before:
tm=len (flt has-tm seqs)   -- allocates L b sized to the keepers

-- After:
tm=ct has-tm seqs           -- straight counter, zero intermediate alloc

What's in the diff (per commit)

f3aed43 builtins: rgxall1 pat s for flat first-capture-group extraction — adds Builtin::Rgxall1, tree-walker arm, tree-bridge eligibility, propagate-error allow-list entry. 0 groups → flat L t of whole matches; 1 group → flat L t of capture-1 strings; 2+ groups → runtime ILO-R009 pointing at rgxall. 8 cross-engine regression tests + examples/rgxall1-flat-captures.ilo.
e86dba9 builtins: ct fn xs for count-by-predicate without the filtered-list alloc — adds Builtin::Ct with predicate signature identical to flt's, 2-arg and 3-arg closure-bind shapes. Tree-bridge wired for VM/Cranelift parity; runtime non-bool predicate errors join the propagate allow-list. No count long-form alias on purpose (caught a real false-positive that would have trampled the user-fn name in examples/unq-numbers.ilo). 10 cross-engine regression tests including a cnt-as-continue coexistence regression guard + examples/ct-count-by-predicate.ilo.
a0a1ee2 docs: sync SPEC.md, ai.txt, SKILL.md for rgxall1 and ct builtins — every keep-in-sync surface gets new rows in the builtin tables and the new names in the cheatsheet/HOF lists. Site builtins.md lands in a follow-up ilo-lang/site PR.
4f91c38 tests: tighten ct non-bool predicate assertion to anchor on "ct:" — rust-review caught that bare stderr.contains("ct") would match the substring inside "expected" and let unrelated wording satisfy the assertion. Anchor on the literal ct: prefix from the ILO-R009 message instead.

Manifesto framing

These two compose: any persona that today writes len (flt … rgxall …) or map (hd …) rgxall … is paying both a helper-decl tax and a redundant allocation. With rgxall1 + ct the canonical "scrape one HTML page, count items matching a predicate" pipeline shaves ~50 tokens and one O(n) alloc per occurrence. Across the agent population shape-of-task that is real compression.

Both ride the tree-bridge — same dispatch contract as rgxall, flt 3, grp 2. No backend codegen changes, no JIT lowering complexity, full cross-engine parity for free.

Test plan

cargo test --release --features cranelift — full suite green post-rebase
cargo clippy --features cranelift --all-targets -- -W clippy::all — clean
cargo test --features cranelift --test regression_rgxall1 — 8/8 pass on tree/VM/Cranelift
cargo test --features cranelift --test regression_ct — 10/10 pass on tree/VM/Cranelift, plus cnt-as-continue coexistence regression guard
cargo test --features cranelift --test examples_engines — both new examples run on every engine
Builtin::ALL tag round-trip preserved (entries appended after Sleep, no reordering)
Self rust-review pass complete; one false-positive test assertion caught and tightened

Follow-ups

PR B (ILO-W002 unused-binding warning, security-researcher rerun5 ask) branches off main after this merges, per the two-PR split agreed at the gate.
Site doc sync (ilo-lang/site builtins.md) lands in a follow-up PR against the site repo once this merges.

rgxall returns L (L t) — every match wrapped in a list of capture groups (or a 1-element whole-match list when there are no groups). The common 'extract every match of one capture group' shape pays a flatten/hd helper per use site, which the html-scraper persona has been writing as 'ext xs:L t>t;hd xs' across five reruns. rgxall1 collapses that to a single call: - 0 groups: flat L t of whole matches - 1 group : flat L t of capture-1 strings (skipping non-participating captures under alternation, parallel to rgxall semantics) - 2+ groups: runtime ILO-R009 pointing back at rgxall Cross-engine: tree-bridge eligible (parallel to Rgxall). Multi-group error joins the propagate list so Cranelift surfaces the diagnostic instead of silently returning nil. 8 cross-engine regression tests (tree/VM/Cranelift) plus an examples/ demo for the engine harness. Closes the originating ext-helper friction flagged in ilo_assessment_feedback.md html-scraper rerun5 (line 4988).

…lloc `len (flt pred xs)` is a five-rerun-old shape that costs a full L b allocation on every call just to ask 'how many'. Bioinformatics rerun6 wanted `tm=cnt has-tm seqs` over 20k proteins; `cnt` is reserved as the loop-continue keyword, so the builtin is `ct` instead. Two chars beats the three-char ask in token economy, with zero parser surgery. ct fn xs -- count xs where fn returns true n ct fn ctx xs -- closure-bind variant, parallel to flt 3 n Predicate signature is identical to flt's; predicate must return b or the runtime raises ILO-R009 on every engine (Builtin::Ct joins the tree_bridge_propagates_error allow-list so Cranelift surfaces the diagnostic in lockstep with tree and VM). Cross-engine: rides the tree-bridge alongside Flt 3, Grp 2, Uniqby 2. Tree interpreter owns the predicate dispatch; VM and Cranelift both route through OP_CALL_BUILTIN_TREE. No long-form alias. `count` is a common user-fn name (see examples/unq-numbers.ilo, which would otherwise re-dispatch through the alias resolver into the builtin). Users wanting a long form can keep their own count helper and call ct directly when they want the builtin. 10 cross-engine regression tests + examples/ct-count-by-predicate.ilo. Originating ask: ilo_assessment_feedback.md line 5028.

Adds rgxall1 to the regex section and ct to the higher-order section of each of the three keep-in-sync surfaces: - SPEC.md: builtin tables get one row each for rgxall1 and ct. - ai.txt: builtin cheatsheet adds rgxall1 next to rgxall and ct next to flt. - skills/ilo/SKILL.md: same long-form table plus the Text and HOF cheatsheet lines and the cross-engine HOF list (rgxall1 in Text, ct in HOF, both in the all-HOFs-work-cross-engine sentence). Site builtins.md ships in a follow-up PR against ilo-lang/site (the site is a separate repo).

Bare "ct" matches "expected" (which contains the substring "ct") and would let unrelated error wording satisfy the assertion. Anchor on the literal "ct:" prefix that ILO-R009 emits from the runtime path, which is unambiguous.

codecov · 2026-05-17T00:33:04Z

Codecov Report

❌ Patch coverage is 71.66667% with 34 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/interpreter/mod.rs	70.66%	22 Missing ⚠️
src/verify.rs	62.50%	12 Missing ⚠️

📢 Thoughts on this report? Let us know!

Minor phrasing harmonisation - the three keep-in-sync surfaces had drifted slightly during the initial doc-sync commit (one used 'avoids the intermediate list alloc of len(flt fn xs)', the others wrote it as 'avoids len(flt fn xs)\'s intermediate list alloc'). Pick the shorter form from SPEC.md for the other two surfaces. Same applies to the rgxall1 row.

fifteen fixes since 0.11.5, all from rerun5/rerun6 personas plus standing asks: ListView foundation (#334), window-text-perf reshape via ListView (#336), inner-flt predicate inlining (#340), double-minus trap ILO-P021 (#331), bare-ident bang silent-nil regression (#324), Cranelift JIT span plumbing (#335), bool-prefix ternary (#330), wh prefix-cond reparse (#332), --run-engine auto-pick main (#329), subcommand helper hyphens+non-ident (#328), runtime error spans (#335), persona-diagnostic batch 3 (#327), rgxall1+ct (#333), single-line body diagnostic (#322 carry), lambda type-var defensive test (#326), N-deep prefix arity error (#339), prefix-minus span column drift (#338), doc-sync (#337).

danieljohnmorris added 4 commits May 17, 2026 01:25

danieljohnmorris merged commit d731e47 into main May 17, 2026
4 of 5 checks passed

danieljohnmorris deleted the feature/rerun6-additions branch May 17, 2026 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

builtins: rgxall1 flat-capture + ct count-by-predicate (rerun6 batch)#333

builtins: rgxall1 flat-capture + ct count-by-predicate (rerun6 batch)#333
danieljohnmorris merged 5 commits into
mainfrom
feature/rerun6-additions

danieljohnmorris commented May 17, 2026

Uh oh!

codecov Bot commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 17, 2026

Summary

Repro before/after

What's in the diff (per commit)

Manifesto framing

Test plan

Follow-ups

Uh oh!

codecov Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 17, 2026 •

edited

Loading