builtins: rgxall1 flat-capture + ct count-by-predicate (rerun6 batch)#333
Merged
Conversation
rgxall returns L (L t) — every match wrapped in a list of capture groups
(or a 1-element whole-match list when there are no groups). The common
'extract every match of one capture group' shape pays a flatten/hd helper
per use site, which the html-scraper persona has been writing as 'ext
xs:L t>t;hd xs' across five reruns.
rgxall1 collapses that to a single call:
- 0 groups: flat L t of whole matches
- 1 group : flat L t of capture-1 strings (skipping non-participating
captures under alternation, parallel to rgxall semantics)
- 2+ groups: runtime ILO-R009 pointing back at rgxall
Cross-engine: tree-bridge eligible (parallel to Rgxall). Multi-group
error joins the propagate list so Cranelift surfaces the diagnostic
instead of silently returning nil.
8 cross-engine regression tests (tree/VM/Cranelift) plus an examples/
demo for the engine harness. Closes the originating ext-helper friction
flagged in ilo_assessment_feedback.md html-scraper rerun5 (line 4988).
…lloc `len (flt pred xs)` is a five-rerun-old shape that costs a full L b allocation on every call just to ask 'how many'. Bioinformatics rerun6 wanted `tm=cnt has-tm seqs` over 20k proteins; `cnt` is reserved as the loop-continue keyword, so the builtin is `ct` instead. Two chars beats the three-char ask in token economy, with zero parser surgery. ct fn xs -- count xs where fn returns true n ct fn ctx xs -- closure-bind variant, parallel to flt 3 n Predicate signature is identical to flt's; predicate must return b or the runtime raises ILO-R009 on every engine (Builtin::Ct joins the tree_bridge_propagates_error allow-list so Cranelift surfaces the diagnostic in lockstep with tree and VM). Cross-engine: rides the tree-bridge alongside Flt 3, Grp 2, Uniqby 2. Tree interpreter owns the predicate dispatch; VM and Cranelift both route through OP_CALL_BUILTIN_TREE. No long-form alias. `count` is a common user-fn name (see examples/unq-numbers.ilo, which would otherwise re-dispatch through the alias resolver into the builtin). Users wanting a long form can keep their own count helper and call ct directly when they want the builtin. 10 cross-engine regression tests + examples/ct-count-by-predicate.ilo. Originating ask: ilo_assessment_feedback.md line 5028.
Adds rgxall1 to the regex section and ct to the higher-order section of each of the three keep-in-sync surfaces: - SPEC.md: builtin tables get one row each for rgxall1 and ct. - ai.txt: builtin cheatsheet adds rgxall1 next to rgxall and ct next to flt. - skills/ilo/SKILL.md: same long-form table plus the Text and HOF cheatsheet lines and the cross-engine HOF list (rgxall1 in Text, ct in HOF, both in the all-HOFs-work-cross-engine sentence). Site builtins.md ships in a follow-up PR against ilo-lang/site (the site is a separate repo).
Bare "ct" matches "expected" (which contains the substring "ct") and would let unrelated error wording satisfy the assertion. Anchor on the literal "ct:" prefix that ILO-R009 emits from the runtime path, which is unambiguous.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Minor phrasing harmonisation - the three keep-in-sync surfaces had drifted slightly during the initial doc-sync commit (one used 'avoids the intermediate list alloc of len(flt fn xs)', the others wrote it as 'avoids len(flt fn xs)\'s intermediate list alloc'). Pick the shorter form from SPEC.md for the other two surfaces. Same applies to the rgxall1 row.
danieljohnmorris
added a commit
that referenced
this pull request
May 17, 2026
fifteen fixes since 0.11.5, all from rerun5/rerun6 personas plus standing asks: ListView foundation (#334), window-text-perf reshape via ListView (#336), inner-flt predicate inlining (#340), double-minus trap ILO-P021 (#331), bare-ident bang silent-nil regression (#324), Cranelift JIT span plumbing (#335), bool-prefix ternary (#330), wh prefix-cond reparse (#332), --run-engine auto-pick main (#329), subcommand helper hyphens+non-ident (#328), runtime error spans (#335), persona-diagnostic batch 3 (#327), rgxall1+ct (#333), single-line body diagnostic (#322 carry), lambda type-var defensive test (#326), N-deep prefix arity error (#339), prefix-minus span column drift (#338), doc-sync (#337).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two builtin additions from the rerun6 persona round, both manifesto-aligned token-economy wins on multi-rerun-old shapes:
rgxall1 pat s -> L t— flat first-capture-group convenience overrgxall. Closes theext xs:L t>t;hd xshelper that the html-scraper persona kept paying ~30 tokens to re-declare across five reruns (rerun5, line 4988).ct fn xs -> n— count-by-predicate, parallel toflt's 2/3-arg shape. Avoids the intermediateL ballocationlen (flt p xs)pays on every call. Originating ask: bioinformatics rerun6tm=ct has-tm seqsover 20k proteins (line 5028).Named
ctrather than the persona's wished-forcnt:cntis reserved as the loop-continue keyword (src/parser/mod.rs:3507). Two chars beats three, no parser surgery, the loop-control reservation stays intact (regression-covered).Repro before/after
rgxall1. html-scraper persona, "extract every match of one capture group":
ct. bioinformatics persona, "how many proteins have a TM helix":
What's in the diff (per commit)
f3aed43builtins: rgxall1 pat s for flat first-capture-group extraction — addsBuiltin::Rgxall1, tree-walker arm, tree-bridge eligibility, propagate-error allow-list entry. 0 groups → flatL tof whole matches; 1 group → flatL tof capture-1 strings; 2+ groups → runtimeILO-R009pointing atrgxall. 8 cross-engine regression tests +examples/rgxall1-flat-captures.ilo.e86dba9builtins: ct fn xs for count-by-predicate without the filtered-list alloc — addsBuiltin::Ctwith predicate signature identical toflt's, 2-arg and 3-arg closure-bind shapes. Tree-bridge wired for VM/Cranelift parity; runtime non-bool predicate errors join the propagate allow-list. Nocountlong-form alias on purpose (caught a real false-positive that would have trampled the user-fn name inexamples/unq-numbers.ilo). 10 cross-engine regression tests including acnt-as-continue coexistence regression guard +examples/ct-count-by-predicate.ilo.a0a1ee2docs: sync SPEC.md, ai.txt, SKILL.md for rgxall1 and ct builtins — every keep-in-sync surface gets new rows in the builtin tables and the new names in the cheatsheet/HOF lists. Site builtins.md lands in a follow-upilo-lang/sitePR.4f91c38tests: tighten ct non-bool predicate assertion to anchor on "ct:" — rust-review caught that barestderr.contains("ct")would match the substring inside "expected" and let unrelated wording satisfy the assertion. Anchor on the literalct:prefix from the ILO-R009 message instead.Manifesto framing
These two compose: any persona that today writes
len (flt … rgxall …)ormap (hd …) rgxall …is paying both a helper-decl tax and a redundant allocation. Withrgxall1+ctthe canonical "scrape one HTML page, count items matching a predicate" pipeline shaves ~50 tokens and one O(n) alloc per occurrence. Across the agent population shape-of-task that is real compression.Both ride the tree-bridge — same dispatch contract as
rgxall,flt 3,grp 2. No backend codegen changes, no JIT lowering complexity, full cross-engine parity for free.Test plan
cargo test --release --features cranelift— full suite green post-rebasecargo clippy --features cranelift --all-targets -- -W clippy::all— cleancargo test --features cranelift --test regression_rgxall1— 8/8 pass on tree/VM/Craneliftcargo test --features cranelift --test regression_ct— 10/10 pass on tree/VM/Cranelift, pluscnt-as-continue coexistence regression guardcargo test --features cranelift --test examples_engines— both new examples run on every engineBuiltin::ALLtag round-trip preserved (entries appended afterSleep, no reordering)Follow-ups
ILO-W002unused-binding warning, security-researcher rerun5 ask) branches off main after this merges, per the two-PR split agreed at the gate.ilo-lang/sitebuiltins.md) lands in a follow-up PR against the site repo once this merges.