builtins: add ord and chr for per-char codepoint round-trip#222
Merged
Conversation
Adds enum slots, name resolver, parser arity (both 1-arg), and verifier type signatures. ord is t->n (first char's Unicode codepoint), chr is n->t (codepoint to a single-char string). Round-trip test list updated. No engine code yet; that lives in later commits.
ord returns the first character's Unicode scalar as a number; errors on an empty string. chr validates the codepoint is a finite, non-negative integer in the u32 range and reachable via char::from_u32, then returns a single-character string. Round-trips chr(ord(c)) == c for any single-char ASCII string.
New opcode slots for the codepoint round-trip pair. The VM arms mirror the validation in the tree walker: ord requires a non-empty heap string, chr requires a finite non-negative integer codepoint that char::from_u32 accepts. Also adds the jit_ord and jit_chr extern-C helpers used by the Cranelift backends. The helpers return TAG_NIL on invalid args (matching the at/hd/padl precedent the JIT can't raise typed errors without unwinding through Cranelift).
Adds helper FuncIds, declarations, and emission arms for OP_ORD and OP_CHR in both the AOT object backend and the in-process JIT. The register-type classifier learns that OP_ORD always writes a number and OP_CHR always writes a non-number (string), so the surrounding type-tracking still proves register types for the boxing fast paths.
Twelve cross-engine cases plus an engine-split empty-string error test (tree+VM raise; Cranelift returns nil per existing precedent). Covers ASCII letters and digits, multi-byte UTF-8 (é = U+00E9 = 233), codepoint 0 round-trip, takes-first-char-only, and chr(ord(c)) / ord(chr(n)) round-trips. Also adds examples/ord-chr.ilo with -- run / -- out directives so the example-engine harness exercises the new builtins as a higher- level regression, and so agents have an in-context learning example.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
ord c:t -> n(first char's Unicode codepoint) andchr n:n -> t(codepoint to single-char string). Closes a real-world gap surfaced in the assessment doc: every persona doing per-character text work (NLP tokenisation, classifier features, ASCII-fold lowercasing) had to hand-roll a 26-entrymsetlookup map becausenum "H"returns a parse error rather than 72. Two opcodes for what would otherwise be ~30 lines of boilerplate per program is a clean manifesto win.Standard arity for both, table-stakes builtins that every language ships. Pattern-matched against PR #190 (upr/lwr/cap) for full coverage across tree walker, bytecode VM, and Cranelift (AOT + JIT).
Repro
Before:
After:
What's in the diff
ordist->n,chrisn->t).char::from_u32range for chr.jit_ord/jit_chrextern-C helpers (returnTAG_NILon invalid args matching at/hd/padl precedent the JIT can't raise typed errors without unwinding).compile_cranelift.rsandjit_cranelift.rs. Register-type classifier learnsOP_ORDwrites a number andOP_CHRwrites a non-number.chr(ord(c))andord(chr(n))round-trips. Plusexamples/ord-chr.ilowith-- run/-- outdirectives so the example-engine harness exercises it.Test plan
cargo build --features craneliftcleancargo test --release --features cranelift --test regression_ord_chr— 13/13 passcargo test --release --features cranelift --test examples_engines— example exercised across all enginescargo test --release --features cranelift— full suite greencargo clippy --release --features cranelift --all-targets -- -D warnings— cleancargo fmtcleanFollow-ups
None planned. If real-world use surfaces a need for
ord/chrover a list (e.g.map ord (spl s "")for full string-to-codepoints), the existingmapbuiltin already covers that path.