Skip to content

builtins: add ord and chr for per-char codepoint round-trip#222

Merged
danieljohnmorris merged 5 commits into
mainfrom
fix/ord-chr-builtins
May 13, 2026
Merged

builtins: add ord and chr for per-char codepoint round-trip#222
danieljohnmorris merged 5 commits into
mainfrom
fix/ord-chr-builtins

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Adds ord c:t -> n (first char's Unicode codepoint) and chr n:n -> t (codepoint to single-char string). Closes a real-world gap surfaced in the assessment doc: every persona doing per-character text work (NLP tokenisation, classifier features, ASCII-fold lowercasing) had to hand-roll a 26-entry mset lookup map because num "H" returns a parse error rather than 72. Two opcodes for what would otherwise be ~30 lines of boilerplate per program is a clean manifesto win.

Standard arity for both, table-stakes builtins that every language ships. Pattern-matched against PR #190 (upr/lwr/cap) for full coverage across tree walker, bytecode VM, and Cranelift (AOT + JIT).

Repro

Before:

$ echo 'f>n;num "H"' | ilo /dev/stdin --run-tree f
# returns Err("H") — no way to get 72 without a hand-rolled 26-entry map

After:

$ echo 'f>n;ord "H"' | ilo /dev/stdin --run-tree f
72
$ echo 'f>t;chr 72' | ilo /dev/stdin --run-tree f
H

What's in the diff

  • builtins: register ord and chr in name table and type sigs — enum slots, name resolver, parser arity (1-arg both), verifier signatures (ord is t->n, chr is n->t).
  • interpreter: dispatch ord and chr in the tree walker — validates non-empty string for ord; validates finite non-negative integer codepoint within u32 / char::from_u32 range for chr.
  • vm: add OP_ORD=153 and OP_CHR=154 with VM dispatch — same validation in VM arms. Adds jit_ord / jit_chr extern-C helpers (return TAG_NIL on invalid args matching at/hd/padl precedent the JIT can't raise typed errors without unwinding).
  • cranelift: wire ord and chr through AOT and JIT backends — helper FuncIds, declarations, emission arms in both compile_cranelift.rs and jit_cranelift.rs. Register-type classifier learns OP_ORD writes a number and OP_CHR writes a non-number.
  • tests: cross-engine regression coverage for ord and chr — 12 cross-engine tests + an engine-split empty-string error test (tree+VM raise; Cranelift returns nil). Covers ASCII letters/digits, multi-byte UTF-8 (é = U+00E9 = 233), codepoint 0 round-trip, takes-first-char-only, chr(ord(c)) and ord(chr(n)) round-trips. Plus examples/ord-chr.ilo with -- run / -- out directives so the example-engine harness exercises it.

Test plan

  • cargo build --features cranelift clean
  • cargo test --release --features cranelift --test regression_ord_chr — 13/13 pass
  • cargo test --release --features cranelift --test examples_engines — example exercised across all engines
  • cargo test --release --features cranelift — full suite green
  • cargo clippy --release --features cranelift --all-targets -- -D warnings — clean
  • cargo fmt clean
  • CI green on push

Follow-ups

None planned. If real-world use surfaces a need for ord/chr over a list (e.g. map ord (spl s "") for full string-to-codepoints), the existing map builtin already covers that path.

Adds enum slots, name resolver, parser arity (both 1-arg), and verifier
type signatures. ord is t->n (first char's Unicode codepoint), chr is
n->t (codepoint to a single-char string). Round-trip test list updated.

No engine code yet; that lives in later commits.
ord returns the first character's Unicode scalar as a number; errors
on an empty string. chr validates the codepoint is a finite,
non-negative integer in the u32 range and reachable via char::from_u32,
then returns a single-character string.

Round-trips chr(ord(c)) == c for any single-char ASCII string.
New opcode slots for the codepoint round-trip pair. The VM arms mirror
the validation in the tree walker: ord requires a non-empty heap
string, chr requires a finite non-negative integer codepoint that
char::from_u32 accepts.

Also adds the jit_ord and jit_chr extern-C helpers used by the
Cranelift backends. The helpers return TAG_NIL on invalid args
(matching the at/hd/padl precedent the JIT can't raise typed errors
without unwinding through Cranelift).
Adds helper FuncIds, declarations, and emission arms for OP_ORD and
OP_CHR in both the AOT object backend and the in-process JIT. The
register-type classifier learns that OP_ORD always writes a number
and OP_CHR always writes a non-number (string), so the surrounding
type-tracking still proves register types for the boxing fast paths.
Twelve cross-engine cases plus an engine-split empty-string error
test (tree+VM raise; Cranelift returns nil per existing precedent).
Covers ASCII letters and digits, multi-byte UTF-8 (é = U+00E9 = 233),
codepoint 0 round-trip, takes-first-char-only, and chr(ord(c)) /
ord(chr(n)) round-trips.

Also adds examples/ord-chr.ilo with -- run / -- out directives so
the example-engine harness exercises the new builtins as a higher-
level regression, and so agents have an in-context learning example.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 52.60116% with 82 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/vm/mod.rs 47.69% 34 Missing ⚠️
src/verify.rs 40.00% 18 Missing ⚠️
src/interpreter/mod.rs 52.94% 16 Missing ⚠️
src/vm/compile_cranelift.rs 17.64% 14 Missing ⚠️

📢 Thoughts on this report? Let us know!

@danieljohnmorris danieljohnmorris merged commit 51b7447 into main May 13, 2026
4 of 5 checks passed
@danieljohnmorris danieljohnmorris deleted the fix/ord-chr-builtins branch May 13, 2026 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant