Snake_case field names in dot-access position by danieljohnmorris · Pull Request #160 · ilo-lang/ilo

danieljohnmorris · 2026-05-11T18:21:35Z

Summary

Resolves the doc-cited "real-world JSON is overwhelmingly snake_case, so dot-access is a minority path" friction. Every persona that touched real JSON in the assessment runs (stargazers_count, change_1d, people_vaccinated_per_hundred, budget_authority_amount, ema_20d_change_5d) had to fall back to the verbose jpth! per-field workaround because r.snake_case failed at the lexer.

Through the manifesto lens: every API call agents make against real-world endpoints used to require a token-expensive workaround. Now the natural dot-access form works.

Approach

Lexer-cooperative — added a post-lex merge pass that runs immediately before PR #154's ILO-L002 friendly-error scan. After consuming Dot or DotQuestion, contiguous Ident (Underscore (Ident|Number Ident?))* runs (with strict span-contiguity, no whitespace gaps) get spliced into a single Token::Ident using the original source slice. The L002 scan immediately below is byte-for-byte unchanged, so plain bindings like my_var=5 still emit the friendly underscore error — only dot-access position gets the loosened identifier rule.

The merge handles three shapes uniformly inside one loop:

field_name (Ident _ Ident)
change_1d (Ident _ Number Ident)
ema_20d_change_5d (alternating Ident / Number-Ident segments)

The trailing-letter-after-Number absorb lives inside the loop so each _Number Ident? group is consumed atomically and the loop continues. Without this, alternating real-world names like ema_20d_change_5d would only stitch the first two segments.

What's in the diff

lexer: merge snake_case field names in dot-access position — src/lexer/mod.rs post-lex pass, +64 lines. No parser changes needed; expect_ident() after a Dot now sees one merged Ident.
tests + example: pin snake_case dot-access across engines — 22 cross-engine tests covering simple, multi-segment, digit-trailing-letter, bare-digit, alternating, and real-world shapes, plus negative regressions (my_var=5 still errors with ILO-L002; r.foo bar keeps tokens separate). examples/snake-fields.ilo demonstrates the pattern with three -- run: / -- out: cases so tests/examples_engines.rs exercises each shape across every engine.

Test plan

cargo test --release --features cranelift — full suite green, +10 new regression tests + 6 new example-engine cases
cargo fmt --all -- --check clean
cargo clippy --all-targets --features cranelift -- -D warnings clean
PR Friendly errors for identifier-confusion cases #154's 30 friendly-error tests still pass unchanged
r.ema_20d_change_5d shape (alternating segments) stitches correctly across tree/vm/cranelift
my_var=5 bindings still emit ILO-L002 (binding-position underscore still flagged)
r.foo bar keeps tokens separate (no false merge across whitespace)
Code-reviewed by subagent; the alternating-segment stitching bug was caught and fixed before commit

Follow-ups

r.foo._bar (leading underscore on inner field) is not handled. Deferred — the doc-cited shapes are all flat snake_case, and starting a field with _ is unusual in JSON.

Real-world JSON is overwhelmingly snake_case (stargazers_count, change_1d, people_vaccinated_per_hundred), but record.snake_case dot-access used to fail because the lexer tokenises `_` as a separate Underscore token (used as wildcard). Every persona that touched real-world JSON in the assessment runs hit this and fell back to the verbose `jpth!` per-field workaround. Added a post-lex pass that runs immediately before the ILO-L002 friendly-error scan from PR #154. After consuming `Dot` or `DotQuestion`, contiguous `Ident (Underscore (Ident|Number Ident?))*` runs (with strict span-contiguity, no whitespace gaps) get spliced back into a single `Token::Ident` using the original source slice. The L002 scan immediately below is byte-for-byte unchanged, so plain bindings like `my_var=5` still emit the friendly underscore error. The merge handles three shapes uniformly inside one loop: - `field_name` (Ident _ Ident) - `change_1d` (Ident _ Number Ident) - `ema_20d_change_5d` (alternating Ident / Number-Ident segments) The trailing-letter-after-Number absorb lives inside the loop so each `_Number Ident?` group is consumed atomically and the loop continues to absorb further segments. Without this, alternating real-world field names like `ema_20d_change_5d` would only stitch the first two segments and leave the rest as separate tokens. Single-pass linear walk with in-place splice. No parser changes needed: expect_ident() after Dot now sees one merged Ident.

22 cross-engine regression tests covering: simple `r.field_name`, multi-segment `r.people_vaccinated_per_hundred`, digit-trailing-letter `r.change_1d`, bare-digit `r.x_1`, alternating `r.x_2y_3z`, real-world `r.ema_20d_change_5d`, double-question-mark `r.?field_name?`, plus negative regressions confirming `my_var=5` bindings still emit the ILO-L002 friendly error and `r.foo bar` keeps tokens separate. examples/snake-fields.ilo demonstrates the pattern on a record with three snake_case fields (single, multi-segment, trailing-digit). Three -- run: / -- out: cases so examples_engines.rs exercises each shape across every engine. One-line header. PR #154's 30 friendly-error tests still pass unchanged. The L002 scan is downstream of the new merge pass; merged Ident tokens never reach L002, plain Ident-Underscore-Ident bindings still do.

codecov · 2026-05-11T18:24:25Z

Codecov Report

❌ Patch coverage is 97.72727% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/lexer/mod.rs	97.72%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

danieljohnmorris added 2 commits May 11, 2026 19:21

danieljohnmorris merged commit d9c35b7 into main May 11, 2026
5 checks passed

danieljohnmorris deleted the fix/snake-field-access branch May 11, 2026 18:28

danieljohnmorris mentioned this pull request May 13, 2026

Accept reserved keywords as field names at dot-access #221

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snake_case field names in dot-access position#160

Snake_case field names in dot-access position#160
danieljohnmorris merged 2 commits into
mainfrom
fix/snake-field-access

danieljohnmorris commented May 11, 2026

Uh oh!

codecov Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 11, 2026

Summary

Approach

What's in the diff

Test plan

Follow-ups

Uh oh!

codecov Bot commented May 11, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant