Skip to content

Snake_case field names in dot-access position#160

Merged
danieljohnmorris merged 2 commits into
mainfrom
fix/snake-field-access
May 11, 2026
Merged

Snake_case field names in dot-access position#160
danieljohnmorris merged 2 commits into
mainfrom
fix/snake-field-access

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Resolves the doc-cited "real-world JSON is overwhelmingly snake_case, so dot-access is a minority path" friction. Every persona that touched real JSON in the assessment runs (stargazers_count, change_1d, people_vaccinated_per_hundred, budget_authority_amount, ema_20d_change_5d) had to fall back to the verbose jpth! per-field workaround because r.snake_case failed at the lexer.

Through the manifesto lens: every API call agents make against real-world endpoints used to require a token-expensive workaround. Now the natural dot-access form works.

Approach

Lexer-cooperative — added a post-lex merge pass that runs immediately before PR #154's ILO-L002 friendly-error scan. After consuming Dot or DotQuestion, contiguous Ident (Underscore (Ident|Number Ident?))* runs (with strict span-contiguity, no whitespace gaps) get spliced into a single Token::Ident using the original source slice. The L002 scan immediately below is byte-for-byte unchanged, so plain bindings like my_var=5 still emit the friendly underscore error — only dot-access position gets the loosened identifier rule.

The merge handles three shapes uniformly inside one loop:

  • field_name (Ident _ Ident)
  • change_1d (Ident _ Number Ident)
  • ema_20d_change_5d (alternating Ident / Number-Ident segments)

The trailing-letter-after-Number absorb lives inside the loop so each _Number Ident? group is consumed atomically and the loop continues. Without this, alternating real-world names like ema_20d_change_5d would only stitch the first two segments.

What's in the diff

  1. lexer: merge snake_case field names in dot-access positionsrc/lexer/mod.rs post-lex pass, +64 lines. No parser changes needed; expect_ident() after a Dot now sees one merged Ident.

  2. tests + example: pin snake_case dot-access across engines — 22 cross-engine tests covering simple, multi-segment, digit-trailing-letter, bare-digit, alternating, and real-world shapes, plus negative regressions (my_var=5 still errors with ILO-L002; r.foo bar keeps tokens separate). examples/snake-fields.ilo demonstrates the pattern with three -- run: / -- out: cases so tests/examples_engines.rs exercises each shape across every engine.

Test plan

  • cargo test --release --features cranelift — full suite green, +10 new regression tests + 6 new example-engine cases
  • cargo fmt --all -- --check clean
  • cargo clippy --all-targets --features cranelift -- -D warnings clean
  • PR Friendly errors for identifier-confusion cases #154's 30 friendly-error tests still pass unchanged
  • r.ema_20d_change_5d shape (alternating segments) stitches correctly across tree/vm/cranelift
  • my_var=5 bindings still emit ILO-L002 (binding-position underscore still flagged)
  • r.foo bar keeps tokens separate (no false merge across whitespace)
  • Code-reviewed by subagent; the alternating-segment stitching bug was caught and fixed before commit

Follow-ups

  • r.foo._bar (leading underscore on inner field) is not handled. Deferred — the doc-cited shapes are all flat snake_case, and starting a field with _ is unusual in JSON.

Real-world JSON is overwhelmingly snake_case (stargazers_count,
change_1d, people_vaccinated_per_hundred), but record.snake_case
dot-access used to fail because the lexer tokenises `_` as a
separate Underscore token (used as wildcard). Every persona that
touched real-world JSON in the assessment runs hit this and fell
back to the verbose `jpth!` per-field workaround.

Added a post-lex pass that runs immediately before the ILO-L002
friendly-error scan from PR #154. After consuming `Dot` or
`DotQuestion`, contiguous `Ident (Underscore (Ident|Number Ident?))*`
runs (with strict span-contiguity, no whitespace gaps) get spliced
back into a single `Token::Ident` using the original source slice.
The L002 scan immediately below is byte-for-byte unchanged, so
plain bindings like `my_var=5` still emit the friendly underscore
error.

The merge handles three shapes uniformly inside one loop:
- `field_name` (Ident _ Ident)
- `change_1d` (Ident _ Number Ident)
- `ema_20d_change_5d` (alternating Ident / Number-Ident segments)

The trailing-letter-after-Number absorb lives inside the loop so
each `_Number Ident?` group is consumed atomically and the loop
continues to absorb further segments. Without this, alternating
real-world field names like `ema_20d_change_5d` would only stitch
the first two segments and leave the rest as separate tokens.

Single-pass linear walk with in-place splice. No parser changes
needed: expect_ident() after Dot now sees one merged Ident.
22 cross-engine regression tests covering: simple `r.field_name`,
multi-segment `r.people_vaccinated_per_hundred`, digit-trailing-letter
`r.change_1d`, bare-digit `r.x_1`, alternating `r.x_2y_3z`, real-world
`r.ema_20d_change_5d`, double-question-mark `r.?field_name?`, plus
negative regressions confirming `my_var=5` bindings still emit the
ILO-L002 friendly error and `r.foo bar` keeps tokens separate.

examples/snake-fields.ilo demonstrates the pattern on a record with
three snake_case fields (single, multi-segment, trailing-digit).
Three -- run: / -- out: cases so examples_engines.rs exercises each
shape across every engine. One-line header.

PR #154's 30 friendly-error tests still pass unchanged. The L002
scan is downstream of the new merge pass; merged Ident tokens never
reach L002, plain Ident-Underscore-Ident bindings still do.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 97.72727% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/lexer/mod.rs 97.72% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@danieljohnmorris danieljohnmorris merged commit d9c35b7 into main May 11, 2026
5 checks passed
@danieljohnmorris danieljohnmorris deleted the fix/snake-field-access branch May 11, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant