Skip to content

lexer: accept camelCase at post-dot field access#220

Merged
danieljohnmorris merged 3 commits into
mainfrom
fix/camelcase-dot-access
May 13, 2026
Merged

lexer: accept camelCase at post-dot field access#220
danieljohnmorris merged 3 commits into
mainfrom
fix/camelcase-dot-access

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Real-world JSON from NVD, AWS, Stripe, and GitHub is overwhelmingly camelCase. record.baseSeverity used to trip ILO-L003: unexpected character at the lexer because any uppercase mid-ident was a hard error. The workaround (jpth! record "baseSeverity") is verbose and burns tokens on every field read, which fights the manifesto: agents reading JSON shouldn't pay a token tax for the fact that the source happens to capitalise.

This mirrors the snake_case post-pass that landed earlier: at post-dot field-access position (and only there), the lexer absorbs the camelCase tail into a single Ident token. Bindings still emit ILO-L003 unchanged.

Repro

Before:

$ ilo 'f j:t>R n t;r=jpar! j;r.baseSeverity' f '{"baseSeverity":"HIGH"}'
ILO-L003: unexpected token 'baseSeverity'

After:

$ ilo 'f j:t>R n t;r=jpar! j;r.baseSeverity' f '{"baseSeverity":"HIGH"}'
HIGH

Bindings still reject (unchanged):

$ ilo 'fooBar=5;fooBar'
ILO-L003: unexpected token 'fooBar'

What's in the diff

  • lexer: accept camelCase at post-dot field access - the fix itself. Two lex paths needed updating: the type-sigil branch (L/R/F/O/M/S tokens, e.g. the S in baseSeverity) and the single-uppercase-error branch (non-sigil capitals, e.g. the U in gitURL). New prev_ident_is_post_dot helper detects the position; new absorb_camel_tail does the merge and advances the logos cursor.
  • test: cross-engine regression coverage for camelCase dot-access - 20 tests across tree, VM, and Cranelift covering sigil capitals, non-sigil capitals, chained access, trailing digits, safe access, mixed camel + snake (gitURL_count), and negative regressions on bindings.
  • example: camelCase dot-access on real-world JSON shapes - picked up by tests/examples_engines.rs so every engine exercises the path through the example harness too.

Test plan

  • cargo build --release --features cranelift
  • cargo test --release --features cranelift (4101 passed, 0 failed, up from 4081 with 20 new)
  • cargo fmt
  • cargo clippy --release --features cranelift -- -D warnings (clean)
  • Manual repro of baseSeverity, gitURL, chained, safe, mixed, and binding-still-errors cases

Follow-ups

  • Leading-uppercase keys like record.URL and record.ID still fail at the lexer (no preceding Ident to merge into). Worth a separate fix that treats Dot+uppercase-run as a field-name Ident directly. Out of scope here to keep the change tight.

Real-world JSON (NVD, AWS, Stripe, GitHub) is overwhelmingly camelCase
but the lexer rejected any uppercase mid-ident with ILO-L003, so
`record.baseSeverity` failed before reaching the parser. Mirror the
existing snake_case post-pass: when an uppercase character appears
flush against an Ident that is itself flush against a preceding Dot
or DotQuestion, absorb the camelCase tail (`[A-Za-z0-9]+`) into a
single Ident token.

Two lex paths needed updating: the type-sigil path (L/R/F/O/M/S
tokens like the S in baseSeverity) and the single-uppercase-error
path (non-sigil capitals like the U in gitURL). Bindings still emit
ILO-L003 unchanged.
Tree, VM, and Cranelift each exercise: type-sigil capital
(baseSeverity), non-sigil capital (gitURL), chained access
(baseSeverity.label), trailing digit (field2Name), safe access
(.?baseSeverity), and mixed camel + snake (gitURL_count). Negative
tests pin that fooBar=5 and fooSet=5 bindings still emit ILO-L003.
Picked up by tests/examples_engines so every engine exercises the
new path through the example harness too.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

❌ Patch coverage is 92.39130% with 7 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/lexer/mod.rs 92.39% 7 Missing ⚠️

📢 Thoughts on this report? Let us know!

@danieljohnmorris danieljohnmorris merged commit df0e587 into main May 13, 2026
5 checks passed
@danieljohnmorris danieljohnmorris deleted the fix/camelcase-dot-access branch May 13, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant