lexer: accept camelCase at post-dot field access#220
Merged
Conversation
Real-world JSON (NVD, AWS, Stripe, GitHub) is overwhelmingly camelCase but the lexer rejected any uppercase mid-ident with ILO-L003, so `record.baseSeverity` failed before reaching the parser. Mirror the existing snake_case post-pass: when an uppercase character appears flush against an Ident that is itself flush against a preceding Dot or DotQuestion, absorb the camelCase tail (`[A-Za-z0-9]+`) into a single Ident token. Two lex paths needed updating: the type-sigil path (L/R/F/O/M/S tokens like the S in baseSeverity) and the single-uppercase-error path (non-sigil capitals like the U in gitURL). Bindings still emit ILO-L003 unchanged.
Tree, VM, and Cranelift each exercise: type-sigil capital (baseSeverity), non-sigil capital (gitURL), chained access (baseSeverity.label), trailing digit (field2Name), safe access (.?baseSeverity), and mixed camel + snake (gitURL_count). Negative tests pin that fooBar=5 and fooSet=5 bindings still emit ILO-L003.
Picked up by tests/examples_engines so every engine exercises the new path through the example harness too.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Real-world JSON from NVD, AWS, Stripe, and GitHub is overwhelmingly camelCase.
record.baseSeverityused to tripILO-L003: unexpected characterat the lexer because any uppercase mid-ident was a hard error. The workaround (jpth! record "baseSeverity") is verbose and burns tokens on every field read, which fights the manifesto: agents reading JSON shouldn't pay a token tax for the fact that the source happens to capitalise.This mirrors the snake_case post-pass that landed earlier: at post-dot field-access position (and only there), the lexer absorbs the camelCase tail into a single
Identtoken. Bindings still emitILO-L003unchanged.Repro
Before:
After:
Bindings still reject (unchanged):
What's in the diff
lexer: accept camelCase at post-dot field access- the fix itself. Two lex paths needed updating: the type-sigil branch (L/R/F/O/M/Stokens, e.g. theSinbaseSeverity) and the single-uppercase-error branch (non-sigil capitals, e.g. theUingitURL). Newprev_ident_is_post_dothelper detects the position; newabsorb_camel_taildoes the merge and advances the logos cursor.test: cross-engine regression coverage for camelCase dot-access- 20 tests across tree, VM, and Cranelift covering sigil capitals, non-sigil capitals, chained access, trailing digits, safe access, mixed camel + snake (gitURL_count), and negative regressions on bindings.example: camelCase dot-access on real-world JSON shapes- picked up bytests/examples_engines.rsso every engine exercises the path through the example harness too.Test plan
cargo build --release --features craneliftcargo test --release --features cranelift(4101 passed, 0 failed, up from 4081 with 20 new)cargo fmtcargo clippy --release --features cranelift -- -D warnings(clean)baseSeverity,gitURL, chained, safe, mixed, and binding-still-errors casesFollow-ups
record.URLandrecord.IDstill fail at the lexer (no precedingIdentto merge into). Worth a separate fix that treatsDot+uppercase-run as a field-name Ident directly. Out of scope here to keep the change tight.