Friendly errors for identifier-confusion cases#154
Merged
Conversation
Cryptic parse and lex errors when an agent picks a natural-but-invalid identifier were costing retries on every fresh program. Four distinct code paths shared the same symptom; all of them now emit a single clear, actionable error. Reserved-keyword bindings (`var=5`, `let=5`, `fn=5`, `const=5`, `if=5`, `return=5`, `def=5`) previously produced `ILO-P009 expected expression, got KwX`. The friendly path lived only in `expect_ident` (function/param names), not at the expression-LHS entry where bindings actually parse. Factored the reserved-word message into a `reserved_keyword_message` helper and added preambles in `parse_decl` and `parse_stmt` so the friendly error fires uniformly at top-level and inside function bodies. `cnt=5` and `brk=5` parsed as valid continue/break statements followed by a stray `=`, producing `ILO-P003 expected Greater, got Eq` plus a downstream T028 cascade. Added explicit guards in `parse_stmt`: when `cnt`/`brk` is followed by `Eq`, emit a friendly "reserved for loop control" message with a rename suggestion. `rev_ps` tokenised as `Ident _ Ident` because `_` is a legitimate wildcard token elsewhere. Added a post-lex scan that detects an underscore sandwiched between two adjacent identifier tokens (no whitespace) and emits ILO-L002 with a hint to use hyphens. `isAgg` hit Logos's error fallback at `A`, producing the unhelpful `ILO-L001 unexpected token 'A'`. The lexer now detects when an `Ident` is immediately followed by a single-letter type sigil (`L`/`R`/`F`/`O`/`M`/`S`) or by a logos error char, reconstructs the would-be identifier (including hyphenated tails), and emits ILO-L003 pointing at the position of the first uppercase letter with a concrete rename suggestion. Type sigils used legitimately as type constructors still tokenise correctly because they're whitespace-separated from identifiers in every real usage.
Pins every case the friendly-error change is supposed to handle, plus negative regressions so a future refactor of lexer or parser can't silently re-introduce the cryptic errors. Positive cases cover all seven reserved keywords at both declaration scope and inside function bodies, `cnt`/`brk` at both, underscore mid-identifier, uppercase mid-identifier (including hyphenated tails like `isHello-world`). Negative regressions confirm bare `_` still works as a wildcard in destructure patterns, every type sigil still tokenises correctly in parameter and return-type positions, and identifiers that lexically start with a reserved-word prefix (`letter`, `variable`, `iffy`, `constant`, `function`) still bind as plain idents.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This was referenced May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four cryptic error paths around identifier confusion now produce single-retry-fixable friendly errors. Each one was costing an AI agent a wasted retry on every fresh program, which violates the manifesto's only metric: total tokens from intent to working code.
The doc cited
cntcolliding with the natural "count" variable name three times in one session. That's the kind of token-bleed this PR is built for.What used to happen vs. what happens now
var=5ILO-P009 expected expression, got KwVarILO-P011 \var` is a reserved word ... use `name=expr` for bindings (e.g. `count=5`)`let=5/fn=5/const=5/if=5/return=5/def=5cnt=5ILO-P003 expected Greater, got Eq+ T028 cascadeILO-P011 \cnt` is reserved for continue (loop control); pick a different name like `count` or `c``brk=5ILO-P011 \brk` is reserved for break (loop control)...`rev_ps=5ILO-P003 expected Greater, got Underscorethenundefined variable '_'ILO-L002 underscores are not allowed in identifiers; use hyphens (e.g. \rev-ps`)`isAgg=5ILO-L001 unexpected token 'A'ILO-L003 identifiers must be lowercase ASCII; got 'isAgg' (capital 'A' at offset 2). Use lowercase, e.g. \is-agg` or `isagg``What's in the diff
parser/lexer: friendly errors for identifier-confusion cases— factored the existingexpect_identreserved-word path into areserved_keyword_messagehelper, then wired it up at the two places where binding LHSs actually parse:parse_decl(top-level) andparse_stmt(function bodies). Added explicitcnt/brkguards inparse_stmtso they don't get caught by the trailing-equals trap. On the lexer side, added a post-lex scan that detects_sandwiched between two adjacent identifier tokens (emits ILO-L002), and a check forIdentimmediately followed by a single-letter type sigil or logos error char (emits ILO-L003 with the full reconstructed identifier including hyphenated tails).tests: regression coverage for friendly identifier errors— 30 tests covering every case above at both declaration scope and inside function bodies, plus negative regressions confirming bare_still works in destructure patterns, every type sigil still tokenises in parameter and return-type positions, andletter/variable/iffy/constant/functionstill bind as plain idents.Test plan
cargo test --release --features cranelift— 3413 passed, 0 failed, 74 ignored (3383 baseline + 30 new tests)cargo fmt --all -- --checkcleanFollow-ups (not in this PR)
The assessment doc has more rough edges in the same area that would benefit from the same friendly-error treatment, but they're different code shapes and earn their own PR each. Notably: list-literal
[a b c]vs[1 2 3]consistency, inline-mode phantom verifier errors, sibling helper functions ending in bare calls slurping the next identifier.