Friendly errors for identifier-confusion cases by danieljohnmorris · Pull Request #154 · ilo-lang/ilo

danieljohnmorris · 2026-05-11T15:17:55Z

Summary

Four cryptic error paths around identifier confusion now produce single-retry-fixable friendly errors. Each one was costing an AI agent a wasted retry on every fresh program, which violates the manifesto's only metric: total tokens from intent to working code.

The doc cited cnt colliding with the natural "count" variable name three times in one session. That's the kind of token-bleed this PR is built for.

What used to happen vs. what happens now

Input	Before	After
`var=5`	`ILO-P009 expected expression, got KwVar`	`ILO-P011 \`var` is a reserved word ... use `name=expr` for bindings (e.g. `count=5`)`
`let=5` / `fn=5` / `const=5` / `if=5` / `return=5` / `def=5`	same cryptic ILO-P009 cascade	same friendly ILO-P011 with the right rename hint per keyword
`cnt=5`	`ILO-P003 expected Greater, got Eq` + T028 cascade	`ILO-P011 \`cnt` is reserved for continue (loop control); pick a different name like `count` or `c``
`brk=5`	same cascade shape	`ILO-P011 \`brk` is reserved for break (loop control)...`
`rev_ps=5`	`ILO-P003 expected Greater, got Underscore` then `undefined variable '_'`	`ILO-L002 underscores are not allowed in identifiers; use hyphens (e.g. \`rev-ps`)`
`isAgg=5`	`ILO-L001 unexpected token 'A'`	`ILO-L003 identifiers must be lowercase ASCII; got 'isAgg' (capital 'A' at offset 2). Use lowercase, e.g. \`is-agg` or `isagg``

What's in the diff

parser/lexer: friendly errors for identifier-confusion cases — factored the existing expect_ident reserved-word path into a reserved_keyword_message helper, then wired it up at the two places where binding LHSs actually parse: parse_decl (top-level) and parse_stmt (function bodies). Added explicit cnt/brk guards in parse_stmt so they don't get caught by the trailing-equals trap. On the lexer side, added a post-lex scan that detects _ sandwiched between two adjacent identifier tokens (emits ILO-L002), and a check for Ident immediately followed by a single-letter type sigil or logos error char (emits ILO-L003 with the full reconstructed identifier including hyphenated tails).
tests: regression coverage for friendly identifier errors — 30 tests covering every case above at both declaration scope and inside function bodies, plus negative regressions confirming bare _ still works in destructure patterns, every type sigil still tokenises in parameter and return-type positions, and letter/variable/iffy/constant/function still bind as plain idents.

Test plan

cargo test --release --features cranelift — 3413 passed, 0 failed, 74 ignored (3383 baseline + 30 new tests)
cargo fmt --all -- --check clean
Each of the four input cases above produces the friendly error rather than the cryptic cascade, at both top-level and inside function bodies
Code-reviewed by subagent; blocking and should-fix findings addressed (the reviewer caught that the initial implementation only fixed declaration scope, not function-body scope — fixed before commit)

Follow-ups (not in this PR)

The assessment doc has more rough edges in the same area that would benefit from the same friendly-error treatment, but they're different code shapes and earn their own PR each. Notably: list-literal [a b c] vs [1 2 3] consistency, inline-mode phantom verifier errors, sibling helper functions ending in bare calls slurping the next identifier.

Cryptic parse and lex errors when an agent picks a natural-but-invalid identifier were costing retries on every fresh program. Four distinct code paths shared the same symptom; all of them now emit a single clear, actionable error. Reserved-keyword bindings (`var=5`, `let=5`, `fn=5`, `const=5`, `if=5`, `return=5`, `def=5`) previously produced `ILO-P009 expected expression, got KwX`. The friendly path lived only in `expect_ident` (function/param names), not at the expression-LHS entry where bindings actually parse. Factored the reserved-word message into a `reserved_keyword_message` helper and added preambles in `parse_decl` and `parse_stmt` so the friendly error fires uniformly at top-level and inside function bodies. `cnt=5` and `brk=5` parsed as valid continue/break statements followed by a stray `=`, producing `ILO-P003 expected Greater, got Eq` plus a downstream T028 cascade. Added explicit guards in `parse_stmt`: when `cnt`/`brk` is followed by `Eq`, emit a friendly "reserved for loop control" message with a rename suggestion. `rev_ps` tokenised as `Ident _ Ident` because `_` is a legitimate wildcard token elsewhere. Added a post-lex scan that detects an underscore sandwiched between two adjacent identifier tokens (no whitespace) and emits ILO-L002 with a hint to use hyphens. `isAgg` hit Logos's error fallback at `A`, producing the unhelpful `ILO-L001 unexpected token 'A'`. The lexer now detects when an `Ident` is immediately followed by a single-letter type sigil (`L`/`R`/`F`/`O`/`M`/`S`) or by a logos error char, reconstructs the would-be identifier (including hyphenated tails), and emits ILO-L003 pointing at the position of the first uppercase letter with a concrete rename suggestion. Type sigils used legitimately as type constructors still tokenise correctly because they're whitespace-separated from identifiers in every real usage.

Pins every case the friendly-error change is supposed to handle, plus negative regressions so a future refactor of lexer or parser can't silently re-introduce the cryptic errors. Positive cases cover all seven reserved keywords at both declaration scope and inside function bodies, `cnt`/`brk` at both, underscore mid-identifier, uppercase mid-identifier (including hyphenated tails like `isHello-world`). Negative regressions confirm bare `_` still works as a wildcard in destructure patterns, every type sigil still tokenises correctly in parameter and return-type positions, and identifiers that lexically start with a reserved-word prefix (`letter`, `variable`, `iffy`, `constant`, `function`) still bind as plain idents.

codecov · 2026-05-11T15:21:56Z

Codecov Report

❌ Patch coverage is 88.19876% with 19 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/parser/mod.rs	84.12%	10 Missing ⚠️
src/lexer/mod.rs	90.81%	9 Missing ⚠️

📢 Thoughts on this report? Let us know!

danieljohnmorris added 2 commits May 11, 2026 16:17

danieljohnmorris merged commit 3407475 into main May 11, 2026
4 checks passed

danieljohnmorris deleted the fix/friendly-identifier-errors branch May 11, 2026 15:21

This was referenced May 11, 2026

Remove custom ARM64 JIT backend #158

Merged

Snake_case field names in dot-access position #160

Merged

verifier: prefer in-scope vars over builtin aliases in did-you-mean #182

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Friendly errors for identifier-confusion cases#154

Friendly errors for identifier-confusion cases#154
danieljohnmorris merged 2 commits into
mainfrom
fix/friendly-identifier-errors

danieljohnmorris commented May 11, 2026

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 11, 2026

Summary

What used to happen vs. what happens now

What's in the diff

Test plan

Follow-ups (not in this PR)

Uh oh!

Uh oh!

codecov Bot commented May 11, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant