Skip to content

Friendly errors for identifier-confusion cases#154

Merged
danieljohnmorris merged 2 commits into
mainfrom
fix/friendly-identifier-errors
May 11, 2026
Merged

Friendly errors for identifier-confusion cases#154
danieljohnmorris merged 2 commits into
mainfrom
fix/friendly-identifier-errors

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Four cryptic error paths around identifier confusion now produce single-retry-fixable friendly errors. Each one was costing an AI agent a wasted retry on every fresh program, which violates the manifesto's only metric: total tokens from intent to working code.

The doc cited cnt colliding with the natural "count" variable name three times in one session. That's the kind of token-bleed this PR is built for.

What used to happen vs. what happens now

Input Before After
var=5 ILO-P009 expected expression, got KwVar ILO-P011 \var` is a reserved word ... use `name=expr` for bindings (e.g. `count=5`)`
let=5 / fn=5 / const=5 / if=5 / return=5 / def=5 same cryptic ILO-P009 cascade same friendly ILO-P011 with the right rename hint per keyword
cnt=5 ILO-P003 expected Greater, got Eq + T028 cascade ILO-P011 \cnt` is reserved for continue (loop control); pick a different name like `count` or `c``
brk=5 same cascade shape ILO-P011 \brk` is reserved for break (loop control)...`
rev_ps=5 ILO-P003 expected Greater, got Underscore then undefined variable '_' ILO-L002 underscores are not allowed in identifiers; use hyphens (e.g. \rev-ps`)`
isAgg=5 ILO-L001 unexpected token 'A' ILO-L003 identifiers must be lowercase ASCII; got 'isAgg' (capital 'A' at offset 2). Use lowercase, e.g. \is-agg` or `isagg``

What's in the diff

  1. parser/lexer: friendly errors for identifier-confusion cases — factored the existing expect_ident reserved-word path into a reserved_keyword_message helper, then wired it up at the two places where binding LHSs actually parse: parse_decl (top-level) and parse_stmt (function bodies). Added explicit cnt/brk guards in parse_stmt so they don't get caught by the trailing-equals trap. On the lexer side, added a post-lex scan that detects _ sandwiched between two adjacent identifier tokens (emits ILO-L002), and a check for Ident immediately followed by a single-letter type sigil or logos error char (emits ILO-L003 with the full reconstructed identifier including hyphenated tails).

  2. tests: regression coverage for friendly identifier errors — 30 tests covering every case above at both declaration scope and inside function bodies, plus negative regressions confirming bare _ still works in destructure patterns, every type sigil still tokenises in parameter and return-type positions, and letter/variable/iffy/constant/function still bind as plain idents.

Test plan

  • cargo test --release --features cranelift — 3413 passed, 0 failed, 74 ignored (3383 baseline + 30 new tests)
  • cargo fmt --all -- --check clean
  • Each of the four input cases above produces the friendly error rather than the cryptic cascade, at both top-level and inside function bodies
  • Code-reviewed by subagent; blocking and should-fix findings addressed (the reviewer caught that the initial implementation only fixed declaration scope, not function-body scope — fixed before commit)

Follow-ups (not in this PR)

The assessment doc has more rough edges in the same area that would benefit from the same friendly-error treatment, but they're different code shapes and earn their own PR each. Notably: list-literal [a b c] vs [1 2 3] consistency, inline-mode phantom verifier errors, sibling helper functions ending in bare calls slurping the next identifier.

Cryptic parse and lex errors when an agent picks a natural-but-invalid
identifier were costing retries on every fresh program. Four distinct
code paths shared the same symptom; all of them now emit a single
clear, actionable error.

Reserved-keyword bindings (`var=5`, `let=5`, `fn=5`, `const=5`,
`if=5`, `return=5`, `def=5`) previously produced
`ILO-P009 expected expression, got KwX`. The friendly path lived
only in `expect_ident` (function/param names), not at the
expression-LHS entry where bindings actually parse. Factored the
reserved-word message into a `reserved_keyword_message` helper and
added preambles in `parse_decl` and `parse_stmt` so the friendly
error fires uniformly at top-level and inside function bodies.

`cnt=5` and `brk=5` parsed as valid continue/break statements
followed by a stray `=`, producing `ILO-P003 expected Greater, got Eq`
plus a downstream T028 cascade. Added explicit guards in `parse_stmt`:
when `cnt`/`brk` is followed by `Eq`, emit a friendly "reserved for
loop control" message with a rename suggestion.

`rev_ps` tokenised as `Ident _ Ident` because `_` is a legitimate
wildcard token elsewhere. Added a post-lex scan that detects an
underscore sandwiched between two adjacent identifier tokens (no
whitespace) and emits ILO-L002 with a hint to use hyphens.

`isAgg` hit Logos's error fallback at `A`, producing the unhelpful
`ILO-L001 unexpected token 'A'`. The lexer now detects when an
`Ident` is immediately followed by a single-letter type sigil
(`L`/`R`/`F`/`O`/`M`/`S`) or by a logos error char, reconstructs
the would-be identifier (including hyphenated tails), and emits
ILO-L003 pointing at the position of the first uppercase letter
with a concrete rename suggestion.

Type sigils used legitimately as type constructors still tokenise
correctly because they're whitespace-separated from identifiers in
every real usage.
Pins every case the friendly-error change is supposed to handle,
plus negative regressions so a future refactor of lexer or parser
can't silently re-introduce the cryptic errors.

Positive cases cover all seven reserved keywords at both declaration
scope and inside function bodies, `cnt`/`brk` at both, underscore
mid-identifier, uppercase mid-identifier (including hyphenated
tails like `isHello-world`).

Negative regressions confirm bare `_` still works as a wildcard in
destructure patterns, every type sigil still tokenises correctly in
parameter and return-type positions, and identifiers that lexically
start with a reserved-word prefix (`letter`, `variable`, `iffy`,
`constant`, `function`) still bind as plain idents.
@danieljohnmorris danieljohnmorris merged commit 3407475 into main May 11, 2026
4 checks passed
@danieljohnmorris danieljohnmorris deleted the fix/friendly-identifier-errors branch May 11, 2026 15:21
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 88.19876% with 19 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/parser/mod.rs 84.12% 10 Missing ⚠️
src/lexer/mod.rs 90.81% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant