Skip to content

fix: comments above bound-call lines no longer corrupt parsing#236

Merged
danieljohnmorris merged 3 commits into
mainfrom
fix/comment-parse-corrupt
May 13, 2026
Merged

fix: comments above bound-call lines no longer corrupt parsing#236
danieljohnmorris merged 3 commits into
mainfrom
fix/comment-parse-corrupt

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

An indented -- comment immediately above a paren-bound call silently corrupted parsing. The function body ended up empty and the diagnostic pointed many lines past the actual cause, typically inside a format-string {} placeholder. Manifesto-wise, comments are supposed to be free; this regression made them a 15-minute-per-occurrence tax.

Repro

Before:

$ cat repro.ilo
main>t
  -- build the thing
  m=(fmt "k={}" 1)
  m

$ ilo repro.ilo --run-tree main
{"code":"ILO-T008","message":"return type mismatch: expected t, got _", ...}

After:

$ ilo repro.ilo --run-tree main
k=1

Root cause

normalize_newlines in src/lexer/mod.rs rewrites \n+indent to ; BEFORE the logos lexer's --[^\n]* comment-skip runs. On an indented comment line, the trailing \n is also rewritten to ;, so the logos regex (which stops at \n) greedily eats the comment text plus every following statement up to the next non-indented newline. The body vanishes, the verifier reports a type mismatch with no useful position, and the ILO-P009 expected expression, got Semi cascade points at a { inside the format string several lines later.

What's in the diff

  • Commit 1 moves comment-handling into normalize_newlines itself. When it sees --, it advances past the comment content without emitting anything and leaves the trailing \n intact for the loop's existing newline handling. It also passes through "..." string literals verbatim so -- inside a string isn't mistaken for a comment, and suppresses duplicate ; emission when a comment-only line precedes an indented continuation.
  • Commit 2 adds tests/regression_comment_parse_corrupt.rs. 15 tests across tree / VM / cranelift covering: paren-bound fmt with leading comment, mid-body comment between bindings, stacked comments, comment containing {}/;/(), and dashes inside a string literal.
  • Commit 3 adds examples/comment-above-call.ilo with -- run: / -- out: assertions so tests/examples_engines.rs exercises the same cases at the higher integration layer across every engine.

Test plan

  • Original repro from ilo_assessment_feedback.md L1790 returns k=1 instead of ILO-T008
  • cargo test --release --features cranelift full suite passes (2866 lib tests, plus all integration tests, including the new 15-test regression_comment_parse_corrupt)
  • tests/examples_engines.rs passes with the new example
  • cargo fmt --check clean
  • cargo clippy --release --features cranelift --all-targets -- -D warnings clean
  • -- inside a string literal ("hello -- world") still round-trips correctly

Follow-ups

None. The fix is local to normalize_newlines and matches the logos --[^\n]* rule it's now running ahead of.

normalize_newlines rewrites indented \n to ; before the logos lexer
runs, so on an indented comment line the trailing \n turned into ;.
The logos --[^\n]*  comment-skip then matched across that synthesised
; and greedily ate every following statement up to the next
non-indented newline. An indented comment immediately above a
paren-bound call wiped out the rest of the function body and the
diagnostic landed many lines past the actual cause, typically inside
a format-string {} placeholder.

Detect -- directly in normalize_newlines, advance past comment
content without emitting any output, and leave the trailing \n
intact for the loop's existing newline handling. Also pass through
"..." string literals verbatim so -- inside a string isn't mistaken
for a comment. Suppress duplicate ; emission when a comment-only
line precedes an indented continuation.
Anchors the invariant that an indented -- comment line above any
binding (paren-bound call, plain binding, stacked comments,
comment-with-punctuation) parses identically across tree / VM /
cranelift. Adds a dashes-in-string case so the lexer pre-pass keeps
its string-aware skip honest.
Picked up by tests/examples_engines.rs across every engine. Shows
the four shapes the lexer pre-pass has to get right: paren-bound
fmt call below a comment, stacked comments before a binding,
mid-body comment between bindings, and -- inside a string literal.
@danieljohnmorris danieljohnmorris merged commit 204a7c1 into main May 13, 2026
4 checks passed
@danieljohnmorris danieljohnmorris deleted the fix/comment-parse-corrupt branch May 13, 2026 16:29
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant