fix: comments above bound-call lines no longer corrupt parsing#236
Merged
Conversation
normalize_newlines rewrites indented \n to ; before the logos lexer
runs, so on an indented comment line the trailing \n turned into ;.
The logos --[^\n]* comment-skip then matched across that synthesised
; and greedily ate every following statement up to the next
non-indented newline. An indented comment immediately above a
paren-bound call wiped out the rest of the function body and the
diagnostic landed many lines past the actual cause, typically inside
a format-string {} placeholder.
Detect -- directly in normalize_newlines, advance past comment
content without emitting any output, and leave the trailing \n
intact for the loop's existing newline handling. Also pass through
"..." string literals verbatim so -- inside a string isn't mistaken
for a comment. Suppress duplicate ; emission when a comment-only
line precedes an indented continuation.
Anchors the invariant that an indented -- comment line above any binding (paren-bound call, plain binding, stacked comments, comment-with-punctuation) parses identically across tree / VM / cranelift. Adds a dashes-in-string case so the lexer pre-pass keeps its string-aware skip honest.
Picked up by tests/examples_engines.rs across every engine. Shows the four shapes the lexer pre-pass has to get right: paren-bound fmt call below a comment, stacked comments before a binding, mid-body comment between bindings, and -- inside a string literal.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
An indented
--comment immediately above a paren-bound call silently corrupted parsing. The function body ended up empty and the diagnostic pointed many lines past the actual cause, typically inside a format-string{}placeholder. Manifesto-wise, comments are supposed to be free; this regression made them a 15-minute-per-occurrence tax.Repro
Before:
After:
Root cause
normalize_newlinesinsrc/lexer/mod.rsrewrites\n+indent to;BEFORE the logos lexer's--[^\n]*comment-skip runs. On an indented comment line, the trailing\nis also rewritten to;, so the logos regex (which stops at\n) greedily eats the comment text plus every following statement up to the next non-indented newline. The body vanishes, the verifier reports a type mismatch with no useful position, and theILO-P009 expected expression, got Semicascade points at a{inside the format string several lines later.What's in the diff
normalize_newlinesitself. When it sees--, it advances past the comment content without emitting anything and leaves the trailing\nintact for the loop's existing newline handling. It also passes through"..."string literals verbatim so--inside a string isn't mistaken for a comment, and suppresses duplicate;emission when a comment-only line precedes an indented continuation.tests/regression_comment_parse_corrupt.rs. 15 tests across tree / VM / cranelift covering: paren-boundfmtwith leading comment, mid-body comment between bindings, stacked comments, comment containing{}/;/(), and dashes inside a string literal.examples/comment-above-call.ilowith-- run:/-- out:assertions sotests/examples_engines.rsexercises the same cases at the higher integration layer across every engine.Test plan
ilo_assessment_feedback.mdL1790 returnsk=1instead ofILO-T008cargo test --release --features craneliftfull suite passes (2866 lib tests, plus all integration tests, including the new 15-testregression_comment_parse_corrupt)tests/examples_engines.rspasses with the new examplecargo fmt --checkcleancargo clippy --release --features cranelift --all-targets -- -D warningsclean--inside a string literal ("hello -- world") still round-trips correctlyFollow-ups
None. The fix is local to
normalize_newlinesand matches the logos--[^\n]*rule it's now running ahead of.