fix: scan source tokens for alias hints, not raw text#253
Merged
Conversation
collect_hints previously split the source text on non-word chars after stripping string literals, but comments (`-- ...`) were not stripped. Any alias source word appearing in a comment, in a path-bearing identifier, or in any non-token context would fire `hint: <word> -> <short>`. Surfaced by the streaming-tail persona on v0.11.1: every Approach B invocation got `hint: tail -> tl` because the working directory was /tmp/ilo-persona-streaming-tail-rerun, and that path made it into the batch script header as a comment. Re-lex the source and scan Token::Ident only. Correct by construction: comments, strings, numbers, operators, and CLI argv are all outside the token stream the user wrote. As a side-effect the text-based `==` scan now also ignores comment contents via a small helper, so the equality hint is consistent with the alias hint.
Five new tests pin the lexer-token contract: alias words in comments, in string literals, and the `==` hint in comments must all return no hints; real alias use (`tail xs` as a source identifier) must still fire. Iterate over a representative set of alias source words to catch regression on the others (filter, flatmap, length, head, append).
Shows that `tail` in a comment and inside a string literal does not fire the alias hint, while the canonical `tl` form is used in source. Picked up by tests/examples_engines.rs as a multi-engine assertion.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
collect_hintswas splitting raw source text after stripping string literals, but comments (-- ...) were left in. Any alias source word in a comment, identifier-ish artefact, or path-bearing line could firehint: <word> -> <short>even though the user never wrote that token.Streaming-tail persona on v0.11.1 caught this every Approach B run:
hint: tail -> tlfired because the working directory was/tmp/ilo-persona-streaming-tail-rerun, and the path leaked into the batch script's header comment. Cosmetic but noisy in pipelines, and exactly the kind of false-positive that erodes trust in agent-facing hints.Fix: re-lex inside
collect_hintsand scanToken::Identonly. Correct by construction — comments, strings, numbers, operators, and CLI argv all sit outside the token stream the user wrote. The==text-scan now also ignores comment contents via a small helper, so both hints behave consistently.Repro
Before:
hint: \tail` → `tl` (canonical short form)printed to stderr on every run. After: no hint. Real alias use (tail xs` as a source identifier) still fires.What's in the diff
collect_hintsre-lexes the source and walksToken::Ident. New helperstrip_string_and_comment_contentskeeps the==scan consistent (logos collapses=and==to one token, so that scan can't use lexer output).==scan in comments, multi-word coverage acrosstail/filter/flatmap/length/head/append, and a positive test that real alias use still fires.examples/tail-alias-comment.iloshows an idiomatic no-hint program withtailin both a comment and a string literal. Multi-engine harness runs it on every backend.Test plan
cargo test --release --features cranelift— full suite green (2888 lib + all integration)cargo fmt --checkcleancargo clippy --release --features cranelift -- -D warningscleancargo test --test examples_enginespasses (including the new example)tailin comment / string / path no longer fires;tail xsas a real call still fires.Follow-ups
None planned. The lexer-token approach generalises to any future builtin alias added to
ast::resolve_alias.