fix: split -N v into subtract at fresh-expression positions#244
Merged
Conversation
Six+ personas in the assessment log hit the same papercut: `-0 v`
(intending `0 - v`) lexes as `Number(-0)` followed by a stray
`Ref(v)` because Logos's `-?[0-9]+...` regex greedily consumes the
leading `-`. The natural unary-negation-via-subtract-from-zero idiom
in numerical formulas (`ab x:n>n;-0 x`) silently produces wrong
results. Same trap for `-1 cv`, `r1=-1 t2`, `v=p.1;-0 v`.
Post-lex pass splits `Number(-N)` back into `Minus, Number(N)` when
the preceding token introduces a fresh expression position: start of
input, `;`, `\n`, `=`, `{`, or `(`. The parser's existing
`parse_minus` then resolves to either `Subtract` (operand follows) or
`Negate` (no operand), matching the user's intent.
The split is deliberately gated rather than blanket-applied so that
call-arg negative literals - `at xs -1`, `+a -3`, `into -3 0 10`,
`<r -0.05`, `[1 -2 3]` - keep their `Number(-N)` token. `LBracket`
is also excluded: `[-2 1 3]` (comma-free list starting with a
negative) must stay four tokens, otherwise the parser greedy-subtracts
`-2 1` into a 2-element list. Pinned by 10 new lexer unit tests
covering all split contexts and all keep-literal cases.
Five fixed-shape programs exercise every split context across tree,
VM, and Cranelift:
- `ab x:n>n;-0 x` (start of input)
- `absp p:L _>n;v=p.1;-0 v` (after `;`)
- `diff a:n b:n>n;r=-a b;r` (after `=`)
- `zarm c:n>n;<c 0{-0 c}{c}` (after `{`)
- `neg n:n>n;(-0 n)` (after `(`)
Plus three keep-literal pins that guard against over-eager splitting:
`[1 -2 3]`, `[-2 1 3]` first-element, and `len [-2 1 3]`. Each
assertion runs against all three engines so a future divergence in
the parser/VM/JIT lowering surfaces immediately.
Five small functions that used to silently produce wrong results because of the glued-negative-literal trap: `ab`, `absp`, `diff`, `zarm`, `neg`. The `-- run:` / `-- out:` annotations are exercised by the examples_engines harness so the example doubles as a higher- level cross-engine regression. A future agent encountering the unary-negate-by-subtract-from-zero idiom now has an in-context working pattern to learn from.
-N v into subtract at fresh-expression positions
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Manifesto framing: this is a token-cost fix. Six personas in the assessment log (L705, L1252, L1465, L1503, L1545, L1650, L1809) hit the same papercut three or more times each writing numerical formulas. The natural unary-negation idiom
-0 v(meaning0 - v) silently produced wrong results because Logos's-?[0-9]+...regex greedily consumed the leading-, so-0 vlexed asNumber(-0)+ strayRef(v). The canonical workaround- 0 v(with space) works but is easy to forget when transcribing maths, and the failure mode is silent. Every retry costs tokens for everyone downstream.Repro
Before:
After:
What's in the diff
Three commits:
lexer: split glued negative literal back into Minus + Number- new post-lex pass insrc/lexer/mod.rsthat rewritesNumber(-N)intoMinus, Number(N)only when the preceding token is one that introduces a fresh expression position: start of input,;,\n,=,{, or(. The parser's existingparse_minusresolves the rest. Call-arg negatives (at xs -1,+a -3,into -3 0 10,<r -0.05,[1 -2 3]) keep theirNumber(-N)token.LBracketis deliberately excluded so[-2 1 3]stays a 3-element list. 10 new lexer unit tests pin every split context and every keep-literal case.test: cross-engine regression coverage for neg-literal papercut- newtests/regression_neg_literal_papercut.rsexercises five fixed-shape programs (one per split context) plus three keep-literal pins, each across tree, VM, and Cranelift.examples: neg-literal-papercut.ilo for the now-correct behaviour- five small functions (ab,absp,diff,zarm,neg) with-- run:/-- out:annotations. Doubles as a cross-engine regression via theexamples_enginesharness and as in-context learning material for future agents.Test plan
cargo test --release --features craneliftclean across tree/VM/Craneliftcargo fmt --checkcleancargo clippy --release --features cranelift --all-targets -- -D warningscleanregression_neg_literal_papercut.rsexamples_enginesexercises the newneg-literal-papercut.iloacross all enginesregression_negative_literal_after_opstill green (call-arg literals preserved)Follow-ups
None. The split rule is narrow by design - any expansion (e.g. also splitting after
Commafor function calls that use comma separators, or after prefix-binop tokens) would need its own design pass and cross-engine coverage. If a persona hits a different fresh-expression context that I missed, adding it to thematches!arm insrc/lexer/mod.rs:415is a one-line change with a corresponding unit test.