fix: split `-N v` into subtract at fresh-expression positions by danieljohnmorris · Pull Request #244 · ilo-lang/ilo

danieljohnmorris · 2026-05-13T17:03:34Z

Summary

Manifesto framing: this is a token-cost fix. Six personas in the assessment log (L705, L1252, L1465, L1503, L1545, L1650, L1809) hit the same papercut three or more times each writing numerical formulas. The natural unary-negation idiom -0 v (meaning 0 - v) silently produced wrong results because Logos's -?[0-9]+... regex greedily consumed the leading -, so -0 v lexed as Number(-0) + stray Ref(v). The canonical workaround - 0 v (with space) works but is easy to forget when transcribing maths, and the failure mode is silent. Every retry costs tokens for everyone downstream.

Repro

Before:

$ ilo "ab x:n>n;-0 x" ab 7
{"code":"ILO-P001","message":"unexpected token Ref(\"x\")...","severity":"error"}

After:

$ ilo "ab x:n>n;-0 x" ab 7
-7
$ ilo "ab x:n>n;-0 x" ab -7
7

What's in the diff

Three commits:

lexer: split glued negative literal back into Minus + Number - new post-lex pass in src/lexer/mod.rs that rewrites Number(-N) into Minus, Number(N) only when the preceding token is one that introduces a fresh expression position: start of input, ;, \n, =, {, or (. The parser's existing parse_minus resolves the rest. Call-arg negatives (at xs -1, +a -3, into -3 0 10, <r -0.05, [1 -2 3]) keep their Number(-N) token. LBracket is deliberately excluded so [-2 1 3] stays a 3-element list. 10 new lexer unit tests pin every split context and every keep-literal case.
test: cross-engine regression coverage for neg-literal papercut - new tests/regression_neg_literal_papercut.rs exercises five fixed-shape programs (one per split context) plus three keep-literal pins, each across tree, VM, and Cranelift.
examples: neg-literal-papercut.ilo for the now-correct behaviour - five small functions (ab, absp, diff, zarm, neg) with -- run: / -- out: annotations. Doubles as a cross-engine regression via the examples_engines harness and as in-context learning material for future agents.

Test plan

cargo test --release --features cranelift clean across tree/VM/Cranelift
cargo fmt --check clean
cargo clippy --release --features cranelift --all-targets -- -D warnings clean
10 new lexer unit tests pin all split + keep contexts
3 new cross-engine regression tests in regression_neg_literal_papercut.rs
examples_engines exercises the new neg-literal-papercut.ilo across all engines
Existing regression_negative_literal_after_op still green (call-arg literals preserved)

Follow-ups

None. The split rule is narrow by design - any expansion (e.g. also splitting after Comma for function calls that use comma separators, or after prefix-binop tokens) would need its own design pass and cross-engine coverage. If a persona hits a different fresh-expression context that I missed, adding it to the matches! arm in src/lexer/mod.rs:415 is a one-line change with a corresponding unit test.

Six+ personas in the assessment log hit the same papercut: `-0 v` (intending `0 - v`) lexes as `Number(-0)` followed by a stray `Ref(v)` because Logos's `-?[0-9]+...` regex greedily consumes the leading `-`. The natural unary-negation-via-subtract-from-zero idiom in numerical formulas (`ab x:n>n;-0 x`) silently produces wrong results. Same trap for `-1 cv`, `r1=-1 t2`, `v=p.1;-0 v`. Post-lex pass splits `Number(-N)` back into `Minus, Number(N)` when the preceding token introduces a fresh expression position: start of input, `;`, `\n`, `=`, `{`, or `(`. The parser's existing `parse_minus` then resolves to either `Subtract` (operand follows) or `Negate` (no operand), matching the user's intent. The split is deliberately gated rather than blanket-applied so that call-arg negative literals - `at xs -1`, `+a -3`, `into -3 0 10`, `<r -0.05`, `[1 -2 3]` - keep their `Number(-N)` token. `LBracket` is also excluded: `[-2 1 3]` (comma-free list starting with a negative) must stay four tokens, otherwise the parser greedy-subtracts `-2 1` into a 2-element list. Pinned by 10 new lexer unit tests covering all split contexts and all keep-literal cases.

Five fixed-shape programs exercise every split context across tree, VM, and Cranelift: - `ab x:n>n;-0 x` (start of input) - `absp p:L _>n;v=p.1;-0 v` (after `;`) - `diff a:n b:n>n;r=-a b;r` (after `=`) - `zarm c:n>n;<c 0{-0 c}{c}` (after `{`) - `neg n:n>n;(-0 n)` (after `(`) Plus three keep-literal pins that guard against over-eager splitting: `[1 -2 3]`, `[-2 1 3]` first-element, and `len [-2 1 3]`. Each assertion runs against all three engines so a future divergence in the parser/VM/JIT lowering surfaces immediately.

Five small functions that used to silently produce wrong results because of the glued-negative-literal trap: `ab`, `absp`, `diff`, `zarm`, `neg`. The `-- run:` / `-- out:` annotations are exercised by the examples_engines harness so the example doubles as a higher- level cross-engine regression. A future agent encountering the unary-negate-by-subtract-from-zero idiom now has an in-context working pattern to learn from.

codecov · 2026-05-13T17:06:50Z

Codecov Report

❌ Patch coverage is 98.31933% with 2 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/lexer/mod.rs	98.31%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

danieljohnmorris added 3 commits May 13, 2026 17:54

danieljohnmorris changed the title ~~fix: parses as subtract at fresh-expression positions~~ fix: split -N v into subtract at fresh-expression positions May 13, 2026

danieljohnmorris merged commit eee0b25 into main May 13, 2026
5 checks passed

danieljohnmorris deleted the fix/neg-literal-papercut branch May 13, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: split `-N v` into subtract at fresh-expression positions#244

fix: split `-N v` into subtract at fresh-expression positions#244
danieljohnmorris merged 3 commits into
mainfrom
fix/neg-literal-papercut

danieljohnmorris commented May 13, 2026

Uh oh!

codecov Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 13, 2026

Summary

Repro

What's in the diff

Test plan

Follow-ups

Uh oh!

codecov Bot commented May 13, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant