error_flag_lint.rs — new abstract stack simulation pass that tracks… by navicore · Pull Request #388 · navicore/patch-seq

navicore · 2026-03-30T02:02:45Z

… Bool values from fallible operations through:

Stack ops: swap, rot, over, dup, nip, tuck, 2dup
Aux stack: >aux / aux>
Combinators: dip, keep, bi (conservatively clear tracked flags)
Control flow: if (marks as checked), match, branch joining

Integration:

seqc lint — runs after resource analysis (Phase 2b)
LSP — runs after pattern linting, shows inline diagnostics in editor

Deduplication: Skips drops within 2 lines of the fallible operation (handled by pattern linter with better precision). Only fires for genuinely
non-adjacent drops — the cases patterns can't catch.

Test coverage: 7 unit tests covering adjacent drop (suppressed), non-adjacent nip, if-checking, aux round-trip, division, and escape analysis.

False positives on examples/: zero.

…Bool values from fallible operations through: - Stack ops: swap, rot, over, dup, nip, tuck, 2dup - Aux stack: >aux / aux> - Combinators: dip, keep, bi (conservatively clear tracked flags) - Control flow: if (marks as checked), match, branch joining Integration: - seqc lint — runs after resource analysis (Phase 2b) - LSP — runs after pattern linting, shows inline diagnostics in editor Deduplication: Skips drops within 2 lines of the fallible operation (handled by pattern linter with better precision). Only fires for genuinely non-adjacent drops — the cases patterns can't catch. Test coverage: 7 unit tests covering adjacent drop (suppressed), non-adjacent nip, if-checking, aux round-trip, division, and escape analysis. False positives on examples/: zero.

claude · 2026-03-30T02:05:47Z

PR Review: `error_flag_lint.rs` — Abstract Stack Simulation for Unchecked Error Flags

This is a well-conceived static analysis pass. The architecture is sound, the conservative design choices are sensible, and the deduplication strategy (deferring to the pattern linter for adjacent drops) avoids redundant noise. Below is a detailed review across multiple dimensions.

Code Quality and Design

Strengths:

The modular structure mirrors resource_lint.rs, making it easy to understand for anyone familiar with the existing lint passes.
FlagStack::join is a clean lattice-join for branch merging: preserving flags from either branch is the correct conservative choice.
The escape analysis behavior (flags on stack at word-end = returned, not warned) is the right default to prevent false positives at word boundaries.
The #[allow(dead_code)] on ErrorFlag::id is honest — the field is there for future deduplication but unused now. Consider either wiring it up or removing it until needed to keep the struct lean.

Potential issue — stale comment in module doc:

//! - `string->int nip` (nip drops the value below, leaving unchecked Bool... wait, nip drops below top)

The inline "wait" correction was clearly a thinking-out-loud note during development. It should be cleaned up before merge — either remove the example or rewrite it accurately (nip drops the second-from-top, i.e., the Int, leaving the Bool on top, which is not an unchecked drop). The confusion here is visible in the tests themselves (test_nip_drops_flag correctly asserts no warning, but the doc says the opposite).

Potential Bugs and Logic Issues

1. rot implementation is incorrect

The stack effect of rot is ( a b c -- b c a ): the third item from top moves to the top.

"rot" => {
    let c = state.pop(); // top
    let b = state.pop(); // second
    let a = state.pop(); // third (deepest of three)
    if let Some(v) = b { state.stack.push(v); }
    if let Some(v) = c { state.stack.push(v); }
    if let Some(v) = a { state.stack.push(v); }
}

This pushes b, c, a — giving ( a b c -- b c a ) with a on top. But the conventional Forth rot puts the third item (a) on top, which matches. Actually, checking more carefully: in a Vec stack where the last element is the top, after popping c (top), b (second), a (third/bottom), the result b, c, a would mean bottom=b, middle=c, top=a. That gives ( a b c -- b c a ) which is correct for standard Forth rot. This appears fine. ✓

2. consume_inputs and fallible_op_info are not kept in sync

fallible_op_info is the source of truth for what operations are fallible, but consume_inputs is a separate match that must stay synchronized with it. Notably:

chan.make appears in consume_inputs (0 pops) but is not in fallible_op_info — it will never reach the input-consuming path since consume_inputs is only called after fallible_op_info matches.
regex.valid?, crypto.sha256, regex.match?, crypto.hmac-sha256, crypto.constant-time-eq, crypto.ed25519-verify appear in consume_inputs but not in fallible_op_info. They are dead code in this path.

The divergence creates maintenance risk. A single source-of-truth table mapping name -> (inputs, values_before_bool, description) would eliminate this.

3. os.home-dir and os.current-dir input count mismatch

In consume_inputs, os.home-dir and os.current-dir are listed as consuming 0 inputs. In fallible_op_info they return (1, ...) (one value before the Bool). These operations take no inputs and produce one value + Bool — 0 pops is correct. But worth double-checking against the actual interpreter builtins to confirm they really take no inputs.

4. if branch join when else_branch is None

let mut else_state = state.clone();
// ...
if let Some(else_stmts) = else_branch {
    self.analyze_statements(else_stmts, &mut else_state, word);
}
*state = then_state.join(&else_state);

When there is no else branch, else_state is the pre-if state (with the Bool already popped). Joining then_state with the original-sans-bool state is correct for a no-else if. ✓ But it means flags present before if (below the Bool on the stack) could be duplicated in the join. This is a conservative over-approximation rather than a bug, but worth noting.

5. match arm bindings — Variant without bindings pushes nothing

crate::ast::Pattern::Variant(_) => {
    // All fields pushed as positional values
}

The comment says "all fields pushed as positional values" but the code does nothing. If a Variant pattern does bind positional fields, the stack model will be wrong (fields won't be tracked), potentially causing phantom stack underflows downstream. This should either push Other values for each field or be explicitly documented as a known gap.

Performance Considerations

FlagStack is Cloned at every branch (state.clone() for if, per-arm for match). For typical word bodies this is negligible, but deeply nested match expressions or long words with many branches could cause quadratic clone work. Not a concern in practice but worth keeping in mind as the language grows.
The retain calls in combinator handling (dip, keep, bi, call) do a linear scan to remove flags. Given small stack depths (Forth style), this is fine.

Test Coverage

The 7 tests are well-targeted at specific code paths:

Test	Path Covered
`test_adjacent_drop_not_flagged`	Suppression within 2-line window
`test_non_adjacent_drop_flagged`	Core warning path via `nip`
`test_checked_by_if`	`Statement::If` Bool consumption
`test_aux_round_trip_drop`	`>aux`/`aux>` tracking
`test_division_checked`	`i./` with if-check
`test_nip_drops_flag`	`nip` keeps Bool on top (no warn)
`test_swap_nip_drops_flag`	`swap` + `nip` drops Bool

Missing test cases worth adding:

match arm handling (the Variant vs VariantWithBindings path)
dip/keep/bi combinator behavior (flags cleared conservatively)
call with an anonymous quotation
seq:allow(unchecked-error-flag) suppression attribute
Multiple fallible operations in sequence (does the second flag suppress the first?)
2drop / 3drop paths

Security Considerations

No direct security concerns. The analyzer operates purely on AST data; there is no I/O, no unsafe code, and no external input parsing.

The fallible_op_info table covers crypto operations (crypto.aes-gcm-decrypt, etc.), which is good — silently ignored decryption failures can be security-critical. Ensuring those are flagged is a meaningful security-adjacent benefit.

Minor / Nits

The analyze_word method calls self.diagnostics.clear() and then std::mem::take(&mut self.diagnostics). Since ErrorFlagAnalyzer holds diagnostics as a field but the public API is analyze_program returning a Vec, consider making diagnostics a local variable in analyze_word instead of member state — it would remove the need for clear() and take() and make the borrowing cleaner.
emit_warning calls fallible_op_info(&flag.operation) again to get the description string. Since ErrorFlag already stores operation, and fallible_op_info does a full match scan on every call, it would be slightly cleaner to store the description in ErrorFlag at creation time (avoiding the re-lookup).
The LintDiagnostic fields start_index: 0 and end_index: 0 are set to zero. If downstream consumers use these for source-range highlighting, they'll get incorrect positions. The line/end_line fields are populated, but the index fields may need attention depending on how the LSP and CLI renderers use them.

Summary

This is a solid, well-structured addition to the lint pipeline. The main actionable items before merge are:

Fix the stale/confusing doc comment in the module-level docstring about nip.
Reconcile fallible_op_info and consume_inputs into a single table to eliminate the sync drift and dead entries.
Clarify or implement the Variant pattern binding push in the match arm handler.
Consider removing ErrorFlag::id if it has no current use.

The deduplication boundary (within-2-lines → pattern linter, beyond → this pass) is a smart design choice. The zero false positives on the examples/ directory is a good validation signal.

Reviewed by Claude (claude-sonnet-4-6)

┌─────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ Issue │ Fix │ ├─────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Stale nip doc comment │ Rewrote to accurate examples (swap nip, aux round-trip) │ ├─────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ fallible_op_info / consume_inputs sync │ Merged into single FallibleOpInfo struct with inputs, values_before_bool, description. Eliminated │ │ drift │ consume_inputs entirely │ ├─────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Dead entries in old consume_inputs │ Gone — single table now, no dead code │ ├─────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ ErrorFlag::id unused │ Removed field entirely │ ├─────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ emit_warning re-lookup of description │ Description now stored in ErrorFlag at creation time │ ├─────────────────────────────────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Variant pattern binding gap │ Documented as known limitation (matches resource_lint behavior) │ └─────────────────────────────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────┘ 970 tests pass, zero warnings, zero false positives on examples.

claude · 2026-03-30T04:40:55Z

PR Review: `error_flag_lint.rs` — Error Flag Detection Pass

Nice addition. The conservative design philosophy is well-thought-out, the module-level docs are clear, and the zero-false-positive result on examples/ is a good bar to hold. A few issues worth addressing, roughly in priority order.

Bugs / Correctness

1. test_nip_drops_flag tests the opposite of what its name says

The test at line 719 is named test_nip_drops_flag but the comment inside says "nip keeps Bool on top — no warning" and assert!(diags.is_empty(), ...). The test is correct, but the name is the inverse of the behavior being tested. Rename to test_nip_keeps_flag_no_warning or similar to avoid future confusion.

2. Redundant let popped / let _ = popped in Statement::If handler (line 303–306)

let popped = state.pop();
let _ = popped;

This is just state.pop();. The intermediate binding adds noise and the comment already explains why no action is taken.

3. Spans absent (line = 0) can silently suppress warnings

In emit_warning:

if drop_line <= flag.created_line + 2 {
    return;
}

If either span is missing, line defaults to 0. Two operations both missing spans will have created_line = 0 and drop_line = 0, satisfying 0 <= 0 + 2 and silently suppressing the warning. This is probably fine in practice (nodes without spans are unusual), but worth a comment or an explicit 0 == 0 guard.

Design Observations

4. cond clears the entire stack (line 370)

state.stack.clear();

cond consumes the top Bool as a condition. Clearing the entire stack throws away tracking for anything below it, which could mask flags on the rest of the stack. A more precise model would be state.pop() (just consume the condition Bool) and leave the rest. If cond truly has unbounded effects on the rest of the stack this deserves an explicit comment explaining why.

5. diagnostics as a struct field rather than a local

self.diagnostics is used as a scratch buffer that is cleared at the start of each analyze_word call. Because emit_warning needs &mut self, this pattern propagates &mut self through the whole call chain. A diagnostics: &mut Vec<LintDiagnostic> parameter passed down from analyze_word would make the data flow explicit and avoid surprising reset semantics — but this mirrors resource_lint.rs, so keeping it consistent is reasonable if you don't want to diverge from the existing pattern.

Missing Test Coverage

The 7 tests cover the main happy/unhappy paths well. A few gaps worth filling:

seq:allow(unchecked-error-flag) suppression — no test verifies the allowed_lints early-exit actually works.
Multiple flags simultaneously — e.g., two fallible calls whose flags are both later dropped non-adjacently.
dip/call silent-clear behavior — currently there's no test that a flag on the pre-dip stack is silently cleared (i.e., no false-positive). This is the most surprising behavior and worth documenting with a test.
2drop/3drop with a mix of flags and non-flags — the loop-based implementation should work, but a test would confirm multi-drop ordering is correct.

Nit

The comment on line 721 says "nip keeps Bool on top" while the test function is named test_nip_drops_flag. These tell contradictory stories to anyone skimming the test file.

Overall the implementation is clean and the deduplication with the pattern linter (adjacency suppression) is a smart boundary. The conservative choices (unknown words don't warn, dip/call clear flags silently) are the right defaults. The issues above are mostly small — the cond stack-clear and the missing allowed_lints test are the ones I'd prioritize.

┌────────────────────────────────────────┬───────────────────────────────────────────────────────────────────┐ │ Issue │ Fix │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ test_nip_drops_flag contradictory name │ Renamed to test_nip_preserves_flag_on_top, cleaned comment │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Redundant let popped / let _ = popped │ Simplified to state.pop() │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ cond clears entire stack │ Now pops one value (count arg) instead of clearing; added comment │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Span-missing suppression │ Added comment documenting the edge case │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Missing allowed_lints test │ Added test_allow_suppresses_warning │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Missing multiple-flags test │ Added test_multiple_flags_both_dropped (two flags, both warned) │ ├────────────────────────────────────────┼───────────────────────────────────────────────────────────────────┤ │ Missing dip false-positive test │ Added test_dip_clears_flags_no_false_positive │ └────────────────────────────────────────┴───────────────────────────────────────────────────────────────────┘ 10 tests now (was 7), 973 total.

claude · 2026-03-31T01:15:44Z

Code Review: `error_flag_lint.rs` — Abstract stack simulation for unchecked error flags

Overall this is a well-designed, carefully conservative lint pass. The architecture mirrors resource_lint.rs, the doc comments are excellent, and the deduplication strategy with the pattern linter is clearly reasoned. A few issues worth addressing:

Bug: `rot` semantics appear inverted

In analyze_word_call (around line 431–443), rot is implemented as:

"rot" => {
    let c = state.pop();  // top
    let b = state.pop();  // second
    let a = state.pop();  // third (bottom of the trio)
    if let Some(v) = b { state.stack.push(v); }
    if let Some(v) = c { state.stack.push(v); }
    if let Some(v) = a { state.stack.push(v); }
}

This yields ( a b c -- b c a ), putting a (the deepest element) on top. Whether this is correct depends on Seq's rot definition. In most Forth-family languages rot is ( a b c -- b c a ), so this may actually be correct — but it's worth a comment or a test confirming the expected behaviour, since the variable naming (a = bottom, c = top) makes it easy to mis-read.

Concern: `join` with unequal-depth stacks

FlagStack::join (lines 94–129) takes the max of the two stack depths and fills shorter branches with StackVal::Other. This is safe for well-typed Seq programs (branches must have equal stack effects), but in programs with type errors the simulation could propagate a flag from a longer branch into subsequent analysis as if it belonged to a shorter branch. Since the linter runs before the type checker (Phase 2b vs Phase 3), this ordering means malformed programs could produce confusing double-diagnostics. Not necessarily a blocker, but worth a comment noting the assumption.

Inconsistent phase numbering

main.rs: // Phase 2b: Error flag tracking
diagnostics.rs: // Phase 4b: Error flag tracking

One of these is wrong. The LSP diagnostic phases don't appear to be the same sequence as the CLI lint phases — if they're independent sequences that's fine, but the comment should reflect that (e.g., "Phase 3 in LSP pipeline").

`cond` handling is very conservative (possibly too much so)

if is_checking_consumer(name) {
    state.pop(); // at minimum, the count argument
    return;
}

If cond receives quotation pairs that were pushed before the call, flags inside those quotations remain on the abstract stack after cond returns, even though cond structurally consumes them. In practice this probably doesn't cause false positives (the flags accumulate and are eventually escaped or cleared), but it could cause missed warnings if a flag is buried under cond's arguments. A comment noting this limitation would help future readers.

Test coverage gaps

The test suite has good breadth for the headline cases, but a few implemented stack ops have no direct tests:

rot — its correctness claim above makes a dedicated test important
2drop / 3drop — these pop multiple times; worth a test that verifies all flags in a 2-drop are reported
tuck / over / 2dup — lower risk since they don't drop values, but positive tests confirm tracking survives duplication

A test_rot_drops_flag and test_2drop_reports_both_flags would close the obvious gaps.

Minor: `analyze_word` mutates `self.diagnostics` as a scratch buffer

fn analyze_word(&mut self, word: &WordDef) -> Vec<LintDiagnostic> {
    self.diagnostics.clear();
    ...
    std::mem::take(&mut self.diagnostics)
}

This pattern (shared mutable scratch buffer on self) is safe for single-threaded, non-recursive use and mirrors resource_lint.rs, so it's consistent. Worth a brief comment to prevent a future refactor from accidentally making analyze_word reentrant.

Nit: adjacency threshold is implicit

if drop_line <= flag.created_line + 2 {
    return;
}

The + 2 is explained in the comment above it, but extracting it as a named constant (ADJACENT_LINE_THRESHOLD) would make it easier to tune and self-document the intent at the call site.

Summary: The core logic is sound and the conservative-by-default choices are well-justified. The rot semantics should be verified with a test, the phase numbering inconsistency should be fixed, and the test gaps are worth closing before merge. Everything else is low-priority polish.

navicore merged commit a940d24 into main Mar 31, 2026
2 checks passed

navicore deleted the phase-2-lint branch March 31, 2026 01:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error_flag_lint.rs — new abstract stack simulation pass that tracks…#388

error_flag_lint.rs — new abstract stack simulation pass that tracks…#388
navicore merged 3 commits intomainfrom
phase-2-lint

navicore commented Mar 30, 2026

Uh oh!

claude Bot commented Mar 30, 2026

Uh oh!

claude Bot commented Mar 30, 2026

Uh oh!

claude Bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

navicore commented Mar 30, 2026

Uh oh!

claude Bot commented Mar 30, 2026

PR Review: error_flag_lint.rs — Abstract Stack Simulation for Unchecked Error Flags

Code Quality and Design

Potential Bugs and Logic Issues

Performance Considerations

Test Coverage

Security Considerations

Minor / Nits

Summary

Uh oh!

claude Bot commented Mar 30, 2026

PR Review: error_flag_lint.rs — Error Flag Detection Pass

Bugs / Correctness

Design Observations

Missing Test Coverage

Nit

Uh oh!

claude Bot commented Mar 31, 2026

Code Review: error_flag_lint.rs — Abstract stack simulation for unchecked error flags

Bug: rot semantics appear inverted

Concern: join with unequal-depth stacks

Inconsistent phase numbering

cond handling is very conservative (possibly too much so)

Test coverage gaps

Minor: analyze_word mutates self.diagnostics as a scratch buffer

Nit: adjacency threshold is implicit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PR Review: `error_flag_lint.rs` — Abstract Stack Simulation for Unchecked Error Flags

PR Review: `error_flag_lint.rs` — Error Flag Detection Pass

Code Review: `error_flag_lint.rs` — Abstract stack simulation for unchecked error flags

Bug: `rot` semantics appear inverted

Concern: `join` with unequal-depth stacks

`cond` handling is very conservative (possibly too much so)

Minor: `analyze_word` mutates `self.diagnostics` as a scratch buffer