Skip to content

compiler: audit for PR-#31-shaped silent drops and close the feedback loop that hid them#32

Merged
imjasonh merged 5 commits intomainfrom
claude/fix-state-local-variables-uvv1X
Apr 18, 2026
Merged

compiler: audit for PR-#31-shaped silent drops and close the feedback loop that hid them#32
imjasonh merged 5 commits intomainfrom
claude/fix-state-local-variables-uvv1X

Conversation

@imjasonh
Copy link
Copy Markdown
Owner

@imjasonh imjasonh commented Apr 18, 2026

Closes the audit prompted by PR #31, where state Foo { var x } allocated a ZP slot but every LoadVar/StoreVar on it silently emitted zero bytes for months before anyone noticed. The bug survived because (a) the codegen's if let Some(&addr) = self.var_addrs.get(var) { } guards silently dropped unmapped VarIds, (b) pixel/audio goldens captured the broken behaviour as ground truth, and (c) no test wrote→waited→read through the feature.

This PR attacks all three.

What's in it

Five commits, one per phase of the plan.

1. Address-map misses panic loudly (f7012c6)

New IrCodeGen::var_addr(VarId) -> u16 helper that panics with an explicit "compiler bug" message on miss. Replaces seven silent if let Some(..) sites in gen_op plus the global-initializer loop and parameter-shuffle prologue. Three state_indices.get().copied().unwrap_or(0) misdispatch sites (MMC3 IRQ, MMC3 reload, IrOp::Transition) similarly converted to .unwrap_or_else(|| panic!(...)).

Running the hardened lookup against cargo test immediately found a latent bug with the same shape: uninitialized struct globals (var pos: Vec2 with no literal initializer) never had their flattened field VarIds registered in var_addrs. The analyzer's flatten_struct_fields allocated addresses for pos.x/pos.y; the lowerer's get_or_create_var("pos.x") minted VarIds for them; but IrCodeGen::new only walked ir.globals, which doesn't contain synthesized field entries for uninitialized structs. Every pos.x = N silently compiled to nothing.

Fixed by exposing the lowerer's name→VarId map on IrProgram and joining with the analyzer's allocations in IrCodeGen::new. uninitialized_struct_field_store_emits_sta_to_allocated_address scans the PRG for LDA #$7B / STA <addr> as a regression guard.

2. Four "silently skip" paths turned into real behaviour or hard errors (48bae97)

  • on exit now actually runs. IrOp::Transition used to comment "on_exit of the current state isn't called here because we don't know from an IR op alone which state we're leaving" and skipped the JSR. Pong, war, and state_machine all relied on stop_music and mode-flag translation inside on exit that had been silently never running — the goldens captured the broken state. Emit a runtime CMP-chain against ZP_CURRENT_STATE before every transition to JSR the leaving state's exit handler. Video goldens stay byte-identical (no transitions fire in the 180-frame no-input window), pong/war audio hashes refreshed, gifs byte-identical.
  • State-local array initializers become E0601. The lowerer continue'd past them with a comment saying the analyzer should refuse, but the analyzer never did. Added the diagnostic.
  • on scanline without MMC3 moved from E0203 to dedicated E0603.
  • slow var now actually leaves zero page. Placement::Slow was parsed but ignored by allocate_ram. Routed var.placement through allocate_ram_with_placement so slow var cold: u8 lands at $0300+ per the docs' promise. fast remains advisory (W0107 already warns when wasted), since the default allocator already prefers ZP for u8s.

Three regression tests: analyze_state_local_array_initializer_rejected, analyze_on_exit_declaration_accepted, analyze_slow_var_forced_out_of_zero_page, plus the byte-level transition_dispatches_leaving_states_on_exit_handler that compares JSR-target counts between the exit-bearing and baseline builds.

3. Call arity + CLAUDE.md policy (00f1305)

IrOp::Call used to .take(4) the arg list with a comment claiming E0506 made extras unreachable. Replaced with an explicit assert!(args.len() <= 4). Iterate over all args (no .take(4) needed once the assert stands guard).

CLAUDE.md gains a "new-feature PR checklist (don't ship silent drops)" section documenting the three ingredients every language-feature PR must include — example, runtime behaviour assertion (not just a rom::validate_ines(&rom) shape test), negative test — and calls out the specific address-map-lookup pattern reviewers should reject in favour of IrCodeGen::var_addr.

Deliberately skipped a regex-based CI tripwire for "silently" / "for now" comments. The false-positive rate against legitimate design decisions like "silently truncate to 8 bits per the cast spec" would train contributors to ignore the signal. A durable CLAUDE.md checklist is what next agents actually need.

4. feature_canary.ne: a golden that fails on absence of correctness (ee3dddb)

The structural piece. The audit's recurring diagnosis was that pixel/audio goldens capture whatever happens, not what should happen — so a silent drop is just as stable as correct code and locks in as the ground truth. The canary inverts this: its golden is a solid green universal backdrop that only appears when every memory construct round-trips correctly. Any check failure flips the whole screen red.

Checks cover every silent-drop shape this audit found: state-local variable write-read (PR #31), uninitialized struct-field write-read (phase 1), u8 / u16 globals (u16 exercises both StoreVar + StoreVarHi), array-element write at nonzero index, slow-placed global, function call return value.

Doesn't use debug.assert on purpose — those get stripped in release and the emulator harness runs release. set_palette Fail works in release and is what the pixel-diff sees.

Impact

Tests

  • 715 → 720 Rust tests (+4 analyzer regression tests, +1 byte-level integration test, +1 round-trip integration test; −1 struct test that was passing against the broken behaviour). Actually it's 634 unit + 3 lib + 83 integration = 720.
  • 34 → 35 emulator goldens (+feature_canary).
  • Three example audio hashes updated (pong/war, and state_machine didn't need it because no transition fires in its 180-frame window). Video and the two committed gifs byte-identical.

ROMs

  • pong.nes, state_machine.nes, war.nes bytes change — the exit-dispatch code now lands in every transition site. Video is pixel-identical at frame 180; pong.audio.hash and war.audio.hash refreshed (pure code-layout timing shift, no behavioural change in the no-input window).
  • Remaining example ROMs byte-identical.

Bugs the audit fixed that nobody had filed yet

  1. var pos: Vec2 + pos.x = 42 silently compiled to no code.
  2. var buf: u8[4] = [1,2,3,4] at state scope silently compiled with no initializer (buffer was stale RAM).
  3. on exit { stop_music } in pong/war/state_machine never ran.
  4. slow var cold: u8 still landed in zero page.

None had a Github issue. The audit script that wrote this plan found them via the same code-smell pattern PR #31 exhibited.

Review notes

The individual commits are structured to be reviewable independently. The interesting one is probably the struct-field fix in (1) — the rest is mechanical once you accept the "panic on miss" axiom.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC

claude added 4 commits April 17, 2026 23:49
…ld silent drop

Phase 1 of the post-PR-#31 audit. The PR #31 state-local bug had a
specific shape: analyzer allocated a slot, codegen looked it up by
VarId, silently emitted nothing on miss. Six sites in gen_op plus
the global-initializer loop and the parameter-shuffle prologue all
used the same `if let Some(&addr) = self.var_addrs.get(var) { ... }`
pattern with no else branch. Any future allocation-map desync would
slip through the same crack.

Replace every site with a new `IrCodeGen::var_addr(VarId) -> u16`
helper that panics with an explicit "compiler bug" message on miss.
An IR op referencing an unmapped VarId is not valid input — it means
the analyzer and lowerer disagreed on what to allocate, and we want
that crash to surface in CI rather than be absorbed by whatever
zero-filled RAM happened to sit at the read.

Running cargo test against the hardened lookup surfaced exactly the
bug shape the plan predicted: uninitialized struct globals (e.g.
`var p: Point` with no literal initializer) never had their flattened
field VarIds (`"p.x"`, `"p.y"`) registered in var_addrs. The IR
lowerer's `get_or_create_var("p.x")` minted a VarId, the analyzer's
`flatten_struct_fields` allocated an address for it, but IrCodeGen::new
only populated var_addrs from `ir.globals`, which doesn't contain
synthesized field entries for uninitialized structs. Every `p.x = N`
silently compiled to nothing.

Fix by exposing the IR lowerer's name→VarId map on IrProgram and
joining it with the analyzer's allocations in IrCodeGen::new. Every
allocated name that the lowerer knows about now gets a var_addrs
entry. Example ROMs are byte-identical (no example relied on the
dropped writes), but the bug was reachable — any user program with
a plain `var pos: Point` declaration and field writes would have hit
it silently.

Add `uninitialized_struct_field_store_emits_sta_to_allocated_address`
as a byte-level regression guard: compile `p.x = 123` and scan PRG
for `LDA #\$7B / STA <addr>`. Fails against the old silently-dropping
codegen.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
…res (or fix them)

Phase 2 of the post-PR-#31 audit. The codebase had four documented
"silently skip" paths that parsed user intent but produced no code.
Each one was the same shape as the state-local bug: the analyzer
accepted the program, the IR lowered the construct, but somewhere
downstream the emitted code was dropped on the floor — and a pixel
golden that captured the broken behaviour locked it in as correct.

Fix each per the plan, either by implementing the feature or
rejecting the program at the analyzer.

### on_exit handlers now actually run

`IrOp::Transition` used to comment "on_exit of the current state
isn't called here because we don't know from an IR op alone which
state we're leaving." The codegen emitted the exit handler's body
as an IR function but never JSR'd it. Three example programs
(pong, war, state_machine) relied on `stop_music` or mode-flag
translation inside `on exit` that had been silently never running.

Emit a small CMP-chain against `ZP_CURRENT_STATE` before each
transition: for every state that declares an on_exit, compare the
current index, branch past on miss, JSR the exit handler on match,
then JMP to the shared done-label so only the leaving state's
handler fires. The chain is inlined at each transition site
(bounded by the number of states declaring on_exit) rather than
factored into a single trampoline — simpler to reason about, and
transitions are rare enough that the extra bytes don't matter.

Pong / war / state_machine ROMs change because the dispatch code
is now emitted. Video goldens stay byte-identical (no transitions
happen within the 180-frame harness window under no-input). Pong
and war audio hashes shifted from pure code-layout timing and are
regenerated. `docs/pong.gif` and `docs/war.gif` are byte-identical.

### State-local array initializers now refuse to compile (E0601)

`src/ir/lowering.rs:887` had the comment "Array initializers for
state-locals aren't supported yet... Programs that try this should
get a diagnostic from the analyzer; for now, silently skip." The
analyzer never actually emitted that diagnostic. Verified by
compiling `state Main { var buf: u8[4] = [10,20,30,40] ... }`:
the program built a valid ROM with no trace of 10/20/30/40 in PRG.

Add E0601 to the analyzer's state-local pass. The IR lowerer's
defensive `continue` stays in place as a belt-and-braces guard.

### `on scanline` without MMC3 is now E0603

Previously E0203 ("invalid operation for type") which is a
miscategorisation — the feature is unsupported on the current
mapper, not a type error. Dedicated E0603 makes the future-work
shape explicit.

### `slow` variables now actually live outside zero page

`Placement::Slow` was parsed into the AST but `allocate_ram`
ignored it, so `slow var cold: u8` still landed in ZP like any
other u8. Wire `var.placement` through `allocate_ram_with_placement`
and skip the ZP branch when `Slow` is set. `Fast` remains
advisory (the existing default already prefers ZP for u8 vars),
validated by W0107.

### Other address-map silent drops hardened

Alongside the var_addrs hardening from phase 1, three `state_indices`
lookup sites that did `.copied().unwrap_or(0)` or silent `if let`
are now explicit panics: scanline IRQ dispatch, MMC3 reload, and
`IrOp::Transition`. A miss in any of them is a compiler bug, not
valid input — the analyzer catches unknown state names upstream.

### Regression guards

Four new tests would have failed against the old silently-dropping
code paths:

- `analyze_state_local_array_initializer_rejected` — expects E0601.
- `analyze_on_exit_declaration_accepted` — expects no errors.
- `analyze_slow_var_forced_out_of_zero_page` — expects alloc
  address >= $0100.
- `transition_dispatches_leaving_states_on_exit_handler` — counts
  distinct JSR targets in the PRG before/after adding `on exit` to
  a state; the exit-bearing build must have more.

All 720 tests pass. All 34 emulator goldens pass after the pong/war
audio hash refresh.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
… checklist

Phase 3/4 of the post-PR-#31 audit.

### Call args > 4 is now an assert

`IrOp::Call` silently `.take(4)`-d the arg list with a comment
claiming the analyzer's E0506 check made the extras unreachable.
Replace with an explicit `assert!(args.len() <= 4, ...)` so if
the analyzer ever regresses, the codegen crashes loudly instead
of miscompiling the call. Iterate over all args (not just the
first 4) since the assert guarantees correctness.

### CLAUDE.md: new-feature PR checklist

Document the lesson the audit taught: every new language-feature
PR must include (1) an example exercising it, (2) a runtime
*behaviour* assertion (not just a "ROM validates" shape check),
(3) a negative test for invalid use. Call out the specific
address-map lookup pattern (`if let Some(&addr) = map.get(..)`
with no else) that shipped the state-local bug, and recommend
the `IrCodeGen::var_addr` / explicit `.unwrap_or_else(|| panic!)`
idiom instead.

Chose not to add a regex-based CI tripwire for "silently" /
"for now" comments because the false-positive rate against
legitimate design decisions ("silently truncate to 8 bits per
the cast spec", etc.) would train contributors to ignore it.
The durable checklist in CLAUDE.md is what next agents need.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
… regression

Phase 5 of the post-PR-#31 audit, and the structural piece that
closes the failure mode the earlier phases couldn't fix alone.

The audit's recurring diagnosis: pixel/audio goldens capture
*whatever* the program does, not what it *should* do. A silent
drop in codegen is still deterministic — the golden locks in
the broken behaviour and every future run agrees with it. That's
how state-locals, uninitialized struct-field writes, `on exit`
handlers, and `slow` placement each sat broken for months-to-a-
year in a green CI.

The canary inverts the relationship: the committed golden is a
solid-green universal backdrop that only appears when every
round-trip check passes. Each check writes a distinctive constant
through one language construct, reads it back, and clears
`all_ok` on mismatch. A final `if all_ok == 0 { set_palette Fail }`
flips the entire screen red for the rest of the run.

Checks cover the silent-drop shapes caught by this audit:
  - state-local variable write-read (PR #31)
  - uninitialized struct-field write-read (caught by phase 1)
  - u8 / u16 globals (u16 exercises both StoreVar + StoreVarHi)
  - array-element write at nonzero index
  - `slow`-placed global still round-trips
  - function call return value

The canary doesn't use `debug.assert` on purpose — debug-only
ops get stripped in release and the emulator harness runs
release builds. The palette swap works in release and is what
the harness pixel-diff sees.

### Why this matters as a long-lived test

The harness already had 34 pixel goldens covering full-program
behaviour, but none of them exist specifically to fail if a
*specific language feature* silently drops. The canary does.
Every silent-drop bug the audit found would have flipped it
red the moment the check was added, which is the "behaviour
assertion that can't be satisfied by silence" the plan called
for.

### Harness footprint

`tests/emulator/goldens/feature_canary.{png,audio.hash}` +
`examples/feature_canary.{ne,nes}`. 35/35 ROMs match their
goldens with the canary added. Listed in both README tables.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
@imjasonh imjasonh changed the title codegen: make var_addrs misses panic loudly and fix latent struct-field silent drop compiler: audit for PR-#31-shaped silent drops and close the feedback loop that hid them Apr 18, 2026
… convention

Follow-up to the silent-drop audit. The old ABI passed every
parameter through four fixed zero-page transport slots `$04-$07`,
imposing a hard 4-param cap (E0506) that didn't compose with
structs/arrays/u16s and fell back to "pack args into a global"
workarounds whenever a function needed five things. The transport
scheme also cost every non-leaf call a 4-LDA/STA spill prologue
(~28 cycles, 16 bytes) to copy args out of ZP before the next
nested `JSR` could clobber them.

Replace it with a hybrid convention keyed on leaf-ness:

- **Leaf callees** (no nested `JSR` in body, ≤4 params):
  unchanged. Caller stages args into `$04-$07`; body reads those
  slots directly for its entire lifetime. No prologue copy.
  Fastest path, 3-cycle ZP stores + 3-cycle ZP loads, preserves
  the SHA-256 leaf-primitive optimisation that motivated the
  original fast path.

- **Non-leaf callees** (body contains a nested `JSR`, OR ≥5
  params): direct-write. Caller stages each argument straight
  into the callee's analyzer-allocated parameter RAM slot,
  bypassing the transport slots entirely. No prologue copy on
  the callee side. Saves ~24 cycles and ~16 bytes per call vs
  the old transport-then-spill path, and — crucially — scales
  past 4 params because the per-param slots live wherever the
  analyzer put them rather than in a fixed ZP window.

The analyzer's ceiling moves from 4 to 8. Functions with 5–8
params are silently promoted to the non-leaf convention (even if
their body has no nested `JSR`), which pays the direct-write cost
rather than the prologue-copy cost — still cheaper than the old
ABI. Declarations with 9+ params still emit E0506.

### Implementation

- `function_is_leaf` now also requires `param_count <= 4`.
- `IrCodeGen::new` populates `non_leaf_param_addrs: HashMap<String,
  Vec<u16>>` — for every non-leaf function, the ordered list of
  addresses its parameters occupy. Callers use this to route each
  arg directly to the right slot.
- `IrOp::Call` branches on presence in the map: non-leaf → direct-
  write, leaf (or absent — 0-arg case) → ZP transport.
- `gen_function` no longer emits a prologue. Leaves didn't have
  one; non-leaves had a 4-LDA/STA copy that is now unnecessary
  because args arrive pre-written to the slot.
- The previous `leaf_functions: HashSet<String>` field is
  removed; leaf-ness is now inferred from absence-in-
  `non_leaf_param_addrs` at the call site.

### Tests and regressions

- `eight_param_non_leaf_function_stages_every_arg_at_its_allocated_slot`
  compiles an 8-param function, scans PRG for a distinct
  `LDA #\$NN / STA <addr>` per arg (immediates `0x11..0x88`), and
  asserts that STAs to the `$04-$07` range are strictly fewer
  than 8 — proof the old transport path is gone for this call.
- `non_leaf_call_direct_writes_args_to_callee_param_slots`
  replaces the old `gen_function_prologue_spills_params_to_local_ram`
  test with a dual assertion: (a) no `LDA \$04` prologue at the
  callee entry, and (b) the caller-side STA lands at the
  analyzer-allocated param slot, not at `\$04-\$07`.
- `analyze_rejects_function_with_more_than_4_params` renamed and
  rewritten for the new 8-param cap.
- `feature_canary.ne` gains a 6-param `sum6` call (1+2+3+4+5+6 =
  21) as check 8. The canary stays green (all eight checks
  pass), so the committed golden is unchanged.

### Blast radius

- Six example ROMs change bytes (arrays_and_functions, function_chain,
  mmc1_banked, pong, sha256, war) because their non-leaf call sites
  pick up the shorter staging sequence.
- Pong and war audio hashes refresh (pure layout-timing shift; no
  behavioural change in the 180-frame no-input window). docs/pong.gif
  and docs/war.gif stay byte-identical.
- `examples/function_chain.ne`'s header comment updated to
  document the leaf vs non-leaf split it exercises.
- `docs/language-guide.md` parameter-count section and E0506 entry
  updated to reflect the new rule.

All 720 Rust tests pass; all 35 emulator goldens pass.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
@imjasonh imjasonh merged commit 33b5a39 into main Apr 18, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants