Skip to content

analyzer+ir: fix state-local variables and overlay them across mutually-exclusive states#31

Merged
imjasonh merged 2 commits intomainfrom
claude/resolve-issue-22-0YxNw
Apr 17, 2026
Merged

analyzer+ir: fix state-local variables and overlay them across mutually-exclusive states#31
imjasonh merged 2 commits intomainfrom
claude/resolve-issue-22-0YxNw

Conversation

@imjasonh
Copy link
Copy Markdown
Owner

@imjasonh imjasonh commented Apr 17, 2026

Fixes #22.

What changed

Two things, tightly coupled: a long-latent bug fix and the overlay feature it unlocked.

1. State-local variables actually work now

Before this change, state-level var declarations (state Foo { var x: u8 = 0 }) were silently no-ops at codegen time. The analyzer allocated a ZP slot, but IrCodeGen::new only populated its var_addrs map from IR globals and scope-qualified function locals — state-locals lived in neither list, so every LoadVar / StoreVar whose VarId pointed at one resolved to no address and emitted zero bytes. Initializers at reset were similarly dropped.

The lowerer now pushes each state's locals into the IR globals table (with init_value = None) so the codegen resolves their addresses the same way it does program globals.

2. Overlay allocation for mutually-exclusive states

ZP_CURRENT_STATE keeps exactly one state active at runtime, so every state's locals are mutually exclusive with every other state's and can safely share the same bytes. The analyzer now snapshots both cursors after globals are laid out, rewinds to the snapshot before each state's locals, and advances to the running max at the end. The IR lowerer prepends each state-local's declared initializer to the owning on_enter handler (synthesizing an empty one where needed) so a freshly-entered state re-establishes its bytes before user code runs — which is what makes the overlay safe when another state has written the same ZP byte in the interim.

--memory-map now tags each allocation with its owning state ([@Title], [@Playing], ...) and counts distinct bytes rather than summed sizes so overlaid slots don't double-count. AnalysisResult.state_local_owners exposes the ownership to any tool that wants to group allocations.

3. Converted existing examples to exercise the feature

Only state_machine.ne used state-locals before this PR — which is exactly how the silent-drop bug survived so long: no example with state-locals observed the dropped writes inside the 180-frame harness window. As a follow-up commit, coin_cavern.ne and platformer.ne move their state-scoped globals onto the owning state blocks so the overlay code path sees real use.

Why the bug wasn't caught earlier

  • Goldens capture whatever happens, not what should happen. With state-local writes silently dropped, reads returned leftover bytes from IR temp slots and the behavior was stable across builds, so existing goldens locked in the broken behavior.
  • The state-machine examples didn't depend on state-local values at frame 180. state_machine.ne stays in Title forever (no auto-advance, no buttons in the harness); platformer.ne auto-advances on a global frame_tick, not on Title's state-local blink. The wrong state-local values never reached a test assertion.
  • There was no dedicated test. program_with_state_machine covered allocation and scoping but nothing wrote a value to a state-local, did a wait_frame, and then read it back to assert persistence. The new state_local_store_round_trips_through_zero_page test asserts the specific LDA #N / STA $10 byte sequences land in the ROM, which would have failed against the old silently-dropping codegen.

Measured impact

Overlay savings across the three examples that declare state-locals today:

example         naive  overlay  saved  per-state
------------------------------------------------------
coin_cavern         5        5      0  Playing=5
platformer         10        8      2  GameOver=1, Playing=8, Title=1
state_machine       5        3      2  Playing=3, Title=1, Victory=1
------------------------------------------------------
TOTAL              20       16      4

4 bytes of ZP reclaimed across the suite — meaningful on a 2 KB system but not dramatic, because the savings scale with total_state_local_bytes − max_state_local_bytes and these examples have one dominant state (Playing) with lightweight siblings. The shape where the overlay really pays is "many fat states" (Level1/Level2/Level3 each with ~10 bytes, or symmetric menu/play/score turn state) — we don't have an example like that yet, but the plumbing is in place.

Blast radius

  • Byte-level ROM bytes for coin_cavern.nes, platformer.nes, and state_machine.nes shift because their variable addresses moved.
  • Video goldens stay pixel-identical for all 34 examples.
  • Only platformer.audio.hash needed refreshing — the now-working blink counter in Title adds cycles per frame before auto-transitioning to Playing, which offsets APU register writes within each frame. docs/platformer.gif was also regenerated per the CLAUDE.md rule.

Tests

  • 5 new integration tests in tests/integration_test.rs covering: overlay at the same base address, globals don't alias state-locals, state-local writes round-trip through ZP, initializers fire on entry (not reset), synthesized on_enter for states that only have initializers.
  • All 715 existing Rust tests pass; all 34 emulator harness goldens pass.

Documentation

  • docs/language-guide.md — new "State-Local Variables and Memory Overlays" section with example.
  • CLAUDE.md — zero-page convention updated with the overlay rule and [@State] tagging.
  • docs/future-work.md — added a "State-local memory overlay follow-ups" section covering same-named locals across states, array/struct initializers on state-locals, and handler-local overlay.

https://claude.ai/code/session_015kvJu3iEFLSRJoShPBfm3X

claude added 2 commits April 17, 2026 02:20
Before this change, state-local variables (`state Foo { var x: u8 = 0 }`)
were silently no-ops: the analyzer allocated a ZP slot for them, but
the codegen's `var_addrs` map only covered IR globals and scope-qualified
function locals — so every `LoadVar` / `StoreVar` whose `VarId` pointed
at a state-local resolved to no address and emitted nothing. Existing
examples compiled and matched their goldens because none of them observed
the dropped writes within the 180-frame harness window.

The overlay changes the analyzer's state-local pass to snapshot both the
ZP and RAM cursors after the globals have been laid out, then rewind to
that snapshot before each state's locals and track the running max.
`ZP_CURRENT_STATE` keeps exactly one state active at runtime, so every
state's locals are mutually exclusive with every other state's and can
share the same bytes. The IR lowerer now pushes each state's locals into
the IR globals table (with `init_value=None`) so the codegen resolves
their addresses the same way it does program globals, and prepends the
declared initializers to each state's `on_enter` handler (synthesizing
an empty one where needed) so a freshly-entered state re-establishes its
bytes before user code runs.

`--memory-map` now tags each allocation with its owning state
(`[@title]`, `[@playing]`, ...) and counts distinct bytes rather than
summed allocation sizes so overlaid slots don't double-count. The
`AnalysisResult.state_local_owners` map exposes the ownership to any
tool that wants to group allocations the same way.

Only `state_machine.ne` and `platformer.ne` declare state-level vars,
so they're the only example ROMs whose bytes change. `platformer.ne`'s
audio golden shifts slightly (the now-working `blink` counter in Title
adds a few cycles per frame before the auto-transition to Playing, which
offsets APU register writes within each frame); its video golden and
every other example ROM stay byte-for-byte identical.

Fixes #22.

https://claude.ai/code/session_015kvJu3iEFLSRJoShPBfm3X
…latformer

Both examples declared their gameplay variables at the top level
even though every read and write happened inside one specific state.
That pattern hid the overlay feature from new users and kept the
state-local code path from being exercised outside the dedicated
`state_machine.ne` demo (which is how the "state-locals silently
drop their writes" bug survived so long).

`coin_cavern.ne`: the five Playing-only physics/position/inventory
vars (`player_x`, `player_y`, `player_vy`, `on_ground`,
`coins_left`) move onto Playing's state block. `score` stays
global because GameOver-era code could reasonably grow to read it.
The `on_enter` body loses its redundant resets — the declared
initializers on the state-locals re-run on every entry, so
retrying after `transition Title` comes back to a fresh state.

`platformer.ne`: player physics, camera, liveness, animation
phase, and the autopilot budget (`player_y`, `on_ground`,
`rise_count`, `fall_vy`, `camera_x`, `anim_tick`, `alive`,
`auto_jumps`) all move onto Playing. `frame_tick` and
`stomp_count` stay global — Title reads the former to
auto-advance, GameOver reads the latter to tally coins on the
death screen. The analyzer now overlays Title's `blink`,
Playing's eight physics vars, and GameOver's `linger` starting
at the same ZP byte (`$1A`), so the three scenes share a
9-byte window instead of each claiming their own slots.

Byte-level ROM bytes for both examples shift because variable
addresses moved. Video goldens stay pixel-identical (the harness
doesn't see Playing in coin_cavern, and the pre-transition
Title→Playing timing in platformer is preserved); the platformer
audio hash needed one more refresh because the now-slightly-shorter
reset prologue shifts APU writes within each frame.

https://claude.ai/code/session_015kvJu3iEFLSRJoShPBfm3X
@imjasonh imjasonh changed the title analyzer+ir: automatically overlay state-local variables analyzer+ir: fix state-local variables and overlay them across mutually-exclusive states Apr 17, 2026
@imjasonh imjasonh merged commit 431f144 into main Apr 17, 2026
7 checks passed
imjasonh pushed a commit that referenced this pull request Apr 18, 2026
…ld silent drop

Phase 1 of the post-PR-#31 audit. The PR #31 state-local bug had a
specific shape: analyzer allocated a slot, codegen looked it up by
VarId, silently emitted nothing on miss. Six sites in gen_op plus
the global-initializer loop and the parameter-shuffle prologue all
used the same `if let Some(&addr) = self.var_addrs.get(var) { ... }`
pattern with no else branch. Any future allocation-map desync would
slip through the same crack.

Replace every site with a new `IrCodeGen::var_addr(VarId) -> u16`
helper that panics with an explicit "compiler bug" message on miss.
An IR op referencing an unmapped VarId is not valid input — it means
the analyzer and lowerer disagreed on what to allocate, and we want
that crash to surface in CI rather than be absorbed by whatever
zero-filled RAM happened to sit at the read.

Running cargo test against the hardened lookup surfaced exactly the
bug shape the plan predicted: uninitialized struct globals (e.g.
`var p: Point` with no literal initializer) never had their flattened
field VarIds (`"p.x"`, `"p.y"`) registered in var_addrs. The IR
lowerer's `get_or_create_var("p.x")` minted a VarId, the analyzer's
`flatten_struct_fields` allocated an address for it, but IrCodeGen::new
only populated var_addrs from `ir.globals`, which doesn't contain
synthesized field entries for uninitialized structs. Every `p.x = N`
silently compiled to nothing.

Fix by exposing the IR lowerer's name→VarId map on IrProgram and
joining it with the analyzer's allocations in IrCodeGen::new. Every
allocated name that the lowerer knows about now gets a var_addrs
entry. Example ROMs are byte-identical (no example relied on the
dropped writes), but the bug was reachable — any user program with
a plain `var pos: Point` declaration and field writes would have hit
it silently.

Add `uninitialized_struct_field_store_emits_sta_to_allocated_address`
as a byte-level regression guard: compile `p.x = 123` and scan PRG
for `LDA #\$7B / STA <addr>`. Fails against the old silently-dropping
codegen.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
imjasonh pushed a commit that referenced this pull request Apr 18, 2026
…res (or fix them)

Phase 2 of the post-PR-#31 audit. The codebase had four documented
"silently skip" paths that parsed user intent but produced no code.
Each one was the same shape as the state-local bug: the analyzer
accepted the program, the IR lowered the construct, but somewhere
downstream the emitted code was dropped on the floor — and a pixel
golden that captured the broken behaviour locked it in as correct.

Fix each per the plan, either by implementing the feature or
rejecting the program at the analyzer.

### on_exit handlers now actually run

`IrOp::Transition` used to comment "on_exit of the current state
isn't called here because we don't know from an IR op alone which
state we're leaving." The codegen emitted the exit handler's body
as an IR function but never JSR'd it. Three example programs
(pong, war, state_machine) relied on `stop_music` or mode-flag
translation inside `on exit` that had been silently never running.

Emit a small CMP-chain against `ZP_CURRENT_STATE` before each
transition: for every state that declares an on_exit, compare the
current index, branch past on miss, JSR the exit handler on match,
then JMP to the shared done-label so only the leaving state's
handler fires. The chain is inlined at each transition site
(bounded by the number of states declaring on_exit) rather than
factored into a single trampoline — simpler to reason about, and
transitions are rare enough that the extra bytes don't matter.

Pong / war / state_machine ROMs change because the dispatch code
is now emitted. Video goldens stay byte-identical (no transitions
happen within the 180-frame harness window under no-input). Pong
and war audio hashes shifted from pure code-layout timing and are
regenerated. `docs/pong.gif` and `docs/war.gif` are byte-identical.

### State-local array initializers now refuse to compile (E0601)

`src/ir/lowering.rs:887` had the comment "Array initializers for
state-locals aren't supported yet... Programs that try this should
get a diagnostic from the analyzer; for now, silently skip." The
analyzer never actually emitted that diagnostic. Verified by
compiling `state Main { var buf: u8[4] = [10,20,30,40] ... }`:
the program built a valid ROM with no trace of 10/20/30/40 in PRG.

Add E0601 to the analyzer's state-local pass. The IR lowerer's
defensive `continue` stays in place as a belt-and-braces guard.

### `on scanline` without MMC3 is now E0603

Previously E0203 ("invalid operation for type") which is a
miscategorisation — the feature is unsupported on the current
mapper, not a type error. Dedicated E0603 makes the future-work
shape explicit.

### `slow` variables now actually live outside zero page

`Placement::Slow` was parsed into the AST but `allocate_ram`
ignored it, so `slow var cold: u8` still landed in ZP like any
other u8. Wire `var.placement` through `allocate_ram_with_placement`
and skip the ZP branch when `Slow` is set. `Fast` remains
advisory (the existing default already prefers ZP for u8 vars),
validated by W0107.

### Other address-map silent drops hardened

Alongside the var_addrs hardening from phase 1, three `state_indices`
lookup sites that did `.copied().unwrap_or(0)` or silent `if let`
are now explicit panics: scanline IRQ dispatch, MMC3 reload, and
`IrOp::Transition`. A miss in any of them is a compiler bug, not
valid input — the analyzer catches unknown state names upstream.

### Regression guards

Four new tests would have failed against the old silently-dropping
code paths:

- `analyze_state_local_array_initializer_rejected` — expects E0601.
- `analyze_on_exit_declaration_accepted` — expects no errors.
- `analyze_slow_var_forced_out_of_zero_page` — expects alloc
  address >= $0100.
- `transition_dispatches_leaving_states_on_exit_handler` — counts
  distinct JSR targets in the PRG before/after adding `on exit` to
  a state; the exit-bearing build must have more.

All 720 tests pass. All 34 emulator goldens pass after the pong/war
audio hash refresh.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
imjasonh pushed a commit that referenced this pull request Apr 18, 2026
… checklist

Phase 3/4 of the post-PR-#31 audit.

### Call args > 4 is now an assert

`IrOp::Call` silently `.take(4)`-d the arg list with a comment
claiming the analyzer's E0506 check made the extras unreachable.
Replace with an explicit `assert!(args.len() <= 4, ...)` so if
the analyzer ever regresses, the codegen crashes loudly instead
of miscompiling the call. Iterate over all args (not just the
first 4) since the assert guarantees correctness.

### CLAUDE.md: new-feature PR checklist

Document the lesson the audit taught: every new language-feature
PR must include (1) an example exercising it, (2) a runtime
*behaviour* assertion (not just a "ROM validates" shape check),
(3) a negative test for invalid use. Call out the specific
address-map lookup pattern (`if let Some(&addr) = map.get(..)`
with no else) that shipped the state-local bug, and recommend
the `IrCodeGen::var_addr` / explicit `.unwrap_or_else(|| panic!)`
idiom instead.

Chose not to add a regex-based CI tripwire for "silently" /
"for now" comments because the false-positive rate against
legitimate design decisions ("silently truncate to 8 bits per
the cast spec", etc.) would train contributors to ignore it.
The durable checklist in CLAUDE.md is what next agents need.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
imjasonh pushed a commit that referenced this pull request Apr 18, 2026
… regression

Phase 5 of the post-PR-#31 audit, and the structural piece that
closes the failure mode the earlier phases couldn't fix alone.

The audit's recurring diagnosis: pixel/audio goldens capture
*whatever* the program does, not what it *should* do. A silent
drop in codegen is still deterministic — the golden locks in
the broken behaviour and every future run agrees with it. That's
how state-locals, uninitialized struct-field writes, `on exit`
handlers, and `slow` placement each sat broken for months-to-a-
year in a green CI.

The canary inverts the relationship: the committed golden is a
solid-green universal backdrop that only appears when every
round-trip check passes. Each check writes a distinctive constant
through one language construct, reads it back, and clears
`all_ok` on mismatch. A final `if all_ok == 0 { set_palette Fail }`
flips the entire screen red for the rest of the run.

Checks cover the silent-drop shapes caught by this audit:
  - state-local variable write-read (PR #31)
  - uninitialized struct-field write-read (caught by phase 1)
  - u8 / u16 globals (u16 exercises both StoreVar + StoreVarHi)
  - array-element write at nonzero index
  - `slow`-placed global still round-trips
  - function call return value

The canary doesn't use `debug.assert` on purpose — debug-only
ops get stripped in release and the emulator harness runs
release builds. The palette swap works in release and is what
the harness pixel-diff sees.

### Why this matters as a long-lived test

The harness already had 34 pixel goldens covering full-program
behaviour, but none of them exist specifically to fail if a
*specific language feature* silently drops. The canary does.
Every silent-drop bug the audit found would have flipped it
red the moment the check was added, which is the "behaviour
assertion that can't be satisfied by silence" the plan called
for.

### Harness footprint

`tests/emulator/goldens/feature_canary.{png,audio.hash}` +
`examples/feature_canary.{ne,nes}`. 35/35 ROMs match their
goldens with the canary added. Listed in both README tables.

https://claude.ai/code/session_01AoQ678uVeqpyayvWHpfDhC
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory Lifetimes and Overlays (Unions)

2 participants