signed compares, metatiles + collision, register-allocator pass by imjasonh · Pull Request #36 · imjasonh/nescript

imjasonh · 2026-04-19T01:46:24Z

Works through the top three items in the priority ranking at the bottom of docs/future-work.md.

1. Signedness on Cmp16/Cmp ops (§A follow-up)

Ordering compares (<, <=, >, >=) on signed integers now use the canonical 6502 CMP / SBC / BVC / EOR #$80 overflow-correction idiom so the N flag reflects the true sign of the difference. The lowerer tracks signedness on each IR temp (analogous to the existing wide_hi map) and threads it onto a new Signedness field on CmpLt/CmpGt/CmpLtEq/CmpGtEq and their 16-bit variants.

widen() also sign-extends when the source temp is signed, via a new IrOp::SignExtend, so var w: i16 = some_i8_neg round-trips negative values instead of zero-extending to $00F6.

Casts to u8/u16 strip the signed flag so expr as u16 < other_u16 stays on the unsigned path.

See examples/signed_compare.ne — four pip sprites gate on signed comparisons; three light (signed-correct) and one stays dark (would only light if the lowering regressed to unsigned).

2. Metatiles + collision (§H)

metatileset Name { metatiles: [{ id, tiles, collide }, ...] } and room Name { metatileset: M, layout: [...] } ship as a cohesive feature. Each metatile bundles 4 CHR tile indices (TL/TR/BL/BR) plus a collide flag; rooms lay them out as a 16×15 grid that the compiler expands at compile time into three PRG blobs:

__room_tiles_<name> — 960-byte 32×30 nametable
__room_attrs_<name> — 64-byte attribute table
__room_col_<name> — 240-byte collision bitmap (one byte per metatile)

paint_room Name reuses the existing load_background vblank-safe update machinery for the nametable blit and additionally installs the room's collision bitmap pointer into ZP_ROOM_COL_LO/ZP_ROOM_COL_HI (ZP $18/$19).

collides_at(x: u8, y: u8) -> bool JSRs into a small runtime helper that reads (room_col),Y with Y = (y & 0xF0) | (x >> 4) and returns the 0/1 byte directly. Gated on a __collides_at_used marker — programs that declare a room but never query it pay zero bytes for the subroutine.

parse_byte_array grows a [value; count] shortcut so 240-entry layouts stay readable.

See examples/metatiles_demo.ne: a probe walks right, bounces off the right wall when collides_at fires, and lands on the left side of the playfield by frame 180.

3. Register-allocator follow-up (Code quality)

remove_dead_loads now steps past opcodes that touch neither A nor the flags an LDA sets — LDX/LDY/INX/INY/DEX/DEY and the flag ops (CLC/SEC/CLI/SEI/CLD/SED/CLV) on top of the INC/DEC/STX/STY opcodes the pass already stepped past.

The highest-leverage case is every single-tile draw. Copy propagation and dead-store elimination together leave:

LDA #<y>          ; stray producer, value never consumed
LDY oam_cursor
LDA #<y>          ; real load before STA
STA $0200,Y

The first LDA was surviving because the pass bailed on the LDY. With the step-past, it drops — one LDA gone per draw, 2 bytes each.

LDA-count reductions on the committed examples:

example	before	after	Δ
platformer	242	221	-8.7 %
war	785	754	-4.0 %
pong	843	827	-1.9 %

Audio-goldens churn

The cycle savings shift the main-loop ↔ NMI boundary in audio-emitting programs, which re-times which frame each SFX trigger lands in. Six audio hashes re-baselined as a result: audio_demo, friendly_assets, noise_triangle_sfx, platformer, pong, war.

All 50 PNG goldens, the platformer/war/pong demo gifs, and every non-audio program stay byte-identical. The re-baselined audio is still sample-accurate; what changed is the first-SFX offset within the captured 132 084-sample window. This tradeoff is spelled out in docs/future-work.md's register-allocator section along with the remaining wins worth chasing next (cross-block A-tracking, X/Y allocation, spill skipping at codegen time).

Test plan

cargo test --all-targets — 779 pass
cargo fmt --check, cargo clippy --all-targets -- -D warnings clean
tests/emulator/run_examples.mjs — 50/50 ROMs match their (re-baselined) goldens
Every examples/*.nes re-committed to match its .ne source
docs/{platformer,war,pong}.gif regenerated (byte-identical to the pre-change gifs)

What was added / removed in `docs/future-work.md`

Removed from §A and §H — both features ship in this PR.
Rewrote the register-allocator section to describe what's shipped (the peephole step-past), the remaining wins, and the audio-shift constraint.
Updated priority ranking to reflect the three top items being done.

Closes the §A follow-up gap: ordering compares (`<`, `<=`, `>`, `>=`) on signed integer types now use the canonical 6502 `CMP / SBC / BVC / EOR #$80` overflow-correction idiom so the N flag reflects the true sign of the difference, instead of the previous BCC/BCS-based path that always treated `$FFxx` as greater than `$00yy`. The same change also fixes narrow-to-wide widening: assigning a runtime `i8` expression to an `i16` variable now sign-extends the high byte via a new `IrOp::SignExtend` op instead of zero-extending it, so `var w: i16 = some_i8_neg` round-trips negative values. The lowerer tracks signedness on each IR temp (analogous to the existing `wide_hi` map) and threads it onto the new `Signedness` field of `CmpLt`/`CmpGt`/`CmpLtEq`/`CmpGtEq` and their 16-bit variants. The optimizer's constant-folder uses the same flag to fold compares correctly under either signedness. Casts to `u8`/`u16` strip the signed flag so an explicit `as` opt-out stays unsigned. `examples/signed_compare.ne` exercises both bit widths through the emulator harness — the four pip sprites at the top of the screen show three lit (signed-correct) and one dark (would only light if the compare regressed to unsigned semantics).

…_at` Closes §H. 2×2 metatiles and a parallel collision map are now a first-class construct. `metatileset Name { metatiles: [{ id, tiles, collide }, ...] }` declares a library of 2×2 tile bundles. `room Name { metatileset: M, layout: [...] }` lays them out on a 16×15 grid. The compiler expands each room at compile time into: - a 960-byte nametable (`__room_tiles_<name>`) - a 64-byte attribute table (`__room_attrs_<name>`) - a 240-byte collision bitmap (`__room_col_<name>`) `paint_room Name` reuses the vblank-safe `load_background` update machinery for the nametable blit and installs the collision bitmap pointer into `ZP_ROOM_COL_LO`/`ZP_ROOM_COL_HI` (ZP $18/$19). `collides_at(x, y)` JSRs into a small runtime helper that reads `(room_col),Y` with `Y = (y & 0xF0) | (x >> 4)` and returns 0/1. The helper links in only when the `__collides_at_used` marker is emitted, so programs that declare a room but never query it pay zero bytes for the subroutine. `parse_byte_array` grows a `[value; count]` shortcut — 240-entry `layout` arrays are unwieldy to spell out a byte at a time. See `examples/metatiles_demo.ne` for the end-to-end flow: a probe sprite bounces off walls via `collides_at` and lands on the left side of the playfield at frame 180 — direct evidence that the collision query works. Also defers the register-allocator work from §"Code quality / tooling" and documents the audio-goldens constraint in future-work so the next agent sees it.

`remove_dead_loads` now scans past opcodes that touch neither A nor the flags an LDA sets, so a redundant LDA gets caught by its successor's overwrite even when an index load or counter bump sits between them. The extension covers LDX/LDY/INX/INY/DEX/DEY and the flag ops (CLC/SEC/CLI/SEI/CLD/SED/CLV) alongside the INC/DEC/STX/STY opcodes the pass already stepped past. The highest-leverage case is the shape every single-tile `draw` emits. After copy propagation and dead-store elimination do their work, the stream reads: LDA #<y> ; stray producer, value never consumed LDY oam_cursor LDA #<y> ; real load before STA STA $0200,Y The first LDA was surviving because the pass bailed on the LDY. With the step-past, it drops. One LDA gone per draw, 2 bytes each. Measured LDA-count reduction on committed examples: platformer 242 → 221 (-21, -8.7 %) war 785 → 754 (-31, -4.0 %) pong 843 → 827 (-16, -1.9 %) **Audio goldens.** The cycle savings shift the main-loop/NMI boundary in audio-emitting programs, which re-times which frame each SFX trigger lands in. Six audio hashes re-baseline as a result: audio_demo, friendly_assets, noise_triangle_sfx, platformer, pong, war. All 50 PNG goldens, the platformer/war/pong demo gifs, and every non-audio program stay byte-identical. The re-baselined output is still sample-accurate; what changed is the first-SFX offset within the captured 132 084-sample window. This is the audio-shift tradeoff documented in future-work. Two new peephole unit tests lock in the behaviour: - `dead_load_elim_steps_past_ldx_ldy` — the DrawSprite shape folds. - `dead_load_elim_preserves_lda_when_used_by_shift` — a subsequent ASL on A keeps the LDA alive across an intervening LDY. Also updates future-work.md to reflect the shipped change and the remaining register-allocator wins worth chasing next.

claude added 3 commits April 19, 2026 00:17

imjasonh changed the title ~~ir/codegen: signed comparison lowering for i8/i16~~ signed compares, metatiles + collision, register-allocator pass Apr 19, 2026

imjasonh merged commit 6b1cc98 into main Apr 19, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

signed compares, metatiles + collision, register-allocator pass#36

signed compares, metatiles + collision, register-allocator pass#36
imjasonh merged 3 commits intomainfrom
claude/prioritize-allocator-signedness-tSyMZ

imjasonh commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imjasonh commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Signedness on Cmp16/Cmp ops (§A follow-up)

2. Metatiles + collision (§H)

3. Register-allocator follow-up (Code quality)

Audio-goldens churn

Test plan

What was added / removed in docs/future-work.md

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imjasonh commented Apr 19, 2026 •

edited

Loading

What was added / removed in `docs/future-work.md`