Skip to content

erspicu/AprVisual

Repository files navigation

AprVisual

A Visual6502-style switch-level NES simulator — and an honest log of what made it faster, and what didn't.

🌐 Project site & full write-up → · 📊 Community benchmark leaderboard → · ⬇ Download the benchmark →

中文讀者:官網支援中英切換(預設依瀏覽器語系),點上面的 Project site 即可。


AprVisual takes Visual6502-style transistor netlists of the NES CPU (2A03) and PPU (2C02) and turns them into analyzable, verifiable, executable logic models. It simulates the chip at the level of individual transistors and wires — not opcodes — and lets the CPU's behavior emerge from the physics. The result is bit-for-bit faithful to the real silicon, and (necessarily) far slower than real time.

The real value here is the translation pipeline — silicon connectivity → graph → logic/sequencing → verifiable backends — and the honest record of which optimizations actually work on real hardware, not any single backend.

New to "switch-level simulation"? Start with the plain-language primer — it explains netlists, conduction detection, the settle queue, and the graph/BFS ideas behind all of this.

The honest story

The original plan was a four-stage pipeline (S1 switch-level engine → S2 netlist→IR → S3 CPU proof → S4 codegen + GPU) to push the simulation toward real time. We built and verified those stages — and found the counter-intuitive result that the "obvious" abstractions (IR + codegen, or a GPU kernel) ended up slower than the direct switch-level interpreter (code bloat, lost timing/correctness, algorithmic redundancy in batch re-evaluation). Real time is ~470× out of reach and known-unreachable via this route.

So the focus became pushing S1 — the pure switch-level engine — to its limit, in both C# and Rust, and documenting the wins and the (many) dead-ends. The recurring lesson, which independently matches what the Visual NES author found in 2017: the gains come from less work + smaller (cache-fitting) data + tighter codegen, not from cleverer data structures or algorithms.

A final, CPU-first investigation ("Escape-1") then asked whether the chip could be automatically abstracted to logic for speed, accepting behavioral (not per-node bit-exact) fidelity. It proved that ~98.9% of the chip is reducible to logic + registers (only ~1.1% is genuine analog) — yet still couldn't beat the event-driven engine, because that engine already runs at the netlist's natural minimum granularity. The full, data-backed write-up — including a reusable "which acceleration strategy fits which chip" map — is the study paper →.

Lineage

Visual6502 (chipsim.js)MetalNESAprVisual S1 (C# + Rust).

The site documents each step with source-line citations:

Performance

On an AMD Ryzen 7 3700X (at boost clock), benchmarking full_palette (300k master half-cycles):

Engine Rate Per frame vs NES NTSC real-time
C# (AprVisual.S1) ~91K hc/s ~7.9 s ~472× too slow
Rust (rust-s1) ~83K hc/s ~8.6 s ~517× too slow

(top-3 mean; C# now leads ~9%, from C#-only data-layout/dispatch wins that measured net-negative on Rust's LLVM codegen — see the optimization notes.) Both produce bit-identical output — same checksum 0x794A43ABDF169ADA. NES NTSC real time needs 42,954,552 hc/s.

Measurement note. The hot loop is memory-latency-bound, so throughput scales with CPU clock and is sensitive to boost/thermal state (e.g. the same engine reads ~91K at boost but a rock-steady ~76.5K pinned at base 3.6 GHz). For trustworthy A/B of a sub-1% change, lock the CPU to a fixed frequency and use interleaved-paired runs with the median — absolute single-run numbers drift too much to compare. Got a faster CPU? Run the benchmark and share your numbers.

Run the benchmark

The easiest path is the prebuilt, self-contained package (no .NET install needed; Windows + macOS, both engines):

  1. Download AprVisualBenchMark.zip and unzip.
  2. Windows: run_csharp.bat / run_rust.bat (optional arg = half-cycle count). macOS: chmod +x *.sh then ./run_csharp.sh / ./run_rust.sh.
  3. Each run prints a performance block and writes a parseable JSON .log to log/ — upload it to the leaderboard.

Build from source

Requires the .NET 10 SDK (and Rust/cargo for the Rust engine).

dotnet build AprVisual.sln                 # C# engines
( cd experiment/rust-s1 && cargo build --release )   # Rust engine (rust-s1)

The optimized switch-level engine lives in src/AprVisual.S1/ (C#, headless console) and experiment/rust-s1/ (Rust). See src/AprVisual.Deprecated/README.md for the original layout, and MD/ for the (Traditional Chinese) design docs.

Repository layout

Path What
src/AprVisual.S1/ The S1 switch-level engine — C#, the golden/canonical artifact and focus of optimization.
experiment/rust-s1/ The Rust port of the S1 engine (bit-identical).
src/AprVisual.S2/ The Escape-1 investigation engine (automatic logic extraction; --miter/--compile/--cones) — concluded; the negative-result record.
src/AprVisual.Deprecated/ The original WinForms engine + tooling (rendering, ROM parsing) + the S2/S3/S4 IR/codegen/GPU experiments — reference only.
WebSite/ The GitHub Pages project site (served at the link above).
MD/ Design & analysis docs (Traditional Chinese).
tools/ Helper scripts (benchmark packaging, mail, knowledge-base query).

Credits & license

AprVisual's S1 engine is an independent C#/Rust reimplementation of the MetalNES wire / group-resolution algorithm, which itself descends from Visual6502's chipsim (simulator code: MIT). The switch-level model traces to R. E. Bryant's work (MOSSIM, 1981; IEEE TC, 1984).

The bundled 2A03 / 2C02 netlist data is derived from the Visual 2A03 / Visual 2C02 / Visual6502 projects and is licensed CC-BY-NC-SA — keep the attribution and the same license if you redistribute it. The engine code is by the AprVisual author.

About

Switch-level netlist to executable logic pipeline (Visual2A03 / Visual2C02 → CPU / CUDA / Verilog) — planning stage

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors