The answer is: very slowly YES!
I've recorded keyboard inputs that can be replayed to complete the first level. It finishes at in-game time 316, but wall-clock time is 32 minutes. Transitions and starting the second level adds another 15 minutes. 😀
Demo (speedup 50x):
demo.mp4
I took an existing minimal emulator, removed all CPU logic, and replaced it with a socket-based protocol for delegating CPU execution to Ghidra's PCode emulator (server). Everything else is still handled by the modified emulator (client), such as keyboard input and PPU logic.
Processor module validation! Sure, Ghidra has pcodetest for this purpose, but it's hard to tell how much coverage it provides. Apparently, not enough!
Just getting the Super Mario Bros title screen to render required fixing bugs in 3 instructions. Even more were fixed while appeasing nestest.nes.
Before (some tests fail, until a crash after jumping to an invalid instruction):
After (all tests pass):
Tested with Ghidra 10.3.2, on Debian GNU/Linux 12.
To reproduce the first level run:
- Copy
Ghidra/Processors/6502/data/languages/*.slaspec
from my fork to your Ghidra installation, then runant -f build.xml
underdata/
to build the updated.sla
files; - Install GhidraNes (tested with commit
ef27b8d
); - Load a Super Mario Bros (World) ROM (sha1
ea343f4e445a9050d4b4fbac2c77d0693b1d0922
), and make sure it's focused in the listing (a.k.a. disassembly) window (in case you have other files open); - Copy
./ghidra_scripts/NesEmu.java
to your project'sghidra_scripts
directory; - Copy
./inputs/smb.w11full.inputs
to/tmp/smb.inputs
; - On Ghidra's Window > Script Manager, run
NesEmu.java
(starts the server); - Run
make && ./smolnes_emuclt $ROM
, (starts the client,$ROM
is the full path to the same ROM being disassembled in Ghidra); - Sit back and enjoy an ~1 FPS demo;
Of course, you can remove /tmp/smb.inputs
and play yourself.
- As seen in the demo, Ghidra is constantly re-analyzing functions, caused by frantic clearing and disassembling of instructions. Not much room to improve here, since only disassembled instructions can be executed.
- I've run into some desync when recording inputs in the standalone emulator vs replaying them in Ghidra's emulator. This means that inputs likely end up being set at different instruction lines. Expect diffs in e.g. how many VBlank interrupts happen when comparing both CPU emulators' trace logs... Still, it wasn't bad enough to break the demo, please let me know if that's not the case for you.
- Currently, the protocol is very hardcoded for NES implementation details, and would benefit from a proper TLV encoding to handle variable address / data sizes.
Some flamegraphs were captured with async-profiler: ./asprof -e itimer -d 30 -o flamegraph -f /tmp/out.html $GHIDRA_PID
Surprisingly, stepping through instructions only takes about 15% of CPU time. About 50% is socket I/O (even after some quick optimizations like reusing the same buffer for payloads and buffering socket writes):
- afl_ghidra_emu has a similar socket-based architecture, but applied to fuzzing (Ghidra receives input data and sends coverage results back to AFL++);
deobfuscated.c
from smolnes is under LICENSE.smolnes, and was modified into filessmolnes_emuclt.c
andsmolnes_standalone.c
;- Remaining files are under LICENSE;