An ESP32 firmware simulator driven by an agent-device-style CLI, with two backends behind one contract:
- Host-native (default): boots ESP32 / Arduino firmware compiled from source on your machine, renders the device display into an offscreen framebuffer, and drives it: screenshots, input injection (GPIO, buttons, touch), serial, and scripted scenarios. Fast, deterministic, no cross-toolchain needed.
- QEMU (optional): boots a real compiled flash image under Espressif's QEMU fork and drives it through the same CLI: serial for any image, plus screenshots and the live window for firmware built against the fork's virtual RGB panel. See "The QEMU backend" below.
It is a reusable tool, not tied to any one app. A firmware is compiled once
and is board-agnostic (it renders itself from board_caps() at runtime); a
board target selects the panel it runs on. The first onboarded firmware is
agentgauge, a Wi-Fi Claude usage-limit desk gauge, shown here on its Waveshare
ESP32-S3-Touch-AMOLED-1.8 board (waveshare_amoled_18, 480x480). Three more
targets show the breadth, all
with zero app-specific sim code: sample_gfx (a generic Arduino_GFX sketch),
and two takes on the Cheap Yellow Display (ESP32-2432S028R, 320x240) - cyd,
an Arduino_GFX touch-paint demo, and cyd_tft, a touch-button UI written
against the real TFT_eSPI library that runs unmodified. Supported display
libraries: LVGL, Arduino_GFX, and TFT_eSPI, plus a touch bus (sim_touch) for
non-LVGL sketches.
make build
./build/esprite list-targets
make screenshot TARGET=sample_gfx # writes sample_gfx.png
make screenshot TARGET=cyd_tft # writes cyd_tft.png
make test # unit + integration tests
make install PREFIX=~/.local # optimized build onto your PATHRequirements: CMake >= 3.20 and a C++17 compiler (Apple clang or gcc/clang on Linux). LVGL and ArduinoJson are fetched automatically; doctest and stb are vendored.
The waveshare_amoled_18 target runs the agentgauge firmware, which lives in a separate
checkout: without it the build skips the target (with a CMake warning) and
everything else works. Point -DAGENTGAUGE_SRC=/path/to/firmware/src (or
make build AGENTGAUGE_SRC=...) at it to enable it. Prebuilt release
binaries carry the generic targets only, for the same reason.
The waveshare_amoled_18 target boots the real agentgauge firmware, so injecting a limits snapshot
drives the genuine data path (HTTP POST to the on-device server, parsed by the
firmware's own handler) and the real UI updates:
./build/esprite snapshot \
'{"lim":1,"s5":42,"s5r":180,"s7":10,"s7r":6000}' \
--target waveshare_amoled_18 --shot limits.pngFor an interactive, iOS-Simulator-style view, run serve with --window:
esprite serve --target waveshare_amoled_18 --port 8080 --windowThis opens a native SDL2 window: the device screen, pixel-exact, inside a slim
device bezel. The board's physical buttons appear as clickable nubs on the
bezel edge at their declared positions (each target's board.cpp places
them); hover a nub for its label and keyboard shortcut, press ? for the
full key list, and press ` (backtick) for the hardware-controls panel
(battery level and charging/USB toggles, rotation) on boards that have them.
Mouse on the screen = touch (click and drag); each button also has a
board-declared key (waveshare_amoled_18: space = PRIMARY, tab = SECONDARY,
p = PWR); Esc closes an open overlay, then quits. PWR follows the
hardware's hold semantics: a quick press or click is a short press; holding
past 1.5 s emits the long-press edge (for a firmware's hold-release gesture).
Bezel chrome renders at desktop density, so --scale N enlarges the screen
without blowing up the controls. Ctrl-C stops serve cleanly.
For BLE firmwares, serve --ble-port N additionally exposes the virtual BLE
link as newline-delimited JSON on a localhost TCP socket: connecting acts as a
bonded central, lines written go to the device, and the device's lines stream
back. Any host process (a companion app via a small adapter, a script, even
nc) can drive the simulated device live (point --target at a BLE firmware
that binds the virtual link):
esprite serve --target <ble-firmware-target> --ble-port 9091 --window &
printf '{"cmd":"status"}\n' | nc 127.0.0.1 9091
``` Point a live bridge at
the same port and the real data updates in the window in real time.
The window is optional: it is only compiled when SDL2 is found at configure time
(`brew install sdl2` on macOS). Without SDL2, everything else builds and runs
headless, and `--window` prints a hint. `--scale N` enlarges the window N times.
## CLI
esprite [--target NAME] [args]
list-targets list onboarded targets schema machine-readable JSON of all commands --version print name and version ui snapshot the LVGL widget tree (refs for tap --ref) screenshot OUT.png [--steps N] boot, run, write a PNG snapshot '' [--path P] [--shot OUT] POST to the device webserver tap X Y | tap --ref eN [--shot OUT] inject a touch button primary|secondary|pwr [--shot OUT] press a button; pwr-long / pwr-release inject the power button's hold-gesture edges battery PCT [--charging] [--no-vbus] [--shot OUT] rotate 0..3 [--shot OUT] set IMU rotation quadrant motion [--shot OUT] inject one accelerometer wake nudge (needs an IMU board) gpio PIN LEVEL set a GPIO level ble connect|pair|disconnect|send|recv|hid drive a BLE firmware's virtual link (connect [--passkey N], send '', recv lines, hid captured keyboard reports) serial send 'TEXT' feed the device serial input serial expect 'REGEX' match against captured serial output logs print captured serial output scenario FILE.json run a scripted scenario serve [--window] [--scale N] boot and keep pumping for a live bridge; --window opens an interactive SDL window (mouse/keys drive it) run daemon: newline-delimited JSON commands on stdin
Errors are structured (`{"error":{"kind":...,"message":...}}` on stderr) with
documented exit codes per kind; see `esprite schema`.
Scenarios are ordered JSON steps, useful in CI:
```json
{
"target": "waveshare_amoled_18",
"steps": [
{ "cmd": "screenshot", "out": "01-waiting.png" },
{ "cmd": "snapshot", "data": {"lim":1,"s5":42,"s5r":180,"s7":10,"s7r":6000} },
{ "cmd": "screenshot", "out": "02-limits.png" }
]
}
esprite schema prints the machine-readable clispec contract (commands, args,
output fields, error kinds, exit codes). Commands emit JSON on stdout; logs go to
stderr.
For LVGL targets there is a snapshot-ref model like a browser page snapshot:
esprite ui --target waveshare_amoled_18
# [{"ref":"e6","type":"bar","x":36,"y":168,"w":408,"h":24,"value":42}, ...]
esprite tap --ref e6 --target waveshare_amoled_18 # tap that widget, not a pixelui returns the live widget tree (refs, type, coords, text, bar/arc values), so
an agent reads the UI structurally instead of guessing pixels. tap --ref acts
on a ref; tap X Y is the pixel fallback.
The run daemon is a persistent session where refs from ui stay valid across
the session (one boot per session; steps advances virtual time explicitly):
{"cmd":"boot","target":"waveshare_amoled_18"}
{"cmd":"snapshot","data":{"lim":1,"s5":42,"s5r":180,"s7":10,"s7r":6000}}
{"cmd":"ui"} # read the updated tree, get refs
{"cmd":"tap","ref":"e6"} # act on a ref
{"cmd":"steps","n":50} # run 50 loop() iterations
{"cmd":"screenshot","out":"out.png"}
{"cmd":"quit"}
Raw-GFX targets (no widget tree) return [] from ui and are driven by pixels +
screenshots.
Every target above runs host-native. The qemu_esp32c3 target instead boots a
real ESP32-C3 flash image under Espressif's QEMU fork. It sees what the
host-native backend cannot: the real RTOS scheduler, real heap pressure,
watchdogs, binary-only components, and it runs images you did not build
yourself.
make qemu-fetch # pinned prebuilt Espressif QEMU (no source build)
ESPRITE_QEMU_IMAGE=path/to/flash.bin \
./build/esprite serial expect 'Hello world' --target qemu_esp32c3
make qemu-fixtures # scripted demo images (needs docker + arduino-cli)
make qemu-test # gated integration tests (self-skip without QEMU)Tier 1 (any image): serial, logs, and headless serve work; every other
command degrades explicitly to unsupported, exactly like a board without a
battery rejects battery. Execution is deterministic on ESP32-C3 (icount:
same image, same serial bytes, every run); ESP32/S3 (Xtensa) run wall-clock
only in the current fork release. list-targets reports each target's
backend, and a missing emulator or image yields the backend_unavailable
error kind with the missing piece named.
Tier 2 (cooperating firmware) adds the display and input: build the firmware
against Espressif's esp_lcd_qemu_rgb component and boot it on
qemu_esp32c3_rgb (320x240 virtual RGB panel), and screenshot,
serve --shot, and the live --window work exactly as on native targets,
fed by QMP screendump. Draw full frames: the virtual panel consumes one
pending draw per host-side capture, so per-line drawing stalls headless
firmware.
Input is the same cooperation model: the firmware runs esprite's tiny
esprite_qemu_agent component (tools/qemu/esprite_qemu_agent, one task on
UART1), and tap, swipe, gpio, and button inject through it - the
fork emulates no GPIO or touch hardware, so the firmware polls the agent's
APIs (esprite_agent_touch_events, esprite_agent_gpio_events) instead of
the hardware drivers. scenario runs on qemu targets too: settle is the
portable time verb, and the pixel step (a framebuffer assertion with a
retry deadline) plus byte-exact screenshot goldens make emulator UI tests
deterministic; see scenarios/qemu_esp32c3_rgb.json for a
tap-press-and-post example against the bundled fixture.
Networking closes the loop: the machine emulates an OpenCores ethernet, and
a board spec with "http": {"guest_port": N} gets user-mode networking with
a localhost port forwarded into the guest, so snapshot POSTs into the
firmware's real HTTP server (lwIP over the emulated NIC; build with
CONFIG_ETH_USE_OPENETH=y). serve prints the forwarded URL for live
bridges.
Qemu targets are data, not code: targets/qemu/*.json (key, machine, arch,
optional display dimensions, agent flag, buttons, http capability) ship
inside the binary, and ESPRITE_QEMU_BOARD=/path/to/board.json registers
your own board at runtime without a rebuild. Board-spec buttons render as
bezel nubs in --window (view-only on qemu: window clicks do not route
through the agent yet).
tools/qemu/lvgl_demo is a genuine LVGL 9 application (a two-screen device
control panel) proving the recipe end to end - same board spec as the rgb
fixture, different image (ESPRITE_QEMU_IMAGE selects the firmware):
- Registry deps:
lvgl/lvgl ^9,espressif/esp_lvgl_port ^2,espressif/esp_lcd_qemu_rgb ^1; esprite components:esprite_qemu_agent(input transport) andesp_lcd_touch_esprite(the standardesp_lcd_touchdriver contract over the agent, so touch reaches LVGL through a normal indev driver, no esprite-specific app code). - Display rule: full-refresh mode with a single static full-frame buffer
(the emulated panel consumes one draw per host capture; partial flushes
stall). On RAM-tight chips allocate the buffer statically and wire
lv_display_create/set_buffers/set_flush_cbyourself -lvgl_port_add_disponly heap-allocates - keepingesp_lvgl_portfor task, tick, and locking. - esprite pumps display captures around every injection, so a UI task
blocked in a flush still observes taps;
scenarios/qemu_esp32c3_rgb_lvgl.jsontaps its switch and slider, presses BOOT, and byte-compares the frames.
The two backends answer different questions. Host-native compiles the
firmware's source against shims: it is the fast, deterministic authority on
the app's own behavior, and it can see inside (the ui snapshot-ref model
walks the live LVGL tree). QEMU runs the real compiled image: it is the
authority on everything below the app that shims cannot reproduce, at the
cost of wall-clock timing and firmware cooperation for anything beyond
serial.
| Capability | Host-native | QEMU tier 1 (any image) | QEMU tier 2 (cooperating firmware) |
|---|---|---|---|
| App logic, UI rendering, data parsing | authoritative | runs, observable via serial | runs, observable via display |
| RTOS scheduling, heap/stack pressure, watchdogs | not visible | authoritative | authoritative |
| Toolchain/arch bugs, binary-only components, unbuildable images | no | yes | yes |
| Serial, logs | yes | yes | yes |
Display (screenshot, serve --shot/--window) |
yes | no | yes (esp_lcd_qemu_rgb) |
Input (tap, swipe, gpio, button) |
yes | no | yes (esprite_qemu_agent) |
HTTP snapshot |
yes | no | yes (openeth + port forward) |
ui widget refs |
yes | no | no (out of process, inherent) |
BLE (virtual link, --ble-port) |
yes | no | no (not emulated upstream) |
battery, rotate, motion |
yes | no | no |
| Time control | steps (exact loop iterations) |
settle (wall-clock) |
settle (wall-clock) |
| Determinism | fully deterministic (virtual clock) | ESP32-C3: byte-exact serial across runs; ESP32/S3: wall-clock, load-sensitive | same per architecture |
| Boot speed | milliseconds | seconds | seconds |
Tier 2 is a firmware choice, not an esprite switch: build against the
QEMU-facing components listed above and declare the matching capabilities in
the board spec. Anything a target cannot do fails as unsupported with the
missing piece named, never silently.
The firmware's own source files are compiled unchanged. Only two things are swapped: a set of host shims that stand in for the Arduino / ESP-IDF APIs, and a board layer that binds the app's hardware calls to virtual peripherals.
core/ virtual clock + setup()/loop() pump + target registry +
the SimBackend seam (native vs qemu)
shims/ Arduino, ESP-IDF, and networking APIs (host-backed)
peripherals/ framebuffer + PNG screenshot, LVGL glue, Arduino_GFX shim,
injected input bus
backends/ the QEMU backend: child process driven over QMP + stdio serial
cli/ the esprite CLI
targets/ one folder per onboarded app
The runtime is single-threaded and deterministic: millis() is driven by
delay() and the step count, so a given number of loop() iterations always
produces the same frame. Injected input and data are applied between steps.
Two guides cover adoption end to end:
- docs/onboarding.md - adding your firmware: the zero-adapter path for standard sketches, the HAL-adapter path for real products (firmware compiled once, boards as small profiles), and QEMU board specs as JSON (no esprite changes at all).
- docs/integration.md - using esprite as a test
harness inside your firmware repo: the Makefile pattern, the scenario step
reference, driving it from agents (
schema, therunsession, widget refs), and worked CI examples for both backends.
The host-native backend compiles source against shims, so some things are approximated by design: virtual time rather than RTOS scheduling; instant faked Wi-Fi and BLE with no real radio (no MTU fragmentation, advertising intervals, or bonding storage); no QSPI or panel electrical quirks; no PSRAM exhaustion; audio is silent. Neither backend substitutes for on-hardware QA of timing, radio, or panel electrical behavior - the matrix above says which backend to trust for everything else.
- LVGL is pinned to 9.5.0 for a stable core API surface. LVGL-based targets currently share one LVGL version across the build.
- The agentgauge firmware source is referenced read-only from
../agentgauge/firmware/src; override with-DAGENTGAUGE_SRC=....