Epic/05 fast debug traces#5
Merged
diegoparrilla merged 10 commits intomainfrom May 4, 2026
Merged
Conversation
…ugcap module
Reuses the existing commemul ROM3 capture pipeline (PIO + DMA +
chandler ingest) and layers a second consumer alongside the
TPROTOCOL frame parser. The debug-byte filter recognises captures
where the high byte is 0xFF (= addresses in the 256-byte window
\$FBFF00..\$FBFFFF), extracts the low byte as the emitted char,
and routes it to a small ring drained to the RP debug console
via DPRINTF.
Why this design (vs the abandoned v1 ROM4-upper-half approach):
- No new PIO program / SM / DMA channel — commemul already runs.
- No timing race against romemul's drive phase — ROM3 captures
are already sampled at the right moment.
- No ROM4 layout change, no \_\_StackLimit linker change, no
ADV_LOAD_BUF relocation.
- m68k user-mode reads at \$FBFF00..\$FBFFFF reach the bus the same
way TPROTOCOL reads at \$FB8000..\$FBFFFF do (which is proven
by GEMDRIVE / Runner / chandler shipping).
Coexistence with TPROTOCOL is by design rather than by
discriminator-range reservation: TPROTOCOL frames are temporally
atomic (the m68k transmits a full magic+cmd+size+payload+checksum
burst in one go; debug emits cannot interleave), and a captured
TPROTOCOL payload word that happens to land in \$FF00..\$FFFF gets
fed to BOTH consumers (the chandler frame parser handles it as
part of the in-flight frame, the debug filter prints it as a few
characters of stream noise — never corruption).
New module:
* rp/src/include/debugcap.h, rp/src/debugcap.c — small SPSC ring
(256 bytes, plain BSS, no alignment requirement, no DMA, no
PIO). Functions: debugcap_emit (producer, called from the
chandler ingest filter), debugcap_drainToConsole (consumer,
DPRINTFs up to 32 bytes per call as a hex+ASCII line),
debugcap_getRingStats (used / capacity / dropped — for the
diagnostics endpoint coming in S3). Drop counter increments
when a producer-side write would clobber unread bytes.
Modified:
* rp/src/include/emul.h, rp/src/emul.c — add
emul_enterFirmwareMode() and emul_isFirmwareMode(). Idempotent,
one-way; only a hardware reset clears the flag. Wired into
cmdRunner ([U]), cmdExit ([E]), cmdFirmware ([F] + boot
countdown). DPRINTFs the transition.
* rp/src/chandler.c — chandler_consume_rom3_sample now runs the
debug-byte filter alongside the existing TPROTOCOL feed.
Filter:
if ((sample & 0xFF00) == 0xFF00 && emul_isFirmwareMode()) {
debugcap_emit((uint8_t)(sample & 0xFF));
}
Pre-firmware-mode emits drop at the handler so menu-mode
activity doesn't pollute the diagnostic stream.
* rp/src/emul.c (main poll loop) — debugcap_drainToConsole()
called after chandler_loop() in emul_start's iteration. No-op
when the ring is empty.
* rp/src/CMakeLists.txt — debugcap.c added to target_sources.
Verified on hardware: boot, no DPRINTF noise from debugcap. After
[U] the console prints "emul: firmware mode committed at <ms>".
TPROTOCOL traffic (Runner / GEMDRIVE) unaffected. Build green;
CLI tests 87/87 pass; cartridge code unchanged.
Open items for S2: retarget target/atarist/test/hello-debug/
hello-debug.c from the abandoned \$FA8000 base to \$FBFF00, then
end-to-end smoke (run HELLODBG.TOS, see "Hello, world!" via the
DPRINTF drain).
…te ABI
Tiny GEMDOS executable that demonstrates the public ABI: a single
byte read at \$FBFF00 + c latches c into the RP-side debug ring.
One m68k cartridge cycle per byte, no envelope, no shift.
C-equivalent shape (the public contract for any external program
emitting debug bytes — Atari TOS, EmuTOS, user code, third-party
diagnostic tools):
#define DEBUG_BASE 0xFBFF00UL
static void debug_putc(unsigned char c) {
(void)*(volatile char *)(DEBUG_BASE + c);
}
m68k assembly equivalent:
tst.b \$FBFF00(c.w) ; c.w = byte to emit, 4 cycles @ 8 MHz
The compiler / linker chain is m68k-atari-mint-gcc against
libcmini, run inside the sidecartridge atarist toolkit docker
(`stcmd`). Output: dist/HELLODBG.TOS, ~2 KB. The libcmini choice
matters — the default mintlib build pulls in a POSIX shim that
crashes on plain TOS via Pexec; libcmini drops both the size
balloon and the MiNT-isms.
Files:
- target/atarist/test/hello-debug/hello-debug.c — the program.
Cconws("HELLODBG: Hello, world!") for visible on-screen
confirmation that main() ran, then a 1000-iteration loop of
debug_puts("Hello, world!\\n") = 14 000 cartridge cycles to
drive the diagnostic stream loud and clear.
- target/atarist/test/hello-debug/Makefile — m68k-atari-mint-gcc
+ libcmini link (crt0.o, -nostdlib, -lcmini, -lgcc), modeled
after the sidecartridge atarist-toolkit-docker
C_LIBCMINI demo Makefile.
- target/atarist/test/hello-debug/build.sh — wraps stcmd around
the Makefile.
build/ and dist/ are globally gitignored.
Verified end-to-end on hardware. After
python3 cli/sidecart.py put dist/HELLODBG.TOS /
python3 cli/sidecart.py runner run /HELLODBG.TOS
the ST screen prints "HELLODBG: Hello, world!" and the RP debug
console emits 30+ debugcap[32] lines containing the full byte
sequence:
debugcap.c:78:debugcap_drainToConsole(): debugcap[32]: \\
48 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 0A 48 65 ... \\
| "Hello, world!.Hello, world!.Hell"
Closes the smoke test for the dual-consumer ROM3 ingest. The
debugcap_drainToConsole DPRINTF path is intentional scaffolding
for S1+S2 only; subsequent stories (S4 USB CDC, S5 HTTP) replace
it with workstation-visible transports and the DPRINTF drain
will be dropped from the production path.
…s + heap cap
Adds the diagnostic surface for the fast-debug-traces feature
plus a defensive linker fix exposed by the early ring-size
experiment that landed alongside this story.
HTTP / CLI surface:
- rp/src/http_server.c — new handle_debug_status registered at
GET /api/v1/debug. Returns
{"ok":true,"firmware_mode":<bool>,
"ring_used":<bytes pending drain>,
"ring_capacity":<DEBUGCAP_RING_BYTES>,
"bytes_dropped":<cumulative emits lost to ring-full>}
No transport state; this is purely informational. The
bytes_dropped counter is the load-bearing operator signal —
when it grows, debug bursts are overrunning whatever consumer
is currently draining the ring.
- cli/sidecart.py — new cmd_debug_status, `debug` / `debug status`
subparser, dispatch. Human form + --json.
- cli/test_sidecart.py — 3 new DebugStatusTests (human, firmware
mode = yes, --json envelope). 90 / 90 tests pass.
Linker fix (rp/src/memmap_rp.ld):
__StackLimit = ORIGIN(RAM) + LENGTH(RAM) // was: + LENGTH(ROM_IN_RAM)
The original Pico-SDK template lets the heap grow ORIGIN(RAM) +
LENGTH(RAM) + LENGTH(ROM_IN_RAM) high — i.e. straight through the
cartridge mirror. Once BSS+heap exceed the 192 KB RAM region,
malloc returns addresses INSIDE ROM_IN_RAM and overwrites
cartridge bytes the m68k will later read. Symptom: cold boot
fine, then the first Atari ST reset after lwIP heap usage spikes
fails to read the cartridge. Capping __StackLimit at the start
of ROM_IN_RAM converts that silent corruption into a loud
"Out of memory" panic at the source — which is exactly what
surfaced when we tried bumping DEBUGCAP_RING_BYTES from 256 →
4096: settings_init OOMed during boot, alerting us to the
budget instead of letting the cartridge mirror rot.
Ring stays at 256 B for now; bumping it requires either
reclaiming heap headroom elsewhere or finding a different
storage strategy. Out of scope for S3.
Verified on hardware: cold boot completes. After [U],
`python3 cli/sidecart.py debug status` reports
firmware_mode : yes
ring : 0 / 256 bytes
bytes_dropped : 0
After `runner run /HELLODBG.TOS` the bytes_dropped count climbs
into the 13 000s — expected for DPRINTF-only drain at this
ring size; the next stories add USB CDC and HTTP transports
that drain orders of magnitude faster.
Adds a long-lived `tail -f` shape over chunked transfer-encoding so a workstation can watch m68k debug bytes in real time without the DPRINTF scaffolding. Multi-consumer cursor model — each TCP conn owns a debugcap_cursor_t that snapshots the producer position on connect, so concurrent consumers (HTTP tail + future USB CDC sink) don't steal bytes from each other. Root-cause fix: srv_poll_cb was unconditionally closing every connection on each ~4 s poll tick, killing the debug stream mid-flight without the chunked terminator (Python `IncompleteRead`). Now spares HC_STREAM_DEBUG and just nudges stream_debug_drive instead. Other streaming states (LISTING/DOWNLOAD) still get the idle sweep — they're bounded by their producers and can't hang. Retires the DPRINTF drainer (debugcap_drainToConsole + g_consoleCursor) ahead of the S5 USB CDC sink — HTTP tail is now the supported visibility path. debugcap_getRingStats keeps its envelope shape but producer-side `dropped` is always 0; per-cursor drops live on the debugcap_cursor_t. Ring sizing: commemul ring 32 KB → 8 KB (~2× the largest TPROTOCOL frame's worth of bus traffic, freeing 24 KB) and debugcap ring 256 B → 8 KB to absorb HELLODBG-style bursts without per-cursor drops on a slow consumer. Verified on hardware: `python3 cli/sidecart.py debug tail` stays open indefinitely, prints HELLODBG output as it's emitted from the m68k.
Brings the captured debug bytes out the RP2040's USB CDC interface in parallel with the existing HTTP /api/v1/debug/log stream — one m68k emit, two independent consumers, each with its own cursor on the shared ring. Either (or both) can be running; neither steals bytes from the other and the producer never blocks (lock-free overwrite, slow-consumer wraps record per-cursor drops). Mechanism follows md-debug-cart's CMake pattern (pico_enable_stdio_usb flipped on so pico_stdio_usb brings up TinyUSB + CDC + descriptors for free), then detaches the stdio bridge at runtime via stdio_set_driver_enabled(&stdio_usb, false). Result: DPRINTF (fprintf-to-stderr) keeps going UART-only when _DEBUG=1, and the CDC interface is fully owned by usbcdc_drain — byte-exact tud_cdc_write, no stdio CR/LF translation, matching the HTTP tail's octet-stream contract. Drain is pumped from both the menu/idle main loop and the lwIP emul_pollTick callback so the CDC port keeps flowing during multi-second WiFi operations. tud_cdc_connected() short-circuits the drain when no host is attached — the producer keeps running and the cursor lazily catches up (with drop accounting) when a terminal finally connects. Verified on hardware: USB CDC and `sidecart debug tail` both running concurrently, both seeing the full HELLODBG byte stream; DPRINTF diagnostics stay on UART, never bleed onto the CDC port.
Until now the USB CDC sink's per-cursor drop count was invisible —
bytes silently fell off the back of the debugcap ring whenever no
host was attached or the host's TX FIFO stalled, with no way to
notice except the absence of expected output.
GET /api/v1/debug now returns two new fields:
usbcdc_attached bool — tud_cdc_connected() (host has the port
open with DTR asserted)
usbcdc_dropped u32 — cumulative debug bytes lost on this
consumer's cursor since boot
Plumbed via a small usbcdc_getStats(dropped, attached) accessor that
short-circuits to (0, false) before usbcdc_init has run, so the
endpoint remains safe to hit at any point in the boot sequence.
`sidecart debug status` prints both new fields in human mode and
passes them through unchanged in --json mode. CLI test fixtures
bumped to ring_capacity=8192 (matches the post-S5 ring) and now
include the new fields; new regex assertions verify the printer.
Verified on hardware: usbcdc_dropped=0 with a host attached and
draining live; grows monotonically when HELLODBG.TOS is run with
no terminal connected to /dev/tty.usbmodem*.
Two improvements for late-attaching workstations, in one edge:
1. The freshly-attached terminal sees only bytes emitted from the
moment of attach forward — no replay of stale pre-attach
data already overwritten in the ring.
2. Bytes the producer emitted during the disconnect window are
accounted as drops on the cursor (visible via usbcdc_dropped
in /api/v1/debug). Previously this loss was silent: the drain
short-circuits on !tud_cdc_connected(), so the cursor's
read_pos stays put and `dropped` never increments — even
when the producer wraps the ring multiple times.
Mechanism: usbcdc_drain detects the false→true rising edge of
tud_cdc_connected() and calls a new debugcap_cursor_skipToNow
helper, which folds (write_pos - cur->read_pos) into cur->dropped
before snapping read_pos to write_pos. Distinct from the existing
debugcap_cursor_initSnapshot (which also zeros dropped — used at
boot only).
Verified on hardware:
- Boot with no host attached, run HELLODBG.TOS → ring fills,
usbcdc_dropped stays 0 (drain hasn't run yet, by design).
- Attach a USB serial terminal → usbcdc_dropped jumps by the
pre-attach emit count, terminal stays clean (no stale tail).
- Subsequent HELLODBG.TOS bursts print live; in-session drops
only grow if the TX FIFO stalls (S8 territory).
Pico-sdk leaves CFG_TUD_CDC_TX_BUFSIZE at the FS default (64 B), which is far smaller than the rate at which HELLODBG-style debug bursts arrive — the producer ring wraps before USB has a chance to drain the FIFO, and usbcdc_dropped grows during attached sessions even with the host present and reading. Bump the FIFO to 1024 B via -DCFG_TUD_CDC_TX_BUFSIZE=1024 in CMakeLists.txt (pico-sdk's tusb_config.h uses #ifndef so the compiler-flag override wins). Drain batch raised 64 → 512 in usbcdc_drain so a single tud_cdc_write call absorbs a full HELLODBG chunk without lots of per-loop overhead. RX/EP sizes left at the FS defaults (64) — debugcap is one-way and the FS bulk endpoint hard-caps at 64. ~960 B extra BSS, no other cost. Verified on hardware: HELLODBG.TOS run repeatedly with the host attached and draining live now reports usbcdc_dropped=0 across the entire session — the in-session loss the previous setup was suffering from is gone.
The original "move drain to Core 1" plan was overscoped: TinyUSB single-core-ownership rules, pico_stdio_usb's existing tud_task pump on Core 0, and four chandler_loop spin-wait sites in HTTP handlers all need work before a clean Core 1 split is possible. The minimal version captures most of the practical benefit trivially: SLEEP_LOOP_MS goes from 100 → 10, so chandler_loop and usbcdc_drain run at ~100 Hz instead of ~10 Hz. That's 10× headroom against the 4096-sample commemul ring's ~4 ms wrap at full m68k emit rate, and 10× lower CDC drain latency. The full Core 1 worker plan + the parked DBGBENCH throughput benchmark are now both in the Epic 05 backlog with their caveats, to revisit only if 100 Hz turns out to be insufficient. Verified on hardware: sustained HELLODBG.TOS reports usbcdc_dropped=0; bytes appear live on both `sidecart debug tail` and `/dev/tty.usbmodem*`. TPROTOCOL frames + runner-adv operations show no regression in success or latency.
Final pass on the fast-debug-traces epic. No behavior changes —
documentation, doc-comment refreshes, and one stale comment fix.
Doc-comment polish:
• debugcap.h: drop the stale "DPRINTF drainer" mention from the
cursor-struct docstring (DPRINTF drain was retired in S5);
add an explicit "single-Core-0 producer + consumers, no
memory barriers" note on debugcap_emit so any future move
to a Core 1 worker doesn't accidentally race.
• usbcdc.h: file-header description tightened to reflect
current shipped behaviour — multi-consumer cursor model,
rising-edge skip-to-now, pico_stdio_usb's async_context
worker as the tud_task pump on Core 0.
• target/atarist/test/hello-debug/hello-debug.c: "Try it"
block updated to point at `sidecart debug tail` and the USB
CDC port (the original "RP debug console shows ..." line
referenced the DPRINTF drain that S5 retired).
User-facing docs:
• docs/api.md: new "Debug traces" major section between
Advanced Runner and CLI exit codes. Covers the public m68k
ABI ((void)*(volatile char *)(0xFBFF00 + c)), the
firmware-mode gate, the multi-consumer model, both
endpoints (GET /api/v1/debug + GET /api/v1/debug/log) with
curl + sidecart examples and full envelope-field tables,
the USB CDC alternative path, and an end-to-end recipe
using HELLODBG.TOS.
• README.md: new "Debug traces" subsection between the API
section and License. Covers the public ABI with paired C
+ m68k assembly examples for both single-byte emit and
full-string dumps, both transports (HTTP tail + USB CDC),
and the HELLODBG.TOS verification recipe; links to
docs/api.md#debug-traces for full details.
Smoke verified: a fresh checkout can follow the README to build
HELLODBG.TOS, flash, and see "Hello, world!" on either USB CDC
or `sidecart debug tail` — no off-doc hand-holding required.
All 91 CLI tests still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.