Skip to content

fix(USB): replace 500 ms host-quiet timer with msp.frameInProgress() predicate#9

Open
snokvist wants to merge 1 commit intomasterfrom
feature/parser-state-quiet-gate
Open

fix(USB): replace 500 ms host-quiet timer with msp.frameInProgress() predicate#9
snokvist wants to merge 1 commit intomasterfrom
feature/parser-state-quiet-gate

Conversation

@snokvist
Copy link
Copy Markdown
Owner

Summary

The shared host-quiet gate in Tx_main.cpp's loop() locked permanently under sustained host writes — reproduced with continuous 25 Hz MSP_ELRS_BACKPACK_SET_PTR from the Android app. Every received byte reset last_host_rx_ms, the 500 ms window never closed, and both the sniffer and wired-CRSF drainers were starved for as long as the stream lasted. All device→host emit paths froze.

This PR replaces the time-based gate with a parser-state predicate (msp.frameInProgress()), which captures the original intent precisely and is rate-independent.

Why this is the right long-term fix

  • Encodes original intent exactly. The gate was meant to "auto-pause around request/response round-trips." A 500 ms timer is a coarse proxy that conflates mid-frame with recently-received-anything. Parser state answers the real question.
  • Strictly simpler. −53 / +35 lines in Tx_main.cpp. Removes USB_HOST_QUIET_GATE_ENABLED, last_host_rx_ms, USB_SNIFFER_QUIET_AFTER_RX_MS, and the host_rx_active shared block. One predicate read at each drainer.
  • Robust to library upgrades. The 500 ms timer originated as a workaround for pre-arduino-esp32 v2.0.17 simultaneous-TX+RX brokenness, which v2.0.17 (feat: runtime UID override (MSP_ELRS_BIND wins over MY_BINDING_PHRASE) #5) fixed. Parser-state is independent of any library-version assumption.
  • No rate cliff. Handles 1 Hz one-shots and 100 Hz streams the same way. Future emit paths just check the same predicate; the foot-gun is gone.
  • Response-side is naturally serialised. msp.sendPacket(..., &Serial) from ProcessMSPPacketFromTX runs synchronously in the same loop() iteration; drainer checks happen after that returns, with the parser already idle. No race.

Diagnostic capture (Android side, pre-fix)

```
16:04:12 rx=58B/1chunks tx=375B/25calls maxWrite=9ms
16:04:13 rx=0B/0chunks tx=360B/24calls maxWrite=3ms
... (rx=0 sustained while tx steady at 25 Hz) ...
16:04:25 rx=0B/0chunks tx=330B/22calls maxWrite=4ms ← continuous off
16:04:26 parser: frames=9 wired=7 ← recovery
16:04:27 parser: frames=53 wired=49 ← back to baseline
```

Host write path was healthy throughout (3-9 ms maxWrite); C3 simply stopped emitting on USB-IN.

What changed

File Change
lib/MSP/msp.h Add public bool frameInProgress() const.
src/Tx_main.cpp Drop timer gate; gate drainers on msp.frameInProgress(). Preserve host_in_bytes_total (used by OLED).
targets/txbp_esp.ini Drop USB_SNIFFER_QUIET_AFTER_RX_MS=500.
CLAUDE.md Update sniffer + drainer docs to reflect parser-state semantics; note prior-bug context.
docs/esp32c3_usb_backpack.md Same.

Build: ESP32C3_TX_Backpack_via_USB env compiles. Flash 50.6 % (994 214 B), RAM 15.8 % (51 804 B) — net flash unchanged.

Test plan

  • Flash on C3 SuperMini (BOOT-hold + pio run -t upload --upload-port /dev/ttyACM0).
  • Repro 1 — the original bug: Continuous 25 Hz PTR for 60 s with wired CRSF active. Android dashboard's wired RC card should keep updating; pre-fix it freezes within 1 s.
  • Repro 2 — concurrent load: PROMISCUOUS sniffer + wired CRSF + 25 Hz PTR for 60 s. No MSP desync host-side, all three streams visible.
  • One-shot under load: getVersion round-trip under all three concurrent loads. Completes within the 2 s timeout.
  • Backwards-compat: Single MSP request/response (no continuous PTR, no wired RC connected). Completes cleanly.

Companion / follow-up

  • Companion Android PR snokvist/Waybeam-backpack-android#16 ships a host-side throttle (continuous PTR 25→10 Hz when wired CRSF is active) as an interim mitigation. Drop that throttle once this firmware fix is released.
  • Coordination memory: backpack_pr8_quiet_gate_stall.md.

🤖 Generated with Claude Code

…predicate

The shared host-quiet gate in Tx_main.cpp's loop() locked permanently
under sustained host writes. Reproduced (April 2026) with continuous
25 Hz `MSP_ELRS_BACKPACK_SET_PTR` from the Android app: every received
byte reset `last_host_rx_ms`, the 500 ms window never closed, and both
the sniffer and wired-CRSF drainers were starved indefinitely. All
device->host emit paths froze for as long as the stream lasted.

The 500 ms timer was a holdover from pre-arduino-esp32 v2.0.17 days when
simultaneous TX+RX on the C3 USB-CDC could stuck the bus. v2.0.17
(#5) fixed that, but the timer outlived its purpose
and didn't capture the *actual* original intent: "don't write while a
host MSP transaction is in progress."

Replace with `msp.frameInProgress()` — true between the magic byte and
the CRC, false otherwise. Captures the intent precisely, is
rate-independent (1 Hz one-shots and 100 Hz streams behave the same),
and self-correcting for new emit paths (any future device->host channel
just checks the same predicate).

Net effect on src/Tx_main.cpp is roughly -53 / +35 lines: the
`USB_HOST_QUIET_GATE_ENABLED` macro, `last_host_rx_ms` static, sniffer's
`USB_SNIFFER_QUIET_AFTER_RX_MS` constant, the `host_rx_active`
computation block, and the wired CRSF's reuse of the same constant all
go away. `host_in_bytes_total` is preserved (used by the OLED
dashboard) under the underlying USB feature-flag.

* `lib/MSP/msp.h` — add `bool MSP::frameInProgress() const` accessor.
* `src/Tx_main.cpp` — drop timer gate, gate drainers on
  `msp.frameInProgress()`.
* `targets/txbp_esp.ini` — drop `USB_SNIFFER_QUIET_AFTER_RX_MS=500`.
* `CLAUDE.md` + `docs/esp32c3_usb_backpack.md` — reflect new gate
  semantics + record the prior-bug context.

Companion Android PR snokvist/Waybeam-backpack-android#16 ships a
host-side throttle (continuous PTR 25->10 Hz when wired CRSF is active)
as an interim mitigation; that throttle should be removed once this
firmware fix lands and a release tag is cut.

Build clean: ESP32C3_TX_Backpack_via_USB env compiles, flash 50.6%,
RAM 15.8% — net flash unchanged.

HW-test plan (to be run on flash + verify):
1. Continuous 25 Hz PTR for 60 s with wired CRSF active. Host RX should
   stay steady at ~50 frames/sec wired (vs. zero pre-fix).
2. Promiscuous sniffer + wired CRSF + 25 Hz PTR concurrent for 60 s.
   No MSP desync host-side.
3. One-shot MSP request/response (e.g. `getVersion`) under all three
   concurrent loads. Round-trip completes within timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant