Skip to content

ESP32-S3: g_nvs_config.node_id clobbered to 1 between main.c:140 and csi_collector_init + LoadProhibited panic loop #390

@proffesor-for-testing

Description

@proffesor-for-testing

Summary

On ESP32-S3 (v0.4.3.1, binary release_bins/esp32-csi-node.bin Apr 5 2026, flashed 8MB layout to a 16MB device), after provisioning NVS with node_id=3 via firmware/esp32-csi-node/provision.py, the device:

  1. Reads NVS correctly at boot (NVS override: node_id=3).
  2. Logs the value once more, still correct (main: ESP32-S3 CSI Node (ADR-018) — Node ID: 3).
  3. Then something clobbers g_nvs_config.node_id from 3 → 1 between main.c:140 and csi_collector_init().
  4. Shortly after, the core 0 panics with Guru Meditation Error: LoadProhibited.
  5. Watchdog reboots. CSI frames still reach the aggregator during each pre-panic interval, so it looks "mostly working" until you inspect the serial log or node_id in the received frames.

Same binary on two other boards (ESP32 #1 with NVS node_id=1, #2 with node_id=2) does not exhibit a visible bug because the NVS-overridden and the post-clobber value happen to both be small integers that don't distinguish the two paths — but the same LoadProhibited panic + reboot loop is likely occurring there too, unchecked because their node_id values can't be validated.

Repro

Hardware: ESP32-S3 (QFN56 rev v0.2, 8 MB PSRAM, 16 MB flash, Boya flash chip).

# Flash
cd firmware/esp32-csi-node
esptool --chip esp32s3 --port /dev/cu.usbmodem1301 --baud 460800 \
  write-flash --flash-mode dio --flash-size 8MB \
    0x0      release_bins/bootloader.bin \
    0x8000   release_bins/partition-table.bin \
    0xf000   release_bins/ota_data_initial.bin \
    0x20000  release_bins/esp32-csi-node.bin

# Provision with node_id=3
python3 provision.py --port /dev/cu.usbmodem1301 \
  --ssid "<SSID>" --password "<PASS>" \
  --target-ip <aggregator_ip> --target-port 5005 --node-id 3

# (Provisioning fails on the esptool step because provision.py uses 'write_flash'
#  which esptool v5.2 renamed to 'write-flash'. Workaround: generate NVS bin
#  manually and flash with system esptool. Separate cosmetic issue.)

# Reset and monitor
esptool --port /dev/cu.usbmodem1301 run
cat /dev/cu.usbmodem1301

Observed serial output

I (222) cpu_start: Multicore app
...
I (345) main_task: Calling app_main()
I (365) nvs_config: NVS override: ssid=Spiridonovi1
I (365) nvs_config: NVS override: password=***
I (365) nvs_config: NVS override: target_ip=192.168.1.95
I (365) nvs_config: NVS override: target_port=5005
I (375) nvs_config: NVS override: node_id=3             ← correct
W (375) nvs_config: wasm_verify=1 but no wasm_pubkey in NVS — uploads will be rejected
I (385) main: ESP32-S3 CSI Node (ADR-018) — Node ID: 3  ← still correct
I (415) wifi:config NVS flash: enabled
I (525) main: WiFi STA initialized, connecting to SSID: Spiridonovi1
I (565) wifi:connected with Spiridonovi1, aid = 10, channel 2, 40U, bssid = a0:ab:1b:79:f1:e6
I (1615) csi_collector: Auto-detected AP channel: 2
I (1615) csi_collector: Promiscuous mode enabled for CSI capture
I (1625) csi_collector: CSI collection initialized (node_id=1, channel=2)   ← should be 3
I (1665) csi_collector: CSI cb #1: len=128 rssi=-38 ch=2
...
I (2005) edge_proc: Edge DSP task created on Core 1 (stack=8192, priority=5)
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
(repeating — watchdog reboots, cycle repeats every ~2 s)

Static analysis (found nothing that explains it)

main/nvs_config.c is the only writer to cfg->node_id:

  • Line 41: cfg->node_id = (uint8_t)CONFIG_CSI_NODE_ID; (Kconfig default = 1)
  • Line 144: cfg->node_id = node_val; (after successful nvs_get_u8("node_id", ...))

main.c:45 has the single definition nvs_config_t g_nvs_config;. All other files use extern. grep confirms no stray writers anywhere in main/.

Between main.c:140 ("Node ID: 3") and csi_collector.c:275 ("initialized (node_id=1)"), the only code that runs is:

  • wifi_init_sta() (main.c:76-125) — reads g_nvs_config.wifi_ssid and g_nvs_config.wifi_password, never writes back
  • stream_sender_init_with(g_nvs_config.target_ip, g_nvs_config.target_port) — takes ip+port by value, no writes

So the clobber appears to be via:

  • a buffer overrun into g_nvs_config.node_id (offset ~116 in the struct), or
  • uninitialized memory + memory corruption shared with the LoadProhibited panic, or
  • a linker/symbol aliasing issue (two TUs seeing different g_nvs_config storage)

Given the LoadProhibited panic fires ~400 ms later, I'd bet the two symptoms share one root cause.

Questions / asks for the firmware team

  1. Is the binary in release_bins/esp32-csi-node.bin dated Apr 5 2026 the current ship? app_init: App version: v0.4.3.1-esp32-3-g66e2fa083-dir. Source file main/csi_collector.c has the same mtime, so the binary is consistent with the source — i.e. this bug is latent in master as of that commit.
  2. Is this reproducible on your bench (ESP32-S3 DevKit, any WiFi AP) if you provision with node_id != 1 and compare with the frame header buf[4] seen by the aggregator?
  3. Is there a known stack buffer overrun in the WiFi init path that would clobber BSS-adjacent globals?
  4. Has anyone traced the LoadProhibited source line (e.g. via backtrace decoder) — probably the quickest way to pin this.

Related cosmetic issues found the same day

These are in firmware/esp32-csi-node/provision.py and the RuView /store wizard advertised by Cognitum seeds; not part of this bug but worth closing at the same time:

  • provision.py uses write_flash (esptool v4 syntax). esptool v5.2 renamed it to write-flash and the script fails at the flash step. Also hardcodes sys.executable -m esptool which breaks on pip-managed Python (PEP 668 externally-managed). (Already noted in ruvnet/optimizer PR #60.)
  • Cognitum Seed /store ESP32 setup wizard: collects WiFi SSID + password but never POSTs them; suggested target_port=80 should be 5005 per ADR-073. (Also noted in PR #60.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions