Conversation
The original defensive copy in csi_collector_init() (line 172 of main.c)
runs AFTER wifi_init_sta() (line 147), which on some ESP32-S3 devices
corrupts g_nvs_config.node_id back to the Kconfig default of 1.
Reproduced on device 80:b5:4e:c1:be:b8 (ESP32-S3 QFN56 rev v0.2):
- NVS provisioned with node_id=5
- Release firmware (no fix): seed receives node_id=1 (clobbered)
- This patch: seed receives node_id=5 (correct)
Changes:
- Add csi_collector_set_node_id() called from main.c immediately
after nvs_config_load(), before wifi_init_sta() runs
- csi_collector_init() now detects and logs the clobber if early
capture disagrees with current g_nvs_config value
- Fallback path preserved: if set_node_id() is never called,
init() still captures from g_nvs_config (backwards compatible)
Co-Authored-By: claude-flow <ruv@ruv.net>
The CSI callback reads g_nvs_config.filter_mac_set and filter_mac on every invocation (100-500 Hz). If wifi_init_sta() corrupts g_nvs_config (same root cause as the node_id clobber), the callback reads garbage from the struct, leading to Core 0 LoadProhibited panic after ~2400 callbacks (~70 seconds of operation). Extends the early-capture pattern from the node_id fix to also copy filter_mac_set and filter_mac into module-local statics before WiFi init runs. Adds canary logging to detect filter_mac corruption. Observed on device 80:b5:4e:c1:be:b8 via serial: CSI cb #2400 → Guru Meditation Error: Core 0 panic'ed (LoadProhibited) → TG0WDT_SYS_RST → reboot → crash again at ~2900 callbacks Refs #232 #375 #385 #386 #390 Co-Authored-By: Ruflo & AQE
The WiFi driver's wDev_ProcessFiq interrupt handler crashes with LoadProhibited in cache_ll_l1_resume_icache when promiscuous mode captures MGMT+DATA frames (100-500 interrupts/sec). The high interrupt rate races with SPI flash cache operations, corrupting cache state. Changes: - Promiscuous filter: MGMT+DATA → MGMT-only (~10 Hz beacons) - CSI config: disable htltf_en and stbc_htltf2_en (LLTF-only) LLTF provides 64 subcarriers (HT20) — sufficient for presence, breathing, and fall detection. The 10 Hz beacon rate eliminates the SPI flash cache contention that caused the crash. Verified on device 80:b5:4e:c1:be:b8: - Before: LoadProhibited crash at ~1600-2400 callbacks (every ~70s) - After: 2700+ callbacks over 4.7 minutes, zero crashes Backtrace decode confirmed crash in ESP-IDF closed-source WiFi blob: _xt_lowint1 → wDev_ProcessFiq → spi_flash_restore_cache → cache_ll_l1_resume_icache → EXCVADDR=0x00000004 (NULL deref) Co-Authored-By: Ruflo & AQE
esptool v5+ rejects hyphenated subcommands. The provision script used 'write-flash' which fails with "invalid choice". Changed to 'write_flash' (underscore) which works with both old and new esptool. Co-Authored-By: Ruflo & AQE
- Add early rate gate in wifi_csi_callback at 50 Hz (defense-in-depth, does not prevent crash alone but reduces callback execution time) - Add null-data injection timer infrastructure (disabled — TX adds interrupt pressure that triggers the SPI cache crash, RuView#396) - sdkconfig.defaults: add CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y - sdkconfig.defaults: document SPIRAM XIP attempt (crashes differently) Co-Authored-By: Ruflo & AQE
Applies @ruvnet's five review requests on PR #397 (RuView#397 comment 4289417527): 1. **Inline comment on `provision.py` `write_flash`** — ESP-IDF v5.4 bundles esptool 4.10.0 (underscore-only). #391's hyphen swap broke the documented venv flow; kept the underscore form and added a three-line comment warning future maintainers not to "re-fix" it. 2. **Correct `edge_processing.c` sample_rate** (blocking) — changed hard-coded `20.0f` → `10.0f` at line 718 so `estimate_bpm_zero_crossing()` matches the MGMT-only CSI rate. Without this, breathing and heart-rate reports were 2× the true value. Added a comment tying the constant to the callback rate gate. 3. **Removed disabled probe-injection infrastructure** — dropped the forward declaration, the `CSI_PROBE_INTERVAL_MS` define, six static variables (`s_probe_timer`, `s_probe_tx_count`, `s_probe_tx_fail`, `s_ap_bssid`, `s_ap_bssid_known`), and three functions (`csi_send_probe_request`, `probe_timer_cb`, `csi_collector_start_probe_timer`). None were reachable. `csi_inject_ndp_frame()` reverted to the original ADR-029 stub. Can be revived from this commit's parent if needed. 4. **Cleaned `sdkconfig.defaults`** — removed the SPIRAM prose and commented-out `# CONFIG_SPIRAM is not set` line. Kept only the live `CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` with a concise rationale. 5. **Bumped firmware version 0.6.1 → 0.6.2** and added four `[Unreleased]` CHANGELOG entries covering the SPI cache crash fix, the `filter_mac` / `node_id` clobber defense, the sample-rate correction, and the `write_flash` command-form revert. Net: +39 / -128 across six files. Validation in this devcontainer: - Static sanity on modified C files: braces balance (csi_collector.c 59/59; edge_processing.c 96/96), zero dangling references to removed probe-injection symbols. - Rust workspace tests and Python proof not executed here — cargo not installed and pip blocked by PEP 668. Deferring hardware build + flash + miniterm verification to @ruvnet's COM7 per his offer in the review comment. Co-Authored-By: claude-flow <ruv@ruv.net>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rebases the 6 firmware-fix commits from #397 onto current main (the original PR is now CONFLICTING with the post-merge of nvsim work).
What this fixes
fix(firmware): move defensive node_id capture before wifi_init_sta()—g_nvs_configcan be clobbered by WiFi driver init on some devices (80:b5:4e:c1:be:b8), reverting node_id to Kconfig default. Capture now runs beforewifi_init_sta().fix(firmware): defensive copy of filter_mac to prevent callback crash— Same clobber path corruptedfilter_macreads in the CSI callback (100–500 Hz). Module-local staticss_filter_mac+s_filter_mac_setinsulate the callback from any subsequent corruption.fix(firmware): MGMT-only promiscuous filter to prevent SPI cache crash— ESP32-S3 nodes crashed incache_ll_l1_resume_icache/wDev_ProcessFiqafter ~2400 callbacks under DATA-frame load. Narrowed toWIFI_PROMIS_FILTER_MASK_MGMT(~10 Hz beacons).fix(provision): write-flash → write_flash for esptool v5 compat— ESP-IDF v5.4 ships esptool 4.10 (underscore form only); standalone esptool v5 accepts both. Reverts towrite_flashso both tools work.fix(firmware): 50 Hz callback rate gate + sdkconfig extra IRAM opt—CSI_MIN_PROCESS_INTERVAL_USearly-drops excess callbacks;CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=yas defense-in-depth.fix(firmware): address PR #397 review feedback— review fixes from the original PR.Validation on COM7 (this session)
Closes the path to a v0.6.3-esp32 firmware release. Original PR #397 should be closed once this lands.