Skip to content

WebSocket sensing broadcast hardcodes source="esp32", bypasses 5s stale-detection (effective_source not called) #618

@ArnonEnbar

Description

@ArnonEnbar

Summary

The WebSocket /ws/sensing broadcast continues to emit source: "esp32" indefinitely after the ESP32 hardware loses power or network connectivity. The UI ("Sensing" tab) then keeps showing "LIVE — ESP32 HARDWARE Connected" with cached/frozen sensor values, with no indication that the data source is offline.

AppStateInner::effective_source() already implements the correct 5-second stale-detection (returns "esp32:offline" when no UDP frame has arrived within ESP32_OFFLINE_TIMEOUT), but it is only invoked from REST endpoints (/health, etc.), never from the WS broadcast path.

Reproduce

  1. Flash v0.6.4-esp32 firmware, provision WiFi + --target-ip.
  2. Start sensing-server --source esp32 --bind-addr 0.0.0.0.
  3. Open http://localhost:8080/ui/index.html → "Sensing" tab. Verify the green "LIVE — ESP32 HARDWARE" banner and live tick / amplitude data.
  4. Unplug the ESP32 (or power it off).
  5. Wait 30+ seconds.

Observed

  • UI still displays "LIVE — ESP32 HARDWARE Connected".
  • WS payload still emits "source": "esp32".
  • tick field frozen at the last received value.
  • All nodes[].amplitude, features.*, vital_signs.* frozen at the last frame, but re-broadcast every tick.
  • classification keeps emitting whatever motion/presence the cached frame implied.

Expected

  • After 5 seconds with no UDP frames, WS payload should emit "source": "esp32:offline" so the UI can switch to an offline/disconnected state (the same way the REST /health endpoint already reports "status": "degraded").

Verification

GET /health correctly reports the offline state:

{"clients":1,"source":"esp32:offline","status":"degraded","tick":44680}

WS /ws/sensing does not — it keeps emitting:

{"type":"sensing_update","source":"esp32","tick":44540, ...same frozen payload...}

Root Cause

effective_source() (main.rs:679) is correct, and is called from REST endpoints (main.rs:2167, 2735, 2789, 2810, 2820, 2844, 3472).

The two WS broadcast sites use a string literal instead:

// main.rs:3791 (edge-vitals path)
let mut update = SensingUpdate {
    msg_type: "sensing_update".to_string(),
    timestamp: chrono::Utc::now().timestamp_millis() as f64 / 1000.0,
    source: "esp32".to_string(),   // <-- bypasses effective_source()
    tick,
    ...

// main.rs:4018 (raw CSI path)
let mut update = SensingUpdate {
    msg_type: "sensing_update".to_string(),
    timestamp: chrono::Utc::now().timestamp_millis() as f64 / 1000.0,
    source: "esp32".to_string(),   // <-- same bug
    tick,
    ...

Suggested Fix

Both call sites already hold the AppStateInner guard s in scope:

-    source: "esp32".to_string(),
+    source: s.effective_source(),

(Single-line change at each site. No new locks, no allocation churn beyond what effective_source already does.)

Why This Matters (Safety)

The project README and various ADRs market use-cases including:

  • Fall detection / elder-care monitoring
  • Overnight vital-sign tracking (apnea screening)
  • Presence-based safety triggers

Silently re-publishing the last received frame as "LIVE" is a silent-failure pattern: a deployed system whose ESP32 lost power could continue reporting "breathing normal / present" indefinitely, masking a real emergency. The 5s timeout was clearly intended to prevent exactly this — the fix just needs to extend coverage to the WS path.

Related / Wider Scope

This patch is the minimum fix and is sufficient to let the UI flip to an offline indicator. A more complete fix could additionally:

  • Suppress WS broadcasts entirely when stale (skip the tick loop when effective_source().ends_with(":offline")), or
  • Add a stale: bool field on SensingUpdate that downstream consumers (UI, recorders, cluster aggregators) can branch on.

The two existing per-node NodeFeatureSnapshot.stale flags (main.rs:478, 511-512) are a precedent for the second approach.

#519 ("Ghost person detection, FPS infinity, skeleton flickering and jumpy vitals with ESP32-S3 multi-node setup") may be partially related — both stem from the broadcast loop not knowing the upstream sensor state has degraded.

Environment

  • Firmware: v0.6.4-esp32 (esp32-csi-node.bin SHA256 prefix 0066d74d35b0dbca…)
  • Server: main branch
  • Hardware: ESP32-S3 N16R8 (Waveshare DEV-KIT), single-node, channel 6 / 2.4 GHz
  • Host: Windows 11, sensing-server built with stable-x86_64-pc-windows-gnu, --no-default-features
  • UI: Chrome on localhost:8080/ui/index.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions