Skip to content

fix(align): stream JSONL + support sensing_update format (unblocks ADR-079 P8)#641

Merged
ruvnet merged 1 commit into
mainfrom
fix/align-ground-truth-streaming-and-sensing-update
May 19, 2026
Merged

fix(align): stream JSONL + support sensing_update format (unblocks ADR-079 P8)#641
ruvnet merged 1 commit into
mainfrom
fix/align-ground-truth-streaming-and-sensing-update

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 19, 2026

Summary

Two real blockers found while running ADR-079 P7→P8 end-to-end for the first time against a 30-min paired session:

  1. Node V8 string limit (~512 MB) on the 750 MB CSI recording. fs.readFileSync(_, 'utf8').split('\n') errored with `Cannot create a string longer than 0x1fffffe8 characters`. Replaced `loadJsonl` with a 1 MiB byte-buffer streaming reader that decodes line-by-line.
  2. Schema mismatch with the current sensing-server. The aligner filtered on legacy `raw_csi` / `feature` types; the live server emits a single `sensing_update` record per tick (with `nodes[].amplitude` and top-level `features`). Result: 0 frames matched every time. Added a `sensing_update` branch that projects each tick into rawCsi/features entries the existing windowing logic can consume, and updated `extractCsiMatrix` to use already-extracted amplitudes when `iqHex` is absent. `timestamp` is now accepted as either ISO string or numeric float-seconds.

End-to-end verified: 1,077 paired samples produced at `--min-confidence 0.3 --window-frames 20`; downstream `train-wiflow-supervised.js` runs to completion.

The PCK gap that came out of this run (0% on every joint, more data + GPU needed) is tracked separately in #640 — those are training concerns, not aligner concerns.

Test plan

  • Aligner produces 1,077 paired samples (`[56, 20]` shape) from the 30-min P7 session
  • Memory stays bounded — no V8 string limit error
  • Training script consumes the paired output successfully end-to-end
  • Reviewer: spot-check that no schema fields were dropped

🤖 Generated with claude-flow

Two blockers discovered while running ADR-079 P7→P8 end-to-end against
a 30-minute paired session (39,088 GT frames + 45,625 CSI frames):

1. `readFileSync(_, 'utf8').split('\n')` hit Node's `String.MaxLength`
   (~512 MB) on the 750 MB CSI recording. Result:
       Error: Cannot create a string longer than 0x1fffffe8 characters
   Replaced loadJsonl with a 1 MiB byte-buffer streaming reader that
   decodes line-by-line, so memory use stays bounded by the largest
   single record.

2. The sensing-server has long since switched from the legacy `raw_csi`
   / `feature` typed records to a single `sensing_update` record per
   tick (with nodes[].amplitude and top-level features). The aligner
   filtered on the old types and produced 0 frames every time. Added a
   `sensing_update` branch that projects each tick into rawCsi/features
   entries the existing windowing code can consume, and updated
   extractCsiMatrix to use already-extracted amplitudes when iqHex is
   absent. timestamp is now accepted as either ISO string (legacy) or
   numeric float-seconds (current).

End-to-end verified: produces 1,077 paired samples at
`--min-confidence 0.3 --window-frames 20` from the full 30-min
recording; downstream `train-wiflow-supervised.js` runs to completion.
See follow-up #640 for the PCK gap (data + GPU needed) — those are
training concerns, not aligner concerns.
@ruvnet ruvnet merged commit ef20a72 into main May 19, 2026
13 checks passed
@ruvnet ruvnet deleted the fix/align-ground-truth-streaming-and-sensing-update branch May 19, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant