fix(align): stream JSONL + support sensing_update format (unblocks ADR-079 P8)#641
Merged
Merged
Conversation
Two blockers discovered while running ADR-079 P7→P8 end-to-end against
a 30-minute paired session (39,088 GT frames + 45,625 CSI frames):
1. `readFileSync(_, 'utf8').split('\n')` hit Node's `String.MaxLength`
(~512 MB) on the 750 MB CSI recording. Result:
Error: Cannot create a string longer than 0x1fffffe8 characters
Replaced loadJsonl with a 1 MiB byte-buffer streaming reader that
decodes line-by-line, so memory use stays bounded by the largest
single record.
2. The sensing-server has long since switched from the legacy `raw_csi`
/ `feature` typed records to a single `sensing_update` record per
tick (with nodes[].amplitude and top-level features). The aligner
filtered on the old types and produced 0 frames every time. Added a
`sensing_update` branch that projects each tick into rawCsi/features
entries the existing windowing code can consume, and updated
extractCsiMatrix to use already-extracted amplitudes when iqHex is
absent. timestamp is now accepted as either ISO string (legacy) or
numeric float-seconds (current).
End-to-end verified: produces 1,077 paired samples at
`--min-confidence 0.3 --window-frames 20` from the full 30-min
recording; downstream `train-wiflow-supervised.js` runs to completion.
See follow-up #640 for the PCK gap (data + GPU needed) — those are
training concerns, not aligner concerns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two real blockers found while running ADR-079 P7→P8 end-to-end for the first time against a 30-min paired session:
fs.readFileSync(_, 'utf8').split('\n')errored with `Cannot create a string longer than 0x1fffffe8 characters`. Replaced `loadJsonl` with a 1 MiB byte-buffer streaming reader that decodes line-by-line.End-to-end verified: 1,077 paired samples produced at `--min-confidence 0.3 --window-frames 20`; downstream `train-wiflow-supervised.js` runs to completion.
The PCK gap that came out of this run (0% on every joint, more data + GPU needed) is tracked separately in #640 — those are training concerns, not aligner concerns.
Test plan
🤖 Generated with claude-flow