Skip to content

ADR-079 P8 follow-up: PCK@20 3.0% → ≥35% requires more paired data + multi-room framing #645

@ruvnet

Description

@ruvnet

Status (after today's cog v0.0.1 ship — see PRs #642, #643, #644)

ADR-079 P7 (data collection), P8 (alignment + train + eval) and cog packaging end-to-end all ran today. The pipeline is validated and a signed cog-pose-estimation@0.0.1 binary is live at gs://cognitum-apps/cogs/{arm,x86_64}/, installed on cognitum-v0. The remaining work for a useful model is data-bound.

Lever Current (v0.0.1) Target
Paired sample count 1,077 30,000+ (multiple 30-min sessions × full-body framing)
Camera framing torso-up at desk (avg n_visible 14.3/17) full-body, varied movements, multiple rooms
Avg detection confidence 0.476 ≥ 0.7
Training epochs 400 (Candle CUDA, 2.1 s on RTX 5080) 1000+ if needed (still seconds on the GPU)
PCK@20 3.0% ≥ 35%
PCK@50 18.5% ≥ 60%
MPJPE (normalized) 0.093 < 0.05

What the v0.0.1 numbers tell us

Per-joint PCK@50 ranks show the model is learning where the camera lets it:

r_hip       76.9%   ← excellent (right side most consistently in frame)
r_knee      35.2%
l_hip       27.3%
l_elbow     26.4%
l_wrist     24.1%
l_knee      20.8%
r_shoulder  19.9%
...
nose          5.1%   ← essentially random (face joints at desk-level zoom)
l_ankle       7.9%
r_ankle       9.3%

The asymmetry is a direct reading of the seated-at-desk camera framing — not a model defect. CSI at 56 subcarriers × 20 frames carries enough spatial info for proximal joints with consistent visibility; it doesn't carry enough for fine-grained extremities. More data won't fix that subcarrier-density bottleneck for fingertips / face, but multi-room full-body data will solve it for the 11 joints that today already show some signal.

Suggested data-collection plan

  1. 3 × 30-min sessions with the camera backed up so head→ankles fits in frame. Different rooms (or different times of day for the same room) to give the model spatial diversity. Vary movements: walk pattern, arm raises, sit/stand transitions, squats, reaches, lying down.
  2. Re-run scripts/align-ground-truth.js (now streaming-loader-safe per fix(align): stream JSONL + support sensing_update format (unblocks ADR-079 P8) #641) to produce a multi-session paired set.
  3. Train via the existing Candle pipeline on ruvultra's RTX 5080. Expected wall time: still well under a minute even for 30K samples / 1000 epochs.
  4. Re-evaluate. PCK@20 should approach the 35% target if the framing + variety land.

Optimizations available within the pipeline (do not require new data)

  • LoRA cross-environment fine-tune (per ADR-079 P9). Today's encoder was random-initialized because the HF presence encoder's MLP architecture didn't match; with multi-room data we can train a real shared encoder first and then per-room LoRA adapters.
  • Subcarrier attention weighting was already enabled (top-5: [33, 47, 50, 19, 16]).
  • Stoer-Wagner min-cut multi-person separation enabled.

Artifacts shipped today (for context)

  • Signed v0.0.1 binaries: gs://cognitum-apps/cogs/arm/cog-pose-estimation-arm + .../x86_64/....
  • Trained model: models/wifi-densepose-pretrained.safetensorspose_v1.safetensors (507 KB) + pose_v1.onnx (12 KB).
  • Benchmark log: docs/benchmarks/pose-estimation-cog.md.
  • Live install: /var/lib/cognitum/apps/pose-estimation/ on cognitum-v0.

Acceptance criteria for closing this issue

  • Multi-room paired dataset ≥ 30K samples at avg conf ≥ 0.7 produced.
  • PCK@20 ≥ 35% on a held-out time window from a different session.
  • PCK@50 ≥ 60%.
  • Per-joint PCK@50 ≥ 30% for at least 13 of 17 joints (face joints can lag).
  • Re-release cog-pose-estimation@0.1.0 with the new weights (no code change required — same Candle inference path, just better weights).

🤖 Generated with claude-flow

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions