Skip to content

feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103)#695

Merged
ruvnet merged 1 commit into
mainfrom
feat/cog-person-count-train
May 21, 2026
Merged

feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103)#695
ruvnet merged 1 commit into
mainfrom
feat/cog-person-count-train

Conversation

@ruvnet
Copy link
Copy Markdown
Owner

@ruvnet ruvnet commented May 21, 2026

Phase 2 of ADR-103. Trained count head on the existing 1,077 paired samples (same data that produced pose_v1 yesterday).

Honest result

Metric Value
Best eval accuracy 65.1%
Within ±1 100% (labels span {0,1}, trivially satisfied)
MAE 0.349
Class 0 (empty) accuracy 100% (140 samples)
Class 1 (person present) accuracy 0% (75 samples)
Confidence↔correctness Spearman 0.023

Model overfit by epoch ~100; the 'best' checkpoint predicts the eval-window class distribution rather than a real classifier. Same data-bound failure mode as pose_v1 (#645). v0.0.1 ships the pipeline + a working artifact + honest numbers; usable counts wait on multi-room paired data.

What v0.0.1 still validates

  • PyTorch → safetensors → Candle Rust loads cleanly. cog-person-count health reports backend: candle-cpu (not stub), architecture parity bit-exact.
  • ONNX export bit-clean (16 KB, opset 18, dynamic batch).
  • Training wall time: 5.6 s for 400 epochs on RTX 5080.
  • All 15 tests still pass.

Files

  • scripts/align-ground-truth.js — extended to emit n_persons_mode + n_persons_max per window. Backwards-compatible additive fields.
  • scripts/train-count.py — new. Mirrors CountNet exactly; CE+BCE+Brier loss; safetensors+ONNX export.
  • v2/.../cog/artifacts/{count_v1.safetensors, count_v1.onnx, count_train_results.json} — the artifacts.
  • v2/.../cog/README.md — Status updated with v0.0.1 numbers + honest-caveat section.
  • docs/benchmarks/person-count-cog.md — new benchmark log mirroring the pose-cog format.

🤖 Generated with claude-flow

…DR-103)

Phase 2 of ADR-103: trained count head on the existing 1,077 paired
samples (the same data that produced pose_v1 yesterday).

Honest result: 65.1% eval accuracy / 100% within ±1 / MAE 0.349 on
the held-out time-window. Per-class: 100% on "empty room" / 0% on
"1 person". The model overfit by epoch 100 (train_acc → 1.0,
eval_loss climbed 0.67 → 7.8) and the "best" checkpoint is the
snapshot that happened to predict the eval window's class
distribution (140/215 = 65.1%, matches eval_acc exactly). Confidence
head Spearman = 0.023 ⇒ uncalibrated. Same data-bound failure mode
as pose_v1 (#645), bounded by single-session training data; same
fix path (multi-room).

What v0.0.1 still validates end-to-end:
* PyTorch → safetensors → Candle Rust loads cleanly on first try.
  `cog-person-count health` reports `backend: candle-cpu` and emits
  real per-frame predictions instead of the stub backend's hard-coded
  {1 person, 0 confidence}. Architecture parity between train-count.py
  and src/inference.rs::CountNet is bit-exact.
* ONNX export bit-clean (16 KB, opset 18, dynamic batch axis).
* Training wall time: 5.6 s for 400 epochs on RTX 5080.
* Binary size unchanged (2.36 MB stripped), model loads via mmap at
  runtime.

This commit ships:

* scripts/align-ground-truth.js: extended to emit n_persons_mode +
  n_persons_max per window so the training pipeline has count
  labels. Backwards-compatible (additive fields).
* scripts/train-count.py: new — mirrors CountNet architecture
  exactly, loads paired.jsonl, trains 400 epochs with
  CE+BCE+Brier loss, exports safetensors + ONNX + per-epoch JSON.
* v2/.../cog/artifacts/{count_v1.safetensors,count_v1.onnx,
  count_train_results.json}: the trained artifacts.
* v2/.../cog/README.md: Status table updated with the v0.0.1 numbers
  + an Honest Caveat section explaining the data-bound result.
* docs/benchmarks/person-count-cog.md: new — full v0.0.1 benchmark
  log mirroring the format docs/benchmarks/pose-estimation-cog.md
  established. Includes comparison to ADR-103 v0.1.0 acceptance
  gates and per-class breakdown.

Still pending:
* `run` subcommand wiring (long-running polling loop, same as pose)
* Cross-compile + sign + GCS upload (mirror of pose cog pipeline)
* Live install on cognitum-v0
* v0.2.0: re-train on multi-room data, LoRA per-room adapters,
  Stoer-Wagner min-cut clip in fusion stage
@ruvnet ruvnet merged commit 6b4994e into main May 21, 2026
12 checks passed
@ruvnet ruvnet deleted the feat/cog-person-count-train branch May 21, 2026 22:56
ruvnet added a commit that referenced this pull request May 21, 2026
…al (#697)

Phase 4 of ADR-103. Adds the long-running polling loop so the cog's
fourth verb (`run`) does real work, completing the ADR-100 runtime
contract end-to-end:

  cog-person-count version    → "person-count 0.3.0"
  cog-person-count manifest   → JSON skeleton
  cog-person-count health     → loads weights + 1-shot infer + emit
  cog-person-count run --config  → long-running per-frame emit  ← THIS

What ships:

* src/runtime.rs (new) — `run_loop` polls sensing_url every poll_ms,
  slides a [56, 20] CSI window, runs InferenceEngine::infer, emits
  publisher::person_count events. Same shape as
  cog-pose-estimation::runtime — fetch_frame extracts amplitudes
  from `snapshot.nodes[0].amplitude[]`, fails open on connect errors
  with a WARN log rather than crashing.
* src/lib.rs — registers the runtime module.
* src/main.rs — cmd_run now loads RunConfig from a JSON file, builds
  the InferenceEngine (with weights if cfg.model_path is set,
  otherwise auto-discover), emits a run.started event, and hands off
  to the Tokio multi-thread runtime's block_on(run_loop). Single-node
  fusion is a no-op for N=1 today; v0.2.0 will append predictions
  from sibling nodes and call fusion::fuse_confidence_weighted before
  emit.

Verified locally:

  cargo check  -p cog-person-count --no-default-features   → clean
  cargo test   -p cog-person-count                          → 15/15 pass (no regressions)
  cargo build  -p cog-person-count --release                → 2.36 MB unchanged
  ./cog-person-count run --config bad-config.json:
    line 1: {"event":"run.started","fields":{"cog":"person-count",
             "sensing_url":"http://127.0.0.1:9999/...",poll_ms:100,
             "model_path":"(auto-discover)"}}
    line 2: WARN sensing-server fetch failed
            error=Connection Failed: Connect error: actively refused
    (loop alive — exits cleanly on SIGTERM, no crash, no NaN)

Also adds a "Relationship to the in-process score_to_person_count
heuristic" section to cog/README.md explaining the dual-emitter
design (sensing-server keeps emitting the PR #491 slot heuristic;
the cog runs out-of-process and emits person.count events from the
learned model). Operators choose by installing the cog or not — no
sensing-server rebuild required.

ADR-103 §"Migration" status:
  1. Land ADR + scaffold ........... done (#693, #694)
  2. Train count_v1 ................ done (#695)
  3. Cross-compile + sign + GCS .... done (#696)
  4. Server-side wiring ............ done — out-of-process design
                                      means no rewire needed; this
                                      cog is the wiring.
  5. v0.2.0 multi-room + LoRA ...... data-bound (#645)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant