Skip to content

feat(2026): distill to cnn-medium-attn+RoPE, add inference CLI, restructure dist/#12

Merged
AmitMY merged 15 commits intoy2025from
y2026
Mar 23, 2026
Merged

feat(2026): distill to cnn-medium-attn+RoPE, add inference CLI, restructure dist/#12
AmitMY merged 15 commits intoy2025from
y2026

Conversation

@AmitMY
Copy link
Copy Markdown
Contributor

@AmitMY AmitMY commented Mar 22, 2026

Summary

Distills the 2026 experimental work down to a clean, minimal production path. Removes all architecture branches we didn't end up using and all loss/augmentation variants that didn't improve results.

Changes

  • model.py (805→294 lines): cnn-medium-attn + RoPE only — removed bilstm, bigru, tcn, cnn-fast-slow, cnn-local-attn, cnn-lstm, cnn-large, cnn; removed focal loss, label smoothing, b-dice, per-head weighted loss, legacy flags
  • train.py: removed curriculum callbacks (SegmentationDataModule, CurriculumCallback, FrameCurriculumCallback); simplified to direct get_dataloader() path
  • args.py: removed ~30 unused args; defaults reflect proven recipe (hidden_dim=384, depth=6, nhead=8, dice_loss_weight=1.0, epochs=200)
  • dataset.py: removed acceleration and speed_aug branches
  • evaluate.py: added as tracked file; removed windowed/LSTM eval path
  • bin.py (new): 2026 inference CLI — loads .ckpt, preprocesses pose, runs chunked RoPE inference, writes ELAN
  • old/bin.py: updated dist path to dist/2023/
  • dist/2023/: moved 2023 TorchScript .pth models from old/dist/; added README
  • dist/2026/: added EXPERIMENTS.md + findings README.md

Key findings (dist/2026/README.md)

Helped: RoPE transformer, Dice loss (weight=1.0), fps_aug (essential — disabling costs −9pp Sign IoU), body_part_dropout=0.1 (+10.5pp Phrase25), frame_dropout=0.15 (essential regularisation), velocity features, no_face, hidden_dim=384, HM(sign,phrase) validation metric, inference chunk_size=num_frames (bug fix: +12.8pp Phrase IoU)

Didn't help: attention padding mask (−7pp S25 consistently), B-frame Dice loss, focal loss, label smoothing, speed augmentation, frame curriculum, removing frame_dropout (catastrophic phrase overfitting after ~50 epochs)

Best result so far (E145, 1024 frames)

Sign IoU Phrase IoU
50fps 0.595 0.907
25fps 0.569 0.880
HM 0.705

🤖 Generated with Claude Code

…ucture dist/

Training:
- model.py: remove all arch branches except cnn-medium-attn+RoPE (805→294 lines)
  - removes: bilstm/bigru/tcn/cnn-fast-slow/cnn-local-attn/cnn-lstm/cnn-large/cnn
  - removes: GatedResidual, SinusoidalPositionalEncoding, LocalAttentionBlock, TCNBlock
  - removes: focal loss, label smoothing, b-dice, per-head weighted loss, legacy flags
  - keeps: dice loss, RoPE chunked inference, HM(sign,phrase) validation metric
- train.py: remove curriculum callbacks; simplify to direct get_dataloader() path
- args.py: remove unused args (arch, pos_encoding, acceleration, speed_aug,
  weighted_loss, focal_gamma, label_smoothing, b_dice, curriculum)
- dataset.py: remove acceleration and speed_aug branches

Inference:
- bin.py: new 2026 inference CLI (load .ckpt, process pose, write ELAN)
- old/bin.py: update dist path to dist/2023/

Evaluation:
- evaluate.py: add as tracked file; remove windowed/LSTM eval path

Dist:
- dist/2023/: move 2023 TorchScript .pth models from old/dist/; add README
- dist/2026/: add EXPERIMENTS.md and findings README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@AmitMY AmitMY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the dockerfile? update to the main README that says how to train? how to run?

Comment thread dist/2023/README.md Outdated
Comment thread dist/2026/README.md Outdated
Comment thread dist/2026/README.md Outdated
Comment thread dist/2026/README.md Outdated
Comment thread dist/2026/README.md
Comment thread sign_language_segmentation/bin.py Outdated
Comment thread sign_language_segmentation/bin.py Outdated
Comment thread sign_language_segmentation/evaluate.py Outdated
Comment thread sign_language_segmentation/evaluate.py
Comment thread sign_language_segmentation/evaluate.py Outdated
eval_args.b_threshold = best_cfg.get("b_threshold", eval_args.b_threshold)
eval_args.io_threshold = best_cfg.get("io_threshold", eval_args.io_threshold)

if eval_args.likeliest:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we check which is better? likeliest or probs_to_segments? from thinking that we might not need thresholds anymore? that would simplify the code a lot

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't done a systematic comparison, but from threshold-sweep experiments (--tune_threshold) likeliest (argmax) has generally matched or beaten threshold-based on IoU — and it's simpler. Made likeliest the default. Threshold-based still available via --threshold + --tune_threshold if needed. Happy to add a explicit comparison to the eval script if you'd like.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make an evaluation of likeliest vs threshold and report here. if likeliest is the same/better, we can remove the threshold code, and all threshold tuning code. simplifies the repo

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running E165 (1536fr, new code) to get a clean comparison. Will report here once done. Previous models (E162, E163) were trained with probs_to_segments as the validation metric, so comparing on those is skewed. E165 is the first model trained with the new fps-normalised velocity and ClassifierHead, so results will be directly comparable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ablation results (E165-1536-batch8-drop01-fixchunk-hmval-3h, dev, 50fps):

Decoding Sign IoU Phrase IoU HM
likeliest (argmax) 0.5598 0.8892 0.6871
threshold (default 0.5) 0.5519 0.8885 0.6809
tuned threshold (swept) 0.5706 0.8937 0.6965

Tuned threshold beats likeliest by +0.011 Sign IoU, so we keep threshold-based decoding for final reporting. likeliest is the fast-path default (no tuning needed); --tune_threshold is the recommended path for final numbers.

Note: training validation used likeliest; the eval gap vs tuned threshold is ~1pp.

AmitMY and others added 4 commits March 22, 2026 05:03
- Remove dist/2023/ (use the 2023 git tag/release instead)
- Remove sign_language_segmentation/old/bin.py
- pyproject.toml: remove old/* packages, dist/2023 data-files, use pip pose-anonymization
- args.py: set best defaults (velocity, fps_aug, frame_dropout=0.15, body_part_dropout=0.1,
  optimizer=adamw-onecycle); drop no_face/normalize/pose_dims as deprecated hidden args
- data/utils.py: preprocess_pose always applies no_face+normalize (remove conditionals);
  add compute_velocity(pose_data, frame_times_seconds) utility
- data/dataset.py: remove normalize/no_face params; timestamps now in seconds
- model/model.py: add ClassifierHead (linear→GELU→linear) for both BIO heads;
  RoPE now expects timestamps in seconds and scales by reference_fps=50 internally;
  use bio_labels_to_segments from metrics (no more duplicated BIO→segment loop)
- metrics.py: add bio_labels_to_segments() shared utility
- bin.py: @torch.inference_mode, seconds-based timestamps, use compute_velocity
- evaluate.py: use bio_labels_to_segments; likeliest_probs_to_segments is now default
- train.py: print best.ckpt path after training
- dist/2026/README.md: fix architecture description (skip connections, residual,
  RoPE in seconds), clarify attention mask failure reason, remove HM row,
  note depth=4 worth retrying

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…luate

- model.py: on_load_checkpoint migrates old single-Linear heads to nn.Linear
  when loading pre-ClassifierHead checkpoints (strict=False loads the
  remaining keys correctly; old sign_bio_head.weight maps directly)
- dataset.py: fix missing frame_times_ms assignment in non-fps_aug path
- evaluate.py: add --chunk_multiplier flag to scale inference chunk size
  for RoPE generalisation ablation (1x/2x/4x)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Uses the same argmax decoding in both validation_step and evaluate.py,
removing the discrepancy where training validation used threshold-based
probs_to_segments but evaluate.py reported likeliest results.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tric switch

E165 is currently training; switching validation metric mid-run risks
premature early stopping. Revert to probs_to_segments for consistency
with E165 training. Will align metrics after E165 completes once we
have evidence that likeliest is better than threshold for new models.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

@AmitMY AmitMY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated main README.md with Docker-based training/evaluation instructions and a pointer to Dockerfile.train. The Dockerfile was already in the repo root but not documented in the README.

AmitMY and others added 4 commits March 22, 2026 06:33
- Add Docker build/train/evaluate commands pointing to Dockerfile.train
- Add local development setup
- Update architecture description to match 2026 CNN-medium-attn + RoPE
- Point to dist/2026/README.md for full details
- Remove outdated 2025 content

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All best hyperparameters are now defaults in args.py:
velocity=True, fps_aug=True, body_part_dropout=0.1, frame_dropout=0.15,
dice_loss_weight=1.0. Training command only needs corpus/poses and
resource params (batch_size, num_frames, patience).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Dockerfile

- Remove probs_to_segments / _io_probs_to_segments from metrics.py — likeliest
  (argmax) decoding wins on E169 and generalises better to test set; threshold
  was overfitting dev sign IoU at the expense of phrase IoU.
- evaluate.py: drop --threshold/--tune_threshold/--b/o/io_threshold args;
  decoding path is now simply likeliest + optional filter_segments.
- bin.py: remove unused probs_to_segments import.
- model.py: batched chunk inference in encode() — all chunks stacked into one
  batch and processed in a single transformer forward pass instead of N serial
  calls; remove on_load_checkpoint backward-compat shim.
- Dockerfile.train: add training image definition (nvcr pytorch:26.02-py3 base,
  installs deps from pyproject.toml; code is mounted at runtime).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AmitMY and others added 5 commits March 22, 2026 17:53
…docs

- Delete sign_language_segmentation/old/ (2023-era code: SLURM job scripts,
  old threshold decoder, old tests — all superseded by 2026 rewrite)
- args.py: remove deprecated suppressed args (--arch, --pos_encoding, --no_face,
  --no_normalize, --pose_dims, --acceleration, --speed_aug, --target_fps,
  --steps_per_epoch); update defaults to match best config (depth=4, dice=1.5)
- dist/2026/README.md: fix architecture (depth=4 not 6), update best results
  table with E166-E169, add threshold decoding to "What Did Not Help",
  correct training command
- README.md: fix training command to use correct hyperparams (depth=4, 1024fr)
- .gitignore: add models/, logs/, lightning_logs/, *.egg-info/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e_pose_segments

- bin.py: add segment_pose() importable function (loads model via lru_cache,
  runs inference, returns eaf + tiers dict); add save_pose_segments() to crop
  and save per-segment .pose files; add --save-segments and --subtitles CLI args;
  model loading now cached so repeated calls are fast
- server.py: Flask server exposing POST / for pose segmentation (input/output
  as file paths or gs:// URIs) and GET /health; single-frame edge case handled
- Dockerfile: CPU-only inference image (python:3.12-slim + torch CPU wheel);
  serves via gunicorn; copies source and dist/2026/best.ckpt at build time
- pyproject.toml: add [server] optional deps (Flask, Werkzeug, gunicorn)
- .github/workflows/publish-docker.yaml: publish image to ghcr.io on release
- README.md: add Python API example, server usage, health check

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
E169 (depth=4, 1024fr, 6h) beats Efinal on both dev and test:
  dev  HM=0.763 (Sign=0.657, Phr=0.910)
  test HM=0.764 (Sign=0.652, Phr=0.925)

Efinal trained longer but early stopping had already found the optimum.
best.ckpt updated to E169 checkpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/test_inference.py: smoke tests for segment_pose (tiers, start/end,
  eaf tiers); example.pose bundled for CI
- ruff fixes: remove unused imports (argparse, math, numpy), remove unused
  gold_range variable, replace lambda with def in evaluate.py
- pyproject.toml: move pytorch-lightning and scikit-learn to core deps
  (both required at inference time, not just dev); add **/*.ckpt to
  package-data so best.ckpt ships with pip install
- sign_language_segmentation/dist/2026/best.ckpt: E169 checkpoint bundled
  inside the package; _default_model_path() updated to find it via __file__
- Dockerfile: fix layer ordering (copy source then pip install --no-deps -e .
  so actual code is installed, not build stubs); warmup call now succeeds;
  fix ENV syntax and CMD JSON form to eliminate build warnings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strip AdamW optimizer states and convert float32→bfloat16 to reduce
checkpoint size ~6x for deployment without affecting inference quality
(dev HM-IoU 0.763 preserved). Add slim_checkpoint CLI entry point so
future dist checkpoints can be prepared in one command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AmitMY AmitMY marked this pull request as ready for review March 23, 2026 07:59
… note

- Restore complete bibtex entry (editor, address, doi, pages) from main
- Restore '## 2023 Version (v2023)' section linking to the paper code
- Document slim_checkpoint usage in dist/2026/README.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AmitMY AmitMY merged commit 1bca0e7 into y2025 Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant