WIP feat(2025): major refactor to train directly from database, new model architecture by AmitMY · Pull Request #7 · sign-language-processing/segmentation

AmitMY · 2025-04-01T13:53:58Z

No description provided.

… architecture

…ucture dist/ Training: - model.py: remove all arch branches except cnn-medium-attn+RoPE (805→294 lines) - removes: bilstm/bigru/tcn/cnn-fast-slow/cnn-local-attn/cnn-lstm/cnn-large/cnn - removes: GatedResidual, SinusoidalPositionalEncoding, LocalAttentionBlock, TCNBlock - removes: focal loss, label smoothing, b-dice, per-head weighted loss, legacy flags - keeps: dice loss, RoPE chunked inference, HM(sign,phrase) validation metric - train.py: remove curriculum callbacks; simplify to direct get_dataloader() path - args.py: remove unused args (arch, pos_encoding, acceleration, speed_aug, weighted_loss, focal_gamma, label_smoothing, b_dice, curriculum) - dataset.py: remove acceleration and speed_aug branches Inference: - bin.py: new 2026 inference CLI (load .ckpt, process pose, write ELAN) - old/bin.py: update dist path to dist/2023/ Evaluation: - evaluate.py: add as tracked file; remove windowed/LSTM eval path Dist: - dist/2023/: move 2023 TorchScript .pth models from old/dist/; add README - dist/2026/: add EXPERIMENTS.md and findings README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove dist/2023/ (use the 2023 git tag/release instead) - Remove sign_language_segmentation/old/bin.py - pyproject.toml: remove old/* packages, dist/2023 data-files, use pip pose-anonymization - args.py: set best defaults (velocity, fps_aug, frame_dropout=0.15, body_part_dropout=0.1, optimizer=adamw-onecycle); drop no_face/normalize/pose_dims as deprecated hidden args - data/utils.py: preprocess_pose always applies no_face+normalize (remove conditionals); add compute_velocity(pose_data, frame_times_seconds) utility - data/dataset.py: remove normalize/no_face params; timestamps now in seconds - model/model.py: add ClassifierHead (linear→GELU→linear) for both BIO heads; RoPE now expects timestamps in seconds and scales by reference_fps=50 internally; use bio_labels_to_segments from metrics (no more duplicated BIO→segment loop) - metrics.py: add bio_labels_to_segments() shared utility - bin.py: @torch.inference_mode, seconds-based timestamps, use compute_velocity - evaluate.py: use bio_labels_to_segments; likeliest_probs_to_segments is now default - train.py: print best.ckpt path after training - dist/2026/README.md: fix architecture description (skip connections, residual, RoPE in seconds), clarify attention mask failure reason, remove HM row, note depth=4 worth retrying Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…luate - model.py: on_load_checkpoint migrates old single-Linear heads to nn.Linear when loading pre-ClassifierHead checkpoints (strict=False loads the remaining keys correctly; old sign_bio_head.weight maps directly) - dataset.py: fix missing frame_times_ms assignment in non-fps_aug path - evaluate.py: add --chunk_multiplier flag to scale inference chunk size for RoPE generalisation ablation (1x/2x/4x) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Uses the same argmax decoding in both validation_step and evaluate.py, removing the discrepancy where training validation used threshold-based probs_to_segments but evaluate.py reported likeliest results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tric switch E165 is currently training; switching validation metric mid-run risks premature early stopping. Revert to probs_to_segments for consistency with E165 training. Will align metrics after E165 completes once we have evidence that likeliest is better than threshold for new models. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add Docker build/train/evaluate commands pointing to Dockerfile.train - Add local development setup - Update architecture description to match 2026 CNN-medium-attn + RoPE - Point to dist/2026/README.md for full details - Remove outdated 2025 content Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

All best hyperparameters are now defaults in args.py: velocity=True, fps_aug=True, body_part_dropout=0.1, frame_dropout=0.15, dice_loss_weight=1.0. Training command only needs corpus/poses and resource params (batch_size, num_frames, patience). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Dockerfile - Remove probs_to_segments / _io_probs_to_segments from metrics.py — likeliest (argmax) decoding wins on E169 and generalises better to test set; threshold was overfitting dev sign IoU at the expense of phrase IoU. - evaluate.py: drop --threshold/--tune_threshold/--b/o/io_threshold args; decoding path is now simply likeliest + optional filter_segments. - bin.py: remove unused probs_to_segments import. - model.py: batched chunk inference in encode() — all chunks stacked into one batch and processed in a single transformer forward pass instead of N serial calls; remove on_load_checkpoint backward-compat shim. - Dockerfile.train: add training image definition (nvcr pytorch:26.02-py3 base, installs deps from pyproject.toml; code is mounted at runtime). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…docs - Delete sign_language_segmentation/old/ (2023-era code: SLURM job scripts, old threshold decoder, old tests — all superseded by 2026 rewrite) - args.py: remove deprecated suppressed args (--arch, --pos_encoding, --no_face, --no_normalize, --pose_dims, --acceleration, --speed_aug, --target_fps, --steps_per_epoch); update defaults to match best config (depth=4, dice=1.5) - dist/2026/README.md: fix architecture (depth=4 not 6), update best results table with E166-E169, add threshold decoding to "What Did Not Help", correct training command - README.md: fix training command to use correct hyperparams (depth=4, 1024fr) - .gitignore: add models/, logs/, lightning_logs/, *.egg-info/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e_pose_segments - bin.py: add segment_pose() importable function (loads model via lru_cache, runs inference, returns eaf + tiers dict); add save_pose_segments() to crop and save per-segment .pose files; add --save-segments and --subtitles CLI args; model loading now cached so repeated calls are fast - server.py: Flask server exposing POST / for pose segmentation (input/output as file paths or gs:// URIs) and GET /health; single-frame edge case handled - Dockerfile: CPU-only inference image (python:3.12-slim + torch CPU wheel); serves via gunicorn; copies source and dist/2026/best.ckpt at build time - pyproject.toml: add [server] optional deps (Flask, Werkzeug, gunicorn) - .github/workflows/publish-docker.yaml: publish image to ghcr.io on release - README.md: add Python API example, server usage, health check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

E169 (depth=4, 1024fr, 6h) beats Efinal on both dev and test: dev HM=0.763 (Sign=0.657, Phr=0.910) test HM=0.764 (Sign=0.652, Phr=0.925) Efinal trained longer but early stopping had already found the optimum. best.ckpt updated to E169 checkpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- tests/test_inference.py: smoke tests for segment_pose (tiers, start/end, eaf tiers); example.pose bundled for CI - ruff fixes: remove unused imports (argparse, math, numpy), remove unused gold_range variable, replace lambda with def in evaluate.py - pyproject.toml: move pytorch-lightning and scikit-learn to core deps (both required at inference time, not just dev); add **/*.ckpt to package-data so best.ckpt ships with pip install - sign_language_segmentation/dist/2026/best.ckpt: E169 checkpoint bundled inside the package; _default_model_path() updated to find it via __file__ - Dockerfile: fix layer ordering (copy source then pip install --no-deps -e . so actual code is installed, not build stubs); warmup call now succeeds; fix ENV syntax and CMD JSON form to eliminate build warnings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Strip AdamW optimizer states and convert float32→bfloat16 to reduce checkpoint size ~6x for deployment without affecting inference quality (dev HM-IoU 0.763 preserved). Add slim_checkpoint CLI entry point so future dist checkpoints can be prepared in one command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… note - Restore complete bibtex entry (editor, address, doi, pages) from main - Restore '## 2023 Version (v2023)' section linking to the paper code - Document slim_checkpoint usage in dist/2026/README.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).

Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls.

Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).

Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls.

Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).

Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls.

* feat: publish HF ops + evaluation helpers Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls. * refactor: publish HF ops review follow-ups — narrow except, strict loading, hm_IoU in regression - check_regression: narrow `except Exception` to HfHubHTTPError (404-only); auth/network errors now surface - drop `strict=False` on load_from_checkpoint — silent partial-load was a landmine for ckpt key drift - align hparam fallbacks (fps_aug, velocity) with training defaults from args.py (both True); comment the why - quality_percentile: assert equality across all manifests; raise ValueError on mismatch (previously silent "first wins") - add hm_IoU to regression-check metrics (hm_IoU regressions previously didn't block promotion) - drop dangling `# TODO: add slack notifications` - rename tests/test_publish_cli.py → tests/test_publish_hf_ops.py (tests cover HF ops, not the CLI) - tests: replace RuntimeError mock with real HfHubHTTPError(response.status_code=404); add test_non_404_download_error_propagates * ci: install [hf] extra so publish tests can import huggingface_hub

Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).

* feat: publish CLI — evaluation, regression, promotion, orchestrator Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote). * fix: pass repo_id to generate_model_card after #29 API change * fix: split CLI-orchestrator tests back out of test_publish_hf_ops.py Rebase side-effect: after #31 renamed test_publish_cli.py → test_publish_hf_ops.py, git's rename detection merged #30's CLI-orchestrator additions into the HF-ops file. Restore the intended separation — TestPublishIntegration lives in test_publish_cli.py, TestCheckRegression and TestPromote stay in test_publish_hf_ops.py.

AmitMY and others added 17 commits January 6, 2025 11:00

feat(2025): major refactor to train directly from database, new model…

4b3a0b6

… architecture

docs: update experiment log in dist/2026/

5c2b414

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge pull request #12 from sign-language-processing/y2026

1bca0e7

AmitMY merged commit b7c3299 into main Mar 23, 2026

ziv-lazarov-nagish mentioned this pull request Apr 20, 2026

feat: publish CLI orchestrator #30

Merged

3 tasks

ziv-lazarov-nagish mentioned this pull request Apr 20, 2026

feat: publish HF ops + evaluation helpers #31

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP feat(2025): major refactor to train directly from database, new model architecture#7

WIP feat(2025): major refactor to train directly from database, new model architecture#7
AmitMY merged 17 commits intomainfrom
y2025

AmitMY commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AmitMY commented Apr 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant