Conversation
…ucture dist/ Training: - model.py: remove all arch branches except cnn-medium-attn+RoPE (805→294 lines) - removes: bilstm/bigru/tcn/cnn-fast-slow/cnn-local-attn/cnn-lstm/cnn-large/cnn - removes: GatedResidual, SinusoidalPositionalEncoding, LocalAttentionBlock, TCNBlock - removes: focal loss, label smoothing, b-dice, per-head weighted loss, legacy flags - keeps: dice loss, RoPE chunked inference, HM(sign,phrase) validation metric - train.py: remove curriculum callbacks; simplify to direct get_dataloader() path - args.py: remove unused args (arch, pos_encoding, acceleration, speed_aug, weighted_loss, focal_gamma, label_smoothing, b_dice, curriculum) - dataset.py: remove acceleration and speed_aug branches Inference: - bin.py: new 2026 inference CLI (load .ckpt, process pose, write ELAN) - old/bin.py: update dist path to dist/2023/ Evaluation: - evaluate.py: add as tracked file; remove windowed/LSTM eval path Dist: - dist/2023/: move 2023 TorchScript .pth models from old/dist/; add README - dist/2026/: add EXPERIMENTS.md and findings README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove dist/2023/ (use the 2023 git tag/release instead) - Remove sign_language_segmentation/old/bin.py - pyproject.toml: remove old/* packages, dist/2023 data-files, use pip pose-anonymization - args.py: set best defaults (velocity, fps_aug, frame_dropout=0.15, body_part_dropout=0.1, optimizer=adamw-onecycle); drop no_face/normalize/pose_dims as deprecated hidden args - data/utils.py: preprocess_pose always applies no_face+normalize (remove conditionals); add compute_velocity(pose_data, frame_times_seconds) utility - data/dataset.py: remove normalize/no_face params; timestamps now in seconds - model/model.py: add ClassifierHead (linear→GELU→linear) for both BIO heads; RoPE now expects timestamps in seconds and scales by reference_fps=50 internally; use bio_labels_to_segments from metrics (no more duplicated BIO→segment loop) - metrics.py: add bio_labels_to_segments() shared utility - bin.py: @torch.inference_mode, seconds-based timestamps, use compute_velocity - evaluate.py: use bio_labels_to_segments; likeliest_probs_to_segments is now default - train.py: print best.ckpt path after training - dist/2026/README.md: fix architecture description (skip connections, residual, RoPE in seconds), clarify attention mask failure reason, remove HM row, note depth=4 worth retrying Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…luate - model.py: on_load_checkpoint migrates old single-Linear heads to nn.Linear when loading pre-ClassifierHead checkpoints (strict=False loads the remaining keys correctly; old sign_bio_head.weight maps directly) - dataset.py: fix missing frame_times_ms assignment in non-fps_aug path - evaluate.py: add --chunk_multiplier flag to scale inference chunk size for RoPE generalisation ablation (1x/2x/4x) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Uses the same argmax decoding in both validation_step and evaluate.py, removing the discrepancy where training validation used threshold-based probs_to_segments but evaluate.py reported likeliest results. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tric switch E165 is currently training; switching validation metric mid-run risks premature early stopping. Revert to probs_to_segments for consistency with E165 training. Will align metrics after E165 completes once we have evidence that likeliest is better than threshold for new models. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add Docker build/train/evaluate commands pointing to Dockerfile.train - Add local development setup - Update architecture description to match 2026 CNN-medium-attn + RoPE - Point to dist/2026/README.md for full details - Remove outdated 2025 content Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All best hyperparameters are now defaults in args.py: velocity=True, fps_aug=True, body_part_dropout=0.1, frame_dropout=0.15, dice_loss_weight=1.0. Training command only needs corpus/poses and resource params (batch_size, num_frames, patience). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Dockerfile - Remove probs_to_segments / _io_probs_to_segments from metrics.py — likeliest (argmax) decoding wins on E169 and generalises better to test set; threshold was overfitting dev sign IoU at the expense of phrase IoU. - evaluate.py: drop --threshold/--tune_threshold/--b/o/io_threshold args; decoding path is now simply likeliest + optional filter_segments. - bin.py: remove unused probs_to_segments import. - model.py: batched chunk inference in encode() — all chunks stacked into one batch and processed in a single transformer forward pass instead of N serial calls; remove on_load_checkpoint backward-compat shim. - Dockerfile.train: add training image definition (nvcr pytorch:26.02-py3 base, installs deps from pyproject.toml; code is mounted at runtime). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…docs - Delete sign_language_segmentation/old/ (2023-era code: SLURM job scripts, old threshold decoder, old tests — all superseded by 2026 rewrite) - args.py: remove deprecated suppressed args (--arch, --pos_encoding, --no_face, --no_normalize, --pose_dims, --acceleration, --speed_aug, --target_fps, --steps_per_epoch); update defaults to match best config (depth=4, dice=1.5) - dist/2026/README.md: fix architecture (depth=4 not 6), update best results table with E166-E169, add threshold decoding to "What Did Not Help", correct training command - README.md: fix training command to use correct hyperparams (depth=4, 1024fr) - .gitignore: add models/, logs/, lightning_logs/, *.egg-info/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e_pose_segments - bin.py: add segment_pose() importable function (loads model via lru_cache, runs inference, returns eaf + tiers dict); add save_pose_segments() to crop and save per-segment .pose files; add --save-segments and --subtitles CLI args; model loading now cached so repeated calls are fast - server.py: Flask server exposing POST / for pose segmentation (input/output as file paths or gs:// URIs) and GET /health; single-frame edge case handled - Dockerfile: CPU-only inference image (python:3.12-slim + torch CPU wheel); serves via gunicorn; copies source and dist/2026/best.ckpt at build time - pyproject.toml: add [server] optional deps (Flask, Werkzeug, gunicorn) - .github/workflows/publish-docker.yaml: publish image to ghcr.io on release - README.md: add Python API example, server usage, health check Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
E169 (depth=4, 1024fr, 6h) beats Efinal on both dev and test: dev HM=0.763 (Sign=0.657, Phr=0.910) test HM=0.764 (Sign=0.652, Phr=0.925) Efinal trained longer but early stopping had already found the optimum. best.ckpt updated to E169 checkpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/test_inference.py: smoke tests for segment_pose (tiers, start/end, eaf tiers); example.pose bundled for CI - ruff fixes: remove unused imports (argparse, math, numpy), remove unused gold_range variable, replace lambda with def in evaluate.py - pyproject.toml: move pytorch-lightning and scikit-learn to core deps (both required at inference time, not just dev); add **/*.ckpt to package-data so best.ckpt ships with pip install - sign_language_segmentation/dist/2026/best.ckpt: E169 checkpoint bundled inside the package; _default_model_path() updated to find it via __file__ - Dockerfile: fix layer ordering (copy source then pip install --no-deps -e . so actual code is installed, not build stubs); warmup call now succeeds; fix ENV syntax and CMD JSON form to eliminate build warnings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Strip AdamW optimizer states and convert float32→bfloat16 to reduce checkpoint size ~6x for deployment without affecting inference quality (dev HM-IoU 0.763 preserved). Add slim_checkpoint CLI entry point so future dist checkpoints can be prepared in one command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… note - Restore complete bibtex entry (editor, address, doi, pages) from main - Restore '## 2023 Version (v2023)' section linking to the paper code - Document slim_checkpoint usage in dist/2026/README.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 20, 2026
Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).
3 tasks
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 20, 2026
Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls.
2 tasks
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 20, 2026
Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 23, 2026
Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls.
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 23, 2026
Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 23, 2026
Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 26, 2026
Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls.
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 26, 2026
* feat: publish HF ops + evaluation helpers Adds the library-level functions the publish CLI will orchestrate. No CLI entry point yet — that lands in a follow-up PR. - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. `run_evaluation` uses `argparse.Namespace` instead of an ad-hoc class, and imports `ensure_datasets_registered` as a public name. `promote` raises `ValueError` when the revision can't be resolved to a commit instead of silently passing the string through. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` so external callers can depend on it. - `pyproject.toml`: add `[hf]` optional group (`huggingface_hub>=0.20.0`). - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). Review comments addressed here: #4 (private → public), #5 (Namespace), #7 (raise ValueError). Tests: 7 new cases in `tests/test_publish_cli.py` covering `check_regression` (no_baseline / download-fail / pass / fail) and `promote` (tag-hit / branch-hit / unresolved → raises). All boundaries mocked — no network calls. * refactor: publish HF ops review follow-ups — narrow except, strict loading, hm_IoU in regression - check_regression: narrow `except Exception` to HfHubHTTPError (404-only); auth/network errors now surface - drop `strict=False` on load_from_checkpoint — silent partial-load was a landmine for ckpt key drift - align hparam fallbacks (fps_aug, velocity) with training defaults from args.py (both True); comment the why - quality_percentile: assert equality across all manifests; raise ValueError on mismatch (previously silent "first wins") - add hm_IoU to regression-check metrics (hm_IoU regressions previously didn't block promotion) - drop dangling `# TODO: add slack notifications` - rename tests/test_publish_cli.py → tests/test_publish_hf_ops.py (tests cover HF ops, not the CLI) - tests: replace RuntimeError mock with real HfHubHTTPError(response.status_code=404); add test_non_404_download_error_propagates * ci: install [hf] extra so publish tests can import huggingface_hub
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 26, 2026
Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote).
ziv-lazarov-nagish
added a commit
that referenced
this pull request
Apr 26, 2026
* feat: publish CLI — evaluation, regression, promotion, orchestrator Completes the publish pipeline on top of the pure-helpers layer: - `publish/utils.py`: append `_eval_single`, `run_evaluation`, `check_regression`, `promote`. - `publish/publish.py`: 8-step orchestrator — convert → find manifest → eval → regression check → save manifest → model card → upload to `weekly` branch → promote tag. - `publish/__init__.py`: re-export `publish` and `main`. - `datasets/common.py`: rename `_ensure_datasets_registered` → `ensure_datasets_registered` (now a public API so `run_evaluation` can call it without reaching into a private name). - `pyproject.toml`: add `[hf]` (huggingface_hub>=0.20.0) and `[publish]` (`[hf]` + `[train]`) optional-dependency groups. - `.env.example`: append HF section (`HF_TOKEN`, `HF_MODEL_REPO`, `HF_MODEL_REVISION`, `XDG_CACHE_HOME` for sparks cache). - `.gitignore`: add `dist/`, `wandb/`, `*.log`. Review-comment fixes shipped here: - #1: dropped unused `os`, `datetime`, `UTC` imports from `publish.py`. - #4: `_ensure_datasets_registered` → `ensure_datasets_registered` (public). - #5: `class EvalArgs: pass` → `argparse.Namespace(...)`. - #6: `--regression-threshold` default `0.02` → `0.005`. - #7: `promote()` now raises `ValueError` on unresolved revision instead of silently passing the ref string through to `create_tag`. 11 new tests in `test_publish_cli.py` cover: - `check_regression`: no_baseline (no tags / download failure), pass (within threshold), fail (beyond threshold). - `promote`: tag found, branch found, unresolved raises `ValueError`. - `publish()` integration with every HF + eval boundary mocked: skip-eval + no-promote, skip-eval + promote, eval + regression pass + promote, eval + regression fail (no promote). * fix: pass repo_id to generate_model_card after #29 API change * fix: split CLI-orchestrator tests back out of test_publish_hf_ops.py Rebase side-effect: after #31 renamed test_publish_cli.py → test_publish_hf_ops.py, git's rename detection merged #30's CLI-orchestrator additions into the HF-ops file. Restore the intended separation — TestPublishIntegration lives in test_publish_cli.py, TestCheckRegression and TestPromote stay in test_publish_hf_ops.py.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.