Skip to content

feat: safetensors runtime loading + HF-aware resolver#28

Merged
ziv-lazarov-nagish merged 2 commits intonagishfrom
feat/safetensors-runtime
Apr 26, 2026
Merged

feat: safetensors runtime loading + HF-aware resolver#28
ziv-lazarov-nagish merged 2 commits intonagishfrom
feat/safetensors-runtime

Conversation

@ziv-lazarov-nagish
Copy link
Copy Markdown
Contributor

@ziv-lazarov-nagish ziv-lazarov-nagish commented Apr 20, 2026

Summary

First of a 5-PR stack (layers 1–4 split PR #22 into reviewable layers; layer 5 adds --dry-run on top). This PR adds the runtime-only foundation: bin.py can now load models from safetensors directories and falls back to .ckpt for backward compatibility. No HF or publish features are advertised here — the huggingface_hub import is behind a guarded try/except ImportError, and the [hf] / [publish] optional-dependency groups land in the later stack layers.

  • bin.py: new resolve_model_path() with priority MODEL_PATH env > HF_MODEL_REPO env > baked-in sign_language_segmentation/dist/2026/. load_model() prefers model.safetensors + config.json and falls back to .ckpt.
  • pyproject.toml: add [build-system], add safetensors to base deps, swap .ckpt.safetensors in package-data, drop slim_checkpoint script (still runnable via python -m).
  • .gitignore: ignore *.safetensors artifacts.

Review-comment fix shipped in this layer

  • High memory usage #3 @lru_cache + HF_MODEL_REVISION staleness — added revision: str = "" kwarg to load_model(...) so it's part of the cache key. segment_pose reads os.environ.get("HF_MODEL_REVISION", "") and forwards it. A mid-process env change now invalidates the cache entry instead of silently returning a stale model. Non-HF paths default to revision="" and behave exactly as before.

Out of scope (reverted)

  • datasets/dgs/dataset.py formatting churn — not in this PR
  • datasets/annotation_platform/sync.py --model_path default change — not in this PR

Stack

5-PR stack — layers 1–4 split #22, layer 5 adds --dry-run:

  1. This PRnagish
  2. feat/publish-utils-and-card → this PR (PR feat: publish utils — conversion, versioning, model card #29)
  3. feat/publish-hf-ops-and-evalfeat: publish utils — conversion, versioning, model card #29 (PR feat: publish HF ops + evaluation helpers #31)
  4. feat/publish-clifeat: publish HF ops + evaluation helpers #31 (PR feat: publish CLI orchestrator #30)
  5. feat/publish-dry-runfeat: publish CLI orchestrator #30 (PR feat: --dry-run flag on publish CLI #32)

Stacks up toward replacing #22. #22 stays open until explicit close.

Test plan

  • uv run --extra dev pytest sign_language_segmentation/tests — 61 passed
  • Local round-trip: MODEL_PATH pointing at a directory with model.safetensors + config.json loads successfully (verified against sign_language_segmentation/dist/2026/)
  • load_model(dir, device, revision="x") vs load_model(dir, device, revision="y") returns distinct instances; same revision returns the cached instance (cache key includes revision as intended)

- `bin.py`: new `resolve_model_path()` picks MODEL_PATH > HF_MODEL_REPO > baked-in
  `dist/2026/`. `load_model()` prefers `model.safetensors` + `config.json` and
  falls back to `.ckpt` for backward compatibility.
- `load_model(..., revision="")` makes HF_MODEL_REVISION part of the lru_cache
  key so mid-process env changes invalidate the cached instance instead of
  silently returning a stale model.
- `pyproject.toml`: add `[build-system]`, add `safetensors` to base deps, swap
  `.ckpt` for `.safetensors` in package-data, drop `slim_checkpoint` script
  (still runnable via `python -m`).
- `.gitignore`: ignore `*.safetensors` artifacts.

The `huggingface_hub` import stays behind a guarded `try/except ImportError`;
no new runtime dep is advertised in this PR.
@AmitMY
Copy link
Copy Markdown
Contributor

AmitMY commented Apr 22, 2026

generally good! but, this PR, if merged to main, would break it. can you please manually convert the current ckpt to safetensors, so we can still have the model here (more important for main)

@ziv-lazarov-nagish
Copy link
Copy Markdown
Contributor Author

generally good! but, this PR, if merged to main, would break it. can you please manually convert the current ckpt to safetensors, so we can still have the model here (more important for main)

.ckpt files are still supported (see here), do you still want me to convert to safetensors?

Comment thread pyproject.toml

[tool.setuptools.package-data]
sign_language_segmentation = ["**/*.json", "**/*.ckpt", "**/*.yaml"]
sign_language_segmentation = ["**/*.json", "**/*.safetensors", "**/*.yaml"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert to safetensors, otherwise, ckpt file will not be installed on pip install

package-data now selects **/*.safetensors, so the bundled best.ckpt
would not have been installed on pip install. convert it in-place
(bf16 state dict + hparams json) so the baked-in default keeps working.

safetensors/ckpt loads produce identical outputs (max abs diff = 0).
@ziv-lazarov-nagish ziv-lazarov-nagish merged commit 4b3d2ea into nagish Apr 26, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants