Skip to content

feat(loading): detect Git LFS pointer files and raise a clear error#13559

Open
NazarKozak wants to merge 1 commit intohuggingface:mainfrom
NazarKozak:feat/lfs-pointer-detection
Open

feat(loading): detect Git LFS pointer files and raise a clear error#13559
NazarKozak wants to merge 1 commit intohuggingface:mainfrom
NazarKozak:feat/lfs-pointer-detection

Conversation

@NazarKozak
Copy link
Copy Markdown

What does this PR do?

Detect Git LFS pointer stubs in load_state_dict before attempting to deserialize them, and raise a clear error pointing at the actual fix instead of failing with a cryptic safetensors / pickle deserialization error far away from the real cause.

Why

When a Hugging Face repository is mirrored without LFS-aware copying — for example, git clone without a subsequent git lfs pull, or gsutil rsync / aws s3 sync from a bucket that holds the original git checkout — the resulting local directory contains ~130-byte text pointer stubs in place of the real .safetensors weights:

version https://git-lfs.github.com/spec/v1
oid sha256:<hash>
size <bytes>

from_pretrained(<that-local-dir>) then fails deep inside safetensors with a confusing tensor-shape / format error, often after several minutes of partial loads, with no hint that the problem is a missing LFS pull. The existing fallback in load_state_dict does try to detect this case, but only after the failed load and only for git-style cloning (its message tells users to run git lfs install + git lfs pull, which is wrong advice for a gsutil rsync mirror).

Changes

  1. New helper _is_lfs_pointer(path) in src/diffusers/utils/hub_utils.py.

    Reads at most 64 bytes, after a single os.path.getsize short-circuit (real weight files are MB–GB; LFS pointer files are ~130 bytes). Returns False for any kind of I/O error so it never gets in the way of a real load.

  2. Preemptive check at the top of load_state_dict in src/diffusers/models/model_loading_utils.py. If the resolved checkpoint file is an LFS pointer, raise OSError with a message that:

    • names the actual file path,
    • explains the two common causes (git clone without git lfs pull, and bucket-mirror tools like gsutil rsync / aws s3 sync),
    • suggests the LFS-aware alternative (huggingface-cli download <repo_id> --local-dir <dir>).

    The existing post-failure detection in the except block is kept as a safety net for any path that bypasses the new pre-check.

  3. Unit tests in tests/others/test_hub_utils.py covering five cases:

    • LFS pointer is detected (True)
    • small synthetic safetensors header is not flagged
    • large file with the LFS marker as a prefix is short-circuited to False by the size check
    • missing path returns False (no exception)
    • unrelated short JSON file is not flagged

    The helper is purely byte-oriented so tests run without GPU, network, or any model fixtures.

Why not extend to all safetensors.torch.load_file sites at once

There are a handful of other places (loaders/lora_base.py, loaders/unet.py, loaders/textual_inversion.py, hooks/group_offloading.py, utils/state_dict_utils.py) that call safetensors directly without going through load_state_dict. Happy to extend the pre-check to those sites in a follow-up if maintainers prefer; this PR keeps scope to the central loader path that covers the from_pretrained flow most users hit first.

Behavioral change

  • For valid weights: zero impact (one os.path.getsize short-circuits the check immediately).
  • For LFS pointer files: an immediate, clear OSError instead of a delayed, cryptic deserialization error. No previously-loadable file becomes unloadable.

Fixes # (no related issue — surfaced repeatedly when mirroring HF repos to non-LFS-aware buckets)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you read our philosophy doc (important for complex PRs)?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? (no user-facing docs to update; helper is internal)
  • Did you write any new necessary tests? (5 new unit tests in tests/others/test_hub_utils.py)

Who can review?

General functionalities: @sayakpaul @yiyixuxu @DN6

@github-actions github-actions Bot added models tests utils size/M PR with diff < 200 LOC labels Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models size/M PR with diff < 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant