feat(loading): detect Git LFS pointer files and raise a clear error#13559
Open
NazarKozak wants to merge 1 commit intohuggingface:mainfrom
Open
feat(loading): detect Git LFS pointer files and raise a clear error#13559NazarKozak wants to merge 1 commit intohuggingface:mainfrom
NazarKozak wants to merge 1 commit intohuggingface:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Detect Git LFS pointer stubs in
load_state_dictbefore attempting to deserialize them, and raise a clear error pointing at the actual fix instead of failing with a cryptic safetensors / pickle deserialization error far away from the real cause.Why
When a Hugging Face repository is mirrored without LFS-aware copying — for example,
git clonewithout a subsequentgit lfs pull, orgsutil rsync/aws s3 syncfrom a bucket that holds the original git checkout — the resulting local directory contains ~130-byte text pointer stubs in place of the real.safetensorsweights:from_pretrained(<that-local-dir>)then fails deep inside safetensors with a confusing tensor-shape / format error, often after several minutes of partial loads, with no hint that the problem is a missing LFS pull. The existing fallback inload_state_dictdoes try to detect this case, but only after the failed load and only for git-style cloning (its message tells users to rungit lfs install+git lfs pull, which is wrong advice for agsutil rsyncmirror).Changes
New helper
_is_lfs_pointer(path)insrc/diffusers/utils/hub_utils.py.Reads at most 64 bytes, after a single
os.path.getsizeshort-circuit (real weight files are MB–GB; LFS pointer files are ~130 bytes). ReturnsFalsefor any kind of I/O error so it never gets in the way of a real load.Preemptive check at the top of
load_state_dictinsrc/diffusers/models/model_loading_utils.py. If the resolved checkpoint file is an LFS pointer, raiseOSErrorwith a message that:git clonewithoutgit lfs pull, and bucket-mirror tools likegsutil rsync/aws s3 sync),huggingface-cli download <repo_id> --local-dir <dir>).The existing post-failure detection in the
exceptblock is kept as a safety net for any path that bypasses the new pre-check.Unit tests in
tests/others/test_hub_utils.pycovering five cases:True)Falseby the size checkFalse(no exception)The helper is purely byte-oriented so tests run without GPU, network, or any model fixtures.
Why not extend to all
safetensors.torch.load_filesites at onceThere are a handful of other places (
loaders/lora_base.py,loaders/unet.py,loaders/textual_inversion.py,hooks/group_offloading.py,utils/state_dict_utils.py) that call safetensors directly without going throughload_state_dict. Happy to extend the pre-check to those sites in a follow-up if maintainers prefer; this PR keeps scope to the central loader path that covers thefrom_pretrainedflow most users hit first.Behavioral change
os.path.getsizeshort-circuits the check immediately).OSErrorinstead of a delayed, cryptic deserialization error. No previously-loadable file becomes unloadable.Fixes # (no related issue — surfaced repeatedly when mirroring HF repos to non-LFS-aware buckets)
Before submitting
tests/others/test_hub_utils.py)Who can review?
General functionalities: @sayakpaul @yiyixuxu @DN6