Skip to content

Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection#45547

Open
rtrompier wants to merge 3 commits intohuggingface:mainfrom
rtrompier:disable-mmap-with-hf-mount-autodetect
Open

Add disable_mmap kwarg to from_pretrained with hf-mount auto-detection#45547
rtrompier wants to merge 3 commits intohuggingface:mainfrom
rtrompier:disable-mmap-with-hf-mount-autodetect

Conversation

@rtrompier
Copy link
Copy Markdown
Contributor

What

Adds a new disable_mmap kwarg to PreTrainedModel.from_pretrained (and plumbs it through load_state_dict / LoadStateDictConfig / _load_pretrained_model). When enabled, safetensors checkpoints are read fully into memory and parsed via safetensors.torch.load(bytes) instead of being memory-mapped with safe_open.

Why

  • Loading a safetensors model from a FUSE filesystem can deadlock the Python process when mmap + kernel readahead + the parallel state-dict loader trigger many concurrent page-faults. We observed this in production on HF Spaces/Endpoints, which expose Hub repos via a FUSE mount named hf-mount: captured kernel stacks showed Python hanging on page-faults with a saturated FUSE queue.
  • Bypassing mmap (reading the file straight into RAM) side-steps the deadlock with a modest RAM cost.
  • diffusers already ships the exact same kwarg (disable_mmap in diffusers/models/model_loading_utils.py); adding it to transformers makes the two libraries consistent and unblocks users who currently hit TypeError: ...__init__() got an unexpected keyword argument 'disable_mmap' when they try to pass it through.

How

  • New _is_on_hf_mount(path) helper parses /proc/mounts on Linux and returns True if path resolves under a mountpoint whose device string is hf-mount (returns False on non-Linux).
  • load_state_dict(...) gains a disable_mmap: bool | None = None kwarg. When None, it auto-detects via _is_on_hf_mount. When truthy (and map_location != "meta"), the safetensors file is loaded with safetensors.torch.load(open(path, "rb").read()).
  • LoadStateDictConfig gets a matching disable_mmap field; from_pretrained forwards its new kwarg into it; the multi-shard safetensors branch in _load_pretrained_model mirrors the same no-mmap path per file.
  • Auto-detection is a no-op on every platform except Linux with an hf-mount FUSE mount, so there's no behavior change for existing users.

Tests

Adds DisableMmapLoadingTest in tests/utils/test_modeling_utils.py covering:

  • _is_on_hf_mount matches when /proc/mounts lists hf-mount for the path.
  • _is_on_hf_mount does not match for a regular filesystem.
  • _is_on_hf_mount short-circuits to False on non-Linux platforms.
  • load_state_dict(..., disable_mmap=True) returns tensors equal to the mmap path.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds great ty for adding

Comment thread src/transformers/modeling_utils.py Outdated
Comment thread tests/utils/test_modeling_utils.py Outdated
@ArthurZucker
Copy link
Copy Markdown
Collaborator

@bot /style

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Style fix bot fixed some files and pushed the changes.

@rtrompier rtrompier marked this pull request as ready for review April 21, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants