Prefetch mmap'd weight blobs to eliminate page fault bottleneck by pytorchbot · Pull Request #18356 · pytorch/executorch

pytorchbot · 2026-03-20T01:13:38Z

Weight loading via update_constants_from_blob was achieving only
0.3-0.4 GB/s (vs 8 GB/s hardware capability) because memcpy from
mmap'd pages triggers synchronous page faults — each 16K page traps
into the kernel for NVMe I/O.

Call madvise(MADV_WILLNEED) on the weights blob
early in Metal backend init, before writing/dlopen'ing the .so file.
The kernel prefaults pages asynchronously during the ~200ms of other
init work. By the time memcpy runs, pages are already resident and
throughput reaches 5-8 GB/s.

Metal init time: ~25s -> ~9s (2.7x faster) on int4 Voxtral model.

Weight loading via update_constants_from_blob was achieving only 0.3-0.4 GB/s (vs 8 GB/s hardware capability) because memcpy from mmap'd pages triggers synchronous page faults — each 16K page traps into the kernel for NVMe I/O. Call madvise(MADV_WILLNEED) on the weights blob early in Metal backend init, before writing/dlopen'ing the .so file. The kernel prefaults pages asynchronously during the ~200ms of other init work. By the time memcpy runs, pages are already resident and throughput reaches 5-8 GB/s. Metal init time: ~25s -> ~9s (2.7x faster) on int4 Voxtral model. (cherry picked from commit b7ca1a4)

pytorch-bot · 2026-03-20T01:13:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18356

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 122 Pending

As of commit e3f3328 with merge base 8c0a60b ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorchbot requested review from cccclai and shoumikhin as code owners March 20, 2026 01:13

pytorchbot mentioned this pull request Mar 20, 2026

[v1.2.0] Release Schedule and Tracker #17016

Closed

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2026

pytorchbot mentioned this pull request Mar 20, 2026

Prefetch mmap'd weight blobs to eliminate page fault bottleneck #18236

Merged

manuelcandales approved these changes Mar 20, 2026

View reviewed changes

manuelcandales merged commit 2bba833 into release/1.2 Mar 20, 2026
201 of 204 checks passed

manuelcandales deleted the cherry-pick-18236-by-pytorch_bot_bot_ branch March 20, 2026 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefetch mmap'd weight blobs to eliminate page fault bottleneck#18356

Prefetch mmap'd weight blobs to eliminate page fault bottleneck#18356
manuelcandales merged 1 commit into
release/1.2from
cherry-pick-18236-by-pytorch_bot_bot_

pytorchbot commented Mar 20, 2026

Uh oh!

pytorch-bot Bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pytorchbot commented Mar 20, 2026

Uh oh!

pytorch-bot Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18356

⏳ No Failures, 122 Pending

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot Bot commented Mar 20, 2026 •

edited

Loading