Add MmapUseMlockIgnoreErrors as default load mode when creating creat…#17613
Add MmapUseMlockIgnoreErrors as default load mode when creating creat…#17613
Conversation
…e_text_llm_runner Users can still override the default mode
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17613
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New Failures, 1 Unrelated FailureAs of commit 496a321 with merge base 9a58ce8 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR adds a new optional parameter load_mode to the create_text_llm_runner factory functions, with a default value of Module::LoadMode::MmapUseMlockIgnoreErrors (previously, the load mode was hardcoded to File). This change allows users to customize the memory loading strategy while providing a more efficient default for LLM workloads.
Changes:
- Added
load_modeparameter to both overloads ofcreate_text_llm_runnerwith default valueMmapUseMlockIgnoreErrors - Updated function implementation to use the new parameter when constructing
Moduleinstances - Removed hardcoded
Module::LoadMode::Fileusage in module construction
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| extension/llm/runner/llm_runner_helper.h | Added load_mode parameter with default value to both create_text_llm_runner function declarations |
| extension/llm/runner/llm_runner_helper.cpp | Implemented parameter propagation and replaced hardcoded LoadMode::File with configurable load_mode parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| float temperature = -1.0f, | ||
| const std::string& method_name = "forward"); | ||
| const std::string& method_name = "forward", | ||
| Module::LoadMode load_mode = Module::LoadMode::MmapUseMlockIgnoreErrors); |
There was a problem hiding this comment.
Why ignore errors variant over MmapUseMlock?
There was a problem hiding this comment.
The codebase only uses MmapUseMlockIgnoreErrors over MmapUseMlock for LLM runners. I feel that is the right default.
- MmapUseMlock: mlock failure → logs error, unmaps the pages, returns Error::NotSupported.
The model fails to load entirely. (Stricter!) - MmapUseMlockIgnoreErrors: mlock failure → logs at Debug level and continues. The model
loads via normal mmap, just without pages pinned in RAM.
For LLM runners, a hard failure is almost never the right behavior. Failing to load the model at all will be
bad UX / functionality experience for the End User.
By using mmap-based loading in our LLM runners, we avoid loading the entire model into RAM
upfront, which reduces peak memory usage and OOM risk. The MmapUseMlockIgnoreErrors variant
additionally attempts to pin pages in memory for better inference latency, but gracefully
falls back to standard mmap if the system can't support it, giving us the best of both
worlds without hard failures.
There was a problem hiding this comment.
Realistically, I think that we're unlikely to actually be able to lock the entire LLM PTE on most systems, so using the base mmap might be better? The mlock ignore errors variant seems functionally fine, though. It'll just fall through to mmap pretty much 100% of the time.
Added load_mode parameter to create_text_llm_runner function for specifying loading strategy of the model file.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Updated documentation for load_mode parameter to clarify its default behavior.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e_text_llm_runner
Users can still override the default mode