fix: VLLM_SKIP_PROFILE_RUN patch for Lunar Lake iGPU profile_run() hang#340
Open
MegaStood wants to merge 1 commit intointel:mainfrom
Open
fix: VLLM_SKIP_PROFILE_RUN patch for Lunar Lake iGPU profile_run() hang#340MegaStood wants to merge 1 commit intointel:mainfrom
MegaStood wants to merge 1 commit intointel:mainfrom
Conversation
16aa9cc to
e874953
Compare
…Lake iGPU vLLM's XPU worker runs a dummy forward pass (profile_run()) during startup to measure peak GPU memory for KV cache sizing. On Lunar Lake's Xe2 iGPU, this forward pass hangs indefinitely for MoE models (gpt-oss-20b, GLM-4.7). This patch adds VLLM_SKIP_PROFILE_RUN=1 environment variable support to _determine_available_memory_default() in xpu_worker.py. When set: - Skips profile_run() entirely - Estimates peak memory as memory_allocated() * 1.2 - Prints memory profiling analysis for debugging Tested on: MSI Claw 8 AI+ (Core Ultra 7 258V, Arc 140V, 32GB LPDDR5x) Models verified: gpt-oss-20b (MXFP4), Qwen3.5-4B, Qwen3-8B Related: vllm-project/vllm#30359
Author
|
please check and review. |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
vLLM's XPU worker calls
profile_run()during startup — a dummy forward passto measure peak GPU memory for KV cache sizing. On Lunar Lake Xe2 iGPU (Arc 140V),
this hangs indefinitely for MoE models (gpt-oss-20b, GLM-4.7-flash), blocking
server startup entirely.
Related upstream issue: vllm-project/vllm#30359
Fix
Adds a
vllm_xpu_worker_skip_profile.patchforvllm/v1/worker/xpu_worker.pythatintroduces
VLLM_SKIP_PROFILE_RUN=1environment variable support:profile_run()entirely when setmemory_allocated() × 1.2(conservative)Also updates
lunar_lake_serve.shto set the env var automatically.Impact
than optimal), but server starts and runs correctly
Tested On
matches actual peak closely