-
Notifications
You must be signed in to change notification settings - Fork 92
Closed
Description
Reproduction Steps
ENV:
uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
command:
vllm serve INC4AI/Qwen3.5-397B-A17B-int4-mixed-AutoRound --port 7777 --host localhost --trust-remote-code --dtype bfloat16 --tensor_parallel_size 4 --max-model-len 4096 --max-num-seqs 64 --gpu-memory-utilization 0.8 --reasoning-parser qwen3 --enable-prefix-caching --language-model-only
use model: https://huggingface.co/INC4AI/Qwen3.5-397B-A17B-int4-mixed-AutoRound
Problem Description
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] Traceback (most recent call last):
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 754, in worker_main
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] worker = WorkerProc(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 580, in __init__
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] self.worker.load_model()
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 324, in load_model
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] self.model_runner.load_model(eep_scale_up=eep_scale_up)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4197, in load_model
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] self.model = model_loader.load_model(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 62, in load_model
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] self.load_weights(model, model_config)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return func(*args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 290, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] loaded_weights = model.load_weights(self.get_all_weights(model_config, model))
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 747, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return original_load_weights(self, weights, *args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 344, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] autoloaded_weights = set(self._load_module("", self.module, weights))
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 292, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] yield from self._load_module(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 265, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] loaded_params = module_load_weights(weights)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 604, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return loader.load_weights(weights)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/reload/torchao_decorator.py", line 50, in patched_model_load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] return original_load_weights(self, weights, *args, **kwargs)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 344, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] autoloaded_weights = set(self._load_module("", self.module, weights))
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 292, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] yield from self._load_module(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 265, in _load_module
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] loaded_params = module_load_weights(weights)
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 465, in load_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] success = self.load_fused_expert_weights(
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 348, in load_fused_expert_weights
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] param = params_dict[name]
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] ~~~~~~~~~~~^^^^^^
[VLLM LOG] (Worker_TP3 pid=51793) ERROR 02-25 02:56:51 [multiproc_executor.py:783] KeyError: 'layers.0.mlp.experts.w2_weight.0.qweight'
Reactions are currently unavailable