System Info
Who can help?
(zhihu_0210_bak_0403) root@nb-zhanlun-zl-0914-1-0:/home# python collect_env.py
Collecting environment information...
==============================
OS : Ubuntu 22.04.4 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version : Could not collect
CMake version : version 3.28.3
Libc version : glibc-2.35
==============================
PyTorch Info
PyTorch version : 2.8.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
Python version : 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform : Linux-5.4.0-42-generic-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
Is CUDA available : True
CUDA runtime version : 12.4.99
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB
Information
Tasks
Reproduction
An error occurred when directly deploying the Qwen3-Next 80B-A3B-Thinking model saved by the save_pretrained of transformers using vllm. However, directly deploying the original model was normal
Expected behavior
正常部署,实际报错
System Info
Who can help?
(zhihu_0210_bak_0403) root@nb-zhanlun-zl-0914-1-0:/home# python collect_env.py
Collecting environment information...
==============================
OS : Ubuntu 22.04.4 LTS (x86_64)
GCC version : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version : Could not collect
CMake version : version 3.28.3
Libc version : glibc-2.35
==============================
PyTorch Info
PyTorch version : 2.8.0+cu128
Is debug build : False
CUDA used to build PyTorch : 12.8
ROCM used to build PyTorch : N/A
==============================
Python Environment
Python version : 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform : Linux-5.4.0-42-generic-x86_64-with-glibc2.35
==============================
CUDA / GPU Info
Is CUDA available : True
CUDA runtime version : 12.4.99
CUDA_MODULE_LOADING set to : LAZY
GPU models and configuration :
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
An error occurred when directly deploying the Qwen3-Next 80B-A3B-Thinking model saved by the save_pretrained of transformers using vllm. However, directly deploying the original model was normal
Expected behavior
正常部署,实际报错