An error occurred when directly deploying the Qwen3-Next 80B-A3B-Thinking model saved by the save_pretrained of transformers using vllm. However, directly deploying the original model was normal

### System Info

    System Info

### Who can help?

(zhihu_0210_bak_0403) root@nb-zhanlun-zl-0914-1-0:/home# python collect_env.py
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : version 3.28.3
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.8.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform              : Linux-5.4.0-42-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.4.99
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : 
GPU 0: NVIDIA A800-SXM4-80GB
GPU 1: NVIDIA A800-SXM4-80GB
GPU 2: NVIDIA A800-SXM4-80GB
GPU 3: NVIDIA A800-SXM4-80GB
GPU 4: NVIDIA A800-SXM4-80GB
GPU 5: NVIDIA A800-SXM4-80GB
GPU 6: NVIDIA A800-SXM4-80GB
GPU 7: NVIDIA A800-SXM4-80GB

<img width="1256" height="301" alt="Image" src="https://github.com/user-attachments/assets/4f90b53b-9296-4be5-a299-22ae9bfd1c79" />



### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

An error occurred when directly deploying the Qwen3-Next 80B-A3B-Thinking model saved by the save_pretrained of transformers using vllm. However, directly deploying the original model was normal

<img width="901" height="206" alt="Image" src="https://github.com/user-attachments/assets/f0b0baab-5f7a-43be-bb6f-96e4323d6570" />

<img width="1691" height="832" alt="Image" src="https://github.com/user-attachments/assets/f77ed66b-226d-4c18-a9c3-cb03d2b86388" />

### Expected behavior

正常部署，实际报错

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurred when directly deploying the Qwen3-Next 80B-A3B-Thinking model saved by the save_pretrained of transformers using vllm. However, directly deploying the original model was normal #40968

System Info

Who can help?

(zhihu_0210_bak_0403) root@nb-zhanlun-zl-0914-1-0:/home# python collect_env.py
Collecting environment information...

==============================
PyTorch Info

==============================
Python Environment

==============================
CUDA / GPU Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

An error occurred when directly deploying the Qwen3-Next 80B-A3B-Thinking model saved by the save_pretrained of transformers using vllm. However, directly deploying the original model was normal #40968

Description

System Info

Who can help?

(zhihu_0210_bak_0403) root@nb-zhanlun-zl-0914-1-0:/home# python collect_env.py Collecting environment information...

============================== PyTorch Info

============================== Python Environment

============================== CUDA / GPU Info

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

(zhihu_0210_bak_0403) root@nb-zhanlun-zl-0914-1-0:/home# python collect_env.py
Collecting environment information...

==============================
PyTorch Info

==============================
Python Environment

==============================
CUDA / GPU Info