-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Open
Labels
Description
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.57.0.dev0- Platform: Linux-6.8.0-79-generic-x86_64-with-glibc2.39
- Python version: 3.12.11
- Huggingface_hub version: 0.35.0
- Safetensors version: 0.6.2
- Accelerate version: 1.10.1
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.8.0+cu128 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA GeForce RTX 4090
Who can help?
@ArthurZucker @Cyrilvallez @gante
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
1、running with test.py given by Qwen official website.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen3-Next-80B-A3B-Instruct"
custom_cache_path = "/mnt/m2_4/models/Qwen3_Next_80B_A3B"
# load the tokenizer and the model from the local directory
tokenizer = AutoTokenizer.from_pretrained(model_name,cache_dir=custom_cache_path)
model = AutoModelForCausalLM.from_pretrained(
model_name,
dtype="auto",
device_map="auto",
cache_dir = custom_cache_path,
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
2、the error log as follows, error when loading "mtp.***" tensor weights.

Expected behavior
output correct answers