After expanding the vocabulary of the Qwen3-8B model，he predictions still contained think content, and the generation would truncate at the newly added tokens. #10332

J123ingjing · 2026-03-31T03:49:29Z

J123ingjing
Mar 31, 2026

在使用tokenizer.add_special_tokens({"additional_special_tokens": all_new_tokens})qwen3-8B扩充词汇表之后，在sft lora微调+embed_tokens 和 lm_head ，使用template：qwen_nothink 微调之后eval 发现predict 依旧存在think内容，并且think会中断在新添加的词汇处，设置skip_special_tokens：false，发现中断之后一直输出<end_of_text> 通过查看train/sft/workflow.py 发现additional_special_tokens_id 加入到eos_token_id 删除之后，重新sft 模型之后，进行inference/chat发现无法分析新添加的特殊词汇，不会中断，但是在胡乱think，对于新加入词汇think不会正常输出，而是在相应位置输出一些其他词汇，请问是哪一步出现了错误

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After expanding the vocabulary of the Qwen3-8B model，he predictions still contained think content, and the generation would truncate at the newly added tokens. #10332

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

After expanding the vocabulary of the Qwen3-8B model，he predictions still contained think content, and the generation would truncate at the newly added tokens. #10332

Uh oh!

J123ingjing Mar 31, 2026

Replies: 0 comments

J123ingjing
Mar 31, 2026