After expanding the vocabulary of the Qwen3-8B model,he predictions still contained think content, and the generation would truncate at the newly added tokens. #10332
Unanswered
J123ingjing
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
在使用tokenizer.add_special_tokens({"additional_special_tokens": all_new_tokens})qwen3-8B扩充词汇表之后,在sft lora微调+embed_tokens 和 lm_head ,使用template:qwen_nothink 微调之后eval 发现predict 依旧存在think内容,并且think会中断在新添加的词汇处,设置skip_special_tokens:false,发现中断之后一直输出<end_of_text> 通过查看train/sft/workflow.py 发现additional_special_tokens_id 加入到eos_token_id 删除之后,重新sft 模型之后,进行inference/chat发现无法分析新添加的特殊词汇,不会中断,但是在胡乱think,对于新加入词汇think不会正常输出,而是在相应位置输出一些其他词汇,请问是哪一步出现了错误
Beta Was this translation helpful? Give feedback.
All reactions