-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimPO如何设置beta参数 #4414
Labels
solved
This problem has been already solved
Comments
主要是想解决在进行simpo训练的过程中'rewards/chosen','rewards/rejected' 这俩都为负 而且都是在变小 |
|
@hiyouga pref_beta 值设置可以超过1吗? SimPO论文使用2 和2.5 |
可以 |
感谢! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reminder
System Info
目前看到命令行参数里面只能设置gamma
Reproduction
llamafactory-cli train qwen_full_simpo.yaml
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: