We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no
目前工程中集成了DPO、PPO、KTO、SFT等训练方式,是否可以新增对他们的组合功能,比如$L= \alpha * L_{SFT} + \beta * L_{DPO}$ ,其中$\alpha$和$\beta$属于超参数。
No response
The text was updated successfully, but these errors were encountered:
请见 pref_ftx 参数
Sorry, something went wrong.
pref_ftx: float = field( default=0.0, metadata={"help": "The supervised fine-tuning loss coefficient in DPO training."}, ) 那如果我想联合DPO和KTO进行训练,该如何调整? @hiyouga
No branches or pull requests
Reminder
System Info
no
Reproduction
no
Expected behavior
目前工程中集成了DPO、PPO、KTO、SFT等训练方式,是否可以新增对他们的组合功能,比如$L= \alpha * L_{SFT} + \beta * L_{DPO}$ ,其中$\alpha$和$\beta$属于超参数。
Others
No response
The text was updated successfully, but these errors were encountered: