是否可以增加联合训练的功能？ #4921

zhengjie-zhou · 2024-07-22T07:23:22Z

Reminder

I have read the README and searched the existing issues.

System Info

no

Reproduction

no

Expected behavior

目前工程中集成了DPO、PPO、KTO、SFT等训练方式，是否可以新增对他们的组合功能，比如$L= \alpha * L_{SFT} + \beta * L_{DPO}$ ,其中$\alpha$和$\beta$属于超参数。

Others

No response

hiyouga · 2024-07-22T08:28:23Z

请见 pref_ftx 参数

zhengjie-zhou · 2024-07-22T08:33:52Z

请见 pref_ftx 参数

pref_ftx: float = field(
    default=0.0,
    metadata={"help": "The supervised fine-tuning loss coefficient in DPO training."},
)
那如果我想联合DPO和KTO进行训练，该如何调整？ @hiyouga

github-actions bot added the pending This problem is yet to be addressed label Jul 22, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jul 22, 2024

hiyouga closed this as completed Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

是否可以增加联合训练的功能？ #4921

是否可以增加联合训练的功能？ #4921

zhengjie-zhou commented Jul 22, 2024

hiyouga commented Jul 22, 2024

zhengjie-zhou commented Jul 22, 2024 •

edited

Loading

是否可以增加联合训练的功能？ #4921

是否可以增加联合训练的功能？ #4921

Comments

zhengjie-zhou commented Jul 22, 2024

Reminder

System Info

Reproduction

Expected behavior

Others

hiyouga commented Jul 22, 2024

zhengjie-zhou commented Jul 22, 2024 • edited Loading

zhengjie-zhou commented Jul 22, 2024 •

edited

Loading