You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that by default ppo_trainer makes a copy of the initial model in order to implement the KL term in the RLHF objective. However, this is very memory inefficient on GPUs. Is there any workaround for this issue?
Edit: one potential workaround I can think of is to use the same model but train two different adaptors for model and reference model through PEFT. I wonder if there is a standard way of doing it?
The text was updated successfully, but these errors were encountered:
Hi @zyzhang1130
Thanks for the issue !
Indeed currently the best way to do so is to use PEFT adapters, please see: https://huggingface.co/blog/trl-peft for more details (now we do support 4bit models)
I understand that by default
ppo_trainer
makes a copy of the initial model in order to implement the KL term in the RLHF objective. However, this is very memory inefficient on GPUs. Is there any workaround for this issue?Edit: one potential workaround I can think of is to use the same model but train two different adaptors for model and reference model through PEFT. I wonder if there is a standard way of doing it?
The text was updated successfully, but these errors were encountered: