Workaround due to memory constraint #1444

zyzhang1130 · 2024-03-19T08:24:00Z

I understand that by default ppo_trainer makes a copy of the initial model in order to implement the KL term in the RLHF objective. However, this is very memory inefficient on GPUs. Is there any workaround for this issue?

Edit: one potential workaround I can think of is to use the same model but train two different adaptors for model and reference model through PEFT. I wonder if there is a standard way of doing it?

The text was updated successfully, but these errors were encountered:

younesbelkada · 2024-04-08T11:44:30Z

Hi @zyzhang1130
Thanks for the issue !
Indeed currently the best way to do so is to use PEFT adapters, please see: https://huggingface.co/blog/trl-peft for more details (now we do support 4bit models)

zyzhang1130 closed this as completed Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround due to memory constraint #1444

Workaround due to memory constraint #1444

zyzhang1130 commented Mar 19, 2024 •

edited

Loading

younesbelkada commented Apr 8, 2024

Workaround due to memory constraint #1444

Workaround due to memory constraint #1444

Comments

zyzhang1130 commented Mar 19, 2024 • edited Loading

younesbelkada commented Apr 8, 2024

zyzhang1130 commented Mar 19, 2024 •

edited

Loading