Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround due to memory constraint #1444

Closed
zyzhang1130 opened this issue Mar 19, 2024 · 1 comment
Closed

Workaround due to memory constraint #1444

zyzhang1130 opened this issue Mar 19, 2024 · 1 comment

Comments

@zyzhang1130
Copy link

zyzhang1130 commented Mar 19, 2024

I understand that by default ppo_trainer makes a copy of the initial model in order to implement the KL term in the RLHF objective. However, this is very memory inefficient on GPUs. Is there any workaround for this issue?

Edit: one potential workaround I can think of is to use the same model but train two different adaptors for model and reference model through PEFT. I wonder if there is a standard way of doing it?

@younesbelkada
Copy link
Contributor

Hi @zyzhang1130
Thanks for the issue !
Indeed currently the best way to do so is to use PEFT adapters, please see: https://huggingface.co/blog/trl-peft for more details (now we do support 4bit models)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants