-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPO 方式全量训练7B模型需要资源 #798
Comments
内存需要加倍 |
dpo需要加载两次模型,所以显存内存开销要翻倍。用之前的资源只能lora训,因为lora自带base模型冻结,所以可以只加载一次模型。目前还没有针对dpo不加载两次模型的实现,不过作者说是可能的:https://github.com/eric-mitchell/direct-preference-optimization/issues/29。 |
我在repo内没有看到新的相关PR, 请问近期其他的工作中有训练时只加载一次的代码实现吗? |
Trl有一个precompute_ref_log_probs参数可以看下:https://github.com/huggingface/trl/blob/main/trl/trainer/dpo_trainer.py#L123 |
请问DPO 方式全量训练7B模型需要多少资源,8卡A800, 400G内存报错,应该是内存爆掉了
The text was updated successfully, but these errors were encountered: