- 
                Notifications
    
You must be signed in to change notification settings  - Fork 463
 
Description
Hi, thank you for your great work! I'd like to reproduce full parameter fine-tuning of dpo training. However I only have 10 * Nvidia A40 GPUs (46 Gbs memory each).
I tried the command
CUDA_VISIBLE_DEVICES=2,3,4,5,6,7,8,9 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --main_process_port 6000 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml
and it reported OOM error, even if I set batch size to 1.
I don't mind the program runs a bit slower (e.g., use smaller batchsize and more gradient accumulation steps). However, I don't know if there is a way to successfully deploy the full-dpo code.
Can you help me, please?
Also, I'm wondering how large is the performance gap between lora and full parameter finetunning.