How to perform full parameter finetuning without A100 GPUs

Hi, thank you for your great work!  I'd like to reproduce full parameter fine-tuning of dpo training. However I only have 10 * Nvidia A40 GPUs  (46 Gbs memory each).

I tried the command

`CUDA_VISIBLE_DEVICES=2,3,4,5,6,7,8,9 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --main_process_port 6000 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml`

and it reported OOM error, even if I set batch size to 1.

I don't mind the program runs a bit slower (e.g., use smaller batchsize and more gradient accumulation steps). However, I don't know if there is a way to successfully deploy the full-dpo code. 

Can you help me, please?


Also, I'm wondering how large is the performance gap between lora and full parameter finetunning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to perform full parameter finetuning without A100 GPUs #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to perform full parameter finetuning without A100 GPUs #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions