Skip to content

How to perform full parameter finetuning without A100 GPUs #22

@ChenDRAG

Description

@ChenDRAG

Hi, thank you for your great work! I'd like to reproduce full parameter fine-tuning of dpo training. However I only have 10 * Nvidia A40 GPUs (46 Gbs memory each).

I tried the command

CUDA_VISIBLE_DEVICES=2,3,4,5,6,7,8,9 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --main_process_port 6000 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml

and it reported OOM error, even if I set batch size to 1.

I don't mind the program runs a bit slower (e.g., use smaller batchsize and more gradient accumulation steps). However, I don't know if there is a way to successfully deploy the full-dpo code.

Can you help me, please?

Also, I'm wondering how large is the performance gap between lora and full parameter finetunning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions