How to QLoRA training with ZeRO-3 on two or more GPUs? #42

Di-Zayn · 2023-11-20T14:13:36Z

I added a 4-bit load after the command LoRA training with ZeRO-3 on two or more GPUs to achieve a mix of QLoRA and ZeRO-3. But the program encountered the following error:
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f2ec8daf900>
The command is:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes=2 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true

alvarobartt · 2023-11-20T15:47:04Z

Hi @Di-Zayn, note that you will need to also modify the configuration used for DeepSpeed ZeRO 3, as the one they share is the one is suited for a VM with 8 x A100 80GB, so to suit your needs you may need to add the flags required to load and train using a lower precision.

Anyway not sure about how to fine-tune that using NF4, but maybe https://www.deepspeed.ai/tutorials/MoQ-tutorial/#deepspeed-configuration-file is worth checking?

laphang · 2024-05-09T06:30:31Z

I'm getting this issue as well (trying qlora with ZeRO-3 and 4 gpus, same error message), @Di-Zayn were you able to solve it?

Serega6678 · 2024-05-10T06:34:32Z

I had similar problems and I decided to use the multi_gpu script and set the param to just use 2 GPUs and everything was working fine: https://github.com/huggingface/alignment-handbook/blob/main/recipes/accelerate_configs/multi_gpu.yaml

However, on the Zero code, the starting loss was like 1.7 instead of 1.4 with the multi-gpu script both when using 1 or 2 GPUs

I never bothered further experimenting with Zero as I got the results I needed with multi_gpu script

laphang · 2024-05-17T00:27:27Z

I was keen on sharding the model across gpus in order to be able to allow for larger models.

As an aside, the latest FSDP and qlora examples are working for me - that works for my use case
606d2e9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to QLoRA training with ZeRO-3 on two or more GPUs? #42

How to QLoRA training with ZeRO-3 on two or more GPUs? #42

Di-Zayn commented Nov 20, 2023

alvarobartt commented Nov 20, 2023

laphang commented May 9, 2024

Serega6678 commented May 10, 2024

laphang commented May 17, 2024

How to QLoRA training with ZeRO-3 on two or more GPUs? #42

How to QLoRA training with ZeRO-3 on two or more GPUs? #42

Comments

Di-Zayn commented Nov 20, 2023

alvarobartt commented Nov 20, 2023

laphang commented May 9, 2024

Serega6678 commented May 10, 2024

laphang commented May 17, 2024