Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to QLoRA training with ZeRO-3 on two or more GPUs? #42

Open
Di-Zayn opened this issue Nov 20, 2023 · 4 comments
Open

How to QLoRA training with ZeRO-3 on two or more GPUs? #42

Di-Zayn opened this issue Nov 20, 2023 · 4 comments

Comments

@Di-Zayn
Copy link

Di-Zayn commented Nov 20, 2023

I added a 4-bit load after the command LoRA training with ZeRO-3 on two or more GPUs to achieve a mix of QLoRA and ZeRO-3. But the program encountered the following error:
RuntimeError: expected there to be only one unique element in <generator object Init._convert_to_deepspeed_param..all_gather_coalesced.. at 0x7f2ec8daf900>
The command is:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes=2 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true

@alvarobartt
Copy link
Member

Hi @Di-Zayn, note that you will need to also modify the configuration used for DeepSpeed ZeRO 3, as the one they share is the one is suited for a VM with 8 x A100 80GB, so to suit your needs you may need to add the flags required to load and train using a lower precision.

Anyway not sure about how to fine-tune that using NF4, but maybe https://www.deepspeed.ai/tutorials/MoQ-tutorial/#deepspeed-configuration-file is worth checking?

@laphang
Copy link

laphang commented May 9, 2024

I'm getting this issue as well (trying qlora with ZeRO-3 and 4 gpus, same error message), @Di-Zayn were you able to solve it?

@Serega6678
Copy link
Contributor

I had similar problems and I decided to use the multi_gpu script and set the param to just use 2 GPUs and everything was working fine: https://github.com/huggingface/alignment-handbook/blob/main/recipes/accelerate_configs/multi_gpu.yaml

However, on the Zero code, the starting loss was like 1.7 instead of 1.4 with the multi-gpu script both when using 1 or 2 GPUs

I never bothered further experimenting with Zero as I got the results I needed with multi_gpu script

@laphang
Copy link

laphang commented May 17, 2024

I was keen on sharding the model across gpus in order to be able to allow for larger models.

As an aside, the latest FSDP and qlora examples are working for me - that works for my use case
606d2e9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants