Fintuning 176B Bloom with lora #43

drxmy · 2022-12-27T05:55:15Z

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

edwardjhu · 2022-12-27T11:48:12Z

Hi! We had a proprietary setup. Are you using Adam and have you made sure to not pass the non-trainable parameters to the optimizer?

drxmy · 2023-01-03T07:10:50Z

I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora.

aegisgpt · 2023-03-07T09:11:23Z

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

Hello, can I check with you how to use Iora to finetune Bloom-3B? I encountered the issue of Bloom-3B having no v_proj and q_proj in the base model. Thanks a lot!

zsc · 2023-03-20T04:03:18Z

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

aegisgpt · 2023-03-21T08:07:26Z

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

Hey @zsc , many thanks! I tried it and it worked! Do you mind sharing where I can find more detailed documentations for LoRA online, especially with regards to configurations for various types of GPTs?

zsc · 2023-03-21T09:34:49Z

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

aegisgpt · 2023-03-21T11:49:21Z

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

Thank you! That helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fintuning 176B Bloom with lora #43

Fintuning 176B Bloom with lora #43

drxmy commented Dec 27, 2022

edwardjhu commented Dec 27, 2022

drxmy commented Jan 3, 2023

aegisgpt commented Mar 7, 2023

zsc commented Mar 20, 2023

aegisgpt commented Mar 21, 2023

zsc commented Mar 21, 2023

aegisgpt commented Mar 21, 2023

Fintuning 176B Bloom with lora #43

Fintuning 176B Bloom with lora #43

Comments

drxmy commented Dec 27, 2022

edwardjhu commented Dec 27, 2022

drxmy commented Jan 3, 2023

aegisgpt commented Mar 7, 2023

zsc commented Mar 20, 2023

aegisgpt commented Mar 21, 2023

zsc commented Mar 21, 2023

aegisgpt commented Mar 21, 2023