Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fintuning 176B Bloom with lora #43

Open
drxmy opened this issue Dec 27, 2022 · 7 comments
Open

Fintuning 176B Bloom with lora #43

drxmy opened this issue Dec 27, 2022 · 7 comments

Comments

@drxmy
Copy link

drxmy commented Dec 27, 2022

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

@edwardjhu
Copy link
Collaborator

Hi! We had a proprietary setup. Are you using Adam and have you made sure to not pass the non-trainable parameters to the optimizer?

@drxmy
Copy link
Author

drxmy commented Jan 3, 2023

I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora.

@aegisgpt
Copy link

aegisgpt commented Mar 7, 2023

The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?

In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.

Hello, can I check with you how to use Iora to finetune Bloom-3B? I encountered the issue of Bloom-3B having no v_proj and q_proj in the base model. Thanks a lot!

@zsc
Copy link

zsc commented Mar 20, 2023

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

@aegisgpt
Copy link

@aegisgpt

having no v_proj and q_proj in the base model

By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to query_key_value for bloom models. Let me know if that solves your problem.

Hey @zsc , many thanks! I tried it and it worked! Do you mind sharing where I can find more detailed documentations for LoRA online, especially with regards to configurations for various types of GPTs?

@zsc
Copy link

zsc commented Mar 21, 2023

@aegisgpt
Copy link

This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py

Thank you! That helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants