-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fintuning 176B Bloom with lora #43
Comments
Hi! We had a proprietary setup. Are you using Adam and have you made sure to not pass the non-trainable parameters to the optimizer? |
I used Adamw with tranformers's trainer class(hugging face). It printed a trainable parameter count. The number was much smaller with Lora. |
Hello, can I check with you how to use Iora to finetune Bloom-3B? I encountered the issue of Bloom-3B having no v_proj and q_proj in the base model. Thanks a lot! |
By https://huggingface.co/smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM/blob/main/adapter_config.json , need to change to |
Hey @zsc , many thanks! I tried it and it worked! Do you mind sharing where I can find more detailed documentations for LoRA online, especially with regards to configurations for various types of GPTs? |
This may be useful: https://github.com/huggingface/peft/blob/main/src/peft/mapping.py |
Thank you! That helps! |
The paper says that it only need 350G VRAM to train 175B GPT3 with rank =4. Can you elaborate more about how this is done? Like, do you use Megraton-deepspeed?
In my experiment with bloom-3b, fintuning all parameters need 29G. After using lora with different experiment set, trainable parameters differ form 10M to 0.8M. But they all need around 20G VRAM. I find this a little bit weird.
The text was updated successfully, but these errors were encountered: