-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What the recommended GPU setup for fine-tuning ? #23
Comments
We were able to train the 7b 64k model on an 8x A100 node -- all other models unfortunately require a multinode setup. We used 64 GPUs, but I expect 16 would suffice for all other models (7b 128k, 13b 64k, 13b 128k) |
Thanks a lot. To confirm, A100 is with 40G or 80G memory for 7b 64k fine-tuning? |
It is 8x80GB for 64k context size |
Could you please clarify if this discussion is around full parameter tuning or lora based? @bloc97 |
I ran finetune.py using 2x A100 GPUs, and both GPUs loaded up to 14g/80g. After processing the first batch, the memory usage went up to 77g/80g, and then it ran OOM when starting the second batch. |
Can this configuration train with a total batch size of 64 (batch_size=1, num_processes=8, gradient_accumulate_every=8)? @bloc97 |
Yes, and if you enable more modern attention partitioning schemes like RingAttention you can even do longer context. |
ok, thank you! |
I run into OOM error with default setup on 8*A100 with train.sh script, could you please share the GPU requirements for fine-tuning ?
The text was updated successfully, but these errors were encountered: