-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few questions on training #21
Comments
I am currently doing a distributed training run and will be open-sourcing all of the weights for a PaLM-rlhf-pytorch model. I will open a PR when training finishes. To answer these questions:
I will update as training progresses. Best, Enrico |
Hi Enrico, Thanks, |
TPUs are already pretty highly efficient at scale. You can benchmark Flash Attention vs Jax if you are interested in the exact speed tests. |
Hi, I've been planning to train this model, I have a tpu pod(v3-128) through trc, which should equate to ~ 5 tb of ram and 2 tb of vram, I had a few questions about how to begin training the model.
Thanks for all of your implementations, they have been really helpful to learn from
The text was updated successfully, but these errors were encountered: