Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How did you determine the size of the TinyLlama model? #33

Closed
dtxwhzw opened this issue Sep 19, 2023 · 2 comments
Closed

How did you determine the size of the TinyLlama model? #33

dtxwhzw opened this issue Sep 19, 2023 · 2 comments

Comments

@dtxwhzw
Copy link

dtxwhzw commented Sep 19, 2023

Were there any trade-offs or considerations you made when deciding on the model's size? Or What criteria did you use to select the specific number of layers, attention heads and Embedding Size etc. in your model?

@jzhang38
Copy link
Owner

I start with Pythia-1.4B/GPT-3 XL's model
config.
image

I reduce the head dim from 128 to 64. (I believe this is a safe step because: 1. Falcon does that. https://huggingface.co/tiiuae/falcon-7b . 2. If you look at the GPT-3 family's shape, the GPT3-2.7B used an even smaller head dim compared with GPT-3 XL.
image

We use GQA and set the number of query groups to 4.
In order to fit the model with 16K per-GPU-batch size, we reduced the number of layers from 24 to 22. Larger batch size leads to higher MFU /throughput. Because at the end of the day, your x-axis is the money you used to train a model, and your y-axis is the model's performance/the impact you made.

@leng-yue
Copy link

leng-yue commented Jan 3, 2024

Thanks for your clarification. Is there any comparison between different number of query groups for 1b models like inference speed and accuracy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants