How did you determine the size of the TinyLlama model? #33

dtxwhzw · 2023-09-19T12:37:37Z

Were there any trade-offs or considerations you made when deciding on the model's size? Or What criteria did you use to select the specific number of layers, attention heads and Embedding Size etc. in your model?

jzhang38 · 2023-09-20T03:17:06Z

I start with Pythia-1.4B/GPT-3 XL's model
config.

I reduce the head dim from 128 to 64. (I believe this is a safe step because: 1. Falcon does that. https://huggingface.co/tiiuae/falcon-7b . 2. If you look at the GPT-3 family's shape, the GPT3-2.7B used an even smaller head dim compared with GPT-3 XL.

We use GQA and set the number of query groups to 4.
In order to fit the model with 16K per-GPU-batch size, we reduced the number of layers from 24 to 22. Larger batch size leads to higher MFU /throughput. Because at the end of the day, your x-axis is the money you used to train a model, and your y-axis is the model's performance/the impact you made.

leng-yue · 2024-01-03T00:10:11Z

Thanks for your clarification. Is there any comparison between different number of query groups for 1b models like inference speed and accuracy?

jzhang38 closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How did you determine the size of the TinyLlama model? #33

How did you determine the size of the TinyLlama model? #33

dtxwhzw commented Sep 19, 2023

jzhang38 commented Sep 20, 2023

leng-yue commented Jan 3, 2024

How did you determine the size of the TinyLlama model? #33

How did you determine the size of the TinyLlama model? #33

Comments

dtxwhzw commented Sep 19, 2023

jzhang38 commented Sep 20, 2023

leng-yue commented Jan 3, 2024