performance benchmark #17

touhi99 · 2023-05-08T09:11:57Z

Hi,

are there possibility to add a performance benchmark of the open-sourced LLMs?

LudwigStumpp · 2023-05-08T17:08:52Z

Hi,

do you mean adding the performance results directly to the table?

For now, we have listed some external resources under the Evals on open LLMs section, which cover the performances of the models on various baselines. Do you think this is enough?

touhi99 · 2023-05-10T13:04:40Z

Hi,

do you mean adding the performance results directly to the table?

For now, we have listed some external resources under the Evals on open LLMs section, which cover the performances of the models on various baselines. Do you think this is enough?

Yes, that would give some rough idea which model to choose among many. I will have a look at the Evals, thanks.

Beside performance, personally, also GPU/tech requirement would also be interesting benchmark to estimate solution. If I propose, an LLM-based solution, what's the min. tech requirement would be for training/fine-tuning/inference. so far many models are coming in and out, but I haven't found any certain data. For example, X model need atleast Y gb gpu ram for inference.

LudwigStumpp · 2023-05-10T14:00:01Z

@touhi99 Great points you are mentioning, thanks for that!

Here are some remarks from my side to further keep the discussion going and find a suitable spot to add your requested information:

Adding evals results right inside the table

there are many different benchmarks which would require us to add many more additional columns to the table
furthermore, one row inside the table currently features all available model variations. Each of them has a different performance on the eval benchmarks. This means that adding evals into the table would require us to split these rows apart. I currently think that this is not a direction we want to go but I will keep this at the back of my head and discuss with @eugeneyan and @Muhtasham
for identifying good performing models, I created the LLM-Leaderboard, which covers both open and closed models

GPU memory requirements

Roughly speaking, the memory requirements to load a model depend on two things:
- the number of parameters
- the precision used (float32, float16, bfloat16, int8) for these parameters
while the number of parameters are fixed, the precision you use to load the model is generally not
you can easily calculate the memory requirements. For simplicity, assume 1 Giga (G) ~= 1 Billion (B), example:
- 7B model
- multiply with:
  - *1 (e.g for int8) = 7 GB
  - *2 (e.g for float16) = 14 GB
  - *4 (e.g for float32) = 28 GB
- and you get the rough estimate for the GPU memory requirements

EDIT: This is a little naive, as one also needs to account for

gradients for backprop (~ assume to be of the same size as the model params)
first and second order momentum terms of ADAM optimizer (~ assume 2 times the size of the model params)
feature maps in the forward pass (depends on the architecture, ignore for now)
batch size (effects gradient storage and feature map storage, but ignore for now)

Taking above into account, we can get a very naive estimate for fine-tuning with:
MODEL_SIZE [Billion] * PRECISION [Bytes] * 4 (model weights + gradients + ADAM)

So for our 7B model above:

float32: 7 * 4 * 4 = 112 GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance benchmark #17

performance benchmark #17

touhi99 commented May 8, 2023

LudwigStumpp commented May 8, 2023

touhi99 commented May 10, 2023

LudwigStumpp commented May 10, 2023 •

edited

martinezpl commented May 26, 2023

performance benchmark #17

performance benchmark #17

Comments

touhi99 commented May 8, 2023

LudwigStumpp commented May 8, 2023

touhi99 commented May 10, 2023

LudwigStumpp commented May 10, 2023 • edited

Adding evals results right inside the table

GPU memory requirements

martinezpl commented May 26, 2023

LudwigStumpp commented May 10, 2023 •

edited