Training details? #16

abhi-mosaic · 2023-07-06T22:40:45Z

Hi InternLM team, thank you for this open source contribution! InternLM looks like a really strong 7B model.

I think the research community would greatly benefit from learning about the training details of InternLM. Are you open to sharing the token budget and global batch size used for this model?

In the README I see this comment which suggests a token budget over 1T tokens:

It leverages trillions of high-quality tokens for training to establish a powerful knowledge base.

And in the training performance README I see that the max performance was achieved at 16k tokens per GPU. If this was used across 1024 GPUs for pretraining it would imply a global batch size of 16M tokens which is larger than I've seen before (especially for 7B models).

Thank you again!

SolenoidWGT · 2023-07-08T16:46:10Z

Hi, thank you for your interest in our project. As you mentioned, a global batch of 16M is indeed quite large. We commonly use configurations such as 512 GPUs with a global batch size of 4M, or 8M with a global batch size on 1024 GPUs for training. We have included information in our readme file regarding the performance testing under a certain global batch sizes.

As the number of GPUs increases, the batch size per GPU will gradually decrease, resulting in a higher proportion of communication overhead. This inevitably leads to a decrease in TGS or tflops. Additionally, enabling options such as pack_sample_into_one, bfloat16/float16, and reduce_bucket_size can cause fluctuations in TGS, and the network status of the cluster can also introduce minor disturbances. We are currently working continuously to address the communication overhead issue in small batches. We will keep you updated on any new developments.

sunpengsdu closed this as completed Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training details? #16

Training details? #16

abhi-mosaic commented Jul 6, 2023

SolenoidWGT commented Jul 8, 2023

Training details? #16

Training details? #16

Comments

abhi-mosaic commented Jul 6, 2023

SolenoidWGT commented Jul 8, 2023