Question about the calculation of `train_batch_size` #3876

formath · 2023-07-05T05:26:13Z

formath
Jul 5, 2023

In the doc https://www.deepspeed.ai/docs/config-json/#batch-size-related-parameters, train_batch_size is calculated in this way:

train_batch_size =  train_micro_batch_size_per_gpu * gradient_accumulation * number of GPUs

However, in pipeline mode, rightly it should be like this:

train_batch_size = train_micro_batch_size_per_gpu * gradient_accumulation * number of GPUs / number of stages

What is the difference between those?

Answered by ShadenSmith

Jul 5, 2023

Hi @formath, great question 😸. In both cases the last term represents the degree of data parallelism used in training (since each data-parallel replica will have its own data pipeline contributing to the train_batch_size.

With pure data parallelism, that's simply the number of GPUs. If we are also training with pipeline (model) parallelism, the resulting degree of data parallelism is the number of GPUs divided by the number of pipeline stages.

View full answer

ShadenSmith · 2023-07-05T16:26:57Z

ShadenSmith
Jul 5, 2023

Hi @formath, great question 😸. In both cases the last term represents the degree of data parallelism used in training (since each data-parallel replica will have its own data pipeline contributing to the train_batch_size.

With pure data parallelism, that's simply the number of GPUs. If we are also training with pipeline (model) parallelism, the resulting degree of data parallelism is the number of GPUs divided by the number of pipeline stages.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the calculation of `train_batch_size` #3876

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Question about the calculation of train_batch_size #3876

formath Jul 5, 2023

Replies: 1 comment

ShadenSmith Jul 5, 2023

Question about the calculation of `train_batch_size` #3876

formath
Jul 5, 2023

ShadenSmith
Jul 5, 2023