Optimizing efficiency using deepspeed #1939

base-y · 2022-05-06T13:16:34Z

base-y
May 6, 2022

Hi, I am using ZeRO-3 for training my T5 base model. But I see samplespersec=24 which I think is inefficient. What are the things I can try to improve the throughput? Also, can I know the maximum possible throughput I can get (maybe by doing sample runs)? I am relatively new to deepspeed.

Also, sometimes, throughput w/o using deepspeed seem to be higher than w/ deepspeed (using 8 GPUs to train). Would like to know the scenarios when deepspeed doesnt help much.

Answered by tjruwase

May 6, 2022

The ZeRO optimizations in DeepSpeed are most helpful when:

Model is too large to train using data parallelism alone.
Larger batch sizes can improve compute efficiency without hurting model performance.

The memory saving of ZeRO is a trade-off for increased communication. The memory saving and communication overhead increase with ZeRO stages. The communication overhead can hurt throughput of smaller models like t5-base, which don't benefit much from the memory savings. In such cases, it is probably better to disable ZeRO by setting the stage to 0. You might find the Flops Profiler or Autotuner helpful for your investigation.

View full answer

tjruwase · 2022-05-06T16:15:02Z

tjruwase
May 6, 2022
Maintainer

The ZeRO optimizations in DeepSpeed are most helpful when:

Model is too large to train using data parallelism alone.
Larger batch sizes can improve compute efficiency without hurting model performance.

The memory saving of ZeRO is a trade-off for increased communication. The memory saving and communication overhead increase with ZeRO stages. The communication overhead can hurt throughput of smaller models like t5-base, which don't benefit much from the memory savings. In such cases, it is probably better to disable ZeRO by setting the stage to 0. You might find the Flops Profiler or Autotuner helpful for your investigation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing efficiency using deepspeed #1939

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Optimizing efficiency using deepspeed #1939

base-y May 6, 2022

Replies: 1 comment

tjruwase May 6, 2022 Maintainer

base-y
May 6, 2022

tjruwase
May 6, 2022
Maintainer