Benchmark Performance for Baseline vs Pipeline-1 #11

vibhatha · 2020-02-26T00:04:10Z

With the speed benchmarks, the pipeline-1 benchmark time is higher than that of baseline benchmarks. Is there a clear reason why there is a significant overhead with pipeline-1 with respect to baseline experiments?

What I understood from the script was the baseline runs in one GPU core. Is this right?
And pipeline also runs in one GPU core? Is this right?

sublee · 2020-02-26T08:15:19Z

Both pipeline-1 and baseline benchmarks run on a single GPU. Unlike baseline, pipeline-1 includes checkpointing which has an overhead. This overhead is worth since there is actual pipeline parallelism, but pipeline-1 does not perform any parallelism.

vibhatha · 2020-02-28T16:02:08Z

About checkpointing, if I understood right, the overhead comes with re-running the forward at the end of a micro-batch. Correct me if I am wrong.

For checkpointing, there are a couple of modes,

when I try to use 'never' it runs out of memory.

I tried 'except_last', it works fine and in the 'always' option, the performance is not as much as 'except_last'.

So what I used in running pipeline-1 is the except_last, so I get minimum overhead in re-running the forward? Am I right?

And also, if I do use 'always' option, it re-runs for each micro-batch?
And if I use 'except_last' it runs for the last micro-batch?

In addition, what is the difference between 'never' and 'except_last'?

sublee · 2020-03-02T06:35:34Z

There seems misunderstanding in 'except_last'. When you choose 'except_last', torchgpipe reruns every micro-batch but not the last one. For example, let's assume that we use 8 micro-batches. Then the each option involves rerunning n micro-batches:

'never': 0
'always': 8
'except_last': 7

vibhatha · 2020-03-02T12:54:14Z

I understand. I will test this for smaller batch sizes. Thanks for the clarification of this point.

sublee · 2020-03-02T13:18:15Z

You are welcome. I close this issue.

sublee self-assigned this Feb 26, 2020

sublee added the question Further information is requested label Feb 26, 2020

sublee closed this as completed Mar 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Performance for Baseline vs Pipeline-1 #11

Benchmark Performance for Baseline vs Pipeline-1 #11

vibhatha commented Feb 26, 2020

sublee commented Feb 26, 2020

vibhatha commented Feb 28, 2020 •

edited

Loading

sublee commented Mar 2, 2020

vibhatha commented Mar 2, 2020 •

edited

Loading

sublee commented Mar 2, 2020

Benchmark Performance for Baseline vs Pipeline-1 #11

Benchmark Performance for Baseline vs Pipeline-1 #11

Comments

vibhatha commented Feb 26, 2020

sublee commented Feb 26, 2020

vibhatha commented Feb 28, 2020 • edited Loading

sublee commented Mar 2, 2020

vibhatha commented Mar 2, 2020 • edited Loading

sublee commented Mar 2, 2020

vibhatha commented Feb 28, 2020 •

edited

Loading

vibhatha commented Mar 2, 2020 •

edited

Loading