Skip to content

Commit

Permalink
Benchmarks: Add benchmark: Megatron-LM/Megatron-Deepspeed GPT pretrai…
Browse files Browse the repository at this point in the history
…n benchmark (#582)

**Description**
Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark
  • Loading branch information
yukirora committed Dec 7, 2023
1 parent 254ea7f commit dd5a632
Show file tree
Hide file tree
Showing 14 changed files with 2,425 additions and 9 deletions.
1 change: 1 addition & 0 deletions dockerfile/cuda11.1.1.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ RUN apt-get update && \
libtinfo5 \
libtool \
lshw \
python3-mpi4py \
net-tools \
openssh-client \
openssh-server \
Expand Down
1 change: 1 addition & 0 deletions dockerfile/cuda12.2.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ RUN apt-get update && \
libtinfo5 \
libtool \
lshw \
python3-mpi4py \
net-tools \
openssh-client \
openssh-server \
Expand Down
3 changes: 2 additions & 1 deletion dockerfile/rocm5.0.x.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ RUN apt-get update && \
libtinfo5 \
libtool \
lshw \
python3-mpi4py \
net-tools \
numactl \
openssh-client \
Expand Down Expand Up @@ -136,7 +137,7 @@ RUN echo PATH="$PATH" > /etc/environment && \
WORKDIR ${SB_HOME}

ADD third_party third_party
RUN make -C third_party rocm -o rocm_hipblaslt
RUN make -C third_party rocm -o rocm_hipblaslt -o megatron_deepspeed -o megatron_lm

ADD . .
RUN python3 -m pip install --upgrade setuptools==65.7 && \
Expand Down
3 changes: 2 additions & 1 deletion dockerfile/rocm5.1.x.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ RUN apt-get update && \
libtinfo5 \
libtool \
lshw \
python3-mpi4py \
net-tools \
numactl \
openssh-client \
Expand Down Expand Up @@ -141,7 +142,7 @@ RUN echo PATH="$PATH" > /etc/environment && \
WORKDIR ${SB_HOME}

ADD third_party third_party
RUN make ROCBLAS_BRANCH=release/rocm-rel-5.1 -C third_party rocm -o rocm_hipblaslt
RUN make ROCBLAS_BRANCH=release/rocm-rel-5.1 -C third_party rocm -o rocm_hipblaslt -o megatron_deepspeed -o megatron_lm

ADD . .
RUN python3 -m pip install --no-cache-dir .[amdworker] && \
Expand Down
23 changes: 22 additions & 1 deletion docs/user-tutorial/benchmarks/model-benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,29 @@ For inference, supported percentiles include
| Name | Unit | Description |
|-----------------------------------------------------------------------------------------|------------------------|------------------------------------------------------------------------------|
| model-benchmarks/pytorch-${model_name}/${precision}_train_step_time | time (ms) | The average training step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_train_throughput | throughput (samples/s) | The average training throughput with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_train_throughput | throughput (samples/s) | The average training throughput with fp32/fp16 precision per GPU. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time | time (ms) | The average inference step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput | throughput (samples/s) | The average inference throughput with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_step_time\_${percentile} | time (ms) | The n<sup>th</sup> percentile inference step time with fp32/fp16 precision. |
| model-benchmarks/pytorch-${model_name}/${precision}_inference_throughput\_${percentile} | throughput (samples/s) | The n<sup>th</sup> percentile inference throughput with fp32/fp16 precision. |


## Megatron Model benchmarks

### `megatron-gpt`

#### Introduction

Run GPT pretrain tasks with float32, float16, bfloat16 precisions with [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) or [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed).

`tips: batch_size in this benchmark represents global batch size, the batch size on each GPU instance is micro_batch_size.`

#### Metrics
| Name | Unit | Description |
|---------------------------------------------------|------------------------|---------------------------------------------------------|
| megatron-gpt/${precision}_train_step_time | time (ms) | The average training step time per iteration. |
| megatron-gpt/${precision}_train_throughput | throughput (samples/s) | The average training throughput per iteration. |
| megatron-gpt/${precision}_train_tflops | tflops/s | The average training tflops per second per iteration. |
| megatron-gpt/${precision}_train_mem_allocated | GB | The average GPU memory allocated per iteration. |
| megatron-gpt/${precision}_train_max_mem_allocated | GB | The average maximum GPU memory allocated per iteration. |

1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@ def run(self):
'xlrd>=2.0.1',
'xlsxwriter>=1.3.8',
'xmltodict>=0.12.0',
'types-requests',
],
extras_require=(
lambda x: {
Expand Down
3 changes: 2 additions & 1 deletion superbench/benchmarks/model_benchmarks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@
from superbench.benchmarks.model_benchmarks.pytorch_gpt2 import PytorchGPT2
from superbench.benchmarks.model_benchmarks.pytorch_cnn import PytorchCNN
from superbench.benchmarks.model_benchmarks.pytorch_lstm import PytorchLSTM
from superbench.benchmarks.model_benchmarks.megatron_gpt3 import MegatronGPT

__all__ = ['ModelBenchmark', 'PytorchBERT', 'PytorchGPT2', 'PytorchCNN', 'PytorchLSTM']
__all__ = ['ModelBenchmark', 'PytorchBERT', 'PytorchGPT2', 'PytorchCNN', 'PytorchLSTM', 'MegatronGPT']
Loading

0 comments on commit dd5a632

Please sign in to comment.