Machine learning benchmarks

Collection of various machine learning benchmarks together with Slurm scripts for CSC's supercomputers.

The benchmarks themselves (Python code) can be found in the benchmarks directory. Main run scripts are in the root directory as *.sh files. The Slurm settings have been separated into their own scripts in the slurm directory.

Typical usage would be to first select a benchmark (e.g., PyTorch synthetic) and then appropriate Slurm settings (e.g., Mahti with 4 GPUs on Mahti, single node, no MPI). The command would then be:

sbatch slurm/mahti-gpu4.sh pytorch-synthetic.sh

Available run scripts

Slurm run scripts can be found in the slurm directory, these are named as [puhti|mahti]-[cpu|gpu]N.sh where N is the number of CPUs or GPUs reserved.

Scripts are all single-node, single MPI task unless it ends with -mpi.sh. Tasks with the -mpi.sh ending launch a separate MPI task for each GPU, assuming 4 GPUs per node. For example mahti-gpu8-mpi.sh reserves two nodes, with 4 GPUs (and thus 4 MPI tasks) per node, giving a total of 8 GPUs (and 8 MPI tasks).

Available benchmarks

Benchmark	Script name	Data
PyTorch synthetic	`pytorch-synthetic.sh`	synthetic
PyTorch DDP	`pytorch-ddp.sh`	synthetic/ImageNet
PyTorch DDP Lightning	`pytorch-ddp-lightning.sh`	synthetic/ImageNet
PyTorch DeepSpeed	`pytorch-deepspeed.sh`	synthetic/ImageNet
run_clm	`pytorch-clm.sh`	WikiText-2
TensorFlow CNN	`tensorflow-cnn.sh`	synthetic/ImageNet

The different benchmarks are described below in more detail.

PyTorch synthetic

Originally based on Horovod's example script with the same name. Note that the original script used a single fixed random batch which was feed to the network again and again. Some systems and setups are able to optimize this scenario giving very unrealistic results. We have modified the script to generate a new random batch each time.

Runs with "resnet50" model by default, but also supports "inception_v3" and other models from torchvision.models.

Run example with single GPU:

sbatch slurm/mahti-gpu1.sh pytorch-synthetic.sh

Run example with 4 GPUs. Note that you can also add arguments to be given to the Python script:

sbatch slurm/mahti-gpu4.sh pytorch-synthetic.sh --batch-size=32

Using 8 GPUs (i.e., 2 nodes) with Horovod and MPI (not supported in newer PyTorch installations):

sbatch slurm/mahti-gpu8-mpi.sh pytorch-synthetic.sh

PyTorch DDP

PyTorch benchmark using Distributed Data Parallel for handling multiple GPUs.

Run example with 4 GPUs on Puhti using synthetic data:

sbatch slurm/puhti-gpu4.sh pytorch-ddp.sh

Run example with 8 GPUs (on 2 nodes) using real ImageNet data:

sbatch slurm/puhti-gpu8.sh pytorch-ddp.sh --data

Run example with 8 GPUs (2 nodes) with fp16:

sbatch slurm/puhti-gpu8.sh pytorch-ddp.sh --fp16

PyTorch DDP with Lightning

PyTorch Lightning example using DDP. Runs with "resnet50" model by default, but also supports "inception_v3" and other models from torchvision.models.

DDP on Lightning (as of PyTorch 1.13) needs to be run as single task per GPU:

sbatch slurm/puhti-gpu4-mpi.sh pytorch-ddp-lightning.sh  # single node
sbatch slurm/puhti-gpu8-mpi.sh pytorch-ddp-lightning.sh  # two nodes

The scripts supports --data option to use real ImageNet data instead of synthetic data and --fp16 to enable 16-bit precision for some operations.

PyTorch DeepSpeed

DeepSpeed example, 4 GPUs with synthetic data (note: one node = one task):

sbatch slurm/puhti-gpu4.sh pytorch-deepspeed.sh

8 GPUs, 2 nodes with ImageNet data (note one GPU = one task):

sbatch slurm/puhti-gpu8-mpi.sh pytorch-deepspeed.sh --data

run_clm

Fine-tuning GPT-like model on WikiText-2, directly from Huggingface Language modeling examples.

Run example with a full node GPUs (in this case 8 GPUs on LUMI):

sbatch slurm/lumi-gpu8.sh pytorch-clm.sh

Run example with two full nodes GPUs (in this case 16 GPUs on LUMI):

sbatch slurm/lumi-gpu16.sh pytorch-clm.sh

TensorFlow CNN

Uses tf_cnn_benchmarks.py directly from TensorFlow's GitHub (as a git submodule here).

Run example:

sbatch slurm/mahti-gpu1.sh tensorflow-cnn.sh

Horovod:

sbatch slurm/mahti-gpu8-mpi.sh tensorflow-cnn.sh

With real data:

sbatch slurm/mahti-gpu1.sh tensorflow-cnn.sh --data

Horovod with real data:

sbatch slurm/mahti-gpu8-mpi.sh tensorflow-cnn.sh --data

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
benchmarks		benchmarks
deprecated		deprecated
slurm		slurm
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
accelerate-launch.sh		accelerate-launch.sh
lumi-memory-bug.py		lumi-memory-bug.py
make_chart.py		make_chart.py
pytorch-accelerate.sh		pytorch-accelerate.sh
pytorch-clm-deepspeed.sh		pytorch-clm-deepspeed.sh
pytorch-clm.sh		pytorch-clm.sh
pytorch-ddp-lightning.sh		pytorch-ddp-lightning.sh
pytorch-ddp.sh		pytorch-ddp.sh
pytorch-deepspeed.sh		pytorch-deepspeed.sh
pytorch-synthetic.sh		pytorch-synthetic.sh
pytorch_ddp_lightning_synthetic.png		pytorch_ddp_lightning_synthetic.png
pytorch_ddp_synthetic.png		pytorch_ddp_synthetic.png
pytorch_deepspeed_synthetic.png		pytorch_deepspeed_synthetic.png
pytorch_run_clm_synthetic.png		pytorch_run_clm_synthetic.png
results.md		results.md
run-all-benchmarks.sh		run-all-benchmarks.sh
tensorflow-cnn.sh		tensorflow-cnn.sh

mvsjober/ml-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Machine learning benchmarks

Available run scripts

Available benchmarks

PyTorch synthetic

PyTorch DDP

PyTorch DDP with Lightning

PyTorch DeepSpeed

run_clm

TensorFlow CNN

About

Resources

Stars

Watchers

Forks

Languages