Distributed deep learning benchmark

Extendable distributed deep learning benchmark, currently containing the following:

Implementations:

PyTorch (1 GPU, baseline)
Horovod
GPipe
PipeDream

Datasets:

MNIST
CIFAR-10
Imagenet

Models:

Resnet-18
Resnet-50
Resnet-152
VGG-11
VGG-16
MobileNet v2

Usage

The script to run the benchmark(s) can be found in /run. To get a summary of the models (using torchsummary), see /summary, to run the actual benchmarks, see /run. These scripts include the installation of all software, configured for SURF's LISA cluster. See the scripts for more information on how to adapt them to run on different environments.

More info:

run/run/run.sh -h
run/run/README.md
run/summary/README.md
README_PIPEDREAM.md

Installation

Download the CIFAR-10 dataset:

cd DDLBench/benchmark/cifar10
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Download the PyTorch build for PipeDream:

cd DDLBench
wget https://surfdrive.surf.nl/files/index.php/s/42ofDsL5Rty4zL8/download
unzip torch_pipedream.zip

See DDLBench/run/run for the remaining installation information. If a script in /run crashes during the installation phase, remember to delete the python environments in ~/.envs/<name_of_env> as it may be corrupted and will not work unless deleted.

If using these benchmarks on the LISA cluster, get environment-modules-lisa from the internal repository and extract it in ~.

Make sure this repository is located in ~, otherwise change paths in all installation and python scripts.

Datasets

The benchmarks are sorted per dataset in /benchmarks. Do not change the names of any files if you want to make use of the provided run scripts as the dataset and implemenation names are used to find the benchmark files. The implementations differ only slightly between datasets (e.g. cifar10_horovod.py vs mnist_horovod.py) because they all use the same image classification logic. Other benchmarks can be easily added by creating new folders in /benchmark and adding code to run scripts, similar to what you can already find there.

The MNIST and CIFAR10 datasets are included in the /benchmark folder, while the ImageNet data is present on LISA / Cartesius which the run scripts make use of. If this is not the case for you, change the data copying paths. It is also possible to use synthetic datasets for MNIST, CIFAR10 and ImageNet, which are automatically generated by /benchmark/generate_synthetic_data.py. This is the default option in the run scripts.

Networks

The previously mentioned models are supported for all data sets / implementations. However there are some exceptions either due to hardware limitations (Resnet-152 PyTorch, Horovod), GPipe limitations (MobileNet v2 for ImageNet / Highres) or PipeDream limitations (ResNet152 for all, most networks with Highres). For ImageNet, torchvision models are used. For Cifar-10 and MNIST, the networks have been slightly modified from the pytorch-cifar github repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
benchmark		benchmark
pipedream-fork		pipedream-fork
run		run
LICENSE		LICENSE
README.md		README.md
README_PIPEDREAM.md		README_PIPEDREAM.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed deep learning benchmark

Usage

Installation

Datasets

Networks

About

Releases

Packages

Contributors 2

Languages

License

sara-nl/DDLBench

Folders and files

Latest commit

History

Repository files navigation

Distributed deep learning benchmark

Usage

Installation

Datasets

Networks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages