Skip to content

sara-nl/DDLBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distributed deep learning benchmark

Extendable distributed deep learning benchmark, currently containing the following:

Implementations:

  • PyTorch (1 GPU, baseline)
  • Horovod
  • GPipe
  • PipeDream

Datasets:

  • MNIST
  • CIFAR-10
  • Imagenet

Models:

  • Resnet-18
  • Resnet-50
  • Resnet-152
  • VGG-11
  • VGG-16
  • MobileNet v2

Usage

The script to run the benchmark(s) can be found in /run. To get a summary of the models (using torchsummary), see /summary, to run the actual benchmarks, see /run. These scripts include the installation of all software, configured for SURF's LISA cluster. See the scripts for more information on how to adapt them to run on different environments.

More info:

run/run/run.sh -h
run/run/README.md
run/summary/README.md
README_PIPEDREAM.md

Installation

Download the CIFAR-10 dataset:

cd DDLBench/benchmark/cifar10
wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Download the PyTorch build for PipeDream:

cd DDLBench
wget https://surfdrive.surf.nl/files/index.php/s/42ofDsL5Rty4zL8/download
unzip torch_pipedream.zip

See DDLBench/run/run for the remaining installation information. If a script in /run crashes during the installation phase, remember to delete the python environments in ~/.envs/<name_of_env> as it may be corrupted and will not work unless deleted.

If using these benchmarks on the LISA cluster, get environment-modules-lisa from the internal repository and extract it in ~.

Make sure this repository is located in ~, otherwise change paths in all installation and python scripts.

Datasets

The benchmarks are sorted per dataset in /benchmarks. Do not change the names of any files if you want to make use of the provided run scripts as the dataset and implemenation names are used to find the benchmark files. The implementations differ only slightly between datasets (e.g. cifar10_horovod.py vs mnist_horovod.py) because they all use the same image classification logic. Other benchmarks can be easily added by creating new folders in /benchmark and adding code to run scripts, similar to what you can already find there.

The MNIST and CIFAR10 datasets are included in the /benchmark folder, while the ImageNet data is present on LISA / Cartesius which the run scripts make use of. If this is not the case for you, change the data copying paths. It is also possible to use synthetic datasets for MNIST, CIFAR10 and ImageNet, which are automatically generated by /benchmark/generate_synthetic_data.py. This is the default option in the run scripts.

Networks

The previously mentioned models are supported for all data sets / implementations. However there are some exceptions either due to hardware limitations (Resnet-152 PyTorch, Horovod), GPipe limitations (MobileNet v2 for ImageNet / Highres) or PipeDream limitations (ResNet152 for all, most networks with Highres). For ImageNet, torchvision models are used. For Cifar-10 and MNIST, the networks have been slightly modified from the pytorch-cifar github repository.

About

Distributed Deep Learning Benchmark Suite

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages