ddl-benchmarks: Benchmarks for Distributed Deep Learning.

Introduction

This repository contains a set of benchmarking scripts for evaluating the training performance of popular distributed deep learning methods, which mainly focuses on system-level optimization algorithms of synchronized stochastic gradient descent with data parallelism. Currently, it covers:

system architectures

Parameter server with BytePS.
All-to-all with Horovod.

optimization algorithms

Wait-free backpropagation (WFBP), which is also known as the technique of pipelining the backward computations with gradient communications and it is a default feature in current deep learning frameworks.
Tensor fusion, which has been integraded in Horovod with a hand-craft threshold to determine when to fusion tensors, but it is possible to dynamically determine to fusion tensors in MG-WFBP.
Tensor partition and priority schedule, which are proposed in ByteScheduler.
Gradient compression with quantization (i.e., signSGD) and sparsification (i.e., TopK-SGD). These methods are included in the code, but they are excluded from our paper as the paper focuses on the system-level optimization methods.

deep neural networks

Convolutional neural networks (CNNs) on a fake ImageNet data set (i.e., randomly generate the input image of 224*224*3)
Transformers: BERT-Base and BERT-Large pretraining models.

Installation

Prerequisites

Python 3.6+
CUDA-10.+
NCCL-2.4.+
PyTorch-1.4.+
OpenMPI-4.0.+
Horovod-0.19.+
BytePS-0.2.+
ByteScheduler
bit2byte: Optional if not run signSGD.

Get the code

$git clone https://github.com/HKBU-HPML/ddl-benchmarks.git
$cd ddl-benchmarks 
$pip install -r requirements.txt

Configure the cluster settings

Before running the scripts, please carefully configure the configuration files in the directory of configs.

configs/cluster*: configure the host files for MPI
configs/envs.conf: configure the cluster enviroments

Create a log folder, e.g.,

$mkdir -p logs/pcie

Run benchmarks

The batch mode

$python benchmarks.py

The individual mode, e.g.,

$cd horovod
$dnn=resnet50 bs=64 nworkers=64 ./horovod_mpi_cj.sh

Paper

If you are using this repository for your paper, please cite our work

@article{shi2020ddlsurvey,
    author = {Shi, Shaohuai and Tang, Zhenheng and Chu, Xiaowen and Liu, Chengjian and Wang, Wei and Li, Bo},
    title = {Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges},
    journal = {arXiv},
    year = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
byteps		byteps
bytescheduler		bytescheduler
configs		configs
horovod		horovod
mgwfbp		mgwfbp
osu_benchmarks		osu_benchmarks
.gitignore		.gitignore
README.md		README.md
benchmarks.py		benchmarks.py
plots.py		plots.py
requirements.txt		requirements.txt
synclog.sh		synclog.sh

marsggbo/ddl-benchmarks

Folders and files

Latest commit

History

Repository files navigation

ddl-benchmarks: Benchmarks for Distributed Deep Learning.

Introduction

system architectures

optimization algorithms

deep neural networks

Installation

Prerequisites

Get the code

Configure the cluster settings

Run benchmarks

Paper

About

Resources

Stars

Watchers

Forks

Languages