gTop-k S-SGD

Introduction

This repository contains the codes of the gTop-k S-SGD (Synchronous Schocastic Gradident Descent) papers appeared at ICDCS 2019 (this version targets at empirical study) and IJCAI 2019 (this version targets at theorectical study). gTop-k S-SGD is a communication-efficient distributed training algorithm for deep learning. The key idea of gTop-k is that each work only sends/recieves top-k (k could be 0.1% of the gradient dimension d, i.e., k=0.001d) with a tree structure (recursive doubling) so that the communication complexity is O(k logP), where P is the number of workers. The convergence property of gTop-k S-SGD is provable under some weak analytical assumptions. The communication complexity comparision with tranditional ring-based all-reduce (Dense) and Top-k sparsification is shown as follows:

S-SGD	Complexity	Time Cost
Dense	O(d)	2\alpha(P-1)+2(P-1)/Pd\beta
Top-k	O(kP)	\alpha logP+2(P-1)k\beta
gTop-k	O(k logP)	\alpha logP+2klogP\beta

For more details about the algorithm, please refer to our papers.

Installation

Prerequisites

Python 2 or 3
PyTorch-0.4.+
OpenMPI-3.1.+
Horovod-0.14.+: Optional if not run the dense version

Quick Start

git clone https://github.com/hclhkbu/gtopkssgd.git
cd gtopkssgd
pip install -r requirements.txt
dnn=resnet20 nworkers=4 ./gtopk_mpi.sh

Assume that you have 4 GPUs on a single node and everything works well, you will see that there are 4 workers running at a single node training the ResNet-20 model with the Cifar-10 data set using the gTop-k S-SGD algorithm.

Papers

S. Shi, Q. Wang, K. Zhao, Z. Tang, Y. Wang, X. Huang, and X.-W. Chu, “A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks,” IEEE ICDCS 2019, Dallas, Texas, USA, July 2019. PDF
S. Shi, K. Zhao, Q. Wang, Z. Tang, and X.-W. Chu, “A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification,” IJCAI 2019, Macau, P.R.C., August 2019. PDF

Referred Models

Deep speech: https://github.com/SeanNaren/deepspeech.pytorch
PyTorch examples: https://github.com/pytorch/examples

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
audio_data		audio_data
exp_configs		exp_configs
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
allreducer.py		allreducer.py
cluster4		cluster4
compression.py		compression.py
datasets.py		datasets.py
decoder.py		decoder.py
distributed_optimizer.py		distributed_optimizer.py
dl_trainer.py		dl_trainer.py
evaluate.py		evaluate.py
gtopk_mpi.sh		gtopk_mpi.sh
gtopk_trainer.py		gtopk_trainer.py
horovod_mpi.sh		horovod_mpi.sh
horovod_trainer.py		horovod_trainer.py
labels.json		labels.json
model_builder.py		model_builder.py
ptb_reader.py		ptb_reader.py
requirements.txt		requirements.txt
settings.py		settings.py
single.sh		single.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gTop-k S-SGD

Introduction

Installation

Prerequisites

Quick Start

Papers

Referred Models

About

Releases

Packages

Contributors 2

Languages

License

hclhkbu/gtopkssgd

Folders and files

Latest commit

History

Repository files navigation

gTop-k S-SGD

Introduction

Installation

Prerequisites

Quick Start

Papers

Referred Models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages