This repository contains the codes of the gTop-k S-SGD (Synchronous Schocastic Gradident Descent) papers appeared at ICDCS 2019 (this version targets at empirical study) and IJCAI 2019 (this version targets at theorectical study). gTop-k S-SGD is a communication-efficient distributed training algorithm for deep learning. The key idea of gTop-k is that each work only sends/recieves top-k (k could be 0.1% of the gradient dimension d, i.e., k=0.001d) with a tree structure (recursive doubling) so that the communication complexity is O(k logP), where P is the number of workers. The convergence property of gTop-k S-SGD is provable under some weak analytical assumptions. The communication complexity comparision with tranditional ring-based all-reduce (Dense) and Top-k sparsification is shown as follows:
|gTop-k||O(k logP)||\alpha logP+2klogP\beta|
For more details about the algorithm, please refer to our papers.
git clone https://github.com/hclhkbu/gtopkssgd.git cd gtopkssgd pip install -r requirements.txt dnn=resnet20 nworkers=4 ./gtopk_mpi.sh
Assume that you have 4 GPUs on a single node and everything works well, you will see that there are 4 workers running at a single node training the ResNet-20 model with the Cifar-10 data set using the gTop-k S-SGD algorithm.
- S. Shi, Q. Wang, K. Zhao, Z. Tang, Y. Wang, X. Huang, and X.-W. Chu, “A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks,” IEEE ICDCS 2019, Dallas, Texas, USA, July 2019. PDF
- S. Shi, K. Zhao, Q. Wang, Z. Tang, and X.-W. Chu, “A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification,” IJCAI 2019, Macau, P.R.C., August 2019. PDF