Skip to content
Implementation of Redundancy Infused SGD for faster distributed SGD.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
cifar.py
generate_cifar_tfrecords.py
main.py
model_base.py
model_base1.py
resnet_model.py
utils.py

README.md

This is a preliminary implementation of the paper:

Haddadpour, F., Kamani, M.M., Mahdavi, M., & Cadambe, V. "Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization." International Conference on Machine Learning. 2019.

You can download each dataset using:

python generate_cifar_tfrecords.py --data-dir=./cifar10 --dataset cifar10

Then you can run RI-SGD using this script:

python main.py --data-dir=./cifar10 \
                --num-gpus=8 \
                --train-steps=45000 \
                --variable-strategy GPU \
                --job-dir=./log/ri-sgd/cifar10-ri-redun25-step50 \
                --run-type multi \
                --redundancy=0.25  \
                --sync-step=50 \
                --dataset cifar10 \
                --eval-batch-size=128
python main.py --data-dir=./cifar10 \
                --num-gpus=8 \
                --train-steps=45000 \
                --variable-strategy GPU \
                --job-dir=./log/ri-sgd/cifar10-ri-sync \
                --run-type sync \
                --redundancy=0.0  \
                --dataset cifar10 \
                --eval-batch-size=128

where redundancy is equal to $\mu$ in paper and sync-step is equal to $\tau$ in paper.

You can’t perform that action at this time.