This repository contains code for the paper Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training (ICLR 2021).
- tensorflow >= 1.14
- python >= 3.6
Put the following datasets in your preferred location (e.g., ./data
).
To measure the effect of data parallelism for the workload of Simple-CNN, MNIST and SGD:
$ python main.py --nruns 100 --path_data PATH_TO_DATA --datasource mnist --arch simple-cnn --target_sparsity 0.9 --batch_size A_BATCH_SIZE --train_iterations 40000 --optimizer sgd --decay_type constant --mparams_kind learning_rate --goal_error 0.02 --check_interval 100
To measure the effect of data parallelism for the workload of ResNet-8, CIFAR-10 and Nesterov:
$ python main.py --nruns 100 --path_data PATH_TO_DATA --datasource cifar-10 --arch resnet-8 --target_sparsity 0.9 --batch_size A_BATCH_SIZE --train_iterations 100000 --optimizer nesterov --decay_type linear --mparams_kind learning_rate momentum decay_steps end_learning_rate_factor --goal_error 0.4 --check_interval 100
If you want to measure the Lipschitz constant of gradients, run with check_lipschitz
flag on.
See main.py
to run with other options.
If you use this code for your work, please cite the following:
@inproceedings{lee2021dps,
title={Understanding the effects of data parallelism and sparsity on neural network training},
author={Lee, Namhoon and Ajanthan, Thalaiyasingam and Torr, Philip HS and Jaggi, Martin},
booktitle={ICLR},
year={2021},
}
This project is licensed under the MIT License. See the LICENSE file for details.