DEFT is abbreviation of 'Distributed Execution of Fragmented Top-k'. DEFT partitions the gradient sparsification task into sub tasks and distributes them to workers. Benchmarks include: 1) image classification using CNNs, 2) language modelling using LSTMs
cnn-lstm
, and 3) recommendation using NCFncf
.
To install the necessary dependencies, create Conda environment using environment.yml
by running the following commands. Note: check compatibility of each package version such as cudatoolkit
and cudnn
with your device, e.g., NVIDIA Tesla V100 GPU is compatible.
$ conda env create --file environment.yml
$ conda activate deft
$ python -m spacy download en
$ conda deactivate deft
The scripts to run code are written for SLURM workload manager. The source code supports distributed training with multi-node and multi-GPU. In run.sh
, you can specify model, dataset, reducer, and world_size.
- If you use SLURM, use
pararun
and modify it for your configuration. The scriptpararun
executesrun.sh
in parallel. The scriptrun.sh
includes setup for distributed training. - If you do not use SLURM, you do not need to use
pararun
. Instead, runrun.sh
on your nodes, then rendezvous of pytorch allows processes are connected.
- If you use SLURM, use following command.
$ sbatch pararun
- If you do not use SLURM, use following command on each node.
$ hostip=<ip> port=<port> mpirun -np <world_size> run.sh
- To download dataset, use following command.
$ ./prepare_dataset.sh
- If you use SLURM, use following command.
$ sbatch pararun
- If you do not use SLURM, use following command on each node.
$ hostip=<ip> port=<port> mpirun -np <world_size> run.sh
If you use this code, please cite the following [Paper]:
- DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. Daegun Yoon, Sangyoon Oh. ICPP 2023, Aug. 2023.
@inproceedings{yoon2023deft,
title={DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification},
author={Yoon, Daegun and Oh, Sangyoon},
booktitle={Proceedings of the 52nd International Conference on Parallel Processing},
pages={746--755},
year={2023}
}
If you have any questions about this project, contact me by one of the followings: