Skip to content

LGrCo/L-GreCo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

L-GreCo: AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION

Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks due to gradient transmission. To address this issue, entire families of lossy gradient compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model’s layers dynamically during training, significantly improving the overall compression without sacrificing accuracy. Our framework, called L-GreCo, is based on an efficient adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while respecting a theoretically-justified error constraint. Our extensive experimental study over image classification and language modeling tasks shows that L-GreCo is effective across all three compression families, and achieves up to 2.5× training speedup and up to 5× compression improvement over efficient implementations of standard approaches while recovering full accuracy. Moreover, we show that L-GreCo is complementary to existing adaptive algorithms improving their compression ratio by 50% and practical throughput by 66%.

For ResNet50 and Transformer-XL experiments we used https://github.com/NVIDIA/DeepLearningExamples, for Transformer-LM experiments we used https://github.com/facebookresearch/fairseq, and for ResNet18 experiment we used the ResNet18 model provided https://github.com/epfml/powersgd.

You can find all hooks in a folder named 'hooks', and also you can use 'lgreco.sh' file in each experiment in order to run the task with lgreco. ResNet18 is a natural entery point.

About

AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published