Skip to content


Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Model Compression and Acceleration Progress

Repository to track the progress in model compression and acceleration

Low-rank approximation

  • T-Net: Parametrizing Fully Convolutional Nets with a Single High-Order Tensor (CVPR 2019) paper
  • MUSCO: Multi-Stage COmpression of neural networks (ICCVW 2019) paper | code (PyTorch)
  • Efficient Neural Network Compression (CVPR 2019) paper | code (Caffe)
  • Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling (ICLR 2019) paper | code (PyTorch)
  • Extreme Network Compression via Filter Group Approximation (ECCV 2018) paper
  • Ultimate tensorization: compressing convolutional and FC layers alike (NIPS 2016 workshop) paper | code (TensorFlow) | code (MATLAB, Theano + Lasagne)
  • Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications (ICLR 2016) paper
  • Accelerating Very Deep Convolutional Networks for Classification and Detection (IEEE TPAMI 2016) paper
  • Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition (ICLR 2015) paper | code (Caffe)
  • Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation (NIPS 2014) paper
  • Speeding up Convolutional Neural Networks with Low Rank Expansions (2014) paper

Pruning & Sparsification



Knowledge distillation


  • Learning Efficient Detector with Semi-supervised Adaptive Distillation (arxiv 2019) paper | code (Caffe)
  • Model compression via distillation and quantization (ICLR 2018) paper | code (Pytorch)
  • Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks (ICLR 2018 workshop) paper
  • Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks ( BMVC 2018) paper
  • Net2Net: Accelerating Learning via Knowledge Transfer (ICLR 2016) paper
  • Distilling the Knowledge in a Neural Network (NIPS 2014) paper
  • FitNets: Hints for Thin Deep Nets (2014) paper | code (Theano + Pylearn2)


TensorFlow implementation of three papers, results for CIFAR-10


  • Bayesian Bits: Unifying Quantization and Pruning (2020) paper
  • Up or Down? Adaptive Rounding for Post-Training Quantization (2020) paper
  • Gradient $\ell_1$ Regularization for Quantization Robustness (ICLR 2020) paper
  • Training Binary Neural Networks with Real-to-Binary Convolutions (ICLR 2020) paper | code (coming soon)
  • Data-Free Quantization Through Weight Equalization and Bias Correction (ICCV 2019) paper | code (PyTorch)
  • XNOR-Net++ (2019) paper
  • Matrix and tensor decompositions for training binary neural networks (2019) paper
  • XNOR-Net (ECCV 2016) paper | code (Pytorch)
  • Trained Quantization Thresholds for Accurate and Efficient Fixed-Point Inference of Deep Neural Networks (2019) paper | code (TensorFlow)
  • Relaxed Quantization for Discretized Neural Networks (ICLR 2019) paper
  • Training and Inference with Integers in Deep Neural Networks (ICLR 2018) paper | code (TensorFlow)
  • Training Quantized Nets: A Deeper Understanding (NeurIPS 2017) paper
  • Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference (2017) paper
  • Deep Learning with Limited Numerical Precision (2015) paper
  • Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation (2013) paper

Architecture search

  • MobileNets
  • EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (ICML 2019) paper | code and pretrained models (TensorFlow)
  • MnasNet: Platform-Aware Neural Architecture Search for Mobile (CVPR 2019) paper | code (TensorFlow)
  • MorphNet: Fast & Simple Resource-Constrained Learning of Deep Network Structure (CVPR 2018) paper | code (TensorFlow)
  • ShuffleNets
    • ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design (ECCV 2018) paper
    • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (CVPR 2018) paper
  • Multi-Fiber Networks for Video Recognition (ECCV 2018) paper | code (PyTorch)
  • IGCVs
    • IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks (BMVC 2018) paper | code and pretrained models (MXNet)
    • IGCV2: Interleaved Structured Sparse Convolutional Neural Networks (CVPR 2018) paper
    • Interleaved Group Convolutions for Deep Neural Networks (ICCV 2017) paper

PhD thesis and overviews

  • Quantizing deep convolutional networks for efficient inference: A whitepaper (2018) paper
  • Algorithms for speeding up convolutional neural networks (2018) thesis
  • Model Compression and Acceleration for Deep Neural Networks: The Principles, Progress, and Challenges (2018) paper
  • Efficient methods and hardware for deep learning (2017) thesis


  • MUSCO - framework for model compression using tensor decompositions (PyTorch, TensorFlow)
  • AIMET - AI Model Efficiency Toolkit (PyTorch, Tensorflow)
  • Distiller - package for compression using pruning and low-precision arithmetic (PyTorch)
  • MorphNet - framework for neural networks architecture learning (TensorFlow)
  • Mayo - deep learning framework with fine- and coarse-grained pruning, network slimming, and quantization methods
  • PocketFlow - framework for model pruning, sparcification, quantization (TensorFlow implementation)
  • Keras compressor - compression using low-rank approximations, SVD for matrices, Tucker for tensors.
  • Caffe compressor K-means based quantization
  • gemmlowp - Building a quantization paradigm from first principles (C++)
  • NNI - Framework for Feature Engineering, NAS, Hyperparam tuning and Model compression

Comparison of different approaches

Please, see comparative_results.pdf

Similar repos


No releases published


No packages published