Skip to content

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Notifications You must be signed in to change notification settings

liuguoyou/awesome-model-quantization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Model Quantization Awesome

This repo collects papers, docs, codes about model quantization for anyone who wants to do research on it. We are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Table of Contents


Keywords

low-bit: Low-bit Quantization | binarization | hardware | other: other relative methods

Statistics: 🔥 highly cited | ⭐ code is available and star > 50

Papers

2020

  • [arxiv] Training with Quantization Noise for Extreme Model Compression. [low-bit] [torch]
  • [CVPR] [47:fire:] GhostNet: More Features from Cheap Operations. [tensorflow & torch] [low-bit] [1.2k:star:]
  • [CVPR] Forward and Backward Information Retention for Accurate Binary Neural Networks. [binarization] [torch] [105:star:]
  • [CVPR] Rotation Consistent Margin Loss for Efficient Low-Bit Face Recognition. [low-bit]
  • [ECCV] Learning Architectures for Binary Networks. [torch] [binarization]
  • [NeurIPS] Rotated Binary Neural Network.[torch] [binarization]
  • [IJCV] Binarized Neural Architecture Search for Efficient Object Recognition. [binarization]
  • [ECCV]PROFIT: A Novel Training Method for sub-4-bit MobileNet Models. [low-bit]
  • [CVPR] BiDet: An Efficient Binarized Object Detector. [torch] [ binarization ] [112:star:]
  • [NeurIPS] Searching for Low-Bit Weights in Quantized Neural Networks. [torch] [low-bit]
  • [ISCAS] MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG Signal Classification. [binarization]
  • [arxiv] Binarized Graph Neural Network. [binarization]
  • [ACL] End to End Binarized Neural Networks for Text Classification. [binarization]
  • [arxiv] Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs. [code] [binarization]
  • [SysML] Riptide: Fast End-to-End Binarized Neural Networks. [tensorflow] [low-bit] [129:star:]
  • [IJCAI] Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models. [binarization]
  • [PR] [23:fire:] Binary neural networks: A survey. [binarization]
  • [ICLR] DMS: Differentiable Dimension Search for Binary Neural Networks. [binarization]
  • [DATE] BNNsplit: Binarized Neural Networks for embedded distributed FPGA-based computing systems. [binarization]
  • [ECCV] ProxyBNN: Learning Binarized Neural Networks via Proxy Matrices. [binarization]
  • [ECCV] ReActNet: Towards Precise Binary Neural Network with Generalized Activation Functions. [binarization] [torch] [108:star:]
  • [arxiv] Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck. [binarization]
  • [NN] Training high-performance and large-scale deep neural networks with full 8-bit integers. [low-bit]
  • [ICLR] [19:fire:] Training Binary Neural Networks with Real-to-Binary Convolutions. [binarization] [code is comming] [re-implement]
  • [paper] Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation. [binarization]
  • [IEEE Trans. Magn] SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator. [binarization]
  • [arxiv] RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks. [binarization] [low-bit]
  • [TVLSI] Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks. [low-bit]
  • [DATE] OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks. [binarization]
  • [WACV] MoBiNet: A Mobile Binary Network for Image Classification. [binarization]
  • [arxiv] MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy? [binarization] [code] [192:star:]
  • [CVPR] Low-Bit Quantization Needs Good Distribution. [low-bit]
  • [IEEE TCS.I] IMAC: In-Memory Multi-Bit Multiplication and ACcumulation in 6T SRAM Array. [low-bit]
  • [arxiv] How Does Batch Normalization Help Binary Training? [binarization]
  • [arxiv] Distillation Guided Residual Learning for Binary Convolutional Neural Networks. [binarization]
  • [IEEE Trans. Electron Devices] Design of High Robustness BNN Inference Accelerator Based on Binary Memristors. [binarization] [hardware]
  • [Pattern Recognition Letters] Controlling information capacity of binary neural network. [binarization]
  • [MLST] Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML. [hardware] [binarization] [low-bit]
  • [ISQED] BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency. [binarization] [torch]
  • [ICLR] BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations. [binarization] [torch]
  • [ICET] An Energy-Efficient Bagged Binary Neural Network Accelerator. [binarization] [hardware]
  • [IEEE Access] An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network. [binarization] [hardware]
  • [IEEE TCS.II] A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks. [hardware]
  • [COOL CHIPS] A Novel In-DRAM Accelerator Architecture for Binary Neural Network. [hardware]
  • [ICASSP] Balanced Binary Neural Networks with Gated Residual. [binarization]
  • [ICML] Training Binary Neural Networks through Learning with Noisy Supervision. [binarization]
  • [IJCAI] CP-NAS: Child-Parent Neural Architecture Search for Binary Neural Networks. [binarization]
  • [CoRR] Training Binary Neural Networks using the Bayesian Learning Rule. [binarization]
  • [Neurocomputing] Eye localization based on weight binarization cascade convolution neural network. [binarization]
  • [DATE] PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones. [binarization] [hardware]

2019

  • [IJCAI] Binarized Collaborative Filtering with Distilling Graph Convolutional Networks. [binarization]
  • [TMM] Compact Hash Code Learning With Binary Deep Neural Network. [binarization]
  • [CVPR] [53:fire:] Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation. [binarization]
  • [NeurIPS] Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization. [binarization] [tensorflow]
  • [MDPI Electronics] A Review of Binarized Neural Networks. [binarization]
  • [AAAI] [31:fire:] Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation. [binarization]
  • [IJCAI] Binarized Collaborative Filtering with Distilling Graph Convolutional Network. [binarization]
  • [IEEE J. Emerg. Sel. Topics Circuits Syst.] Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine. [hardware]
  • [ISOCC] Dual Path Binary Neural Network. [binarization]
  • [paper] [43:fire:] BNN+: Improved Binary Network Training. [binarization]
  • [NeurIPS] [43:fire:] Regularized Binary Network Training. [binarization]
  • [BMVC] [32:fire:] XNOR-Net++: Improved Binary Neural Networks. [binarization]
  • [IEEE TCS.I] Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays. [hardware]
  • [APCCAS] Using Neuroevolved Binary Neural Networks to solve reinforcement learning environments. [binarization] [code]
  • [ICIP] Training Accurate Binary Neural Networks from Scratch. [binarization] [code] [192:star:]
  • [arxiv] Self-Binarizing Networks. [binarization]
  • [CVPR] SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization. [low-bit]
  • [IEEE TCS.I] Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory. [binarization]
  • [CVPR] [48:fire:] Quantization Networks. [binarization] [torch] [82:star:]
  • [RoEduNet] PXNOR: Perturbative Binary Neural Network. [binarization] [code]
  • [ICLR] [37:fire:] ProxQuant: Quantized Neural Networks via Proximal Operators. [binarization] [low-bit] [torch]
  • [NeurIPS] MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization. [low-bit] [torch]
  • [CVPR] Learning Channel-Wise Interactions for Binary Convolutional Neural Networks. [binarization]
  • [CoRR] Improved training of binary networks for human pose estimation and image recognition. [binarization]
  • [CVPR] Fully Quantized Network for Object Detection. [low-bit]
  • [IEEE JETC] [128:fire:] Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. [hardware]
  • [ICCV] [55:fire:] Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks. [low-bit]
  • [TMM] [45:fire:] Deep Binary Reconstruction for Cross-Modal Hashing. [binarization]
  • [CVPR] [31:fire:] Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation. [binarization]
  • [ICUS] Balanced Circulant Binary Convolutional Networks. [binarization]
  • [IEEE J. Solid-State Circuits] An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width. [binarization] [low-bit]
  • [BMVC] Accurate and Compact Convolutional Neural Networks with Trained Binarization. [binarization]
  • [ICLR] An Empirical study of Binary Neural Networks' Optimisation. [binarization]
  • [VLSI-SoC] A Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories. [binarization] [hardware]
  • [TMM] Compact Hash Code Learning With Binary Deep Neural Network. [binarization]
  • [arxiv] Towards Unified INT8 Training for Convolutional Neural Network. [low-bit]
  • [arxiv] daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices. [binarization] [hardware] [code]
  • [CVPR] [36:fire:] Regularizing Activation Distribution for Training Binarized Deep Networks. [binarization]
  • [FPGA] Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA. [binarization] [hardware]
  • [CVPR] A Main/Subsidiary Network Framework for Simplifying Binary Neural Network. [binarization]
  • [CoRR] Matrix and tensor decompositions for training binary neural networks. [binarization]
  • [CoRR] Back to Simplicity: How to Train Accurate BNNs from Scratch? [binarization] [code] [193:star:]
  • [AAAI] Efficient Quantization for Neural Networks with Binary Weights and Low Bitwidth Activations. [low-bit] [binarization]
  • [ICCV] Bayesian optimized 1-bit cnns. [binarization]
  • [IJCAI] Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss. [binarization]
  • [CoRR] Binarized Neural Architecture Search. [binarization]
  • [ICCV] Searching for Accurate Binary Neural Architectures. [binarization]
  • [CoRR] RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs. [binarization]
  • [CVPR] Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit? [binarization]
  • [CoRR] TentacleNet: A Pseudo-Ensemble Template for Accurate Binary Convolutional Neural Networks. [binarization]
  • [GLSVLSI] Binarized Depthwise Separable Neural Network for Object Tracking in FPGA. [binarization] [hardware]

2018

  • [ECCV] [202:fire:] LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks. [low-bit] [tensorflow] [188:star:]
  • [ECCV] [145:fire:] Bi-Real Net: Enhancing the Performance of 1-bit CNNs With Improved Representational Capability and Advanced Training Algorithm. [binarization] [torch] [120:star:]
  • [NeurIPS] [150:fire:] Training Deep Neural Networks with 8-bit Floating Point Numbers. [low-bit]
  • [NeurIPS] [91:fire:] Scalable methods for 8-bit training of neural networks. [low-bit] [torch]
  • [ICLR] [65:fire:] Loss-aware Weight Quantization of Deep Networks. [low-bit] [code]
  • [CVPR] [63:fire:] Two-Step Quantization for Low-bit Neural Networks. [low-bit]
  • [ICLR] [201:fire:] PACT: Parameterized Clipping Activation for Quantized Neural Networks. [low-bit]
  • [AAAI] From Hashing to CNNs: Training BinaryWeight Networks via Hashing. [binarization]
  • [TRETS] [50:fire:] FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. [low-bit]
  • [MM] BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs. [binarization]
  • [TCAD] XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference. [hardware]
  • [arxiv] Training Competitive Binary Neural Networks from Scratch. [binarization] [code] [192:star:]
  • [ECCV] Training Binary Weight Networks via Semi-Binary Decomposition. [binarization]
  • [CVPR] [67:fire:] SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks. [low-bit] [code]
  • [FCCM] ReBNet: Residual Binarized Neural Network. [binarization] [tensorflow]
  • [CVPR] [630:fire:] Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [low-bit]
  • [ICLR] [230:fire:] Model compression via distillation and quantization. [low-bit] [torch] [284:star:]
  • [FPL] FBNA: A Fully Binarized Neural Network Accelerator. [hardware]
  • [IEEE J. Solid-State Circuits] [66:fire:] BRein Memory: A Single-Chip Binary/Ternary Reconfigurable in-Memory Deep Neural Network Accelerator Achieving 1.4 TOPS at 0.6 W. [hardware] [low-bit] [binarization]
  • [Res Math Sci] Blended coarse gradient descent for full quantization of deep neural networks. [low-bit] [binarization]
  • [IPDPS] BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU. [binarization]
  • [IJCNN] Analysis and Implementation of Simple Dynamic Binary Neural Networks. [binarization]
  • [TVLSI] An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks. [binarization]
  • [NCA] [88:fire:] A survey of FPGA-based accelerators for convolutional neural networks. [hardware]
  • [AAAI] [136:fire:] Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM. [low-bit] [homepage]
  • [ICLR] [168:fire:] WRPN: Wide Reduced-Precision Networks. [low-bit]
  • [ICLR] [141:fire:] Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. [low-bit]
  • [ECCV] [47:fire:] TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights. [binarization] [low-bit] [torch]
  • [CVPR] Modulated convolutional networks. [binarization]
  • [CoRR] LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks. [binarization]
  • [IJCAI] Deterministic Binary Filters for Convolutional Neural Networks. [binarization]
  • [CoRR] BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights. [binarization]
  • [CAAI] Fast object detection based on binary deep convolution neural networks. [binarization]

2017

  • [ICLR] [119:fire:] Loss-aware Binarization of Deep Networks. [binarization] [code]
  • [ICLR] [222:fire:] Soft Weight-Sharing for Neural Network Compression. [other]
  • [ICLR] [637:fire:] Trained Ternary Quantization. [low-bit] [torch] [90:star:]
  • [NeurIPS] [293:fire:] Towards Accurate Binary Convolutional Neural Network. [binarization] [tensorflow]
  • [arxiv] [71:fire:] Ternary Neural Networks with Fine-Grained Quantization. [low-bit]
  • [arxiv] ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks. [low-bit] [code] [53:star:]
  • [ICCV] [55:fire:] Performance Guaranteed Network Acceleration via High-Order Residual Quantization. [low-bit]
  • [IPDPSW] On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA. [hardware]
  • [ICLR] [554:fire:] Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights. [low-bit] [torch] [144:star:]
  • [Neurocomputing] [126:fire:] FP-BNN: Binarized neural network on FPGA. [hardware]
  • [CVPR] [251:fire:] Deep Learning with Low Precision by Half-wave Gaussian Quantization. [low-bit] [code] [118:star:]
  • [MWSCAS] Deep learning binary neural network on an FPGA. [hardware] [binarization]
  • [JETC] A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks. [hardware] [binarization]
  • [FPGA] [463:fire:] FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. [hardware] [binarization]
  • [CVPR] [156:fire:] Local Binary Convolutional Neural Networks. [binarization] [torch] [94:star:]
  • [ICCV] [130:fire:] Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources. [binarization] [homepage] [torch] [207:star:]
  • [CoRR] BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet. [binarization] [code]
  • [InterSpeech] Binary Deep Neural Networks for Speech Recognition. [binarization]

2016

  • [CoRR] [1k:fire:] DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [low-bit] [code] [5.8k:star:]
  • [NeurIPS] [572:fire:] Ternary weight networks. [low-bit] [code] [61:star:]
  • [ECCV] [2.7k:fire:] XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. [binarization] [torch] [787:star:]
  • [NeurIPS)] [1.7k:fire:] Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [binarization] [torch] [239:star:]

2015

  • [NeurIPS] [1.8k:fire:] BinaryConnect: Training Deep Neural Networks with binary weights during propagations. [binarization] [code] [330:star:]
  • [ICML] [191:fire:] Bitwise Neural Networks. [binarization]

Codes_and_Docs

  • [code] [doc] ZF-Net: An Open Source FPGA CNN Library.
  • [doc] Accelerating CNN inference on FPGAs: A Survey.
  • [code] Different quantization methods implement by Pytorch.
  • [中文] Quantization Methods.
  • [中文] Run BNN in FPGA.
  • [中文] An Overview of Deep Compression Approaches.
  • [中文] 嵌入式深度学习之神经网络二值化(3)- FPGA实现

Our_Team

Our team is part of the DIG group of the State Key Laboratory of Software Development Environment (SKLSDE), supervised Prof. Xianglong Liu. The main research goals of our team is compressing and accelerating models under multiple scenes.

Members

Ruihao Gong

Ruihao Gong is currently a third-year graduate student at Beihang University under the supervision of Prof. Xianglong Liu. Since 2017, he worked on the build-up of computer vision systems and model quantization as an intern at Sensetime Research, where he enjoyed working with the talented researchers and grew up a lot with the help of Fengwei Yu, Wei Wu, and Junjie Yan. During the early time of the internship, he independently took responsibility for the development of intelligent video analysis system Sensevideo. Later, he started the research on model quantization which can speed up the inference and even the training of neural networks on edge devices. Now he is devoted to further promoting the accuracy of extremely low-bit models and the auto-deployment of quantized models.

Haotong Qin

I am a Ph.D. student (Sep 2019 - ) in the State Key Laboratory of Software Development Environment (SKLSDE) and ShenYuan Honors College at Beihang University, supervised by Prof. Wei Liand Prof. Xianglong Liu. I obtained a B.Eng degree in computer science and engineering from Beihang University. I was a research intern (Jun 2020 - Aug 2020) at the WeiXin Group of Tencent. In my undergraduate study, I interned at the Speech group of Microsoft Research Asia (MSRA) supervised by Dr. Wenping Hu. I'm interested in deep learning, computer vision, and model compression. My research goal is to enable state-of-the-art neural network models to be successfully deployed on resource-limited hardware. This includes compressing and accelerating models on multiple tasks, and flexible and efficient deployment for multiple hardware.

Xiangguo Zhang

Xiangguo Zhang is a second-year graduate student in the School of Computer Science of Beihang University, under the guidance of Prof. Xianglong Liu. He received a bachelor's degree from Shandong University in 2019 and entered Beihang University in the same year. Currently, he is interested in computer vision and post training quantization.

Yifu Ding

Yifu Ding is a senior student in the School of Computer Science and Engineering at Beihang University. She is in the State Key Laboratory of Software Development Environment (SKLSDE), under the supervision of Prof. Xianglong Liu. Currently, she is interested in computer vision and model quantization. She thinks that neural network models which are highly compressed can be deployed on resource-constrained devices. And among all the compression methods, quantization is a potential one.

Qinghua Yan

I am a senior student in the Sino-French Engineer School at Beihang University. I just started the research on model compression in the Skate Key Laboratory of Software Development Environment (SKLSDE), under the supervision of Prof. Xianglong Liu. I have great enthusiasm for deep learning and model quantization and I really enjoy working with my talented teammates.

Xiuying Wei

Xiuying Wei is a first-year graduate student at Beihang University under the supervision of Prof. Xianglong Liu. She recevied a bachelor’s degree from Shandong University in 2020. Currently, she is interested in model quantization. She thinks that quantization could make model faster and more robust, which could put deep learning systems on low-power devices and bring more opportunity for future.

Publications

About

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • VHDL 98.6%
  • Other 1.4%