Support Quantization Aware Fine-tuning in all models (pytorch) #10639

sai-prasanna · 2021-03-11T06:03:47Z

🚀 Feature request

Pytorch supports mimicking quantization errors while training the models.
Here is the tutorial on this. For our NLP transformers, it requires a "fake quantization" operation to be done on the embeddings. I found this repository converting BERT to support this.

Motivation

I think quantization aware fine-tuning (if it works) will help a lot of use-cases where dynamic quantization alone doesn't suffice in maintaining the performance of the quantized model. Supporting it out of the box will remove the duplication of model code in end use cases.

Your contribution

I can work on this ASAP. Would appreciate initial thoughts on what a the MVP for it would be, any thoughts on the API (should we take in a "qat" boolean in config?), any pitfalls that I should be aware of, etc.

LysandreJik · 2021-03-11T13:03:14Z

Hello! Would I-BERT, available on master and contributed by @kssteven418 be of interest?

sai-prasanna · 2021-03-11T18:12:47Z

@LysandreJik, Thanks for the useful reference. I guess the i-BERT model has manually implemented the architectural components (kernels, int8 layer norm etc) to make quantization work for BERT. If I am not wrong, their objective is to train BERT as much as possible in int8. The qat in torch takes the approach of training model in floating point fully but incorporating noise in gradients that mimic noise due to quantization. So it's basically throwing the "optimizing for quantization error" part to gradient descent, foregoing any need for altering architectures or fp32/16 training regime.

This approach would be broader and apply for all the architectures without re-implementation. Maybe we can have a "qat" flag in config, that can be used to perform fake quantization and dequantization (which introduces quantization noise to parts of the gradients).

LysandreJik · 2021-03-12T12:44:59Z

Do you have an idea of the changes required for that? Could you do PoC and show us so that we can discuss over it?

sai-prasanna · 2021-03-20T05:40:58Z

@LysandreJik Can you take a look at this implementation. It's a functioning qat aware BERT fine-tuning implementation. The process is described in this paper, Q8BERT: Quantized 8Bit BERT.

github-actions · 2021-04-14T15:02:01Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

pie3636 · 2021-05-25T14:17:09Z

This is a feature I'd like to see as well, as dynamic quantization leads to a huge accuracy drop in my use case. My understanding is that a possible implementation of QAT could also easily be expanded to support static quantization.

rohanshingade · 2021-06-21T08:46:43Z

@sai-prasanna is it possible to load Bert-base (FP32 model) weights into Q8Bert ?

github-actions bot closed this as completed Apr 22, 2021

This was referenced Oct 18, 2021

Add QDQBert model and QAT example of SQUAD task #14057

Closed

Add QDQBert model and quantization examples of SQUAD task #14066

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Quantization Aware Fine-tuning in all models (pytorch) #10639

Support Quantization Aware Fine-tuning in all models (pytorch) #10639

sai-prasanna commented Mar 11, 2021

LysandreJik commented Mar 11, 2021

sai-prasanna commented Mar 11, 2021

LysandreJik commented Mar 12, 2021

sai-prasanna commented Mar 20, 2021

github-actions bot commented Apr 14, 2021

pie3636 commented May 25, 2021

rohanshingade commented Jun 21, 2021

Support Quantization Aware Fine-tuning in all models (pytorch) #10639

Support Quantization Aware Fine-tuning in all models (pytorch) #10639

Comments

sai-prasanna commented Mar 11, 2021

🚀 Feature request

Motivation

Your contribution

LysandreJik commented Mar 11, 2021

sai-prasanna commented Mar 11, 2021

LysandreJik commented Mar 12, 2021

sai-prasanna commented Mar 20, 2021

github-actions bot commented Apr 14, 2021

pie3636 commented May 25, 2021

rohanshingade commented Jun 21, 2021