Quantization on trained model #6

shon-otmazgin · 2021-03-31T11:34:30Z

❓ Questions and Help

Hello,
Great paper! kudos!
After reading I was wondering if it is possible to use these quantization methods on trained model using one of huggingface transformers or shall we re-train the model and use I-BERT?

kssteven418 · 2021-03-31T12:00:39Z

Thanks for your interest!

First of all, HF and Fairseq (the current repo) are two different implementations for I-BERT and are independent from each others. You can use one of them.
In either way, you start with a pre-trained RoBERTa model (we do not currently support other models, but you will be able to easily implement on whatever your target model by referring to our implementation!), which you have to finetune on your target downstream task (e.g., MNLI, Squad, etc). After that, you can quantize the model and recover accuracy via quantization aware retraining. That is to say, there is no checkpoint provided for quantized models.

Hope this answers your question, and please let me know if it doesn't.

shon-otmazgin · 2021-03-31T12:11:29Z

Hello @kssteven418,
OK, I want to finetune it on my custom task it is possible ? or only GLUE datasets are supported?

kssteven418 · 2021-03-31T12:13:24Z

It is not restricted to specific tasks, so you can finetune it on your own task.

shon-otmazgin · 2021-03-31T14:20:53Z

Let me rephrase my question.
Basically what I am trying to do is quantize each layer in my pretrained model with your suggested quant modules.
For simplicity lets try to quantize only nn.Linear layer for now.

@kssteven418 can you give me a hint where to look at? and how to convert these layers? lets ignore quantization-aware-finetuning I want to see the accuracy degradation while inference speed increasing.

My task is about fast coreference resolution and combined it with quantization may makes it practical to use.

Thanks !

bdalal · 2021-03-31T16:07:39Z

You can use it on any model. I'm currently evaluating applying the quantized modules to distilbert from HF and so far it seems to be working. You essentially need to replace the various layers with their QAT counterparts and then make sure that your activations are correctly requantized where needed (which can be found from the paper or the IBert code).

shon-otmazgin · 2021-03-31T16:12:45Z

@bdalal here is my example:

from fairseq.quantization.utils.quant_modules import QuantLinear

linear = model.layer1

qlinear = QuantLinear(weight_bit=8, bias_bit=8, quant_mode='symmetric')
qlinear.set_param(linear)

now I have QuantLinear. what i cant understand is when using forward i need to send prev_act_scaling_factor.

@kssteven418 what is does ? and what i should send there ?

bdalal · 2021-04-01T18:30:48Z

You'd need to start with the embedding layer. The way I did in it in HF was to just pull the disitlbert code next to their IBert code and then replace every EMbedding, Linear, Softmax, GELU and LayerNorm layers with their corresponding quantized modules. Not sure if this helps. I'd suggest looking at their HF code because it's much easier to understand how the QAT works there.

shon-otmazgin · 2021-04-02T22:15:18Z

@bdalal Can you share your idistlbett?

bdalal · 2021-04-03T04:21:39Z

I'll be pushing it to github early next week and I'll share the link once I do.

bdalal · 2021-04-07T18:13:24Z

@shon-otmazgin I've pushed my impl. You can find that here

There's some instability during training but I haven't gotten around to troubleshooting it.

shon-otmazgin added the question Further information is requested label Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization on trained model #6

Quantization on trained model #6

shon-otmazgin commented Mar 31, 2021

kssteven418 commented Mar 31, 2021

shon-otmazgin commented Mar 31, 2021

kssteven418 commented Mar 31, 2021

shon-otmazgin commented Mar 31, 2021 •

edited

Loading

bdalal commented Mar 31, 2021

shon-otmazgin commented Mar 31, 2021 •

edited

Loading

bdalal commented Apr 1, 2021

shon-otmazgin commented Apr 2, 2021

bdalal commented Apr 3, 2021

bdalal commented Apr 7, 2021 •

edited

Loading

Quantization on trained model #6

Quantization on trained model #6

Comments

shon-otmazgin commented Mar 31, 2021

❓ Questions and Help

kssteven418 commented Mar 31, 2021

shon-otmazgin commented Mar 31, 2021

kssteven418 commented Mar 31, 2021

shon-otmazgin commented Mar 31, 2021 • edited Loading

bdalal commented Mar 31, 2021

shon-otmazgin commented Mar 31, 2021 • edited Loading

bdalal commented Apr 1, 2021

shon-otmazgin commented Apr 2, 2021

bdalal commented Apr 3, 2021

bdalal commented Apr 7, 2021 • edited Loading

shon-otmazgin commented Mar 31, 2021 •

edited

Loading

shon-otmazgin commented Mar 31, 2021 •

edited

Loading

bdalal commented Apr 7, 2021 •

edited

Loading