Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization on trained model #6

Open
shon-otmazgin opened this issue Mar 31, 2021 · 10 comments
Open

Quantization on trained model #6

shon-otmazgin opened this issue Mar 31, 2021 · 10 comments
Labels
question Further information is requested

Comments

@shon-otmazgin
Copy link

❓ Questions and Help

Hello,
Great paper! kudos!
After reading I was wondering if it is possible to use these quantization methods on trained model using one of huggingface transformers or shall we re-train the model and use I-BERT?

@shon-otmazgin shon-otmazgin added the question Further information is requested label Mar 31, 2021
@kssteven418
Copy link
Owner

Thanks for your interest!

First of all, HF and Fairseq (the current repo) are two different implementations for I-BERT and are independent from each others. You can use one of them.
In either way, you start with a pre-trained RoBERTa model (we do not currently support other models, but you will be able to easily implement on whatever your target model by referring to our implementation!), which you have to finetune on your target downstream task (e.g., MNLI, Squad, etc). After that, you can quantize the model and recover accuracy via quantization aware retraining. That is to say, there is no checkpoint provided for quantized models.

Hope this answers your question, and please let me know if it doesn't.

@shon-otmazgin
Copy link
Author

Hello @kssteven418,
OK, I want to finetune it on my custom task it is possible ? or only GLUE datasets are supported?

@kssteven418
Copy link
Owner

It is not restricted to specific tasks, so you can finetune it on your own task.

@shon-otmazgin
Copy link
Author

shon-otmazgin commented Mar 31, 2021

Let me rephrase my question.
Basically what I am trying to do is quantize each layer in my pretrained model with your suggested quant modules.
For simplicity lets try to quantize only nn.Linear layer for now.

@kssteven418 can you give me a hint where to look at? and how to convert these layers? lets ignore quantization-aware-finetuning I want to see the accuracy degradation while inference speed increasing.

My task is about fast coreference resolution and combined it with quantization may makes it practical to use.

Thanks !

@bdalal
Copy link

bdalal commented Mar 31, 2021

You can use it on any model. I'm currently evaluating applying the quantized modules to distilbert from HF and so far it seems to be working. You essentially need to replace the various layers with their QAT counterparts and then make sure that your activations are correctly requantized where needed (which can be found from the paper or the IBert code).

@shon-otmazgin
Copy link
Author

shon-otmazgin commented Mar 31, 2021

@bdalal here is my example:

from fairseq.quantization.utils.quant_modules import QuantLinear

linear = model.layer1

qlinear = QuantLinear(weight_bit=8, bias_bit=8, quant_mode='symmetric')
qlinear.set_param(linear)

now I have QuantLinear. what i cant understand is when using forward i need to send prev_act_scaling_factor.

@kssteven418 what is does ? and what i should send there ?

@bdalal
Copy link

bdalal commented Apr 1, 2021

You'd need to start with the embedding layer. The way I did in it in HF was to just pull the disitlbert code next to their IBert code and then replace every EMbedding, Linear, Softmax, GELU and LayerNorm layers with their corresponding quantized modules. Not sure if this helps. I'd suggest looking at their HF code because it's much easier to understand how the QAT works there.

@shon-otmazgin
Copy link
Author

@bdalal Can you share your idistlbett?

@bdalal
Copy link

bdalal commented Apr 3, 2021

I'll be pushing it to github early next week and I'll share the link once I do.

@bdalal
Copy link

bdalal commented Apr 7, 2021

@shon-otmazgin I've pushed my impl. You can find that here

There's some instability during training but I haven't gotten around to troubleshooting it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants