Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static quantization issue #24

Closed
v1nc3nt27 opened this issue Oct 6, 2021 · 4 comments
Closed

Static quantization issue #24

v1nc3nt27 opened this issue Oct 6, 2021 · 4 comments

Comments

@v1nc3nt27
Copy link

v1nc3nt27 commented Oct 6, 2021

Hello,

thanks a lot for your code and examples! I'm trying to get the static quantization working in the example code, but I always get

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::linear' is only available for these backends: [QuantizedCPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, Tracer, Autocast, Batched, VmapMode].

Could you please give me a hint on how to get this running? From what I have found out is that we need to add Quant() and DeQuant() layers to the beginning and end of the BERT model? If yes, is there already a class that I can use?

@echarlaix
Copy link
Collaborator

We have recently enabled the pytorch fx graph mode for quantization aware training and post-training quantization, so no need to add QuantStub and DeQuantStub in the model anymore. If you are interested on using the pytorch eager mode for post-training quantization and on how the model was modified by the Intel Neural Compressor team, you can take a look at : https://github.com/intel/neural-compressor/blob/master/examples/pytorch/eager/language_translation/ptq/transformers/modeling_bert.py

@v1nc3nt27
Copy link
Author

Thanks a lot @echarlaix. Works like a charm with torch FX graph mode. However, when running the test (https://github.com/huggingface/optimum/blob/main/tests/intel/test_lpot.py) for static quantization and changing the batch size, I always run in an error if the number of samples divided by the batch size has a remainder and the last batch then has less samples than the actual batch size (e.g. with batch size 16):

[ERROR] Unexpected exception RuntimeError("shape '[16, 128, 12, 64]' is invalid for input of size 786432") happened during tuning.

Is there some way to circumvent this?

@echarlaix
Copy link
Collaborator

Currently our tracing of the model with torch fx does not support dynamic inputs shapes and we are currently working towards it. In the meantime, a simple fix could be to set dataloader_drop_last of the TrainingArguments to True.

@v1nc3nt27
Copy link
Author

Would definitely be a good solution for training. Since I do not want to loose any samples at inference, I just duplicated the last sample to fill up the batch and later on removed it again :).

I'll close the issue since my initial problem is fixed with torch fx. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants