Static quantization issue #24

v1nc3nt27 · 2021-10-06T17:17:42Z

Hello,

thanks a lot for your code and examples! I'm trying to get the static quantization working in the example code, but I always get

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::linear' is only available for these backends: [QuantizedCPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, Tracer, Autocast, Batched, VmapMode].

Could you please give me a hint on how to get this running? From what I have found out is that we need to add Quant() and DeQuant() layers to the beginning and end of the BERT model? If yes, is there already a class that I can use?

The text was updated successfully, but these errors were encountered:

echarlaix · 2021-10-12T11:36:35Z

We have recently enabled the pytorch fx graph mode for quantization aware training and post-training quantization, so no need to add QuantStub and DeQuantStub in the model anymore. If you are interested on using the pytorch eager mode for post-training quantization and on how the model was modified by the Intel Neural Compressor team, you can take a look at : https://github.com/intel/neural-compressor/blob/master/examples/pytorch/eager/language_translation/ptq/transformers/modeling_bert.py

v1nc3nt27 · 2021-10-14T12:56:38Z

Thanks a lot @echarlaix. Works like a charm with torch FX graph mode. However, when running the test (https://github.com/huggingface/optimum/blob/main/tests/intel/test_lpot.py) for static quantization and changing the batch size, I always run in an error if the number of samples divided by the batch size has a remainder and the last batch then has less samples than the actual batch size (e.g. with batch size 16):

[ERROR] Unexpected exception RuntimeError("shape '[16, 128, 12, 64]' is invalid for input of size 786432") happened during tuning.

Is there some way to circumvent this?

echarlaix · 2021-10-19T12:10:30Z

Currently our tracing of the model with torch fx does not support dynamic inputs shapes and we are currently working towards it. In the meantime, a simple fix could be to set dataloader_drop_last of the TrainingArguments to True.

v1nc3nt27 · 2021-10-19T15:42:45Z

Would definitely be a good solution for training. Since I do not want to loose any samples at inference, I just duplicated the last sample to fill up the batch and later on removed it again :).

I'll close the issue since my initial problem is fixed with torch fx. Thanks for your help!

v1nc3nt27 closed this as completed Oct 19, 2021

JingyaHuang mentioned this issue Feb 16, 2023

Memory issue for ORTTrainer when trying to run decoder models #789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static quantization issue #24

Static quantization issue #24

v1nc3nt27 commented Oct 6, 2021 •

edited

Loading

echarlaix commented Oct 12, 2021

v1nc3nt27 commented Oct 14, 2021

echarlaix commented Oct 19, 2021

v1nc3nt27 commented Oct 19, 2021

Static quantization issue #24

Static quantization issue #24

Comments

v1nc3nt27 commented Oct 6, 2021 • edited Loading

echarlaix commented Oct 12, 2021

v1nc3nt27 commented Oct 14, 2021

echarlaix commented Oct 19, 2021

v1nc3nt27 commented Oct 19, 2021

v1nc3nt27 commented Oct 6, 2021 •

edited

Loading