NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python,....] #128578

yangyyt · 2024-06-13T02:09:49Z

🐛 Describe the bug

After QAT training, the following error is reported for inference:
NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

training code set:
model.modulexx.ar_predict_layer.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86') [and set quant and dequant]
model_quant = torch.ao.quantization.prepare_qat(model, inplace=True)
model.train()
....
save: torch.save(torch.ao.quantization.convert(model.cpu().eval(), inplace=False).state_dict(), filename)

inference code set:
state_dict = torch.load(filename)
model.ar_predict_layer.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86')
model_fp32_prepared = torch.quantization.prepare_qat(model_ar, inplace=True)
model_int8 = torch.quantization.convert(model_fp32_prepared.eval(), inplace=False)
model_int8.load_state_dict(state_dict['model'])
model_ar = model_int8
model_ar.eval()

Versions

torch_version: 2.1.2+cu121
python:3.10

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim

Xia-Weiwen · 2024-06-14T03:21:13Z

Hi @yangyyt The problem you met was that the quantized model expected quantized tensors as inputs but it got non-quantized inputs. It's probably because you didn't insert QuantStub and DeQuantStub correctly in the model. You may refer to this tutorial for more details and examples.

leslie-fang-intel assigned Xia-Weiwen Jun 13, 2024

soulitzer added the oncall: quantization Quantization support in PyTorch label Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python,....] #128578

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python,....] #128578

yangyyt commented Jun 13, 2024 •

edited by pytorch-bot bot

Loading

Xia-Weiwen commented Jun 14, 2024

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python,....] #128578

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python,....] #128578

Comments

yangyyt commented Jun 13, 2024 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

Xia-Weiwen commented Jun 14, 2024

yangyyt commented Jun 13, 2024 •

edited by pytorch-bot bot

Loading