Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python,....] #128578

Open
yangyyt opened this issue Jun 13, 2024 · 1 comment
Assignees
Labels
oncall: quantization Quantization support in PyTorch

Comments

@yangyyt
Copy link

yangyyt commented Jun 13, 2024

馃悰 Describe the bug

After QAT training, the following error is reported for inference:
NotImplementedError: Could not run 'quantized::linear' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'quantized::linear' is only available for these backends: [QuantizedCPU, QuantizedCUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

training code set:
model.modulexx.ar_predict_layer.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86') [and set quant and dequant]
model_quant = torch.ao.quantization.prepare_qat(model, inplace=True)
model.train()
....
save: torch.save(torch.ao.quantization.convert(model.cpu().eval(), inplace=False).state_dict(), filename)

inference code set:
state_dict = torch.load(filename)
model.ar_predict_layer.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86')
model_fp32_prepared = torch.quantization.prepare_qat(model_ar, inplace=True)
model_int8 = torch.quantization.convert(model_fp32_prepared.eval(), inplace=False)
model_int8.load_state_dict(state_dict['model'])
model_ar = model_int8
model_ar.eval()

Versions

torch_version: 2.1.2+cu121
python:3.10

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim

@soulitzer soulitzer added the oncall: quantization Quantization support in PyTorch label Jun 13, 2024
@Xia-Weiwen
Copy link
Collaborator

Hi @yangyyt The problem you met was that the quantized model expected quantized tensors as inputs but it got non-quantized inputs. It's probably because you didn't insert QuantStub and DeQuantStub correctly in the model. You may refer to this tutorial for more details and examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: quantization Quantization support in PyTorch
Projects
None yet
Development

No branches or pull requests

3 participants