You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After calling " sim.export(path=config.logdir, filename_prefix='quant_model') ", I can get 3 files: quant_model.encodings, quant_model.encodings.yaml and quant_model.onnx.
However, when quant_model.onnx is visualized in netron, it shows as a fp32 ONNX model. Its file size is also same as the original fp32 model's.
How can I get a real int8 ONNX model which contains quanted ops and has smaller file size than the original fp32 model?
The text was updated successfully, but these errors were encountered:
Hi @JiliangNi, AIMET offers quantization simulation, which simulates the quantization noise by quantizing and dequantizing. So, to get INT8 vectors, you will have to run your model on a quantized hardware.
Hi~
After calling " sim.export(path=config.logdir, filename_prefix='quant_model') ", I can get 3 files: quant_model.encodings, quant_model.encodings.yaml and quant_model.onnx.
However, when quant_model.onnx is visualized in netron, it shows as a fp32 ONNX model. Its file size is also same as the original fp32 model's.
How can I get a real int8 ONNX model which contains quanted ops and has smaller file size than the original fp32 model?
The text was updated successfully, but these errors were encountered: