-
Notifications
You must be signed in to change notification settings - Fork 282
Description
Hi Team ,
I am trying to convert a BERT Seq2Seq Model into Intel Neural Compressor ( Its an ONNX Model).
It is stuck at the stage after which no quantisation is happening . I tried the relative accuracy upto 80% . Please guide.
Stuck at this :
2022-11-09 06:56:20 [INFO] Get FP32 model baseline.
Accuracy is : 50.2
mean: 16.99 ms, std: 2.8 ms, min: 12.7 ms, max: 33.83 ms, median: 16.31 ms, 95p: 21.73 ms, 99p: 26.9 ms
2022-11-09 06:56:58 [INFO] Save tuning history to /home/jupyter/spell_projects/nc_workspace/2022-11-09_06-56-07/./history.snapshot.
2022-11-09 06:56:58 [INFO] FP32 baseline is: [Accuracy: 50.2000, Duration (seconds): 38.4878]
No update after this .
YAML Property File
version: 1.0
model: # mandatory. used to specify model specific information.
name: bert
framework: onnxrt_integerops # mandatory. possible values are tensorflow, mxnet, pytorch, pytorch_ipex, onnxrt_integerops and onnxrt_qlinearops.
quantization:
approach: post_training_dynamic_quant # optional. default value is post_training_static_quant.
tuning:
accuracy_criterion:
relative: 0.80 # optional. default value is relative, other value is absolute. this example allows relative accuracy loss: 1%.
exit_policy:
timeout: 0 # optional. tuning timeout (seconds). default value is 0 which means early stop. combine with max_trials field to decide when to exit.
max_trials: 10
random_seed: 5271
def inc_quantize(MODEL_CONFIG,):
ONNX_MODEL_FILE = os.path.join(MODEL_CONFIG['ONNX_MODEL_PATH'],MODEL_CONFIG['ONNX_MODEL_FILE_NAME'])
model = onnx.load(ONNX_MODEL_FILE)
quantizer = Quantization(MODEL_CONFIG['QUANT_CONFIG'])
quantizer.cfg.tuning.accuracy_criterion.relative **= 0.8**
quantizer.model = common.Model(model)
quantizer.tokenizer = transformers.AutoTokenizer.from_pretrained(
MODEL_CONFIG['BERT_MODEL_TYPE'])
quantizer.eval_func = run_evaluation_benchmark
q_model = quantizer()
QUANTIZED_SAVE_PATH = os.path.join(MODEL_CONFIG['ONNX_MODEL_PATH'],MODEL_CONFIG['ONNX_QUANTIZED_FILE_NAME'])
q_model.save(QUANTIZED_SAVE_PATH)
print("QUANTIZED MODEL Saved at: ",QUANTIZED_SAVE_PATH)