-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add quantized yolov4 model #521
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Xinyu Ye <xinyu.ye@intel.com>
1a0c129
to
c41941f
Compare
Hi @jcwchen , I have tested in my local linux env with the command |
Hi @XinyuYe-Intel, |
No, I use Xeon Gold 6248 processor. Checked /proc/cpuinfo, 'avx512_vnni' is absent. |
Probably that's why current CI fails because I believe most of GitHub action machines do have VNNI support (although some of them do not). It's an existing issue that the CI in ONNX Model Zoo will have different ORT behavior with or without VNNI support #522 so sometimes the CI will fail. I will try to prioritize solving it since it's really confusing for this inconsistent CI. Still, I believe all outputs of existing int8 models in ONNX Model Zoo were produced by ORT with VNNI support. (@mengniwang95 please correct me if I am wrong. Thanks!) It seems to me that we should make all output of int8 models be produced by VNNI support for consistency. If I understand correctly, could you please make this output be produced by a machine with VNNI support? Thank you. |
Hi @jcwchen , existing int8 models are all generated with VNNI support. |
Sure, I'll reproduce it. Thanks for your help! |
Signed-off-by: Xinyu Ye <xinyu.ye@intel.com>
Signed-off-by: Xinyu Ye <xinyu.ye@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for getting back to you late. I just merged my PR to improve the CIs: #526. Ideally the CI should be consistent now (skip ORT test if the CI doesn't have VNNI support). I think the Windows CI failed because it has VNNI support and the inferred result is different from yours. To confirm: did you produce the output.pb with a machine with VNNI support? If so, there might other issue causing this behavior difference...
No problem. I followed advice of @mengniwang95 , produced yolov4-int8.onnx with yolov4.onnx as input in a VNNI supported linux machine, and produced test_data_set in a linux machine without VNNI support, doesn't invlove *.pb. |
Thanks for the context! Could you please regenerated the test_data_set in a linux machine with VNNI support? Then it should pass the CIs. |
Sure, I'll try it. |
Signed-off-by: Xinyu Ye <xinyu.ye@intel.com>
Thanks for updating the outpub.pb! but the updated one is still not reproducible in the CI machine which has avx512 support and the difference seems quite a little... I am trying to figure out the root cause about this behavior difference -- did you produce the output.pb with the latest ONNX Runtime (1.11) and an avx512 machine? The only reason I can think of is the GitHub action machines only have avx512f support and do not have avx512_vnni support, but in the past the CI doesn't encounter this significant result difference with int8 test data... |
Yes, in the avx512_vnni supported machine, I produced yolov4 int8 model with onnx: 1.11.0, onnxruntime: 1.10.0. |
YOLOv4
Description
YOLOv4 optimizes the speed and accuracy of object detection. It is two times faster than EfficientDet. It improves YOLOv3's AP and FPS by 10% and 12%, respectively, with mAP50 of 52.32 on the COCO 2017 dataset and FPS of 41.7 on Tesla 100.
Model
Source
Tensorflow YOLOv4 => ONNX YOLOv4
Inference
Conversion
A tutorial for the conversion process can be found in the conversion notebook.
Validation of the converted model and a graph representation of it can be found in the validation notebook.
Running inference
A tutorial for running inference using onnxruntime can be found in the inference notebook.
Input to model
This model expects input shapes of
(1, 416, 416, 3)
. Each dimension represents the following variables:(batch_size, height, width, channels)
.Preprocessing steps
The following code shows how preprocessing is done. For more information and an example on how preprocess is done, please visit the inference notebook.
Output of model
Output shape:
(1, 52, 52, 3, 85)
There are 3 output layers. For each layer, there are 255 outputs: 85 values per anchor, times 3 anchors.
The 85 values of each anchor consists of 4 box coordinates describing the predicted bounding box (x, y, h, w), 1 object confidence, and 80 class confidences. Here is the class list.
Postprocessing steps
The following postprocessing steps are modified from the hunglc007/tensorflow-yolov4-tflite repository.
Dataset
Pretrained yolov4 weights can be downloaded here.
Validation accuracy
YOLOv4:
mAP50 on COCO 2017 dataset is 0.5733, based on the original tensorflow model.
YOLOv4-int8:
mAP50 on COCO 2017 dataset is 0.570, metric is COCO box mAP@[IoU=0.50:0.95 | area= large | maxDets=100].
Quantization
YOLOv4-int8 is obtained by quantizing YOLOv4 model. We use Intel® Neural Compressor with onnxruntime backend to perform quantization. View the instructions to understand how to use Intel® Neural Compressor for quantization.
Environment
onnx: 1.9.0
onnxruntime: 1.10.0
Prepare model
Model quantize
Publication/Attribution
References
This model is directly converted from hunglc007/tensorflow-yolov4-tflite.
Intel® Neural Compressor
Contributors
License
MIT License