Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv7/v8 performance drop with new OpenVINO version #43

Open
maxsitt opened this issue Mar 20, 2023 · 12 comments
Open

YOLOv7/v8 performance drop with new OpenVINO version #43

maxsitt opened this issue Mar 20, 2023 · 12 comments
Assignees

Comments

@maxsitt
Copy link

maxsitt commented Mar 20, 2023

After the migration to OpenVINO 2022.1.0, I can find performance drops (reduced inference speed) for YOLOv7-tiny and YOLOv8n models:

Model OpenVINO version OAK fps .blob size (KB)
YOLOv7-tiny 2021.4.2 38 11807
YOLOv7-tiny 2022.1.0 32 11809
YOLOv8n 2021.4.2 34 5938
YOLOv8n 2022.1.0 32 5938

All models were converted with the same .pt weights and use 4 shaves. The only differences in output I can find are different OpenVINO .xml size and structure and for YOLOv7-tiny slightly different size of the .blob file.

I tried using the Blobconverter with the simplified .onnx model (from Luxonis tools output) and same settings, which leads to the same performance drop for YOLOv8n. YOLOv7-tiny was not properly converted by the Blobconverter (throws [error] Mask is not defined for output layer with width '6'.).

For YOLOv5n and YOLOv6n model performace is exactly the same with both OpenVINO versions.

I've read the OpenVINO API 2.0 Transition Guide but can't find any hints what could be the cause for the reduced inference speed.

I can only assume that it has something to do with the YOLOv7 (YOLOv8) specific model structure and different behaviour of the OpenVINO 2022.1.0 version during conversion.

@tersekmatija
Copy link
Collaborator

Thanks for reporting that @maxsitt , we can certainly investigate this further.

When using blobconverter for YoloV7 conversion, did you use the correct model optimizer parameters?

Regarding performance change - there are several reasons that could cause that. Certain operations in 22.1 release were updated to Opset8 (for example, OV defines Softmax for Opset1 and Opset8, and it's similar for a variety of other operations, such as ReduceMax, MaxPool, ...). Opset is related to the implementation of this operation in the underlying MX plugin, and when executing the model on the device it tells it which implementation to use. It could certainly happen that some operations in the new Opset are slightly slower. Unfortunately, we don't have much control over that, so if performance drop is of an issue, I'd suggest to rely on 2021.4.2 or use YoloV6n. We could in theory enable OV version selection in the tools itself.

I have two follow up questions:
1/ Would choosing the version be desired?
2/ How are you current measuring the FPS? Can you share the blobs and the measuring script? I think it would be better to compare the raw latency directly to verify it's not a measuring/script error or post-processing overhead.

We'll do some investigation on our end as well, but preparing a MRE (code, blobs) would help us if our findings don't match yours.

@tersekmatija
Copy link
Collaborator

3/ Are you using the same DepthAI version and which one when benchmarking?

@maxsitt
Copy link
Author

maxsitt commented Mar 20, 2023

Thanks for your quick reply @tersekmatija!

For YOLOv7 conversion with the BlobConverter website I used the following model optimizer parameters:

--data_type=FP16 --scale=255 --reverse_input_channels

I also tried adding --output=output,278,347,416 which still threw the same error (Mask is not defined for output layer with width '6'.). Regarding this output: I took it from looking at the simplified ONNX model with Netron:

YOLOv7-tiny_onnx_outputs

It is not clear to me why the outputs are named this way. I thought it would be the same as for the other YOLO versions. Looking at the export_yolov7.py script, outputs should have different names?

Regarding your questions:
1/ Choosing the OpenVINO version would be a good workaround for the moment, especially if certain models converted with version 2021.4.2 could potentially run faster on the OAK.

2/ I attached the YOLOv7-tiny .blobs and measuring script together with the simplified ONNX model. Both converted with Luxonis tools before and after implementing OpenVINO 2022.1.0 as default version.

3/ I'm using depthai version 2.20.2.0 on Raspberry Pi Zero 2 W, connected via SSH. This is also the reason why I'm printing fps to console for measuring, as showing the frames via X11 forwarding caps the inference speed at ~10 fps.

Regarding YOLOv8n: I measured fps again with the different models and now the speed seems to be the same for both OpenVINO versions. I don't know why it was different before, even when testing multiple times. Maybe the way I'm measuring fps is not precise enough, it would be great if you could propose a more accurate way by comparing the raw latency!

MRE_fps_measurement.zip

YOLOv7-tiny_ONNX.zip

@tersekmatija
Copy link
Collaborator

Hey, yes, this could indeed affect the performance.

So the output names should be named output1_yolov7,output2_yolov7,output3_yolov7. If you take a look in netron.app and upload the ONNX, you'll see it on the sigmoid layers towards the end. The output layer is default one, which we prune and do not use for decoding. Since you are exporting the model with this layer, it could results in the throughput drop.

Do you mind exporting the model with --output flag set to above and re-computing the throughput?

@maxsitt
Copy link
Author

maxsitt commented Mar 20, 2023

Ok sorry, now I can see the actual output. With this the conversion works properly with the BlobConverter.

The speed difference between the converted YOLOv7-tiny models is the same for both OpenVINO versions: 2021.4.2 = 38 fps; 2022.1.0 = 33 fps.

@tersekmatija
Copy link
Collaborator

Ok, so to confirm - there is no more error Mask is not defined, both 2021.4.2 and 2022.1 give you the same results now, but the FPS difference between 2021.4.2 and 2022.1 remains? Only for V7 or for V8 as well?

@maxsitt
Copy link
Author

maxsitt commented Mar 20, 2023

Yes, only for YOLOv7-tiny.
I tested v8 before with 34 fps (2021.4) and 32 fps (2022.1) respectively, but now both are running at 34 fps, so maybe this was just a measurement error.

@tersekmatija
Copy link
Collaborator

Ok, thanks. We will investigate V7. CC @HonzaCuhel

@maxsitt
Copy link
Author

maxsitt commented Mar 20, 2023

Not sure what's going on, but now the YOLOv8n models differ in speed again (2021.4.2 = 34 fps; 2022.1.0 = 32 fps).

I attached MRE + models just to be sure, maybe you could also take a look at this.

MRE_fps_measurement_YOLOv8.zip

@HonzaCuhel
Copy link
Contributor

Hi @maxsitt,

first of all, let me apologize for delayed reply and thank you for reporting! We have investigated this issue thoroughly and you are right, there is a performance drop with new OpenVINO version for not only YoloV7 and V8, but also for V6 R3. We have found out that there are differences in the ops used by these two version of OpenVINO, we have created an GitHub issue in the OpenVINO repo (link) to clarify the problem. But for now, we have been able to solve the issue for YoloV6 R3 & V8 by using the --use_legacy_frontend flag during the conversion to IR. This update will be deployed as soon as possible. The perfomance of YoloV7 even after this update will still be the same, nevertheless, I would recommend to you to use V6 (or V8), as V6 is faster than V7. If there will be any update, I'll let you know.

Best
Jan

@HonzaCuhel
Copy link
Contributor

Just deployed the newer version of tools.

@maxsitt
Copy link
Author

maxsitt commented Apr 20, 2023

Hi Jan,

thanks a lot for looking into this! To be able to select using the Legacy Front-End flag is a great fix. I'm very interested in the possible explanations for this performance drop, let's see what the OpenVINO developers will respond.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants