TensorRT speedup for EfficientDet models. The repo is based on https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch.
Speeding up the EfficientDet models with TensorRT is mainly composed of 5 steps:
-
Modify original code to support TensorRT speedup (refer to link and link)
-
Convert Pytorch model to onnx file:
python torch2onnx.py
-
Visualize the onnx file with netron:
netron efficientdet-d0.onnx
-
Convert onnx file to TensorRT engine:
bash onnx2trt.sh
-
Infer with TensorRT engine:
python trt.py
On a single RTX 3090 GPU:
model | Input size | Inference Latency (before) | Inference Latency (after) |
---|---|---|---|
D0 | 128 | 35 ms | 9 ms |
D0 | 512 | 39 ms | 25 ms |
Note: The current codebase implement EfficientDet-D0 with input size 512 for TensorRT speedup. Other models or input sizes can be realized by modifying the related code. Feel free to ask questions.