Quantization model deploy on GPU #318

MyraBaba · 2022-10-03T21:16:01Z

I have below error. model downloaded from your exported ones:

with cache or without cache enabled same error.

python3 infer.py --model yolov7-w6-end2end-ort-nms.onnx --image /data/Videos/hiv00131/hiv00131/00000480.png --device gpu --use_trt True
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(540)::CreateTrtEngineFromOnnx Detect serialized TensorRT Engine file in yolotrt.cache, will load it directly.
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(106)::LoadTrtCache Build TensorRT Engine from cache file: yolotrt.cache with shape range information as below,
[INFO] fastdeploy/backends/tensorrt/trt_backend.cc(109)::LoadTrtCache Input name: images, shape=[1, 3, 640, 640], min=[1, 3, 640, 640], max=[1, 3, 640, 640]

[INFO] fastdeploy/fastdeploy_runtime.cc(270)::Init Runtime initialized with Backend::TRT in device Device::GPU.
[ERROR] fastdeploy/backends/tensorrt/trt_backend.cc(384)::AllocateOutputsBuffer Cannot find output: num_dets of tensorrt network from the original model.
Aborted (core dumped)

DefTruth · 2022-10-04T03:09:44Z

Please check the docs of YOLOv7End2EndORT:

vision/detection/yolov7end2end_ort

This class does not support TRT backend. To support TRT::EfficientNMS_TRT op, you should use YOLOv7End2EndTRT, please refer to:

vision/detection/yolov7end2end_trt

export yolov7 with trt nms (let max-wh be None):

wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt

# Export onnx format file with TRT_NMS (Tips: corresponding to YOLOv7 release v0.1 code)
python export.py --weights yolov7.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
# The command to export other models is similar Replace yolov7.pt with yolov7x.pt yolov7-d6.pt yolov7-w6.pt ...
# When using YOLOv7End2EndTRT, you only need to provide the onnx file, no need to transfer the trt file, and it will be automatically converted during inference

or download from vision/detection/yolov7end2end_trt .

MyraBaba · 2022-10-04T11:19:21Z

@DefTruth Thanks I noticed the problem.

Another interesting thing that I quantize of the exported to onnx (small object detection / visdrone paddle) and its succesfully QUint8 and inferencin.

Problem is qunatized onnx much more slower than the orignal fp32.. 10 times.

Meanwhile I will give a shot to int8 chip (hailo-8)
To export onnx some converter asks for calibration images as in
weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams

How can I quantize with calibration ? weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams

Best

jiangjiajun · 2022-10-05T02:41:58Z

@DefTruth Thanks I noticed the problem.

Another interesting thing that I quantize of the exported to onnx (small object detection / visdrone paddle) and its succesfully QUint8 and inferencin.

Problem is qunatized onnx much more slower than the orignal fp32.. 10 times.

Meanwhile I will give a shot to int8 chip (hailo-8) To export onnx some converter asks for calibration images as in weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams

How can I quantize with calibration ? weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams

Best

Inference with CPU or GPU while you use the quantized onnx model?

MyraBaba · 2022-10-10T11:40:52Z

@jiangjiajun inferencing with GPU 2080Ti

jiangjiajun · 2022-10-10T11:50:36Z

Which tool you are using to quantize your onnx model?

MyraBaba · 2022-10-10T11:53:49Z

@jiangjiajun
onnx:


`from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = '/Users/tulpar/Project/devPaddleDetection/sliced_visdrone.onnx'
model_quant = '/Users/tulpar/Project/devPaddleDetection/88quant_sliced_visdrone.onnx'
# quantized_model = quantize_dynamic(model_fp32, model_quant)
_quantized_model = quantize_dynamic(model_fp32 , model_quant , weight_type=QuantType.QUInt8)
`

jiangjiajun · 2022-10-10T12:12:17Z

This quant tool is not supported by TensorRT now, Refer this doc https://onnxruntime.ai/docs/performance/quantization.html#quantization-on-gpu

yunyaoXYY · 2022-10-10T12:30:02Z

@jiangjiajun onnx:


`from onnxruntime.quantization import quantize_dynamic, QuantType

model_fp32 = '/Users/tulpar/Project/devPaddleDetection/sliced_visdrone.onnx'
model_quant = '/Users/tulpar/Project/devPaddleDetection/88quant_sliced_visdrone.onnx'
# quantized_model = quantize_dynamic(model_fp32, model_quant)
_quantized_model = quantize_dynamic(model_fp32 , model_quant , weight_type=QuantType.QUInt8)
`

Hi, FastDeploy will provide the tools to quantize model, which could suit deployment on FastDeploy better. See current tutorials as : https://github.com/PaddlePaddle/FastDeploy/tree/develop/tools/quantization. And we will release the examples about how to deploy INT8 models (YOLO series) on FastDeploy in tow days.

What model do you want to quantize and deploy on FastDeploy? We would give you supports.

MyraBaba · 2022-10-10T12:54:13Z

@jiangjiajun

The model is : https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams

We have a competition for a huge project for object detection as in above model exactly. But we need to achieve 100fps minimum on the Xavier NX at 640x480 px.

Accuracy is perfect for above model. Neet to speed up and achive the performance.

This is why I tried to quantize int8.

I tried cpp inference 👍
./main --model_dir=/data/dProjects/devPaddleDetection/output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025 --video_file=/data/RTSP_oz/20221010_14_30:00_rtsp029.mp4 --device=gpu --run_mode=trt_int8 --batch_size=8 --output_dir=/data/int8.mp4

But not much speedup . Only 40millisec on the 2080 rtx Ti

yunyaoXYY · 2022-10-10T13:27:52Z

@jiangjiajun

The model is : https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams

We have a competition for a huge project for object detection as in above model exactly. But we need to achieve 100fps minimum on the Xavier NX at 640x480 px.

Accuracy is perfect for above model. Neet to speed up and achive the performance.

This is why I tried to quantize int8.

I tried cpp inference 👍 ./main --model_dir=/data/dProjects/devPaddleDetection/output_inference/ppyoloe_crn_l_80e_sliced_visdrone_640_025 --video_file=/data/RTSP_oz/20221010_14_30:00_rtsp029.mp4 --device=gpu --run_mode=trt_int8 --batch_size=8 --output_dir=/data/int8.mp4

But not much speedup . Only 40millisec on the 2080 rtx Ti

We have tried to quantize ppyoloe_crn_l_300e_coco, and it works well on FastDeploy.
Maybe we could help you, how about join our Slack channel for further support? link: https://fastdeployworkspace.slack.com/ssb/redirect

MyraBaba · 2022-10-10T14:05:35Z

@yunyaoXYY
I will join.

any speed improvement.

MyraBaba · 2022-10-10T14:10:47Z

@yunyaoXYY did you tried ppyoloe_crn_l_80e_sliced_visdrone_640_025 ? any speedup ?

MyraBaba · 2022-10-10T14:12:37Z

@yunyaoXYY I need and invitation to join Slack

yunyaoXYY · 2022-10-10T14:26:35Z

@yunyaoXYY I need and invitation to join Slack

HI, please try this.

https://join.slack.com/t/fastdeployworkspace/shared_invite/zt-1hm4rrdqs-RZEm6_EAanuwEVZ8EJsG~g

MyraBaba · 2022-10-11T08:24:04Z

Inferencing both, an slow.. GPU 10 times CPU 1,5 times

…

On 5 Oct 2022, at 05:42, Jason ***@***.***> wrote: @DefTruth <https://github.com/DefTruth> Thanks I noticed the problem. Another interesting thing that I quantize of the exported to onnx (small object detection / visdrone paddle) and its succesfully QUint8 and inferencin. Problem is qunatized onnx much more slower than the orignal fp32.. 10 times. Meanwhile I will give a shot to int8 chip (hailo-8) To export onnx some converter asks for calibration images as in weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams <https://github.com/hailo-ai/hailo_model_zoo> How can I quantize with calibration ? weights=https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams <https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_80e_sliced_visdrone_640_025.pdparams> Best Inference with CPU or GPU while you use the quantized onnx model? — Reply to this email directly, view it on GitHub <#318 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEFRZH6XSFWG62KOJCBI7V3WBTTIFANCNFSM6AAAAAAQ37AYWQ>. You are receiving this because you authored the thread.

jiangjiajun · 2024-02-06T04:22:55Z

此ISSUE由于一年未更新，将会关闭处理，如有需要，可再次更新打开。

DefTruth self-assigned this Oct 4, 2022

jiangjiajun changed the title ~~Cannot find output: num_dets of tensorrt network from the original model.~~ Quantization model deploy on GPU Oct 10, 2022

jiangjiajun closed this as completed Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization model deploy on GPU #318

Quantization model deploy on GPU #318

MyraBaba commented Oct 3, 2022

DefTruth commented Oct 4, 2022 •

edited

MyraBaba commented Oct 4, 2022

jiangjiajun commented Oct 5, 2022

MyraBaba commented Oct 10, 2022

jiangjiajun commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

jiangjiajun commented Oct 10, 2022

yunyaoXYY commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

yunyaoXYY commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

yunyaoXYY commented Oct 10, 2022

MyraBaba commented Oct 11, 2022 via email

jiangjiajun commented Feb 6, 2024

Quantization model deploy on GPU #318

Quantization model deploy on GPU #318

Comments

MyraBaba commented Oct 3, 2022

DefTruth commented Oct 4, 2022 • edited

MyraBaba commented Oct 4, 2022

jiangjiajun commented Oct 5, 2022

MyraBaba commented Oct 10, 2022

jiangjiajun commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

jiangjiajun commented Oct 10, 2022

yunyaoXYY commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

yunyaoXYY commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

MyraBaba commented Oct 10, 2022

yunyaoXYY commented Oct 10, 2022

MyraBaba commented Oct 11, 2022 via email

jiangjiajun commented Feb 6, 2024

DefTruth commented Oct 4, 2022 •

edited