Slow batch inference as compared to insightface's single predictions #3

zeahmd · 2022-02-28T11:25:15Z

More time during batch inference as compared to insightface's single pass!!

`
import numpy as np
from insightface.app import FaceAnalysis
import cv2
import matplotlib.pyplot as plt
import time
from batch_face import RetinaFace

img = cv2.imread('/home/zeeshan/Downloads/avengers.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (640, 640))
plt.imshow(img)

model = FaceAnalysis(allowed_modules=['detection'],
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
model.prepare(ctx_id=0)
tik = time.time()
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
print(f"time taken: {time.time()-tik}")

detector = RetinaFace(gpu_id=0)
tik = time.time()
faces = detector.detect([img, img, img, img, img])
print(f"time taken: {time.time()-tik}")
`

I have even tried repeating the same img object inside the batch array and then computing the result. Even in that case the total of computing individually with insightface is much less than this? @elliottzheng could you please have a look at this?

elliottzheng · 2022-02-28T12:12:23Z

Hi, I run your code with this image, and I get

Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/elliott/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/genderage.onnx genderage
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/w600k_r50.onnx recognition
set det-size: (640, 640)
time taken: 0.24384212493896484
time taken: 0.04523015022277832

I get similar results with different images.

I am running with torch 1.10 on NVIDIA Tesla V100.
I have installed the onnxruntime and onnxruntime-gpu for insightface.
I would like to know your outputs.

zeahmd · 2022-02-28T12:47:07Z

Hello, I'm sharing the notebook that I used which includes all of the versions of packages in the output. Moreover, it also contains the time taken by each package. You can try this code and compare the results.

…

On Mon, Feb 28, 2022 at 5:12 PM Elliott Zheng ***@***.***> wrote: Hi, I run your code with this image <https://user-images.githubusercontent.com/22427645/155980559-e432cf6b-0bd1-46e2-8137-b5abc8417e3e.jpg>, and I get Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}} model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}} model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}} find model: /home/v-zhenyi/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0 Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}} model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/genderage.onnx genderage Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}} model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/w600k_r50.onnx recognition set det-size: (640, 640) time taken: 0.24384212493896484 time taken: 0.04523015022277832 I get similar results with different images. I am running with torch 1.10 on NVIDIA Tesla V100. I have installed the onnxruntime and onnxruntime-gpu for insightface. I would like to know your outputs. — Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOUCYWCQBYXQA5B43K2ISBDU5NRDLANCNFSM5PQ2BNPQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow batch inference as compared to insightface's single predictions #3

Slow batch inference as compared to insightface's single predictions #3

zeahmd commented Feb 28, 2022

elliottzheng commented Feb 28, 2022 •

edited

Loading

zeahmd commented Feb 28, 2022 via email

Slow batch inference as compared to insightface's single predictions #3

Slow batch inference as compared to insightface's single predictions #3

Comments

zeahmd commented Feb 28, 2022

elliottzheng commented Feb 28, 2022 • edited Loading

zeahmd commented Feb 28, 2022 via email

elliottzheng commented Feb 28, 2022 •

edited

Loading