Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow batch inference as compared to insightface's single predictions #3

Open
zeahmd opened this issue Feb 28, 2022 · 2 comments
Open

Comments

@zeahmd
Copy link

zeahmd commented Feb 28, 2022

More time during batch inference as compared to insightface's single pass!!

`
import numpy as np
from insightface.app import FaceAnalysis
import cv2
import matplotlib.pyplot as plt
import time
from batch_face import RetinaFace

img = cv2.imread('/home/zeeshan/Downloads/avengers.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (640, 640))
plt.imshow(img)

model = FaceAnalysis(allowed_modules=['detection'],
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
model.prepare(ctx_id=0)
tik = time.time()
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
print(f"time taken: {time.time()-tik}")

detector = RetinaFace(gpu_id=0)
tik = time.time()
faces = detector.detect([img, img, img, img, img])
print(f"time taken: {time.time()-tik}")
`

I have even tried repeating the same img object inside the batch array and then computing the result. Even in that case the total of computing individually with insightface is much less than this? @elliottzheng could you please have a look at this?

@elliottzheng
Copy link
Owner

elliottzheng commented Feb 28, 2022

Hi, I run your code with this image, and I get

Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/elliott/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/genderage.onnx genderage
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/elliott/.insightface/models/buffalo_l/w600k_r50.onnx recognition
set det-size: (640, 640)
time taken: 0.24384212493896484
time taken: 0.04523015022277832

I get similar results with different images.

I am running with torch 1.10 on NVIDIA Tesla V100.
I have installed the onnxruntime and onnxruntime-gpu for insightface.
I would like to know your outputs.

@zeahmd
Copy link
Author

zeahmd commented Feb 28, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants