-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow batch inference as compared to insightface's single predictions #3
Comments
Hi, I run your code with this image, and I get
I get similar results with different images. I am running with torch 1.10 on NVIDIA Tesla V100. |
Hello, I'm sharing the notebook that I used which includes all of the
versions of packages in the output. Moreover, it also contains the time
taken by each package. You can try this code and compare the results.
…On Mon, Feb 28, 2022 at 5:12 PM Elliott Zheng ***@***.***> wrote:
Hi, I run your code with this image
<https://user-images.githubusercontent.com/22427645/155980559-e432cf6b-0bd1-46e2-8137-b5abc8417e3e.jpg>,
and I get
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/v-zhenyi/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/genderage.onnx genderage
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CUDAExecutionProvider': {'do_copy_in_default_stream': '1', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'cudnn_conv_use_max_workspace': '0', 'gpu_mem_limit': '18446744073709551615', 'cudnn_conv_algo_search': 'EXHAUSTIVE', 'gpu_external_alloc': '0', 'device_id': '0'}}
model ignore: /home/v-zhenyi/.insightface/models/buffalo_l/w600k_r50.onnx recognition
set det-size: (640, 640)
time taken: 0.24384212493896484
time taken: 0.04523015022277832
I get similar results with different images.
I am running with torch 1.10 on NVIDIA Tesla V100.
I have installed the onnxruntime and onnxruntime-gpu for insightface.
I would like to know your outputs.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOUCYWCQBYXQA5B43K2ISBDU5NRDLANCNFSM5PQ2BNPQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
More time during batch inference as compared to insightface's single pass!!
`
import numpy as np
from insightface.app import FaceAnalysis
import cv2
import matplotlib.pyplot as plt
import time
from batch_face import RetinaFace
img = cv2.imread('/home/zeeshan/Downloads/avengers.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (640, 640))
plt.imshow(img)
model = FaceAnalysis(allowed_modules=['detection'],
providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
model.prepare(ctx_id=0)
tik = time.time()
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
faces = model.get(img)
print(f"time taken: {time.time()-tik}")
detector = RetinaFace(gpu_id=0)
tik = time.time()
faces = detector.detect([img, img, img, img, img])
print(f"time taken: {time.time()-tik}")
`
I have even tried repeating the same img object inside the batch array and then computing the result. Even in that case the total of computing individually with insightface is much less than this? @elliottzheng could you please have a look at this?
The text was updated successfully, but these errors were encountered: