Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv5 pruned_quant-aggressive_94 exception #226

Closed
SkalskiP opened this issue Dec 16, 2021 · 20 comments
Closed

YOLOv5 pruned_quant-aggressive_94 exception #226

SkalskiP opened this issue Dec 16, 2021 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@SkalskiP
Copy link
Contributor

Describe the bug
I was trying to run demo code with YOLOv5 pruned_quant-aggressive_94 model on g4dn.x2large and encountered this exception.

Stack trace

  | 2021-12-16T15:36:11.889+01:00 | Overwriting original model shape (640, 640) to (800, 800)
  | 2021-12-16T15:36:11.889+01:00 | Original model path: /mnt/pylot/unleash_models/yolov5_optimised/yolov5-s/pruned_quant-aggressive_94.onnx, new temporary model saved to /tmp/tmpd8kad_7r
  | 2021-12-16T15:36:11.890+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
  | 2021-12-16T15:36:13.559+01:00 | DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized)
  | 2021-12-16T15:36:13.559+01:00 | Date: 12-16-2021 @ 14:36:13 UTC
  | 2021-12-16T15:36:13.559+01:00 | OS: Linux ip-10-0-2-22.ap-southeast-2.compute.internal 4.14.173-137.229.amzn2.x86_64 #1 SMP Wed Apr 1 18:06:08 UTC 2020
  | 2021-12-16T15:36:13.559+01:00 | Arch: x86_64
  | 2021-12-16T15:36:13.559+01:00 | CPU: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
  | 2021-12-16T15:36:13.559+01:00 | Vendor: GenuineIntel
  | 2021-12-16T15:36:13.559+01:00 | Cores/sockets/threads: [4, 1, 8]
  | 2021-12-16T15:36:13.559+01:00 | Available cores/sockets/threads: [4, 1, 8]
  | 2021-12-16T15:36:13.559+01:00 | L1 cache size data/instruction: 32k/32k
  | 2021-12-16T15:36:13.559+01:00 | L2 cache size: 1Mb
  | 2021-12-16T15:36:13.559+01:00 | L3 cache size: 35.75Mb
  | 2021-12-16T15:36:13.559+01:00 | Total memory: 30.9605G
  | 2021-12-16T15:36:13.559+01:00 | Free memory: 14.6592G
  | 2021-12-16T15:36:13.559+01:00 | Assertion at ./src/include/wand/jit/pooling/common.hpp:239
  | 2021-12-16T15:36:13.559+01:00 | Backtrace:
  | 2021-12-16T15:36:13.560+01:00 | 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 1# wand::detail::assert_fail(char const*, char const*, int) in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 2# 0x00007F4B71E55271 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 3# 0x00007F4B71E55125 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 4# 0x00007F4B71E554FD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 5# 0x00007F4B71E5A4E0 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 6# 0x00007F4B71E5A89A in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 7# 0x00007F4B71E5CDE8 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 8# 0x00007F4B7101F93B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 9# 0x00007F4B7101FAF9 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 10# 0x00007F4B7101B9D5 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 11# 0x00007F4B71042618 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 12# 0x00007F4B71042C91 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 13# 0x00007F4B71070667 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 14# 0x00007F4B70BFA76B in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 15# 0x00007F4B70BEA8FC in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 16# 0x00007F4B70BD7A4F in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 17# 0x00007F4B71156499 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 18# 0x00007F4B70C0A3EF in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 19# 0x00007F4B70C28DCD in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 20# 0x00007F4B70C28EF3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 21# 0x00007F4B70C295B3 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 22# 0x00007F4B71FB8E10 in /opt/venv/lib/python3.8/site-packages/deepsparse/avx512/libonnxruntime.so.1.8.0
  | 2021-12-16T15:36:13.560+01:00 | 23# 0x00007F4CFA2C06DB in /lib/x86_64-linux-gnu/libpthread.so.0
  | 2021-12-16T15:36:13.560+01:00 | Please email a copy of this stack trace and any additional information to: support@neuralmagic.com

Environment

  1. Ubuntu 18.04
  2. Python 3.8
  3. ML framework version(s)
torch @ https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl
torchvision @ https://download.pytorch.org/whl/cu110/torchvision-0.8.2%2Bcu110-cp38-cp38-linux_x86_64.whl
  1. Other Python package versions
sparseml==0.9.0
sparsezoo==0.9.0
numpy==1.21.4
onnx==1.9.0
onnxruntime==1.7.0

Is there any chance you could help me out to debug that issue?

@SkalskiP SkalskiP added the bug Something isn't working label Dec 16, 2021
@mgoin
Copy link
Member

mgoin commented Dec 16, 2021

Hi @SkalskiP thanks for reporting this issue. Could you help us a little more to get enough details?

Which script are you using? I tried to replicate your situation using deepsparse/examples/ultralytics-yolo/benchmark.py with the pruned and pruned-quant models on an avx512 machine. Also are you using the models straight from the SparseZoo? I passed in the Zoo model stubs directly to the script.

I checked this on 0.9.1 and the nightly build and couldn't replicate it unfortunately.

(user) ➜  ultralytics-yolo git:(9fe31a4) ✗ python benchmark.py zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 --image-shape 800 800 -b1   
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/mgoin/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx
Overwriting original model shape (640, 640) to [800, 800]
Original model path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96, new temporary model saved to /tmp/tmp0qqk_iqq
Compiling deepsparse model for /tmp/tmp0qqk_iqq
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
Engine info: deepsparse.engine.Engine:
        onnx_file_path: /tmp/tmp0qqk_iqq
        batch_size: 1
        num_cores: 18
        num_sockets: 0
        scheduler: Scheduler.default
        cpu_avx_type: avx512
        cpu_vnni: False
Loading dataset
Running for 25 warmup iterations and 80 benchmarking iterations
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:13<00:00,  5.88it/s]
Benchmarking complete. End-to-end results:
BenchmarkResults:
        items_per_second: 6.7487256951254855
        ms_per_batch: 148.17612171173096
        batch_times_mean: 0.14817612171173095
        batch_times_median: 0.029736638069152832
        batch_times_std: 0.2139884281148565
End-to-end per image time: 148.17612171173096ms
(user) ➜  ultralytics-yolo git:(9fe31a4) ✗ python benchmark.py zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 --image-shape 800 800 -q -b1
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/mgoin/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx
Overwriting original model shape (640, 640) to [800, 800]
Original model path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94, new temporary model saved to /tmp/tmpm203a5ew
Compiling deepsparse model for /tmp/tmpm203a5ew
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.9.1 (afc7e831) (release) (optimized) (system=avx512, binary=avx512)
WARNING: Generating emulated code for quantized operations since no VNNI instructions were detected. Quantization (INT8) performance is greatly degraded. Set NM_FAST_VNNI_EMULATION=1 to increase performance at the expense of accuracy.
Engine info: deepsparse.engine.Engine:
        onnx_file_path: /tmp/tmpm203a5ew
        batch_size: 1
        num_cores: 18
        num_sockets: 0
        scheduler: Scheduler.default
        cpu_avx_type: avx512
        cpu_vnni: False
WARNING: VNNI instructions not detected, quantization speedup not well supported
Loading dataset
Running for 25 warmup iterations and 80 benchmarking iterations
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:12<00:00,  6.36it/s]
Benchmarking complete. End-to-end results:
BenchmarkResults:
        items_per_second: 9.745719406011945
        ms_per_batch: 102.60915160179138
        batch_times_mean: 0.10260915160179138
        batch_times_median: 0.0395512580871582
        batch_times_std: 0.13218825771218534
End-to-end per image time: 102.60915160179138ms

@SkalskiP
Copy link
Contributor Author

Hello @mgoin I actually run some demo version of my app:

from deepsparse import compile_model

from src.deepsparse_utils import modify_yolo_onnx_input_shape, postprocess_nms

import numpy as np
import cv2

from typing import Tuple

MODEL_PATH = 'data/model.onnx'
IMAGE_PATH = 'data/image-1.jpeg'
BATCH_SIZE = 1
INPUT_RESOLUTION = (800, 800)


def preprocess_image(image: np.ndarray, image_size: Tuple[int, int] = (640, 640)) -> np.ndarray:
    image_resized = cv2.resize(image, image_size)
    image_transposed = image_resized[:, :, ::-1].transpose(2, 0, 1)
    image_wrapped = image_transposed[np.newaxis, ...]
    return np.ascontiguousarray(image_wrapped)


model_path, _ = modify_yolo_onnx_input_shape(MODEL_PATH, INPUT_RESOLUTION)
engine = compile_model(model_path, batch_size=BATCH_SIZE)
print(f"Engine info: {engine}")

image = cv2.imread(IMAGE_PATH)
batch = preprocess_image(image=image, image_size=INPUT_RESOLUTION)
outputs = engine.run([batch])[0]
results = postprocess_nms(outputs)[0]
print(results)

@SkalskiP
Copy link
Contributor Author

SkalskiP commented Dec 16, 2021

Is it possible that the presence of other python dependencies is causing this problem? I have quite a few other packages installed in this python environment?

I am trying to put together some sort of replication path for you guys outside of my application, but so far I have not been able to isolate the problem. Any suggestions are welcome.

@prasanth-pivotchain
Copy link

Hi @mgoin, I am facing a similar issue

DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0.20211208 (811251bd) (release) (optimized) (system=avx2, binary=avx2)
WARNING: Running quantized (INT8) operations with only AVX2 instructions available. Sparse optimizations disabled and performance is greatly degraded.
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0.20211208 (811251bd) (release) (optimized)
Date: 12-16-2021 @ 11:55:29 IST
OS: Linux support-ThinkPad-E490 5.11.0-40-generic #44~20.04.2-Ubuntu SMP Tue Oct 26 18:07:44 UTC 2021
Arch: x86_64
CPU: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz
Vendor: GenuineIntel
Cores/sockets/threads: [4, 1, 8]
Available cores/sockets/threads: [4, 1, 8]
L1 cache size data/instruction: 32k/32k
L2 cache size: 0.25Mb
L3 cache size: 6Mb
Total memory: 17.3526G
Free memory: 0.371181G

Assertion at ./src/include/wand/jit/pooling/common.hpp:239

Backtrace:
 0# wand::detail::abort_prefix(std::ostream&, char const*, char const*, int, bool, bool, unsigned long) in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 1# wand::detail::assert_fail(char const*, char const*, int) in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 2# 0x00007FCD9390154F in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 3# 0x00007FCD93900E53 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 4# 0x00007FCD93901405 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 5# 0x00007FCD93901A32 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 6# 0x00007FCD93905330 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 7# 0x00007FCD939056FA in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 8# 0x00007FCD93905CC8 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
 9# 0x00007FCD929F7A49 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
10# 0x00007FCD929FB675 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
11# 0x00007FCD929FD98F in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
12# 0x00007FCD929FDF44 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
13# 0x00007FCD92682EDF in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
14# 0x00007FCD92677E9C in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
15# 0x00007FCD9266525F in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
16# 0x00007FCD92B989A9 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
17# 0x00007FCD92697EBF in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
18# 0x00007FCD926B317D in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
19# 0x00007FCD926B32A3 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
20# 0x00007FCD926B3913 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
21# 0x00007FCD93A61130 in /home/support/anaconda3/envs/sparsify/lib/python3.6/site-packages/deepsparse/avx2/libonnxruntime.so.1.8.0
22# 0x00007FCDFAA5E609 in /lib/x86_64-linux-gnu/libpthread.so.0
23# clone in /lib/x86_64-linux-gnu/libc.so.6

I have deepsparse-nightly==0.10.0.20211208 installed. The engine runs fine when using the original image size for yolo, however, it fails with the above stack trace when I modify the model input size using deepsparse_utils.modify_yolo_onnx_input_shape . Please help.
TY

@SkalskiP
Copy link
Contributor Author

I tested and it looks like it is the same issue as in @prasanth-pivotchain case. When I'm using default model without changing input size, code runs correctly.

@SkalskiP
Copy link
Contributor Author

Hello, @mgoin we spent time today to create some path of reproduction for you. Here is a small repo with Docker image and a simple Python script. Everything is described in README.md. https://github.com/SkalskiP/deepsparse-yolov5-minimal-example It fails when we run it on g4dm.xlarge AWS instance with stock ubuntu image and run is as ubuntu user.

@mgoin
Copy link
Member

mgoin commented Dec 17, 2021

Hi there @SkalskiP thanks so much for all of the details and help! I was able to successfully recreate the error now and we are working on a fix. I will keep you posted when we have an update available on nightly to test. Thanks again.

@SkalskiP
Copy link
Contributor Author

We are meeting again! I'm very glad you were able to reproduce the bug. We put a lot of energy into reproducing it. I know it's a little early, but do you happen to have an ETA for this fix?

@mgoin
Copy link
Member

mgoin commented Dec 17, 2021

Thanks for joining the community and your effort 🙂
We should have a fix available on the nightly build next week, will stay in touch.

@SkalskiP
Copy link
Contributor Author

It is pure pleasure. A few months ago I heard about what you are doing and I must say I am very impressed. A few months ago I heard about what you are doing and I must say I am very impressed. And I have to say that since I was assigned the task of optimizing our jobs (using Deepsparse among other things), I'm constantly learning something new. Fascinating subject. 🧐 Let me know when it's ready here or on Slack, and I'll be happy to test the fix on our infrastructure.

@mgoin
Copy link
Member

mgoin commented Dec 27, 2021

Hi @SkalskiP and @prasanth-pivotchain, thanks for waiting over the holidays! I'm happy to share a fix for this issue on our latest nightly build which can be found here: https://pypi.org/project/deepsparse-nightly/0.10.0.20211227/
This will also be available in our upcoming stable release coming near the end of January.

Here is how I validated that this issue on this build:

python deepsparse/examples/ultralytics-yolo/benchmark.py zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 --image-shape 800 800 -b1 -c4
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96 downloaded to /home/mgoin/.cache/sparsezoo/c13e55cb-dd6c-4492-a079-8986af0b65e6/model.onnx
Overwriting original model shape (640, 640) to [800, 800]
Original model path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned-aggressive_96, new temporary model saved to /tmp/tmpq4hr2i4x
Compiling deepsparse model for /tmp/tmpq4hr2i4x
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0.20211227 (15dfc2ae) (release) (optimized) (system=avx512, binary=avx512)
Engine info: deepsparse.engine.Engine:
        onnx_file_path: /tmp/tmpq4hr2i4x
        batch_size: 1
        num_cores: 4
        num_sockets: 0
        scheduler: Scheduler.default
        cpu_avx_type: avx512
        cpu_vnni: False
Loading dataset
Running for 25 warmup iterations and 80 benchmarking iterations
Benchmarking complete. End-to-end results:
BenchmarkResults:
        items_per_second: 14.499520412304115
        ms_per_batch: 68.96779835224152
        batch_times_mean: 0.06896779835224151
        batch_times_median: 0.06240344047546387
        batch_times_std: 0.030852884253567997
End-to-end per image time: 68.96779835224152ms

@SkalskiP
Copy link
Contributor Author

Hi @mgoin, I'm off until January 4th/5th, but I will certainly test if it has solved the problems we originally encountered. I'll let you know in about a week or so. Thank you 🙏

@prasanth-pivotchain
Copy link

Hi @mgoin, Thank you for your quick response. However, the issue still persists in my case. I am going to wait for the hotfix release.

@mgoin
Copy link
Member

mgoin commented Jan 4, 2022

Sorry to hear that @prasanth-pivotchain , is it the same issue as before now? We might need more information in order to deal with your problem for the next release if it is still present.

I noticed your problem occurs on an avx2 machine I tried running a similar setup on the new build with an image shape of 800x800

python deepsparse/examples/ultralytics-yolo/benchmark.py zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 --image-shape 800 800 -b1 -c4 --quantized 
Neural Magic: Using env variable NM_ARCH=avx2 for avx_type
model with stub zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 downloaded to /home/mgoin/.cache/sparsezoo/aabc828b-c199-4766-95e1-53f2abd0fdd3/model.onnx
Overwriting original model shape (640, 640) to [800, 800]
Original model path: zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94, new temporary model saved to /tmp/tmp0fz6onmy
Compiling deepsparse model for /tmp/tmp0fz6onmy
DeepSparse Engine, Copyright 2021-present / Neuralmagic, Inc. version: 0.10.0.20220103 (2a9eaab9) (release) (optimized) (system=avx512, binary=avx2)
WARNING: Running quantized (INT8) operations with only AVX2 instructions available. Sparse optimizations disabled and performance is greatly degraded.
Engine info: deepsparse.engine.Engine:
        onnx_file_path: /tmp/tmp0fz6onmy
        batch_size: 1
        num_cores: 4
        num_sockets: 0
        scheduler: Scheduler.default
        cpu_avx_type: avx2
        cpu_vnni: False
WARNING: VNNI instructions not detected, quantization speedup not well supported
Loading dataset
Running for 25 warmup iterations and 80 benchmarking iterations
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:18<00:00,  4.42it/s]
Benchmarking complete. End-to-end results:
BenchmarkResults:
        items_per_second: 5.831224391728527
        ms_per_batch: 171.4905709028244
        batch_times_mean: 0.1714905709028244
        batch_times_median: 0.17201220989227295
        batch_times_std: 0.0017102459872419056
End-to-end per image time: 171.4905709028244ms

It seems to run fine for the sparse FP32 and sparse INT8 yolov5s, so I'm not sure what else to try. Did you use a different image shape perhaps?

@SkalskiP
Copy link
Contributor Author

SkalskiP commented Jan 4, 2022

Hi, @mgoin. I'll be back at work in 2 days. I'll let you know what I found ;)

@prasanth-pivotchain
Copy link

Yes @mgoin, I am using image shape (320, 320) on an avx2 only machine. I will make a clean install and post the stack trace for you.

@mgoin
Copy link
Member

mgoin commented Jan 5, 2022

@prasanth-pivotchain I was able to recreate a failure using image shape (320, 320). Thanks for the additional info that should be enough to start working on a fix!

@mgoin
Copy link
Member

mgoin commented Jan 6, 2022

Hi there @prasanth-pivotchain we have an updated nightly build that should address your issue if you'd like to try it. Let me know if you get the chance, otherwise it will be in our next release (0.10) in a few weeks. Thanks for your help finding and reporting this!

@prasanth-pivotchain
Copy link

Hi @mgoin, everything is working as expected. Thanks for the help.

@jeanniefinks
Copy link
Member

Hello @prasanth-pivotchain

I am going to go ahead and close this thread out with your confirmation. But if you have more comments, please re-open and we'd love to chat. Lastly, if you have not starred our deepsparse repo already, and you feel inclined, please do! Thank you in advance for your support! https://github.com/neuralmagic/deepsparse/

Best, Jeannie / Neural Magic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants