Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xavier NX and Yolov4-Tiny #1

Closed
MuhammadAsadJaved opened this issue Oct 12, 2020 · 39 comments
Closed

Xavier NX and Yolov4-Tiny #1

MuhammadAsadJaved opened this issue Oct 12, 2020 · 39 comments
Labels
question Further information is requested

Comments

@MuhammadAsadJaved
Copy link
Contributor

Hi,
Thanks for the great work. I have a few questions about the project.

1 - Are you using Yolov4 + DeepSort tracker? What is the speed in Jetson Nano?
2- Can we run it on Jetson Xavier NX as well with TenosrRT support?
3- Can we use any other yolov4 model trained for different classes and can track more than one class at the same time? i.e person and car?
4- Are you planning to update project to run with Yolov4-Tiny + DeepSort?

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 12, 2020

@MuhammadAsadJaved
1 - I actually use YOLOv4 + Deep SORT + optical flow to make the tracker faster. The feature extractor in Deep SORT is also swapped to OSNet. I wouldn't recommend running on Jetson Nano. I expect the speed to be around 5 FPS. You may want to increase detector_frame_skip if you have to.
2 - Yes, you can look at the performance section in Readme for speed on Xavier NX. This repo only supports TensorRT backend, so yes.
3 - Yes, please refer to the usage section in Readme. You need to train both YOLOv4 and a ReID model for person and car, and convert them to ONNX.
4 - YOLOv4-tiny should already be supported, but I don't have time to train one for now. You just need to change LAYER_FACTORS here to [32, 16] and set ANCHORS according to your darknet cfg file.

@MuhammadAsadJaved
Copy link
Contributor Author

OK. Thank you very much.

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 12, 2020

Hi I have an error while building YOLOv4 TensorRT plugin

Here are my environment details.

Ubuntu : 16.04
GPU : GEFORCE GTX 1080 Ti
Cuda and CuDNN : 10.0 and 7.4.1
OpenCV : 3.4.4
TensorFlow : 1.10.0
TensorRT : 5.0.2.6
Screenshot from 2020-10-12 18-36-45

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 12, 2020

Hi, please install TensorRT 7. You can try my script install_tensorrt.sh to install CUDA 10.2, CuDNN 7, and TensorRT 7 from scratch. You need to set the OS variable in the script to ubuntu1604. I haven't really tested in 1604, let me know if it works.

@MuhammadAsadJaved
Copy link
Contributor Author

OK. Let me give it a try. Thank you. @GeekAlexis

@xhzzc1994
Copy link

OK. Let me give it a try. Thank you. @GeekAlexis
@MuhammadAsadJaved
Hi,I also have this problem under Ubuntu 16.04. Have you solved it? Is it related to the Ubuntu version?
Here are my environment details.
Ubuntu : 16.04
GPU : GEFORCE GTX 1660 Ti
Cuda and CuDNN : 10.0 and 7.6.4
OpenCV : 4.4.0
TensorFlow : 1.15.2
TensorRT : 7.0.0.11

@GeekAlexis
Thanks for your great work.
I also tested it under NVIDIA Jetson AGX Xavier follow your steps exactly, but the result was not good. There were only a few FPS. Is there anything I didn't notice in the running process?

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 13, 2020

@xhzzc1994 What is your Jetpack version on AGX Xavier? Did you run sudo jetson_clocks? Roughly how many objects are you tracking? It is expected to have lower FPS if there are too many objects. Also, FPS will be lower when you run for the first time because Numba needs to compile and TensorRT engines are built, but these will be cached afterward. Display or video writing has some extra overhead too.

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 13, 2020

@xhzzc1994 @MuhammadAsadJaved For Ubuntu 16.04, it appears it might be related to dusty-nv/jetson-inference#281 (comment). install_tensorrt.sh uses NVIDIA's cuda-repo and machine-learning-repo Debian package. Reinstalling using the script might be able to fix the issue.

@MuhammadAsadJaved
Copy link
Contributor Author

@GeekAlexis Actually can not reinstall cuda cudnn etc packages because existing cuda and cudnn etc packages are configured with a lot of running projects.

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 13, 2020

@MuhammadAsadJaved In that case, it would be better to use an Nvidia docker container. But I don't have time to investigate it for now. If upgrading TensorRT is fine, you can try to install TensorRT 7 only using NVIDIA's cuda-repo and machine-learning-repo.

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 13, 2020 via email

@GeekAlexis GeekAlexis added the question Further information is requested label Oct 19, 2020
@darbyyyy
Copy link

@MuhammadAsadJaved
FYI, I've test the code on Xavier NX.
The FPS depends on number of objects.
Worst case was Shibuya crossroad video since there were lots of pedestrians. scores 11-12 fps
On usual video, it scores 25-32 fps.

@MuhammadAsadJaved
Copy link
Contributor Author

@darbyyyy Thank you so much for your update.

I will also try in Xavier NX now. Is there any suggestion about the installation process?

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 20, 2020

@darbyyyy For me in Xavier NX the average speed is 6 FPS when there are 15 ~ 20 detected objects at the same time. May I ask for you 25 ~ 32 FPS there was how many objects?

  • Is it possible to upload video and share link? So that i can check speed on the same video and we can compare results in a better way.

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 20, 2020

@MuhammadAsadJaved Keep in mind that running for the first time will be slow due to Numba compilation. Also, I got my results without using --gui or --output_uri options. These have some extra overhead that can slow down processing by 2 - 5 FPS. You should get something close to 30 FPS with 20 objects

@darbyyyy
Copy link

@MuhammadAsadJaved
11-12fps on video below,
https://www.videezy.com/abstract/40906-tokyo-japan-shibuya-area
and videos with 10~20 objects, the fps was higher than 20.
I'm sorry to say that the videos I used were confidential thing, sorry.

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 22, 2020

@darbyyyy @GeekAlexis
Hi, On my Xavier NX the speed is much slower as compared to yours.
I tried 3 different videos with 4 ~ 5 objects and 10 ~ 20 objects. Video length not more than 10s. So can you try these videos, please? I have attached in google drive link.
What is the possible reason for the slow speed?

I got these results

Performance

video command FPS
nyc1.mp4 python3 app.py --input_uri nyc1.mp4 --mot --gui 7
nyc1.mp4 python3 app.py --input_uri nyc1.mp4 --mot 11
cycle1.mp4 python3 app.py --input_uri cycle1.mp4 --mot --gui 9
cycle1.mp4 python3 app.py --input_uri cycle1.mp4 --mot 11
cycle2.mp4 python3 app.py --input_uri cycle2.mp4 --mot --gui 10
cycle2.mp4 python3 app.py --input_uri cycle2.mp4 --mot 12

https://drive.google.com/drive/folders/1CtNqQm3RzWafPg_qzBk-XuQX82wuAm5A?usp=sharing

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 22, 2020

@MuhammadAsadJaved This is weird. Are all packages correctly installed? Without display, I got 23 FPS on nyc1.mp4, 30 FPS on cycle1.mp4, and 34 FPS on cycle2.mp4. FYI, cycle1 and cycle2 are really distorted and blurry which makes camera motion hard to estimate. The tracker will perform better on these if you decrease detector_frame_skip to 1.

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 22, 2020

@GeekAlexis I just installed using scripts/install_jetson.sh. and all packages were installed successfully. Do I need to install any other packages as well?

  • I check it needs OpenCV with GStreamer. How to install with Gstreamer? is there any link or suggestions?

Update: I might build OpenCV without Gstreamer support. It must influence the speed. let me verify it.

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 22, 2020

@MuhammadAsadJaved OpenCV 4.1.1 from Jetpack 4.4 should already support GStreamer. Did you reinstall OpenCV by any chance? Gstreamer is used to accelerate video resizing so it does have some impact.

@MuhammadAsadJaved
Copy link
Contributor Author

@MuhammadAsadJaved OpenCV 4.1.1 from Jetpack 4.4 should already support GStreamer. Did you reinstall OpenCV by any chance?

Yes, I rebuild OpenCV for some other projects. So I will rebuild it, then verify the results again and update you. Thank you for your time.

@darbyyyy
Copy link

darbyyyy commented Oct 22, 2020

@MuhammadAsadJaved Here's my result. I tried on ssh connection so I disabled gui option
--input_uri, --output_uri(==save the output), --mot
cycle1: 22
cycle2: 27
nyc1: 20
--input_uri, --mot
cycle1: 25
cycle2: 30
nyc1: 22

@MuhammadAsadJaved
Copy link
Contributor Author

@darbyyyy Thank you for your efforts.

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 22, 2020

@GeekAlexis
Just verified i am using OpenCV 3.4.0 and build with GStreamer.

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_CUDA=ON -D CUDA_ARCH_BIN="7.2" -D CUDA_ARCH_PTX="" -D WITH_CUBLAS=ON -D ENABLE_FAST_MATH=ON -D CUDA_FAST_MATH=ON -D ENABLE_NEON=ON -D WITH_GSTREAMER=ON -D WITH_LIBV4L=ON -D BUILD_opencv_python2=ON -D BUILD_opencv_python3=ON -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D WITH_QT=ON -D WITH_OPENGL=OFF -D CUDA_NVCC_FLAGS="--expt-relaxed-constexpr" -D WITH_TBB=ON ..

Also checked other libraries and versions and they are same as mentioned in the project. What is possible reason for slow speed?

@darbyyyy
Copy link

darbyyyy commented Oct 22, 2020

@MuhammadAsadJaved
Did you check the power mode?

  1. Connect to Powersupply for xavier NX
  2. select 15W 6core option
    Reference: https://jkjung-avt.github.io/setting-up-xavier-nx/

@xhzzc1994
Copy link

@GeekAlexis
Hi, I have an error while using scripts/install_opencv.sh to install opencv with GStreamer,

CMake Error at modules/core/CMakeLists.txt:40 (message):
CUDA: OpenCV requires enabled 'cudev' module from 'opencv_contrib'
repository: https://github.com/opencv/opencv_contrib

-- Configuring incomplete, errors occurred!

2020-10-22 15-55-16 的屏幕截图

Here are my environment details.
Ubuntu : 18.04
GPU : GEFORCE GTX 2070
Cuda and CuDNN : 10.2 and 7.6.5
TensorFlow : 1.15.2
TensorRT : 7.1.3.4

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 22, 2020

@xhzzc1994 The opencv contrib path seems to be incorrect in the script, I just pushed a fix. You can try again by pulling or changing it yourself:

-D OPENCV_EXTRA_MODULES_PATH='../opencv_contrib/modules'

Please open another issue if you encounter further problems.

@GeekAlexis
Copy link
Owner

GeekAlexis commented Oct 22, 2020

@MuhammadAsadJaved OpenCV is mostly used in flow.py and a few other places for resizing. I'm not sure if 4.1.1 and 3.4 have any performance gap. Can you run with the verbose option -v and report the timing results?

@MuhammadAsadJaved
Copy link
Contributor Author

MuhammadAsadJaved commented Oct 22, 2020

@MuhammadAsadJaved This is weird. Are all packages correctly installed? Without display, I got 23 FPS on nyc1.mp4, 30 FPS on cycle1.mp4, and 34 FPS on cycle2.mp4. FYI, cycle1 and cycle2 are really distorted and blurry which makes camera motion hard to estimate. The tracker will perform better on these if you decrease detector_frame_skip to 1.
@GeekAlexis
Update: Just verified results on another Xavier NX device. Now the results are almost same with yours. This device used OpenCV 4.1.1 default installed with Jetpack 4.4.

Now results without display are:
nyc1.mp4 = 21
cycle1.mp4= 16 first run , 25 second run.
cycle2.mp4 = 28.
Just a little bit different from yours.

So most probably it's OpenCV difference.

@darbyyyy Yes I am using 15W 6CORE

Thank you for your time and cooperation. I think now we should close this issue so that people should not mix up other issues here.

@GeekAlexis It would be nice if you try to add other Yolo models like V3, Tiny-v3 in this repo. It will help a lot of people. I know a lot of people are looking for this kind of project. Thank you.

@GeekAlexis
Copy link
Owner

@MuhammadAsadJaved Thanks. I believe reversing the order of LAYER_FACTORS should work for YOLOV3. TensorRT conversions for V3 and V4 are exactly the same. I will consider training a V3 model if time permits.

@MuhammadAsadJaved
Copy link
Contributor Author

@MuhammadAsadJaved Thanks. I believe reversing the order of LAYER_FACTORS should work for YOLOV3. TensorRT conversions for V3 and V4 are exactly the same. I will consider training a V3 model if time permits.

Can you use yolo models trained for two classes? person and car? if yes then i can share .weights or .onnx with you. So it can save our time. I have trained models. for V3, Tiny-V3 and Tiny-V4.

@GeekAlexis
Copy link
Owner

@MuhammadAsadJaved That'd be great if you are open to sharing your .onnx models. Two classes are fine. I can disable the car class until a ReID model gets trained. BTW, what dimensions are these models?

@MuhammadAsadJaved
Copy link
Contributor Author

@GeekAlexis I have sent you a personal email. We can discuss in detail.

@Myron1996
Copy link

Myron1996 commented Jun 11, 2021

HI @MuhammadAsadJaved @GeekAlexis @darbyyyy .

I am just tagging new comments to to this issue because I am getting only 3FPS on Xavier NX. I provided all the necessery details with resepct to my current enviorment setup on Xavier . It will be extremly helpful if i can recive any kind of advice to improve the performance.

Bench-marking on 11 second video

  • 3 FPS (GSTREAMER = True - Video Saving after inference )
  • 10 FPS (GSTREAMER = True - Video Saving after inference )
  • 16 FPS (GSTREAMER = True - Video Not Saving after inference)
  • 8 FPS (GSTREAMER = False - Video Not Saving after inference)
  • 6 FPS (GSTREAMER = False - Video Saving after inference)

Jetson Board

NVIDIA Jetson Xavier NX (Developer Kit Version)
 L4T 32.5.1 [ JetPack 4.5.1 ]
   Ubuntu 18.04.5 LTS
   Kernel Version: 4.9.201-tegra
 CUDA 10.2.89
   CUDA Architecture: 7.2
 OpenCV version: 4.5.3-pre
   OpenCV Cuda: NO
 CUDNN: 8.0.0.180
 TensorRT: 7.1.3.0
 Vision Works: 1.6.0.501
 VPI: ii libnvvpi1 1.0.15 arm64 NVIDIA Vision Programming Interface library
 Vulcan: 1.2.70

Video Properties

video

Power Mode

15W 6CORE

OpenCV Version - Built from source

tdc@tdc:~/jetsonUtilities$ pkg-config --modversion opencv4
4.5.3
tdc@tdc:~/jetsonUtilities$ python3 -c "import cv2; print(cv2.__version__)"
4.5.3-pre

................................................### LOG...........................................................................................................

WITH_GSTREAMER = True - Video Saving after inference**

tdc@tdc:~/Downloads/FastMOT$ python3 app.py --input_uri c1.mp4 -o ./output.mp4 --mot
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 260 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 260 
[ WARN:0] global /home/tdc/opencv_build/opencv/modules/videoio/src/cap_gstreamer.cpp (1044) open OpenCV | GStreamer warning: unable to query duration of stream
[ WARN:0] global /home/tdc/opencv_build/opencv/modules/videoio/src/cap_gstreamer.cpp (1081) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
2021-06-11 00:59:46 [    INFO] 1280x720 stream @ 30 FPS
2021-06-11 00:59:46 [    INFO] Loading detector model...
Framerate set to : 30 at NvxVideoEncoderSetParameterNvMMLiteOpen : Block : BlockType = 4 
===== NVMEDIA: NVENC =====
NvMMLiteBlockCreate : Block : BlockType = 4 
H264: Profile = 66, Level = 40 
2021-06-11 00:59:50 [    INFO] Building engine with batch size: 1
2021-06-11 00:59:50 [    INFO] This may take a while...
2021-06-11 01:10:32 [    INFO] Completed creating engine
2021-06-11 01:10:36 [    INFO] Loading feature extractor model...
2021-06-11 01:10:36 [    INFO] Building engine with batch size: 16
2021-06-11 01:10:36 [    INFO] This may take a while...
2021-06-11 01:14:50 [    INFO] Completed creating engine
2021-06-11 01:14:51 [    INFO] Starting video capture...
NVMEDIA_ENC: bBlitMode is set to TRUE 
2021-06-11 01:15:50 [    INFO] Found:        person   1 at ( 521, 448)
2021-06-11 01:15:50 [    INFO] Found:        person   2 at ( 291, 232)
2021-06-11 01:15:50 [    INFO] Found:        person   3 at ( 605,  98)
2021-06-11 01:15:50 [    INFO] Found:        person   4 at ( 585,  87)
2021-06-11 01:15:50 [    INFO] Found:        person   5 at ( 535, 126)
2021-06-11 01:16:08 [    INFO] Found:        person   7 at ( 544, 108)
2021-06-11 01:16:08 [    INFO] Found:        person   8 at ( 342, 670)
2021-06-11 01:16:12 [    INFO] Lost:         person   3 at ( 551, 108)
2021-06-11 01:16:15 [    INFO] Average FPS: 3

```

**WITH_GSTREAMER = True** - Video **Not Saving** after inference** 

```
tdc@tdc:~/Downloads/FastMOT$ python3 app.py --input_uri c1.mp4  --mot
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 260 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 260 
[ WARN:0] global /home/tdc/opencv_build/opencv/modules/videoio/src/cap_gstreamer.cpp (1044) open OpenCV | GStreamer warning: unable to query duration of stream
[ WARN:0] global /home/tdc/opencv_build/opencv/modules/videoio/src/cap_gstreamer.cpp (1081) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
2021-06-11 09:41:14 [    INFO] 1280x720 stream @ 30 FPS
2021-06-11 09:41:14 [    INFO] Loading detector model...
2021-06-11 09:41:20 [    INFO] Loading feature extractor model...
2021-06-11 09:41:20 [    INFO] Starting video capture...
2021-06-11 09:41:21 [    INFO] Found:        person   1 at ( 521, 448)
2021-06-11 09:41:21 [    INFO] Found:        person   2 at ( 291, 232)
2021-06-11 09:41:21 [    INFO] Found:        person   3 at ( 605,  98)
2021-06-11 09:41:21 [    INFO] Found:        person   4 at ( 585,  87)
2021-06-11 09:41:21 [    INFO] Found:        person   5 at ( 535, 126)
2021-06-11 09:41:32 [    INFO] Found:        person   7 at ( 544, 108)
2021-06-11 09:41:32 [    INFO] Found:        person   8 at ( 342, 670)
2021-06-11 09:41:35 [    INFO] Lost:         person   3 at ( 551, 108)
2021-06-11 09:41:38 [    INFO] Average FPS: 16

```

**WITH_GSTREAMER = False** - Video **Not Saving** after inference** 

```
tdc@tdc:~/Downloads/FastMOT$ python3 app.py --input_uri c1.mp4  --mot
2021-06-11 10:04:07 [    INFO] 1920x1080 stream @ 30 FPS
2021-06-11 10:04:07 [    INFO] Loading detector model...
2021-06-11 10:04:11 [    INFO] Loading feature extractor model...
2021-06-11 10:04:11 [    INFO] Starting video capture...
2021-06-11 10:04:13 [    INFO] Found:        person   1 at ( 520, 447)
2021-06-11 10:04:13 [    INFO] Found:        person   2 at ( 291, 233)
2021-06-11 10:04:13 [    INFO] Found:        person   3 at ( 585,  86)
2021-06-11 10:04:13 [    INFO] Found:        person   4 at ( 606,  99)
2021-06-11 10:04:13 [    INFO] Found:        person   5 at ( 536, 127)
2021-06-11 10:04:13 [    INFO] Found:        person   6 at ( 555,  67)
2021-06-11 10:04:17 [    INFO] Found:        person   8 at ( 530,  95)
2021-06-11 10:04:20 [    INFO] Lost:         person   3 at ( 494,  62)
2021-06-11 10:04:21 [    INFO] Lost:         person   8 at ( 489,  98)
2021-06-11 10:04:36 [    INFO] Found:        person   9 at ( 544, 108)
2021-06-11 10:04:36 [    INFO] Found:        person  10 at ( 342, 670)
2021-06-11 10:04:42 [    INFO] Lost:         person   4 at ( 556, 108)
2021-06-11 10:04:46 [    INFO] Average FPS: 8

```

@GeekAlexis
Copy link
Owner

@Myron1996 I have said this so many times. Since you reinstalled OpenCV you have to guarantee it has all the optimization turned on, which I cannot help you. GStreamer is critical on Jetson, so it must be turned on. This repo only guarantees maximum performance with the default OpenCV from Jetpack. You might want to reflash your Jetson as a last resort.

FYI, 2 core power mode is slightly faster but that's not the real cause for you.

@rafaelbate
Copy link

@MuhammadAsadJaved
1 - I actually use YOLOv4 + Deep SORT + optical flow to make the tracker faster. The feature extractor in Deep SORT is also swapped to OSNet. I wouldn't recommend running on Jetson Nano. I expect the speed to be around 5 FPS. You may want to increase detector_frame_skip if you have to.
2 - Yes, you can look at the performance section in Readme for speed on Xavier NX. This repo only supports TensorRT backend, so yes.
3 - Yes, please refer to the usage section in Readme. You need to train both YOLOv4 and a ReID model for person and car, and convert them to ONNX.
4 - YOLOv4-tiny should already be supported, but I don't have time to train one for now. You just need to change LAYER_FACTORS here to [32, 16] and set ANCHORS according to your darknet cfg file.

Hello @GeekAlexis, if I understand correctly, if I train a ReID model using fast-reid, then when I use FastMOT, the feature extractor in Deep SORT will be swapped to the ReID model I trained, right?

Also, do you happen to know if I can train a custom ReID model in fast-reid (I have a dataset containing images of fruits).

Thank you!

@GeekAlexis
Copy link
Owner

@rafaelbate Correct, but I'm not sure how ReID works on fruits in practice? FYI, You need to have instance ID's on fruits to train ReID.

@rafaelbate
Copy link

@rafaelbate Correct, but I'm not sure how ReID works on fruits in practice? FYI, You need to have instance ID's on fruits to train ReID.

Thanks for your fast reply and clarification @GeekAlexis! By "instance ID's on fruits", you mean each individual fruit must have a specific ID that identifies it along its appearance in multiple photos?

@GeekAlexis
Copy link
Owner

@rafaelbate Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants