GPU-CPU sync issues in real time object detection project #28

shuuchen · 2018-02-06T11:27:32Z

Hi there,

I have met some problem when using zed camera in real time object detection projects.
In the project, images are read in and processed with deep neural networks in real time.

The project is based on Nvidia Jetson TX2 board and connect with a zed camera.
It uses cuda 8.0, cudnn 5.1, and tensorflow 1.3

When I run it, some sync problems happen and shows the following error message:

Aborted. GPU sync failed.
Killed

I would like to know how to apply zed camera in such projects to avoid the GPU- CPU sync
issues.

The main code is as follows:

# Create a PyZEDCamera object
zed = zcam.PyZEDCamera()

# Create a PyInitParameters object and set configuration parameters
init_params = zcam.PyInitParameters(svo_real_time_mode=False)
init_params.camera_resolution = sl.PyRESOLUTION.PyRESOLUTION_VGA
init_params.camera_fps = 1 

# Open the camera
err = zed.open(init_params)
if err != tp.PyERROR_CODE.PySUCCESS:
    print("-----------Cannot open camera---------")
    print(err)
    exit(1)

img = core.PyMat()

while True:
    err = zed.grab(zcam.PyRuntimeParameters()) 
    if err == tp.PyERROR_CODE.PySUCCESS:
        zed.retrieve_image(img, sl.PyVIEW.PyVIEW_LEFT)
        img = img.get_data() # convert to ndarray
        img = img[:, :, :3] 
        print(img.shape)

        # object detection
　　......
    else:
        break

Thanks very much.

adujardin · 2018-02-06T13:15:05Z

Hi,
It seems this issue is from TensorFlow, possibly when using too much GPU memory from what I read, but I'm not very familiar with it. Could you monitor the GPU RAM usage when running your code?

Also to consume less memory if you don't use the depth you can disable it :

    init_params.depth_mode = sl.PyDEPTH_MODE.PyDEPTH_MODE_NONE
    init_params.depth_stabilization = False

shuuchen · 2018-02-07T04:58:09Z

@adujardin
Thanks for your reply.
I monitored GPU RAM usage and got the following messages.

Error message

...
2018-02-07 04:49:46.403798: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-02-07 04:49:46.403940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.35GiB
2018-02-07 04:49:46.404005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-02-07 04:49:46.404036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-02-07 04:49:46.404068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2018-02-07 04:49:59.685627: E tensorflow/stream_executor/cuda/cuda_driver.cc:1098] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
2018-02-07 04:49:59.685749: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Aborted (core dumped)

The corresponding monitor message

...
RAM 2980/7851MB (lfb 256x4MB) cpu [100%@1997,off,off,0%@1996,1%@1997,2%@1996] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2984/7851MB (lfb 256x4MB) cpu [29%@2034,off,off,2%@2036,72%@2035,1%@2034] EMC 5%@1600 APE 150 GR3D 0%@114
RAM 2984/7851MB (lfb 256x4MB) cpu [0%@1998,off,off,0%@1996,100%@1996,0%@1999] EMC 5%@1600 APE 150 GR3D 0%@114
RAM 2985/7851MB (lfb 256x4MB) cpu [0%@2036,off,off,0%@2035,100%@2035,1%@2035] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2985/7851MB (lfb 256x4MB) cpu [0%@1997,off,off,0%@1996,100%@1997,1%@1996] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2999/7851MB (lfb 256x4MB) cpu [0%@2034,off,off,0%@2035,100%@2035,1%@2034] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 3050/7851MB (lfb 256x4MB) cpu [26%@1380,off,off,8%@1373,14%@1349,41%@1349] EMC 5%@1600 APE 150 GR3D 30%@114
RAM 3214/7851MB (lfb 256x4MB) cpu [4%@2011,off,off,11%@2034,63%@2036,9%@2034] EMC 6%@1600 APE 150 GR3D 16%@114
RAM 3489/7851MB (lfb 256x4MB) cpu [0%@1996,off,off,1%@1995,87%@1997,1%@1998] EMC 20%@665 APE 150 GR3D 99%@114
RAM 3501/7851MB (lfb 256x4MB) cpu [34%@2035,off,off,12%@1997,38%@2034,56%@1999] EMC 5%@1600 APE 150 GR3D 0%@522
RAM 2625/7851MB (lfb 256x4MB) cpu [100%@1999,off,off,10%@2034,11%@1999,100%@2035] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2625/7851MB (lfb 256x4MB) cpu [70%@2035,off,off,7%@2034,1%@2034,100%@2035] EMC 4%@1600 APE 150 GR3D 0%@114
...

It is clear that a peak appears (99%@114) in GPU RAM usage .
Shall I close the camera while processing images on DNN?

shuuchen · 2018-02-07T05:09:29Z

Sometimes, the camera works but is later killed.

I wonder if there is some method to control the frequency of image grabbing, or is some other thread in the grab function that reads images asynchronously that breaks down the GPU RAM.

Error message

ame: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.98GiB
2018-02-07 04:59:16.559008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-02-07 04:59:16.559032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-02-07 04:59:16.559057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
ZED (Init) >> Video mode: VGA@15
True True
Image resolution: 672 x 376 || Image timestamp: 1517979575170237179

(376, 672, 3)
<keras_frcnn.config.Config object at 0x7f67afe9e8>
img min side: 480.0
resized image shape: (480, 640, 3)
Killed

The corresponding monitor message

RAM 6101/7851MB (lfb 87x4MB) cpu [60%@2004,off,off,3%@2034,35%@2035,24%@2035] EMC 8%@1600 APE 150 GR3D 32%@114
RAM 6463/7851MB (lfb 70x4MB) cpu [5%@1998,off,off,10%@1997,32%@1997,67%@1998] EMC 13%@1600 APE 150 GR3D 18%@624
RAM 6604/7851MB (lfb 26x4MB) cpu [16%@2012,off,off,12%@2014,85%@2016,27%@2016] EMC 12%@1600 APE 150 GR3D 0%@216
RAM 6615/7851MB (lfb 24x4MB) cpu [15%@1972,off,off,76%@1982,89%@1985,7%@1986] EMC 19%@1600 APE 150 GR3D 66%@114
RAM 6772/7851MB (lfb 4x4MB) cpu [34%@1984,off,off,54%@1988,31%@1985,18%@1981] EMC 19%@1600 APE 150 GR3D 9%@114
RAM 7128/7851MB (lfb 2x4MB) cpu [14%@2034,off,off,21%@2034,8%@2035,81%@2035] EMC 15%@1600 APE 150 GR3D 30%@114
RAM 7449/7851MB (lfb 2x4MB) cpu [52%@806,off,off,40%@806,15%@806,6%@806] EMC 29%@665 APE 150 GR3D 0%@114
RAM 7476/7851MB (lfb 2x4MB) cpu [16%@1618,off,off,12%@1680,16%@1680,17%@1703] EMC 9%@1600 APE 150 GR3D 32%@114
RAM 7511/7851MB (lfb 2x4MB) cpu [13%@2035,off,off,10%@2035,17%@2035,77%@2035] EMC 9%@1600 APE 150 GR3D 0%@114
***RAM 7604/7851MB (lfb 2x4MB) cpu [19%@806,off,off,24%@806,19%@806,34%@806] EMC 20%@665 APE 150 GR3D 14%@114***
RAM 7670/7851MB (lfb 2x4MB) cpu [27%@806,off,off,29%@806,32%@805,14%@806] EMC 19%@665 APE 150 GR3D 0%@114
RAM 7671/7851MB (lfb 2x4MB) cpu [26%@806,off,off,22%@807,7%@806,14%@806] EMC 15%@665 APE 150 GR3D 0%@114
RAM 7673/7851MB (lfb 2x4MB) cpu [26%@806,off,off,37%@806,3%@805,9%@807] EMC 14%@665 APE 150 GR3D 0%@114
RAM 7686/7851MB (lfb 2x4MB) cpu [44%@806,off,off,6%@806,18%@808,12%@806] EMC 5%@1600 APE 150 GR3D 0%@522
RAM 7687/7851MB (lfb 2x4MB) cpu [33%@2034,off,off,30%@2035,21%@2035,2%@2036] EMC 5%@1600 APE 150 GR3D 0%@420
RAM 7694/7851MB (lfb 2x4MB) cpu [19%@1995,off,off,15%@2016,73%@1144,37%@2056] EMC 5%@1600 APE 150 GR3D 0%@420
RAM 7698/7851MB (lfb 2x4MB) cpu [23%@2019,off,off,18%@2017,17%@2022,76%@2015] EMC 5%@1600 APE 150 GR3D 0%@318

On this condition, the monitor gets blocked at the ***row and recovers after the main program is killed.

adujardin · 2018-02-07T09:17:09Z

I think your TF model might be too heavy for the TX2.
You could test grabbing the image from OpenCV python (unrectified) to use the least possible amount of memory. Something like this :

import numpy as np
import cv2

cap = cv2.VideoCapture(0) # depending on the ZED ID

while(True):
    ret, frame = cap.read() # Capture SbS frames
    height, width = frame.shape[:2]
    left = frame[0:height,0:int(width*0.5)] # Extract the left img

    # object detection
    .......

# Release the capture
cap.release()
cv2.destroyAllWindows()

If this doesn't work you might want to reduce the input size of the network or another trick to make it fit.

shuuchen · 2018-02-09T00:48:54Z

@adujardin
Thanks very much.
I used opencv3.4 and now it runs well!

The only thing is that, it cannot get depth image.
Alternatively, it gets a broad image which is combined by the left and right image.

shuuchen changed the title ~~read images in real time object detection project~~ GPU-CPU sync issues in real time object detection project Feb 6, 2018

shuuchen mentioned this issue Feb 12, 2018

Cannot get depth image using OpenCV API #29

Closed

adujardin closed this as completed Sep 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-CPU sync issues in real time object detection project #28

GPU-CPU sync issues in real time object detection project #28

shuuchen commented Feb 6, 2018

adujardin commented Feb 6, 2018

shuuchen commented Feb 7, 2018

shuuchen commented Feb 7, 2018 •

edited

adujardin commented Feb 7, 2018

shuuchen commented Feb 9, 2018

GPU-CPU sync issues in real time object detection project #28

GPU-CPU sync issues in real time object detection project #28

Comments

shuuchen commented Feb 6, 2018

adujardin commented Feb 6, 2018

shuuchen commented Feb 7, 2018

shuuchen commented Feb 7, 2018 • edited

adujardin commented Feb 7, 2018

shuuchen commented Feb 9, 2018

shuuchen commented Feb 7, 2018 •

edited