Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU-CPU sync issues in real time object detection project #28

Closed
shuuchen opened this issue Feb 6, 2018 · 5 comments
Closed

GPU-CPU sync issues in real time object detection project #28

shuuchen opened this issue Feb 6, 2018 · 5 comments

Comments

@shuuchen
Copy link

shuuchen commented Feb 6, 2018

Hi there,

I have met some problem when using zed camera in real time object detection projects.
In the project, images are read in and processed with deep neural networks in real time.

The project is based on Nvidia Jetson TX2 board and connect with a zed camera.
It uses cuda 8.0, cudnn 5.1, and tensorflow 1.3

When I run it, some sync problems happen and shows the following error message:

  • Aborted. GPU sync failed.
  • Killed

I would like to know how to apply zed camera in such projects to avoid the GPU- CPU sync
issues.

The main code is as follows:

# Create a PyZEDCamera object
zed = zcam.PyZEDCamera()

# Create a PyInitParameters object and set configuration parameters
init_params = zcam.PyInitParameters(svo_real_time_mode=False)
init_params.camera_resolution = sl.PyRESOLUTION.PyRESOLUTION_VGA
init_params.camera_fps = 1 

# Open the camera
err = zed.open(init_params)
if err != tp.PyERROR_CODE.PySUCCESS:
    print("-----------Cannot open camera---------")
    print(err)
    exit(1)

img = core.PyMat()

while True:
    err = zed.grab(zcam.PyRuntimeParameters()) 
    if err == tp.PyERROR_CODE.PySUCCESS:
        zed.retrieve_image(img, sl.PyVIEW.PyVIEW_LEFT)
        img = img.get_data() # convert to ndarray
        img = img[:, :, :3] 
        print(img.shape)

        # object detection
  ......
    else:
        break

Thanks very much.

@shuuchen shuuchen changed the title read images in real time object detection project GPU-CPU sync issues in real time object detection project Feb 6, 2018
@adujardin
Copy link
Member

Hi,
It seems this issue is from TensorFlow, possibly when using too much GPU memory from what I read, but I'm not very familiar with it. Could you monitor the GPU RAM usage when running your code?

Also to consume less memory if you don't use the depth you can disable it :

    init_params.depth_mode = sl.PyDEPTH_MODE.PyDEPTH_MODE_NONE
    init_params.depth_stabilization = False

@shuuchen
Copy link
Author

shuuchen commented Feb 7, 2018

@adujardin
Thanks for your reply.
I monitored GPU RAM usage and got the following messages.

Error message

...
2018-02-07 04:49:46.403798: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-02-07 04:49:46.403940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.35GiB
2018-02-07 04:49:46.404005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-02-07 04:49:46.404036: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-02-07 04:49:46.404068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2018-02-07 04:49:59.685627: E tensorflow/stream_executor/cuda/cuda_driver.cc:1098] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
2018-02-07 04:49:59.685749: F tensorflow/core/common_runtime/gpu/gpu_util.cc:370] GPU sync failed
Aborted (core dumped)

The corresponding monitor message

...
RAM 2980/7851MB (lfb 256x4MB) cpu [100%@1997,off,off,0%@1996,1%@1997,2%@1996] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2984/7851MB (lfb 256x4MB) cpu [29%@2034,off,off,2%@2036,72%@2035,1%@2034] EMC 5%@1600 APE 150 GR3D 0%@114
RAM 2984/7851MB (lfb 256x4MB) cpu [0%@1998,off,off,0%@1996,100%@1996,0%@1999] EMC 5%@1600 APE 150 GR3D 0%@114
RAM 2985/7851MB (lfb 256x4MB) cpu [0%@2036,off,off,0%@2035,100%@2035,1%@2035] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2985/7851MB (lfb 256x4MB) cpu [0%@1997,off,off,0%@1996,100%@1997,1%@1996] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2999/7851MB (lfb 256x4MB) cpu [0%@2034,off,off,0%@2035,100%@2035,1%@2034] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 3050/7851MB (lfb 256x4MB) cpu [26%@1380,off,off,8%@1373,14%@1349,41%@1349] EMC 5%@1600 APE 150 GR3D 30%@114
RAM 3214/7851MB (lfb 256x4MB) cpu [4%@2011,off,off,11%@2034,63%@2036,9%@2034] EMC 6%@1600 APE 150 GR3D 16%@114
RAM 3489/7851MB (lfb 256x4MB) cpu [0%@1996,off,off,1%@1995,87%@1997,1%@1998] EMC 20%@665 APE 150 GR3D 99%@114
RAM 3501/7851MB (lfb 256x4MB) cpu [34%@2035,off,off,12%@1997,38%@2034,56%@1999] EMC 5%@1600 APE 150 GR3D 0%@522
RAM 2625/7851MB (lfb 256x4MB) cpu [100%@1999,off,off,10%@2034,11%@1999,100%@2035] EMC 4%@1600 APE 150 GR3D 0%@114
RAM 2625/7851MB (lfb 256x4MB) cpu [70%@2035,off,off,7%@2034,1%@2034,100%@2035] EMC 4%@1600 APE 150 GR3D 0%@114
...

It is clear that a peak appears (99%@114) in GPU RAM usage .
Shall I close the camera while processing images on DNN?

@shuuchen
Copy link
Author

shuuchen commented Feb 7, 2018

Sometimes, the camera works but is later killed.

I wonder if there is some method to control the frequency of image grabbing, or is some other thread in the grab function that reads images asynchronously that breaks down the GPU RAM.

Error message

ame: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.98GiB
2018-02-07 04:59:16.559008: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-02-07 04:59:16.559032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-02-07 04:59:16.559057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
ZED (Init) >> Video mode: VGA@15
True True
Image resolution: 672 x 376 || Image timestamp: 1517979575170237179

(376, 672, 3)
<keras_frcnn.config.Config object at 0x7f67afe9e8>
img min side: 480.0
resized image shape: (480, 640, 3)
Killed

The corresponding monitor message

RAM 6101/7851MB (lfb 87x4MB) cpu [60%@2004,off,off,3%@2034,35%@2035,24%@2035] EMC 8%@1600 APE 150 GR3D 32%@114
RAM 6463/7851MB (lfb 70x4MB) cpu [5%@1998,off,off,10%@1997,32%@1997,67%@1998] EMC 13%@1600 APE 150 GR3D 18%@624
RAM 6604/7851MB (lfb 26x4MB) cpu [16%@2012,off,off,12%@2014,85%@2016,27%@2016] EMC 12%@1600 APE 150 GR3D 0%@216
RAM 6615/7851MB (lfb 24x4MB) cpu [15%@1972,off,off,76%@1982,89%@1985,7%@1986] EMC 19%@1600 APE 150 GR3D 66%@114
RAM 6772/7851MB (lfb 4x4MB) cpu [34%@1984,off,off,54%@1988,31%@1985,18%@1981] EMC 19%@1600 APE 150 GR3D 9%@114
RAM 7128/7851MB (lfb 2x4MB) cpu [14%@2034,off,off,21%@2034,8%@2035,81%@2035] EMC 15%@1600 APE 150 GR3D 30%@114
RAM 7449/7851MB (lfb 2x4MB) cpu [52%@806,off,off,40%@806,15%@806,6%@806] EMC 29%@665 APE 150 GR3D 0%@114
RAM 7476/7851MB (lfb 2x4MB) cpu [16%@1618,off,off,12%@1680,16%@1680,17%@1703] EMC 9%@1600 APE 150 GR3D 32%@114
RAM 7511/7851MB (lfb 2x4MB) cpu [13%@2035,off,off,10%@2035,17%@2035,77%@2035] EMC 9%@1600 APE 150 GR3D 0%@114
***RAM 7604/7851MB (lfb 2x4MB) cpu [19%@806,off,off,24%@806,19%@806,34%@806] EMC 20%@665 APE 150 GR3D 14%@114***
RAM 7670/7851MB (lfb 2x4MB) cpu [27%@806,off,off,29%@806,32%@805,14%@806] EMC 19%@665 APE 150 GR3D 0%@114
RAM 7671/7851MB (lfb 2x4MB) cpu [26%@806,off,off,22%@807,7%@806,14%@806] EMC 15%@665 APE 150 GR3D 0%@114
RAM 7673/7851MB (lfb 2x4MB) cpu [26%@806,off,off,37%@806,3%@805,9%@807] EMC 14%@665 APE 150 GR3D 0%@114
RAM 7686/7851MB (lfb 2x4MB) cpu [44%@806,off,off,6%@806,18%@808,12%@806] EMC 5%@1600 APE 150 GR3D 0%@522
RAM 7687/7851MB (lfb 2x4MB) cpu [33%@2034,off,off,30%@2035,21%@2035,2%@2036] EMC 5%@1600 APE 150 GR3D 0%@420
RAM 7694/7851MB (lfb 2x4MB) cpu [19%@1995,off,off,15%@2016,73%@1144,37%@2056] EMC 5%@1600 APE 150 GR3D 0%@420
RAM 7698/7851MB (lfb 2x4MB) cpu [23%@2019,off,off,18%@2017,17%@2022,76%@2015] EMC 5%@1600 APE 150 GR3D 0%@318

On this condition, the monitor gets blocked at the ***row and recovers after the main program is killed.

@adujardin
Copy link
Member

I think your TF model might be too heavy for the TX2.
You could test grabbing the image from OpenCV python (unrectified) to use the least possible amount of memory. Something like this :

import numpy as np
import cv2

cap = cv2.VideoCapture(0) # depending on the ZED ID

while(True):
    ret, frame = cap.read() # Capture SbS frames
    height, width = frame.shape[:2]
    left = frame[0:height,0:int(width*0.5)] # Extract the left img

    # object detection
    .......

# Release the capture
cap.release()
cv2.destroyAllWindows()

If this doesn't work you might want to reduce the input size of the network or another trick to make it fit.

@shuuchen
Copy link
Author

shuuchen commented Feb 9, 2018

@adujardin
Thanks very much.
I used opencv3.4 and now it runs well!

The only thing is that, it cannot get depth image.
Alternatively, it gets a broad image which is combined by the left and right image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants