CUDNN=1 is not working ? #23

kidapu · 2017-04-11T01:54:55Z

I trained face with FDDB Datasets ( I wrote in #13 ), and I tried to detect face, but I can't with CUDNN=1.

$ vim MakeFile

GPU=1
CUDNN=1
OPENCV=1
DEBUG=1

$ ./darknet-cpp detector test cfg/face.data cfg/tiny-yolo-face.cfg tiny-yolo-face_final.weights FaceData2/JPEGImages/2002-07-19-big-img_254.jpg

On the other hand, I can detect face successfully with CUDNN=0.

$ vim MakeFile

GPU=1
CUDNN=0
OPENCV=1
DEBUG=1


$ ./darknet-cpp detector test cfg/face.data cfg/tiny-yolo-face.cfg tiny-yolo-face_final.weights FaceData2/JPEGImages/2002-07-19-big-img_254.jpg

My Enviroment is below.

nvidia-docker
nvidia tesla k40c (12G GPU)
Ubuntu 16.04
opencv 2.4 (installed by libopencv-dev)
CUDA 8.0
cudnn 5.1

The text was updated successfully, but these errors were encountered:

prabindh · 2017-04-20T15:12:07Z

Which version of CUDA8.0 is this ?

kidapu · 2017-04-24T04:10:18Z

@prabindh
I use nvidia/cuda:8.0-devel-ubuntu16.04 from this Dockerfile.
https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/devel/cudnn5/Dockerfile#L1

prabindh · 2017-04-25T01:42:45Z

I strongly feel it may not be related to CUDNN. Did you stop the training in both of them after reasonable accuracies have been obtained in training ? Can you let the CUDNN version run longer epochs and check ?

kidapu · 2017-05-12T10:25:33Z

I have re-trained my face data by CuDNN =1 once.
Following graph shows my train log, (x,y) = (epoch, loss rate). I have tried 29000 epochs.

My CuDNN version is 5.1.10.
The result is unchaged. CuDNN=1 isn't working. But CuDNN=0 works fine.

But I try to do following example, by CuDNN=1 and CuDNN=0, It works fine...

./darknet-cpp detector demo cfg/coco.data cfg/yolo.cfg yolo.weights

bobeo · 2017-08-16T14:45:56Z

@kidapu have you sorted this? I have the same problem. I trained tiny yolo and it only works when CUDNN = 0. But this problem only happens when I try to link libdarknet-cpp-shared.so to my program. The ./darknet binary still works fine.

My environment:
Ubuntu 16
Cuda 8
Cudnn 6
GTX 1050

kidapu · 2017-08-16T17:40:44Z

@bobeo
No. I have not solved. Completely same happens to me!!!

prabindh · 2017-08-19T07:48:50Z

@bobeo Have you ensured your wrapper application (that uses the .so) also has the same options that are used for building the darknet shared lib ?

prabindh · 2017-08-19T07:50:03Z

@kidapu Does inference work with CUDNN=1, with the shared lib ?

kidapu · 2017-08-19T17:56:12Z

In summary, the following happens in my case.

(1) CuDNN == 0 && ( darknet-cpp || darknet-cpp-shared)
coco & my dataset works fine.

(2) CuDNN == 1 && ( darknet-cpp || darknet-cpp-shared)

coco works fine
my dataset is not work...

prabindh · 2018-05-10T13:13:18Z

Is this behaviour seen with the latest master as well ? Please check the latest master and confirm

ooobelix · 2018-09-17T11:50:29Z

I need to confirm but I have this behaviour on v6.5-1-g372b25d with a GPU machine:

(CuDNN == 0 || CuDNN == 1) && GPU == 1 && darknet-cpp-shared && arapaho : no detection
(CuDNN == 0 || CuDNN == 1) && GPU == 0 && darknet-cpp-shared && arapaho : detections

prabindh · 2018-09-17T12:25:23Z

@ooobelix please confirm - that you are building Arapaho, and darknet with same options (for GPU, CUDNN) in both the Makefiles.

ooobelix · 2018-09-17T13:13:24Z

I'm working on!

~/darknet$ grep -i "^GPU=|^CUDNN" Makefile arapaho/Makefile
Makefile:GPU=1
Makefile:CUDNN=1
arapaho/Makefile:GPU=1
arapaho/Makefile:CUDNN=1

After that, I'm using my own code with Arapaho to do some predictions.

Thanks for your help!

prabindh · 2018-09-17T16:21:16Z

Could you confirm, what cfg is being used ?

ooobelix · 2018-09-18T07:50:16Z

From GIT:

5d442b0e550e6c640068e7e15e498599 yolov3.cfg

With 0.1 threshold

ooobelix · 2018-09-18T10:07:35Z

I'm:

compiling libdarknet-cpp-shared.so with GPU=1 and CUDNN=1
using your Arapaho code into my application with CFLAGS "-DCUDNN" and link with "cuda cudart cublas curand cudnn"

Results:

without GPU, it works well
with GPU, Detect return always 0 detection

prabindh · 2018-09-18T14:32:39Z

I think you already tried with GPU=1, but I observed that in the last comment GPU is not defined.

my application with CFLAGS "-DCUDNN"

ooobelix · 2018-09-18T14:52:52Z

Sorry it's a mistake, you are right! I have already tested with GPU=1 and CUDNN=1

prabindh · 2018-09-19T05:24:16Z

I tried the Arapaho build (Windows build from darknet-cpp-windows) with latest code, and the config:- Yolo-tinyv3 cfg, and CUDA91. I am able to see detections with the default yolov3 weights.

ooobelix · 2018-09-19T16:41:52Z

Ok, I did a stupid mistake into CMakeFile with the GPU and CUDNN options.

Now it works well with GPU=1 and CUDNN=1 but no need of linking "-lcudnn", is it normal?

prabindh · 2018-10-22T02:21:41Z

"-lcudnn" should be required. Can we close this as the issue is resolved ?

ooobelix · 2018-10-22T14:10:24Z

I'm using CMakeList and "CUDNN=1" to "set(LNK_DEP [...] cudnn" and it works well.
For me, you can close this issue.

kidapu mentioned this issue Apr 11, 2017

Segmentation fault (Core Dumped) on traing my own dataset... #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN=1 is not working ? #23

CUDNN=1 is not working ? #23

kidapu commented Apr 11, 2017 •

edited

Loading

prabindh commented Apr 20, 2017

kidapu commented Apr 24, 2017 •

edited

Loading

prabindh commented Apr 25, 2017

kidapu commented May 12, 2017 •

edited

Loading

bobeo commented Aug 16, 2017 •

edited

Loading

kidapu commented Aug 16, 2017

prabindh commented Aug 19, 2017

prabindh commented Aug 19, 2017

kidapu commented Aug 19, 2017 •

edited

Loading

prabindh commented May 10, 2018

ooobelix commented Sep 17, 2018

prabindh commented Sep 17, 2018

ooobelix commented Sep 17, 2018 •

edited

Loading

prabindh commented Sep 17, 2018

ooobelix commented Sep 18, 2018

ooobelix commented Sep 18, 2018

prabindh commented Sep 18, 2018

ooobelix commented Sep 18, 2018

prabindh commented Sep 19, 2018

ooobelix commented Sep 19, 2018

prabindh commented Oct 22, 2018

ooobelix commented Oct 22, 2018

CUDNN=1 is not working ? #23

CUDNN=1 is not working ? #23

Comments

kidapu commented Apr 11, 2017 • edited Loading

prabindh commented Apr 20, 2017

kidapu commented Apr 24, 2017 • edited Loading

prabindh commented Apr 25, 2017

kidapu commented May 12, 2017 • edited Loading

bobeo commented Aug 16, 2017 • edited Loading

kidapu commented Aug 16, 2017

prabindh commented Aug 19, 2017

prabindh commented Aug 19, 2017

kidapu commented Aug 19, 2017 • edited Loading

prabindh commented May 10, 2018

ooobelix commented Sep 17, 2018

prabindh commented Sep 17, 2018

ooobelix commented Sep 17, 2018 • edited Loading

prabindh commented Sep 17, 2018

ooobelix commented Sep 18, 2018

ooobelix commented Sep 18, 2018

prabindh commented Sep 18, 2018

ooobelix commented Sep 18, 2018

prabindh commented Sep 19, 2018

ooobelix commented Sep 19, 2018

prabindh commented Oct 22, 2018

ooobelix commented Oct 22, 2018

kidapu commented Apr 11, 2017 •

edited

Loading

kidapu commented Apr 24, 2017 •

edited

Loading

kidapu commented May 12, 2017 •

edited

Loading

bobeo commented Aug 16, 2017 •

edited

Loading

kidapu commented Aug 19, 2017 •

edited

Loading

ooobelix commented Sep 17, 2018 •

edited

Loading