Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDNN=1 is not working ? #23

Open
kidapu opened this issue Apr 11, 2017 · 22 comments
Open

CUDNN=1 is not working ? #23

kidapu opened this issue Apr 11, 2017 · 22 comments

Comments

@kidapu
Copy link

kidapu commented Apr 11, 2017

I trained face with FDDB Datasets ( I wrote in #13 ), and I tried to detect face, but I can't with CUDNN=1.

$ vim MakeFile

GPU=1
CUDNN=1
OPENCV=1
DEBUG=1

$ ./darknet-cpp detector test cfg/face.data cfg/tiny-yolo-face.cfg tiny-yolo-face_final.weights FaceData2/JPEGImages/2002-07-19-big-img_254.jpg

screenshot from 2017-04-11 10-40-16

On the other hand, I can detect face successfully with CUDNN=0.

$ vim MakeFile

GPU=1
CUDNN=0
OPENCV=1
DEBUG=1


$ ./darknet-cpp detector test cfg/face.data cfg/tiny-yolo-face.cfg tiny-yolo-face_final.weights FaceData2/JPEGImages/2002-07-19-big-img_254.jpg

screenshot from 2017-04-11 10-44-49

My Enviroment is below.

  • nvidia-docker
  • nvidia tesla k40c (12G GPU)
  • Ubuntu 16.04
  • opencv 2.4 (installed by libopencv-dev)
  • CUDA 8.0
  • cudnn 5.1
@prabindh
Copy link
Owner

Which version of CUDA8.0 is this ?

@kidapu
Copy link
Author

kidapu commented Apr 24, 2017

@prabindh
I use nvidia/cuda:8.0-devel-ubuntu16.04 from this Dockerfile.
https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/devel/cudnn5/Dockerfile#L1

@prabindh
Copy link
Owner

I strongly feel it may not be related to CUDNN. Did you stop the training in both of them after reasonable accuracies have been obtained in training ? Can you let the CUDNN version run longer epochs and check ?

@kidapu
Copy link
Author

kidapu commented May 12, 2017

I have re-trained my face data by CuDNN =1 once.
Following graph shows my train log, (x,y) = (epoch, loss rate). I have tried 29000 epochs.
screen shot 2017-05-12 at 15 47 18

My CuDNN version is 5.1.10.
The result is unchaged. CuDNN=1 isn't working. But CuDNN=0 works fine.

But I try to do following example, by CuDNN=1 and CuDNN=0, It works fine...

./darknet-cpp detector demo cfg/coco.data cfg/yolo.cfg yolo.weights

@bobeo
Copy link

bobeo commented Aug 16, 2017

@kidapu have you sorted this? I have the same problem. I trained tiny yolo and it only works when CUDNN = 0. But this problem only happens when I try to link libdarknet-cpp-shared.so to my program. The ./darknet binary still works fine.

My environment:
Ubuntu 16
Cuda 8
Cudnn 6
GTX 1050

@kidapu
Copy link
Author

kidapu commented Aug 16, 2017

@bobeo
No. I have not solved. Completely same happens to me!!!

@prabindh
Copy link
Owner

@bobeo Have you ensured your wrapper application (that uses the .so) also has the same options that are used for building the darknet shared lib ?

@prabindh
Copy link
Owner

@kidapu Does inference work with CUDNN=1, with the shared lib ?

@kidapu
Copy link
Author

kidapu commented Aug 19, 2017

In summary, the following happens in my case.

(1) CuDNN == 0 && ( darknet-cpp || darknet-cpp-shared)
coco & my dataset works fine.

(2) CuDNN == 1 && ( darknet-cpp || darknet-cpp-shared)

  • coco works fine
  • my dataset is not work...

@prabindh
Copy link
Owner

Is this behaviour seen with the latest master as well ? Please check the latest master and confirm

@ooobelix
Copy link

I need to confirm but I have this behaviour on v6.5-1-g372b25d with a GPU machine:

  • (CuDNN == 0 || CuDNN == 1) && GPU == 1 && darknet-cpp-shared && arapaho : no detection
  • (CuDNN == 0 || CuDNN == 1) && GPU == 0 && darknet-cpp-shared && arapaho : detections

@prabindh
Copy link
Owner

@ooobelix please confirm - that you are building Arapaho, and darknet with same options (for GPU, CUDNN) in both the Makefiles.

@ooobelix
Copy link

ooobelix commented Sep 17, 2018

I'm working on!

~/darknet$ grep -i "^GPU=|^CUDNN" Makefile arapaho/Makefile
Makefile:GPU=1
Makefile:CUDNN=1
arapaho/Makefile:GPU=1
arapaho/Makefile:CUDNN=1

After that, I'm using my own code with Arapaho to do some predictions.

Thanks for your help!

@prabindh
Copy link
Owner

Could you confirm, what cfg is being used ?

@ooobelix
Copy link

From GIT:

5d442b0e550e6c640068e7e15e498599 yolov3.cfg

With 0.1 threshold

@ooobelix
Copy link

I'm:

  • compiling libdarknet-cpp-shared.so with GPU=1 and CUDNN=1
  • using your Arapaho code into my application with CFLAGS "-DCUDNN" and link with "cuda cudart cublas curand cudnn"

Results:

  • without GPU, it works well
  • with GPU, Detect return always 0 detection

@prabindh
Copy link
Owner

I think you already tried with GPU=1, but I observed that in the last comment GPU is not defined.

my application with CFLAGS "-DCUDNN"

@ooobelix
Copy link

Sorry it's a mistake, you are right! I have already tested with GPU=1 and CUDNN=1

@prabindh
Copy link
Owner

I tried the Arapaho build (Windows build from darknet-cpp-windows) with latest code, and the config:- Yolo-tinyv3 cfg, and CUDA91. I am able to see detections with the default yolov3 weights.

@ooobelix
Copy link

Ok, I did a stupid mistake into CMakeFile with the GPU and CUDNN options.

Now it works well with GPU=1 and CUDNN=1 but no need of linking "-lcudnn", is it normal?

@prabindh
Copy link
Owner

"-lcudnn" should be required. Can we close this as the issue is resolved ?

@ooobelix
Copy link

I'm using CMakeList and "CUDNN=1" to "set(LNK_DEP [...] cudnn" and it works well.
For me, you can close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants