Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not support lastest cuda 9.0 and cudnn7 #278

Open
encore2020 opened this issue Oct 31, 2017 · 11 comments
Open

not support lastest cuda 9.0 and cudnn7 #278

encore2020 opened this issue Oct 31, 2017 · 11 comments

Comments

@encore2020
Copy link

installed cuda9.0 and cudnn7(cuda 9.0)

if I select cudnn =1, that will be compile error:
/examples/go.c:641:13: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result]
scanf("%s", type);
^
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/rnn.c -o obj/rnn.o
./examples/rnn.c: In function ‘get_seq2seq_data’:
./examples/rnn.c:104:13: warning: unused variable ‘dlen’ [-Wunused-variable]
int dlen = strlen(dest[index]);
^
./examples/rnn.c:103:13: warning: unused variable ‘slen’ [-Wunused-variable]
int slen = strlen(source[index]);
^
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/segmenter.c -o obj/segmenter.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/regressor.c -o obj/regressor.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/classifier.c -o obj/classifier.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/coco.c -o obj/coco.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/yolo.c -o obj/yolo.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/detector.c -o obj/detector.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/nightmare.c -o obj/nightmare.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/attention.c -o obj/attention.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/darknet.c -o obj/darknet.o
gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN obj/captcha.o obj/lsd.o obj/super.o obj/art.o obj/tag.o obj/cifar.o obj/go.o obj/rnn.o obj/segmenter.o obj/regressor.o obj/classifier.o obj/coco.o obj/yolo.o obj/detector.o obj/nightmare.o obj/attention.o obj/darknet.o libdarknet.a -o darknet -lm -pthread pkg-config --libs opencv -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand -lcudnn -lstdc++ libdarknet.a
libdarknet.a(convolutional_layer.o): In function cudnn_convolutional_setup': convolutional_layer.c:(.text+0xcbc): undefined reference to cudnnSetConvolutionGroupCount'
collect2: error: ld returned 1 exit status
Makefile:76: recipe for target 'darknet' failed
make: *** [darknet] Error 1
ubuntu@ubuntu-Z270N-WIFI:~/darknet$

------------- my opencv is lastest version 3.3
if I select cudnn=0, gpu = 1, compile is ok,
after run the command,

sudo ./darknet detector train cfg/voc.data cfg/tiny-yolo.cfg darknet.conv.weights
tiny-yolo
layer filters size input output
0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16
1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16
2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32
3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32
4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64
5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64
6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128
7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128
8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256
9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256
10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512
11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512
12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
13 conv 512 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x 512
14 conv 425 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 425
15 detection
mask_scale: Using default '1.000000'
Loading weights from darknet.conv.weights...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
384
Loaded: 0.017535 seconds
Region Avg IOU: 0.073414, Class: 0.005123, Obj: 0.433099, No Obj: 0.503481, Avg Recall: 0.000000, count: 3
CUDA Error: mapping of buffer object failed
darknet: ./src/cuda.c:36: check_error: Assertion `0' failed.
Aborted (core dumped)


if I select, gpu =0, only run cpu,
compile and running is both ok

@AurusHuang
Copy link

Well...you complained about it too in Caffe area...
Be sure to check if your CUDA and cuDNN are installed properly.
I will also try to repeat the issue. Stay with me.

@ppantalone
Copy link

I ran into the same problem. I had a successful, new network running using darknet with gnu support using cuda 8.0 and cudnn 6, but moving this same network and code to a cuda 9.0 / cudnn 7 environment did not work. The problem seems to be related to more forward and backward convolutional methods being added in cuda 9.0 / cudnn 7, some of which return workspace size of '0'. To account for these cases, checking for zero size workspace and processing the CPU fixed the problem for me. The changes I made where in convolutional_kernels.cu and where as follows:

in forward_convolutional_layer_gpu function:
original code:
float one = 1;
cudnnConvolutionForward(cudnn_handle(),
&one,
l.srcTensorDesc,
net.input_gpu,
l.weightDesc,
l.weights_gpu,
l.convDesc,
l.fw_algo,
net.workspace,
l.workspace_size,
&one,
l.dstTensorDesc,
l.output_gpu);
new code:
if (l.workspace_size > 0)
{
float one = 1;
cudnnConvolutionForward(cudnn_handle(),
&one,
l.srcTensorDesc,
net.input_gpu,
l.weightDesc,
l.weights_gpu,
l.convDesc,
l.fw_algo,
net.workspace,
l.workspace_size,
&one,
l.dstTensorDesc,
l.output_gpu);
}
else
{
int i, j;
int m = l.n/l.groups;
int k = l.sizel.sizel.c/l.groups;
int n = l.out_w*l.out_h;
for(i = 0; i < l.batch; ++i){
for(j = 0; j < l.groups; ++j){
float a = l.weights_gpu + jl.nweights/l.groups;
float *b = net.workspace;
float c = l.output_gpu + (il.groups + j)nm;

        im2col_gpu(net.input_gpu + (i*l.groups + j)*l.c/l.groups*l.h*l.w,
            l.c/l.groups, l.h, l.w, l.size, l.stride, l.pad, b);
        gemm_gpu(0,0,m,n,k,1,a,k,b,n,1,c,n);
    }
}
}

Also in function backward_convolutional_layer_gpu
Original code:
float one = 1;
cudnnConvolutionBackwardFilter(cudnn_handle(),
&one,
l.srcTensorDesc,
net.input_gpu,
l.ddstTensorDesc,
l.delta_gpu,
l.convDesc,
l.bf_algo,
net.workspace,
l.workspace_size,
&one,
l.dweightDesc,
l.weight_updates_gpu);

if(net.delta_gpu){
    if(l.binary || l.xnor) swap_binary(&l);
    cudnnConvolutionBackwardData(cudnn_handle(),
            &one,
            l.weightDesc,
            l.weights_gpu,
            l.ddstTensorDesc,
            l.delta_gpu,
            l.convDesc,
            l.bd_algo,
            net.workspace,
            l.workspace_size,
            &one,
            l.dsrcTensorDesc,
            net.delta_gpu);
    if(l.binary || l.xnor) swap_binary(&l);
    if(l.xnor) gradient_array_gpu(original_input, l.batch*l.c*l.h*l.w, HARDTAN, net.delta_gpu);

New code
if (l.workspace_size > 0)
{
float one = 1;
cudnnConvolutionBackwardFilter(cudnn_handle(),
&one,
l.srcTensorDesc,
net.input_gpu,
l.ddstTensorDesc,
l.delta_gpu,
l.convDesc,
l.bf_algo,
net.workspace,
l.workspace_size,
&one,
l.dweightDesc,
l.weight_updates_gpu);

if(net.delta_gpu){
    if(l.binary || l.xnor) swap_binary(&l);
    cudnnConvolutionBackwardData(cudnn_handle(),
            &one,
            l.weightDesc,
            l.weights_gpu,
            l.ddstTensorDesc,
            l.delta_gpu,
            l.convDesc,
            l.bd_algo,
            net.workspace,
            l.workspace_size,
            &one,
            l.dsrcTensorDesc,
            net.delta_gpu);
    if(l.binary || l.xnor) swap_binary(&l);
    if(l.xnor) gradient_array_gpu(original_input, l.batch*l.c*l.h*l.w, HARDTAN, net.delta_gpu);
}
}
else
{
int m = l.n/l.groups;
int n = l.size*l.size*l.c/l.groups;
int k = l.out_w*l.out_h;

int i, j;
for(i = 0; i < l.batch; ++i){
    for(j = 0; j < l.groups; ++j){
        float *a = l.delta_gpu + (i*l.groups + j)*m*k;
        float *b = net.workspace;
        float *c = l.weight_updates_gpu + j*l.nweights/l.groups;

        float *im = net.input_gpu+(i*l.groups + j)*l.c/l.groups*l.h*l.w;

        im2col_gpu(im, l.c/l.groups, l.h, l.w,
                l.size, l.stride, l.pad, b);
        gemm_gpu(0,1,m,n,k,1,a,k,b,k,1,c,n);

        if(net.delta_gpu){
            if(l.binary || l.xnor) swap_binary(&l);
            a = l.weights_gpu + j*l.nweights/l.groups;
            b = l.delta_gpu + (i*l.groups + j)*m*k;
            c = net.workspace;

            gemm_gpu(1,0,n,k,m,1,a,n,b,k,0,c,k);

            col2im_gpu(net.workspace, l.c/l.groups, l.h, l.w, l.size, l.stride, 
                l.pad, net.delta_gpu + (i*l.groups + j)*l.c/l.groups*l.h*l.w);
            if(l.binary || l.xnor) {
                swap_binary(&l);
            }
        }
        if(l.xnor) gradient_array_gpu(original_input + i*l.c*l.h*l.w, l.c*l.h*l.w, HARDTAN, net.delta_gpu + i*l.c*l.h*l.w);
    }
}
}

I hope this helps and please let me know if there are any other insights into this issue or other cuda 9 / cudnn 7 conversion issues

@Liedermaus
Copy link

Liedermaus commented Jan 17, 2018

This problem is probably due to multiple versions of CUDA installed on your computer, especially if you use autoupdates for CUDA (which you shouldn't).
In my case the problem came from a PATH-variable that includes CUDA:
...:/usr/local/cuda-8.0/bin/:...
When CUDA is upgraded for examle to 9.1 you need to update this to
...:/usr/local/cuda-9.1/bin/:...
Otherwise the wrong nvidia compiler (nvcc ) is used

But I would recommend to remove the old cuda version and do a clean install with the new cuda. Please also remember to update cudnn, because it depends on the CUDA version, so you have to be carefull which version you select...

@TanFluent
Copy link

@encore2020
1.check your default cuda version(nvcc --version). and its install path(which nvcc);
2.if the default "nvcc" is not what you want. GO to "Makefile" line 49 & 51,refine the cuda path to your-cuda-path;

@Grabber
Copy link

Grabber commented Mar 8, 2018 via email

@Yumin-Sun-00
Copy link

YOu are right.. controlling my anger..

@kb1ooo
Copy link

kb1ooo commented Mar 8, 2018

LOL, I don't think it's possible to find a deep learning framework with fewer dependencies. This one has exactly 1 required dependency. I think the "mapping of buffer object" error is due to running out of GPU memory. Try increasing your subdivisions up to the same value as "batch". If that works, then decrease by powers of 2 to find the lowest value for which it will not crash. To clarify, the subdivisions is a setting in the cfg file.

@AlexeyAB
Copy link
Collaborator

AlexeyAB commented Mar 8, 2018

@waschbaer00 There is bug in C API in the OpenCV 3.4.1: opencv/opencv#10963
Use OpenCV 3.4.0 or lower.

@nuannuan1991
Copy link

Dear @TanFluent
My GPU: GeForce GT 1030, Computing capacity 6.1,ubuntu 16.04
cuda is release 7.5, V7.5.17, which nvcc is: /usr/local/cuda-7.5/bin//nvcc
so my makefile is:
GPU=1
CUDNN=0
OPENCV=0
OPENMP=0
DEBUG=0
........
ifeq ($(GPU), 1)
COMMON+= -DGPU -I/usr/local/cuda-7.5/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda-7.5/lib64 -lcuda -lcudart -lcublas -lcurand
endif

when I make it, the following error has occurred:
/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
return (char *) memcpy (__dest, __src, __n) + __n;
^
compilation terminated due to -Wfatal-errors.
Makefile:88: recipe for target 'obj/convolutional_kernels.o' failed
make: *** [obj/convolutional_kernels.o] Error 1

I compiled it many times, every time I have the same error, I really don't know how to modify it.
Can you give me some support? thanks a lot!!!

@nuannuan1991
Copy link

error
@TanFluent

@thanif
Copy link

thanif commented Nov 18, 2019

The following fix worked for me.

ifeq ($(CUDNN), 1)
COMMON+= -DCUDNN
ifeq ($(OS),Darwin) #MAC
CFLAGS+= -DCUDNN -I/usr/local/cuda/include
LDFLAGS+= -L/usr/local/cuda/lib -lcudnn
else (** The fix **)
CFLAGS+= -DCUDNN -I/usr/local/include
LDFLAGS+= -L/usr/local/include -lcudnn
endif
endif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests