Check failed: error == cudaSuccess (8 vs. 0) invalid device function #2

twtygqyy · 2015-10-08T04:52:18Z

There is no problem for me to run the demo.py of fast-rcnn, however, I had the error as follows when I try to run the demo.py of py-faster-rcnn after successfully make -j8 & make pycaffe
Loaded network /home/ubuntu/py-faster-rcnn/data/faster_rcnn_models/ZF_faster_rcnn_final.caffemodel
F1008 04:30:16.139123 5360 roi_pooling_layer.cu:91] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***

Anyone has the same problem?

twtygqyy · 2015-10-08T08:59:51Z

I'm running the code using K520 with 4G GPU memory. Is it because the code cannot support this GPU? CPU mode works fine.

rbgirshick · 2015-10-08T14:29:26Z

You might find some solutions here.

sunshineatnoon · 2015-10-13T08:54:53Z

@rbgirshick I got the same error, but I can run fast-rcnn on GPU using the same Makefile.config to compile caffe-fast-rcnn

PierreHao · 2015-10-16T01:50:59Z

I got the same error too, i have do many tests, also have try to edit Makefile.config, but there are still the same error invalid device function, but some time this error was in other .cu file not roi_pooling_layer.cu. So i think the version of caffe-fast-rcnn which faster-rcnn used has some compatibility problem? And if i want to use other version of caffe, eg. caffe in fast-rcnn, which files i should copy to fast-rnn's caffe-fast-rcnn? @rbgirshick

cxj273 · 2015-10-16T02:19:09Z

I got the same error too. I have carefully read the solutions pointed out by @rbgirshick . However, this error still exists. Finally, I have to get back to the matlab version again.

PierreHao · 2015-10-20T01:52:17Z

I have done many tests, and i found this type of error maybe called by some function of faster-rcnn which fast-rcnn doesn't have. When i use the imagenet model, there isn't this error, and all run well, after RPN training, generate proposals, the error invalid device function occurs, and very strange, call net.forward in function im_proposals() first time, it runs, when second time, error. When i comment the last layer :
layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rois'
top: 'scores'
python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "'feat_stride': 16"
}
}
all run well, so this problem maybe called by some functions of faster-rcnn? @rbgirshick

PierreHao · 2015-10-20T08:06:46Z

I have fixed the problem and after some modifications, now it runs well

twtygqyy · 2015-10-20T08:15:59Z

@PierreHao Could you share your modifications?

PierreHao · 2015-10-20T08:45:57Z

OK, for me , it works, but for your problem, you should test yourself. I found that, use cpu mode, it can run, so the problem is gpu, then in the code nms, by default, it calls nms_gpu version, so if we use caffe gpu mode and nms_gpu, there will be an error for our type of GPU (not surely, my guess). You can change nms_wrapper.py to set mode cpu, or in proposal_layer.py the function forward(), comment nms function and related code, nms_cpu mode is slow, comment nms is more fast. you can try it by yourself. Good luck!

rbgirshick · 2015-10-28T15:55:44Z

@twtygqyy @PierreHao I've pushed a small change to demo.py that I hope will fix the underlying problem. Let me know if you have a chance to check the patch. Thanks.

twtygqyy · 2015-10-30T03:02:49Z

@PierreHao Thank you for the information, I've tried to comment nms, but it could not help to pass the error.
@rbgirshick Thanks for the update. However, I still have the same error after modified the code.

PierreHao · 2015-11-02T05:36:37Z

@rbgirshick I think the problem is called by the GPU, some version of GPU couldn't call a gpu program in another gpu program, when i try titan, i works, when i try 2 different tesla, there will be the error: invalid device function(But pass the error if use cpu mode of nms)

sunshineatnoon · 2015-11-05T01:46:53Z

@PierreHao For me, changing this line: __C.USE_GPU_NMS = Ture to __C.USE_GPU_NMS = False in py-faster-rcnn/lib/fast_rcnn/config.py solves the problem, thanks for your information. It took about 0.975s for 300 object proposals. This is not faster compared to fast rcnn which takes 2.205s for 21007 object proposals. But if you don't do the nms, multiple windows for a single object will appear.

PierreHao · 2015-11-05T03:00:55Z

@sunshineatnoon 0.975s means that you use NMS cpu mode, so it runs slowly.

sunshineatnoon · 2015-11-05T03:10:56Z

@PierreHao If you delete all codes about nms, will multiple bboxes appear in an image?

PierreHao · 2015-11-05T07:01:20Z

@sunshineatnoon when you delete nms in training process, maybe there will be an error. NMS is not necessary, without nms, nultiple bboxes appears ,and you can try it.

alantrrs · 2015-11-05T07:16:30Z

Finally found the solution. You need to change the architecture to match yours in here:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/setup.py#L134

@rbgirshick any chance we can support multiple architectures in there?

sunshineatnoon · 2015-11-05T07:38:20Z

@alantrrs Can you specify how to change the architecture? My GPU is Quadro K4000.

alantrrs · 2015-11-05T07:48:24Z

@sunshineatnoon I believe your GPU has a Kepler architecture, so you can change sm_35 to sm_30 .

sunshineatnoon · 2015-11-06T01:54:58Z

@alantrrs I changed my setup.py file like this, but I still got the error:

    Extension('nms.gpu_nms',
        ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],
        library_dirs=[CUDA['lib64']],
        libraries=['cudart'],
        language='c++',
        runtime_library_dirs=[CUDA['lib64']],
        # this syntax is specific to this build system
        # we're only going to use certain compiler args with nvcc and not with gcc
        # the implementation of this trick is in customize_compiler() below
        extra_compile_args={'gcc': ["-Wno-unused-function"],
                            'nvcc': ['-arch=sm_30',
                                     '--ptxas-options=-v',
                                     '-c',
                                     '--compiler-options',
                                     "'-fPIC'"]},
        include_dirs = [numpy_include, CUDA['include']]
    )

mesnilgr · 2015-11-09T14:54:41Z

@PierreHao thanks Pierre for your solution!
in $FCN_ROOT/lib/fast_rcnn/config.py
set __C.USE_GPU_NMS = False
It worked in my case (using a GPU on AWS).

twtygqyy · 2015-11-11T13:55:12Z

@alantrrs It works, finally. Thank you so much

PierreHao · 2015-11-12T02:36:52Z

@twtygqyy what you have changed? Your gpu is old, i have tested gpu with computing power 5.0, all run well

twtygqyy · 2015-11-12T12:21:50Z

@PierreHao I changed setting from sm_35 to sm_30. I'm using AWS g2.8xlarge instance.

zimenglan-sysu-512 · 2015-12-07T04:21:29Z

@twtygqyy Hi, I got the same error too, if i set __C.USE_GPU_NMS = True in $FCN_ROOT/lib/fast_rcnn/config.py. I'm using AWS g2.0xlarge instance. So, how can i change the architecture to solve the problem? Thanks a lot.

twtygqyy · 2015-12-11T00:26:46Z

@zimenglan-sysu-512 if you're using the GPU instance on AWS, then please change the architecture setting into:

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50

Because the GPU in AWS does not support compute_35

zimenglan-sysu-512 · 2015-12-14T09:28:59Z

@twtygqyy I I changed settings from sm_35 to sm_30 and remove *_50, but it did not work. what other settings should be changed? Thanks.

zimenglan-sysu-512 · 2015-12-15T07:38:01Z

@twtygqyy I have solve the problem. In the case, I use K520 of aws. Thanks for your help. As below, there is my solution (thress steps):
1 if you're using the GPU instance on AWS, then please change the architecture setting into:
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_50,code=compute_50
Because the GPU in AWS does not support compute_35
2 I changed sm_35 into sm_30 in lib/setup.py file
3 cd lib, remove these files: utils/bbox.c nms/cpu_nms.c nms/gpu_nms.cpp, if they exist.
And then make && cd ../caffe/ && make clean && make -j8 && make pycaffe -j8

twtygqyy · 2015-12-16T00:48:23Z

@zimenglan-sysu-512 Sorry for the late reply, I'm glad to hear that your problem has been solved.
Good Luck!

rodrigob · 2016-01-31T23:07:08Z

@alantrrs thanks for the pointer ! That fixed the problem.

@sunshineatnoon did you remove the *.so files and recompile the $FRCN_ROOT/lib ?

sunshineatnoon · 2016-05-19T02:14:38Z

@rodrigob I tried to remove *.so in $FRCN_ROOT/lib/nms and $FRCN_ROOT/lib/utils, now it works, Thanks very much!

xiaohujecky · 2016-06-30T03:21:26Z

@sunshineatnoon , I use GeForce GTX 760, and come across the problem too, for the solution:

I changed sm_35 into sm_30 in lib/setup.py file, and
$FCN_ROOT/lib/fast_rcnn/config.py set __C.USE_GPU_NMS = False,
the problem solved, the diffrence of --cpu model and --gpu model is :
GPU:Detection took 0.158s for 100 object proposals
CPU:Detection took 1.505s for 100 object proposals
wonderful! thank you for your answer!

ashwin · 2016-09-20T03:06:17Z

@xiaohujecky Note that if you set __C.USE_GPU_NMS = False then changing sm_35 in lib/setup.py should have no effect. The sm_35 is a CUDA compilation setting and affects only GPU code.

In any case, I still face this error. It is pretty simple to reproduce. Run Faster-RCNN training and alongside it run a simple CUDA program that tries to cudaMalloc as much GPU memory as it can grab. Faster-RCNN training will crash with this error. Neither of the above solutions worked for me.

loretoparisi · 2016-10-25T13:10:04Z

I'm running this error with

$ docker run -ti caffe:gpu caffe --version
libdc1394 error: Failed
caffe version 1.0.0-rc3

and

$ nvidia-smi
Tue Oct 25 15:08:35 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
|  0%   48C    P8     7W / 200W |     62MiB /  8105MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
|  0%   38C    P8     7W / 200W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1241    G   /usr/lib/xorg/Xorg                              60MiB |
+-----------------------------------------------------------------------------+

and

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

More work on Python 3

manhcuogntin4 · 2017-01-26T11:03:30Z

changed sm_35 into sm_30 in lib/setup.py file, and
$FCN_ROOT/lib/fast_rcnn/config.py set __C.USE_GPU_NMS = False,
Work well in my case. Thank you @xiaohujecky

- See rbgirshick/py-faster-rcnn#2 (comment)

Fix typo in README.md

Hodapp87 · 2017-08-20T15:42:01Z

Still getting this on a GTX 1060. Tried __C.USE_GPU_NMS = False.

Update: Adding -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52;60 -DCUDA_ARCH_PTX=60 to the CMake options resolved it.

__C.USE_GPU_NMS = False made no difference.

Not sure why this issue is closed when it still seems to be a constant problem.

femelo · 2018-11-30T08:52:19Z

Finally found the solution. You need to change the architecture to match yours in here:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/setup.py#L134

@rbgirshick any chance we can support multiple architectures in there?

That worked for me.

seemon2 · 2021-03-31T04:56:10Z

Hi ...
Does anyone can advise where is this setup.py file that I need to change in windows 10 env.
I have this same error .. when trying to run OpenposeVideo.bat.
I understand my Nvidia card should be using sm 86, but am fine to remove the GPU if this is not really working in this openpose script.

rbgirshick closed this as completed Oct 8, 2015

ericromanenghi mentioned this issue Feb 24, 2016

demo.py in gpu crash and show "Out of memory" #91

Closed

wujiyang mentioned this issue Mar 21, 2016

Check failed: error == cudaSuccess (8 vs.0) invalid device function #118

Open

wait1988 mentioned this issue Apr 20, 2016

Just train 'person' class in the VOC2007 dataset,the final model can't detect anything #150

Closed

wait1988 mentioned this issue Apr 28, 2016

Floating point exception #159

Closed

liuchang8am mentioned this issue Jun 2, 2016

ResNet Implementation for Faster-rcnn #62

Closed

drazara mentioned this issue Aug 1, 2016

loss during trainig #280

Open

JohnnyY8 mentioned this issue Aug 4, 2016

Training of py-faster-rcnn on ImageNet #9

Closed

tanghy2016 mentioned this issue Nov 8, 2016

Is some BUG in bbox_transform.py?? #403

Open

hugobordigoni mentioned this issue Nov 9, 2016

cudaCheckError() failed : invalid device function smallcorgi/Faster-RCNN_TF#19

Open

shls referenced this issue in shls/py-faster-rcnn Nov 18, 2016

Merge pull request #2 from Austriker/python-3-support

9407198

More work on Python 3

panyiming mentioned this issue Dec 26, 2016

All the scores are the same #449

Open

kecaiwu mentioned this issue Feb 16, 2017

detect all objects in one image as "person"? #486

Closed

dacox added a commit to kinsolresearch/caffe that referenced this issue May 10, 2017

Patched Makefile.

526c215

- See rbgirshick/py-faster-rcnn#2 (comment)

rubin-zhou pushed a commit to rubin-zhou/py-faster-rcnn that referenced this issue Jul 30, 2017

Merge pull request rbgirshick#2 from onkarganjewar/master

599d984

Fix typo in README.md

sulth mentioned this issue Oct 8, 2017

how detect multiple object on same image #594

Closed

This was referenced Dec 1, 2017

Error running the faster_rcnn_end2end.sh to train my own network #574

Open

Floating Point Exception #740

Open

blueardour mentioned this issue Jan 16, 2018

F0116 07:59:24.649296 18466 roi_pooling_layer.cu:91] Check failed: error == cudaSuccess (9 vs. 0) invalid configuration argument abhi2610/ohem#4

Closed

Elasine mentioned this issue Apr 11, 2018

Check failed: error == cudaSuccess (11 vs. 0) invalid argument #804

Closed

Ram-Godavarthi mentioned this issue Jun 7, 2018

Training FasterRCNN without pre-trained network? #238

Closed

Sinsax mentioned this issue Jun 23, 2019

Check failed: error == cudaSuccess (8 vs. 0) invalid device function peterljq/OpenMMD#17

Closed

Check failed: error == cudaSuccess (8 vs. 0) invalid device function #2

Check failed: error == cudaSuccess (8 vs. 0) invalid device function #2

Comments

twtygqyy commented Oct 8, 2015

twtygqyy commented Oct 8, 2015

rbgirshick commented Oct 8, 2015

sunshineatnoon commented Oct 13, 2015

PierreHao commented Oct 16, 2015

cxj273 commented Oct 16, 2015

PierreHao commented Oct 20, 2015

PierreHao commented Oct 20, 2015

twtygqyy commented Oct 20, 2015

PierreHao commented Oct 20, 2015

rbgirshick commented Oct 28, 2015

twtygqyy commented Oct 30, 2015

PierreHao commented Nov 2, 2015

sunshineatnoon commented Nov 5, 2015

PierreHao commented Nov 5, 2015

sunshineatnoon commented Nov 5, 2015

PierreHao commented Nov 5, 2015

alantrrs commented Nov 5, 2015

sunshineatnoon commented Nov 5, 2015

alantrrs commented Nov 5, 2015

sunshineatnoon commented Nov 6, 2015

mesnilgr commented Nov 9, 2015

twtygqyy commented Nov 11, 2015

PierreHao commented Nov 12, 2015

twtygqyy commented Nov 12, 2015

zimenglan-sysu-512 commented Dec 7, 2015

twtygqyy commented Dec 11, 2015

zimenglan-sysu-512 commented Dec 14, 2015

zimenglan-sysu-512 commented Dec 15, 2015

twtygqyy commented Dec 16, 2015

rodrigob commented Jan 31, 2016

sunshineatnoon commented May 19, 2016 • edited Loading

xiaohujecky commented Jun 30, 2016

ashwin commented Sep 20, 2016

loretoparisi commented Oct 25, 2016

manhcuogntin4 commented Jan 26, 2017

Hodapp87 commented Aug 20, 2017 • edited Loading

femelo commented Nov 30, 2018

seemon2 commented Mar 31, 2021

sunshineatnoon commented May 19, 2016 •

edited

Loading

Hodapp87 commented Aug 20, 2017 •

edited

Loading