Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: error == cudaSuccess (8 vs. 0) invalid device function #2

Closed
twtygqyy opened this issue Oct 8, 2015 · 38 comments
Closed

Comments

@twtygqyy
Copy link

twtygqyy commented Oct 8, 2015

There is no problem for me to run the demo.py of fast-rcnn, however, I had the error as follows when I try to run the demo.py of py-faster-rcnn after successfully make -j8 & make pycaffe
Loaded network /home/ubuntu/py-faster-rcnn/data/faster_rcnn_models/ZF_faster_rcnn_final.caffemodel
F1008 04:30:16.139123 5360 roi_pooling_layer.cu:91] Check failed: error == cudaSuccess (8 vs. 0) invalid device function
*** Check failure stack trace: ***

Anyone has the same problem?

@twtygqyy
Copy link
Author

twtygqyy commented Oct 8, 2015

I'm running the code using K520 with 4G GPU memory. Is it because the code cannot support this GPU? CPU mode works fine.

@rbgirshick
Copy link
Owner

You might find some solutions here.

@sunshineatnoon
Copy link

@rbgirshick I got the same error, but I can run fast-rcnn on GPU using the same Makefile.config to compile caffe-fast-rcnn

@PierreHao
Copy link

I got the same error too, i have do many tests, also have try to edit Makefile.config, but there are still the same error invalid device function, but some time this error was in other .cu file not roi_pooling_layer.cu. So i think the version of caffe-fast-rcnn which faster-rcnn used has some compatibility problem? And if i want to use other version of caffe, eg. caffe in fast-rcnn, which files i should copy to fast-rnn's caffe-fast-rcnn? @rbgirshick

@cxj273
Copy link

cxj273 commented Oct 16, 2015

I got the same error too. I have carefully read the solutions pointed out by @rbgirshick . However, this error still exists. Finally, I have to get back to the matlab version again.

@PierreHao
Copy link

I have done many tests, and i found this type of error maybe called by some function of faster-rcnn which fast-rcnn doesn't have. When i use the imagenet model, there isn't this error, and all run well, after RPN training, generate proposals, the error invalid device function occurs, and very strange, call net.forward in function im_proposals() first time, it runs, when second time, error. When i comment the last layer :
layer {
name: 'proposal'
type: 'Python'
bottom: 'rpn_cls_prob_reshape'
bottom: 'rpn_bbox_pred'
bottom: 'im_info'
top: 'rois'
top: 'scores'
python_param {
module: 'rpn.proposal_layer'
layer: 'ProposalLayer'
param_str: "'feat_stride': 16"
}
}
all run well, so this problem maybe called by some functions of faster-rcnn? @rbgirshick

@PierreHao
Copy link

I have fixed the problem and after some modifications, now it runs well

@twtygqyy
Copy link
Author

@PierreHao Could you share your modifications?

@PierreHao
Copy link

OK, for me , it works, but for your problem, you should test yourself. I found that, use cpu mode, it can run, so the problem is gpu, then in the code nms, by default, it calls nms_gpu version, so if we use caffe gpu mode and nms_gpu, there will be an error for our type of GPU (not surely, my guess). You can change nms_wrapper.py to set mode cpu, or in proposal_layer.py the function forward(), comment nms function and related code, nms_cpu mode is slow, comment nms is more fast. you can try it by yourself. Good luck!

@rbgirshick
Copy link
Owner

@twtygqyy @PierreHao I've pushed a small change to demo.py that I hope will fix the underlying problem. Let me know if you have a chance to check the patch. Thanks.

@twtygqyy
Copy link
Author

@PierreHao Thank you for the information, I've tried to comment nms, but it could not help to pass the error.
@rbgirshick Thanks for the update. However, I still have the same error after modified the code.

@PierreHao
Copy link

@rbgirshick I think the problem is called by the GPU, some version of GPU couldn't call a gpu program in another gpu program, when i try titan, i works, when i try 2 different tesla, there will be the error: invalid device function(But pass the error if use cpu mode of nms)

@sunshineatnoon
Copy link

@PierreHao For me, changing this line: __C.USE_GPU_NMS = Ture to __C.USE_GPU_NMS = False in py-faster-rcnn/lib/fast_rcnn/config.py solves the problem, thanks for your information. It took about 0.975s for 300 object proposals. This is not faster compared to fast rcnn which takes 2.205s for 21007 object proposals. But if you don't do the nms, multiple windows for a single object will appear.

@PierreHao
Copy link

@sunshineatnoon 0.975s means that you use NMS cpu mode, so it runs slowly.

@sunshineatnoon
Copy link

@PierreHao If you delete all codes about nms, will multiple bboxes appear in an image?

@PierreHao
Copy link

@sunshineatnoon when you delete nms in training process, maybe there will be an error. NMS is not necessary, without nms, nultiple bboxes appears ,and you can try it.

@alantrrs
Copy link

alantrrs commented Nov 5, 2015

Finally found the solution. You need to change the architecture to match yours in here:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/setup.py#L134

@rbgirshick any chance we can support multiple architectures in there?

@sunshineatnoon
Copy link

@alantrrs Can you specify how to change the architecture? My GPU is Quadro K4000.

@alantrrs
Copy link

alantrrs commented Nov 5, 2015

@sunshineatnoon I believe your GPU has a Kepler architecture, so you can change sm_35 to sm_30 .

@sunshineatnoon
Copy link

@alantrrs I changed my setup.py file like this, but I still got the error:

    Extension('nms.gpu_nms',
        ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'],
        library_dirs=[CUDA['lib64']],
        libraries=['cudart'],
        language='c++',
        runtime_library_dirs=[CUDA['lib64']],
        # this syntax is specific to this build system
        # we're only going to use certain compiler args with nvcc and not with gcc
        # the implementation of this trick is in customize_compiler() below
        extra_compile_args={'gcc': ["-Wno-unused-function"],
                            'nvcc': ['-arch=sm_30',
                                     '--ptxas-options=-v',
                                     '-c',
                                     '--compiler-options',
                                     "'-fPIC'"]},
        include_dirs = [numpy_include, CUDA['include']]
    )

@mesnilgr
Copy link

mesnilgr commented Nov 9, 2015

@PierreHao thanks Pierre for your solution!
in $FCN_ROOT/lib/fast_rcnn/config.py
set __C.USE_GPU_NMS = False
It worked in my case (using a GPU on AWS).

@twtygqyy
Copy link
Author

@alantrrs It works, finally. Thank you so much

@PierreHao
Copy link

@twtygqyy what you have changed? Your gpu is old, i have tested gpu with computing power 5.0, all run well

@twtygqyy
Copy link
Author

@PierreHao I changed setting from sm_35 to sm_30. I'm using AWS g2.8xlarge instance.

@zimenglan-sysu-512
Copy link

@twtygqyy Hi, I got the same error too, if i set __C.USE_GPU_NMS = True in $FCN_ROOT/lib/fast_rcnn/config.py. I'm using AWS g2.0xlarge instance. So, how can i change the architecture to solve the problem? Thanks a lot.

@twtygqyy
Copy link
Author

@zimenglan-sysu-512 if you're using the GPU instance on AWS, then please change the architecture setting into:

# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50

Because the GPU in AWS does not support compute_35

@zimenglan-sysu-512
Copy link

@twtygqyy I I changed settings from sm_35 to sm_30 and remove *_50, but it did not work. what other settings should be changed? Thanks.

@zimenglan-sysu-512
Copy link

@twtygqyy I have solve the problem. In the case, I use K520 of aws. Thanks for your help. As below, there is my solution (thress steps):
1 if you're using the GPU instance on AWS, then please change the architecture setting into:
# CUDA architecture setting: going with all of them.
# For CUDA < 6.0, comment the *_50 lines for compatibility.
CUDA_ARCH := -gencode arch=compute_30,code=sm_30
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_50,code=compute_50
Because the GPU in AWS does not support compute_35
2 I changed sm_35 into sm_30 in lib/setup.py file
3 cd lib, remove these files: utils/bbox.c nms/cpu_nms.c nms/gpu_nms.cpp, if they exist.
And then make && cd ../caffe/ && make clean && make -j8 && make pycaffe -j8

@twtygqyy
Copy link
Author

@zimenglan-sysu-512 Sorry for the late reply, I'm glad to hear that your problem has been solved.
Good Luck!

@rodrigob
Copy link

@alantrrs thanks for the pointer ! That fixed the problem.

@sunshineatnoon did you remove the *.so files and recompile the $FRCN_ROOT/lib ?

@sunshineatnoon
Copy link

sunshineatnoon commented May 19, 2016

@rodrigob I tried to remove *.so in $FRCN_ROOT/lib/nms and $FRCN_ROOT/lib/utils, now it works, Thanks very much!

@xiaohujecky
Copy link

@sunshineatnoon , I use GeForce GTX 760, and come across the problem too, for the solution:

  1. I changed sm_35 into sm_30 in lib/setup.py file, and
  2. $FCN_ROOT/lib/fast_rcnn/config.py set __C.USE_GPU_NMS = False,
    the problem solved, the diffrence of --cpu model and --gpu model is :
    GPU:Detection took 0.158s for 100 object proposals
    CPU:Detection took 1.505s for 100 object proposals
    wonderful! thank you for your answer!

@ashwin
Copy link

ashwin commented Sep 20, 2016

@xiaohujecky Note that if you set __C.USE_GPU_NMS = False then changing sm_35 in lib/setup.py should have no effect. The sm_35 is a CUDA compilation setting and affects only GPU code.

In any case, I still face this error. It is pretty simple to reproduce. Run Faster-RCNN training and alongside it run a simple CUDA program that tries to cudaMalloc as much GPU memory as it can grab. Faster-RCNN training will crash with this error. Neither of the above solutions worked for me.

@loretoparisi
Copy link

I'm running this error with

$ docker run -ti caffe:gpu caffe --version
libdc1394 error: Failed
caffe version 1.0.0-rc3

and

$ nvidia-smi
Tue Oct 25 15:08:35 2016       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 370.28                 Driver Version: 370.28                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:01:00.0      On |                  N/A |
|  0%   48C    P8     7W / 200W |     62MiB /  8105MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:02:00.0     Off |                  N/A |
|  0%   38C    P8     7W / 200W |      1MiB /  8113MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1241    G   /usr/lib/xorg/Xorg                              60MiB |
+-----------------------------------------------------------------------------+

and

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

@manhcuogntin4
Copy link

changed sm_35 into sm_30 in lib/setup.py file, and
$FCN_ROOT/lib/fast_rcnn/config.py set __C.USE_GPU_NMS = False,
Work well in my case. Thank you @xiaohujecky

dacox added a commit to kinsolresearch/caffe that referenced this issue May 10, 2017
rubin-zhou pushed a commit to rubin-zhou/py-faster-rcnn that referenced this issue Jul 30, 2017
@Hodapp87
Copy link

Hodapp87 commented Aug 20, 2017

Still getting this on a GTX 1060. Tried __C.USE_GPU_NMS = False.

Update: Adding -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52;60 -DCUDA_ARCH_PTX=60 to the CMake options resolved it.

__C.USE_GPU_NMS = False made no difference.

Not sure why this issue is closed when it still seems to be a constant problem.

@femelo
Copy link

femelo commented Nov 30, 2018

Finally found the solution. You need to change the architecture to match yours in here:
https://github.com/rbgirshick/py-faster-rcnn/blob/master/lib/setup.py#L134

@rbgirshick any chance we can support multiple architectures in there?

That worked for me.

@seemon2
Copy link

seemon2 commented Mar 31, 2021

Hi ...
Does anyone can advise where is this setup.py file that I need to change in windows 10 env.
I have this same error .. when trying to run OpenposeVideo.bat.
I understand my Nvidia card should be using sm 86, but am fine to remove the GPU if this is not really working in this openpose script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests