Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvbit_at_context_init_hook(): Assertion `cudaGetLastError() == cudaSuccess' failed #18

Closed
louisfeng opened this issue May 31, 2020 · 7 comments

Comments

@louisfeng
Copy link

louisfeng commented May 31, 2020

Hello, I am trying to run the vectoradd example in the README, but ran into the following error:

=> ./test-apps/vectoradd/vectoradd
Final sum = 100000.000000; sum/n = 1.000000 (should be ~1)

=> LD_PRELOAD=./tools/instr_count/instr_count.so ./test-apps/vectoradd/vectoradd
------------- NVBit (NVidia Binary Instrumentation Tool v1.3.1) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
      START_GRID_NUM = 0 - Beginning of the kernel gird launch interval where to apply instrumentation
        END_GRID_NUM = 4294967295 - End of the kernel launch interval where to apply instrumentation
    COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions
    EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count
   ACTIVE_FROM_START = 1 - Start instruction counting from start or wait for cuProfilerStart and cuProfilerStop
       MANGLED_NAMES = 1 - Print kernel names mangled or not
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------
vectoradd: ../../core/nvbit_tool.h:82: void nvbit_at_context_init_hook(): Assertion `cudaGetLastError() == cudaSuccess' failed.
Aborted (core dumped)

Here is the system config:
CentOS Linux release 7.8.2003
cuda-10.2
cuda-9.2

=> g++ --version
g++ (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

=> which g++
/opt/rh/devtoolset-7/root/usr/bin/g++

=> ldd ./test-apps/vectoradd/vectoradd
	linux-vdso.so.1 =>  (0x00007ffde6bb1000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f16dc044000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f16dbe28000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f16dbc24000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f16db91d000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f16db61b000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f16db405000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f16db037000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f16dc24c000)

=> ldd ./tools/instr_count/instr_count.so
	linux-vdso.so.1 =>  (0x00007fff4257d000)
	libcuda.so.1 => /usr/lib64/nvidia/libcuda.so.1 (0x00007fd451819000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fd451611000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd4513f5000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fd4511f1000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd450eea000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fd450be8000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd4509d2000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fd450604000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fd452ac2000)
	libnvidia-fatbinaryloader.so.396.69 => /usr/lib64/nvidia/libnvidia-fatbinaryloader.so.396.69 (0x00007fd4503b8000)

=> nvidia-smi
Sat May 30 21:47:15 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.69                 Driver Version: 396.69                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:1C:00.0 Off |                    0 |
| N/A   39C    P0    70W / 300W |   4665MiB / 16160MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
...

nvbit_release/tools
PATH=/usr/local/cuda-10.2/bin:$PATH make
nvbit_release/test-apps/
PATH=/usr/local/cuda-9.2/bin:$PATH make

Can't use cuda-10.2 for vectoradd

=> ./vectoradd/vectoradd
Cuda error in function '(vecAdd<<<gridSize, blockSize>>>(d_a, d_b, d_c, n))' file 'vectoradd.cu' in line 81 : CUDA driver version is insufficient for CUDA runtime version.

GDB

=> LD_PRELOAD=./tools/instr_count/instr_count.so gdb ./test-apps/vectoradd/vectoradd
------------- NVBit (NVidia Binary Instrumentation Tool v1.3.1) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
      START_GRID_NUM = 0 - Beginning of the kernel gird launch interval where to apply instrumentation
        END_GRID_NUM = 4294967295 - End of the kernel launch interval where to apply instrumentation
    COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions
    EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count
   ACTIVE_FROM_START = 1 - Start instruction counting from start or wait for cuProfilerStart and cuProfilerStop
       MANGLED_NAMES = 1 - Print kernel names mangled or not
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------
GNU gdb (GDB) Red Hat Enterprise Linux 8.1.90.20180727-44.el7
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./test-apps/vectoradd/vectoradd...(no debugging symbols found)...done.
(gdb) set environment LD_PRELOAD ./tools/instr_count/instr_count.so
(gdb) start
Temporary breakpoint 1 at 0x403070
Starting program: nvbit_release/test-apps/vectoradd/vectoradd
------------- NVBit (NVidia Binary Instrumentation Tool v1.3.1) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
      START_GRID_NUM = 0 - Beginning of the kernel gird launch interval where to apply instrumentation
        END_GRID_NUM = 4294967295 - End of the kernel launch interval where to apply instrumentation
    COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions
    EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count
   ACTIVE_FROM_START = 1 - Start instruction counting from start or wait for cuProfilerStart and cuProfilerStop
       MANGLED_NAMES = 1 - Print kernel names mangled or not
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------
warning: Unable to open "librpm.so.3" (librpm.so.3: cannot open shared object file: No such file or directory), missing debuginfos notifications will not be displayed
Missing separate debuginfo for /lib64/ld-linux-x86-64.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/27/ffd1fbc69569c776e666474eed723395e6d727.debug
Missing separate debuginfo for /lib64/librt.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/cc/d4be566dd5a8fc7fa62b224c14b698f51b0d0d.debug
Missing separate debuginfo for /lib64/libpthread.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/2b/482b3bae79def4e5bc9791bc6bbdae0e93e359.debug
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfo for /lib64/libdl.so.2
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/f2/c36986e11a291a0d4bcb3a81632b24ae2359ea.debug
Missing separate debuginfo for /lib64/libstdc++.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e2/fb6e9c483d89e8e96d73c7ccf3e3a91e91bb81.debug
Missing separate debuginfo for /lib64/libm.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/08/5d924f5d23b9f15a8ad28b7231ee93c09e13f1.debug
Missing separate debuginfo for /lib64/libgcc_s.so.1
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/da/c0179f4555aefec9e97476201802fd20c03ec5.debug
Missing separate debuginfo for /lib64/libc.so.6
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/d7/8066a9c36f5fd63e2f6ac851ae3515c4c9792a.debug
------------- NVBit (NVidia Binary Instrumentation Tool v1.3.1) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
      START_GRID_NUM = 0 - Beginning of the kernel gird launch interval where to apply instrumentation
        END_GRID_NUM = 4294967295 - End of the kernel launch interval where to apply instrumentation
    COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions
    EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count
   ACTIVE_FROM_START = 1 - Start instruction counting from start or wait for cuProfilerStart and cuProfilerStop
       MANGLED_NAMES = 1 - Print kernel names mangled or not
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------

Temporary breakpoint 1, 0x0000000000403070 in main ()
(gdb) c
Continuing.
[New Thread 0x7fffef5a2700 (LWP 455460)]
[New Thread 0x7fffeeda1700 (LWP 455465)]
vectoradd: ../../core/nvbit_tool.h:82: void nvbit_at_context_init_hook(): Assertion `cudaGetLastError() == cudaSuccess' failed.

Thread 1 "vectoradd" received signal SIGABRT, Aborted.
0x00007ffff68d2387 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff68d2387 in raise () from /lib64/libc.so.6
#1  0x00007ffff68d3a78 in abort () from /lib64/libc.so.6
#2  0x00007ffff68cb1a6 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff68cb252 in __assert_fail () from /lib64/libc.so.6
#4  0x00007ffff7ada768 in nvbit_at_context_init_hook () from ./tools/instr_count/instr_count.so
#5  0x00007ffff7ae4a25 in Nvbit::create_ctx(CUctx_st*) () from ./tools/instr_count/instr_count.so
#6  0x00007ffff7ae9b4c in nvbitToolsCallbackFunc(void*, CUtools_cb_domain_enum, unsigned int, void const*) ()
   from ./tools/instr_count/instr_count.so
#7  0x00007ffff5c55138 in ?? () from /usr/lib64/nvidia/libcuda.so.1
#8  0x00007ffff5ad0c0f in ?? () from /usr/lib64/nvidia/libcuda.so.1
#9  0x00007ffff5ad225f in ?? () from /usr/lib64/nvidia/libcuda.so.1
#10 0x00007ffff5a0b03c in ?? () from /usr/lib64/nvidia/libcuda.so.1
#11 0x00007ffff5b44ea6 in cuDevicePrimaryCtxRetain () from /usr/lib64/nvidia/libcuda.so.1
#12 0x000000000042e590 in cudart::contextStateManager::initPrimaryContext(cudart::device*) ()
#13 0x000000000042edfd in cudart::contextStateManager::initDriverContext() ()
#14 0x000000000042feec in cudart::contextStateManager::getRuntimeContextState(cudart::contextState**, bool) ()
#15 0x000000000042379c in cudart::doLazyInitContextState() ()
#16 0x0000000000407ca8 in cudart::cudaApiMalloc(void**, unsigned long) ()
#17 0x000000000044291c in cudaMalloc ()
#18 0x00000000004030c8 in main ()
@x-y-z
Copy link
Collaborator

x-y-z commented Jun 1, 2020

The error from the execution of your vectoradd tells that your GPU driver is too old to run programs compiled with CUDA 10.2. You either need to use an older CUDA/nvcc to compile your program and nvbit tools or upgrade your GPU driver.

@louisfeng
Copy link
Author

The error from the execution of your vectoradd tells that your GPU driver is too old to run programs compiled with CUDA 10.2. You either need to use an older CUDA/nvcc to compile your program and nvbit tools or upgrade your GPU driver.

Actually the vectoradd program was compiled with CUDA 9.2 and ran fine without the LD_PRELOAD.

@x-y-z
Copy link
Collaborator

x-y-z commented Jun 1, 2020

The error from the execution of your vectoradd tells that your GPU driver is too old to run programs compiled with CUDA 10.2. You either need to use an older CUDA/nvcc to compile your program and nvbit tools or upgrade your GPU driver.

Actually the vectoradd program was compiled with CUDA 9.2 and ran fine without the LD_PRELOAD.

OK, the issue is the incompatibility between the nvbit tool and your GPU driver. The nvbit tools need to be compiled with CUDA 10.2, which requires >=v440 GPU drivers, but your driver is too old.

I would suggest you to upgrade your GPU driver to >= v440

@louisfeng
Copy link
Author

Thanks @x-y-z, I will give it a try.

@louisfeng
Copy link
Author

louisfeng commented Jun 1, 2020

@x-y-z does nvbit have hard dependency on the CUDA 10.2? Our data center don't always use the latest driver. Is it possible to relax this requirement? For example, would it possible to use CUDA 10.1 (driver 418)?

@louisfeng
Copy link
Author

I was able to build and run with CUDA 10.1 (driver 418). Thanks.

------------- NVBit (NVidia Binary Instrumentation Tool v1.3.1) Loaded --------------
NVBit core environment variables (mostly for nvbit-devs):
            NVDISASM = nvdisasm - override default nvdisasm found in PATH
            NOBANNER = 0 - if set, does not print this banner
---------------------------------------------------------------------------------
         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
      START_GRID_NUM = 0 - Beginning of the kernel gird launch interval where to apply instrumentation
        END_GRID_NUM = 4294967295 - End of the kernel launch interval where to apply instrumentation
    COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions
    EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count
   ACTIVE_FROM_START = 1 - Start instruction counting from start or wait for cuProfilerStart and cuProfilerStop
       MANGLED_NAMES = 1 - Print kernel names mangled or not
        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
----------------------------------------------------------------------------------------------------

kernel 0 - _Z6vecAddPdS_S_i - #thread-blocks 98,  kernel instructions 50077, total instructions 50077
Final sum = 100000.000000; sum/n = 1.000000 (should be ~1)
Total app instructions: 50077

@ovilla
Copy link
Collaborator

ovilla commented Jun 2, 2020

Great. We are glad it was resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants