Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numba cannot find libnvvm.so <OSError: libnvvm.so: cannot open shared object file: No such file or directory> #9216

Closed
AromaticJ opened this issue Sep 28, 2023 · 9 comments
Labels
CUDA CUDA related issue/PR more info needed This issue needs more information

Comments

@AromaticJ
Copy link

Hi, When I use numba, I encounter the following error:

/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/launch.py:180: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use_env is set by default in torchrun.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Traceback (most recent call last):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/nvvm.py", line 139, in __new__
    inst.driver = open_cudalib('nvvm')
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/libs.py", line 64, in open_cudalib
    return ctypes.CDLL(path)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvvm.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./train.py", line 23, in <module>
    from utils.monitor import MonitorEval
  File "/data_e/BaiduSyncdisk/code/MonoFlex/utils/monitor.py", line 9, in <module>
    from data.datasets.evaluation.kitti_object_eval_python.eval import get_official_eval_result,get_coco_eval_result
  File "/data_e/BaiduSyncdisk/code/MonoFlex/data/datasets/evaluation/kitti_object_eval_python/eval.py", line 5, in <module>
    from .rotate_iou import rotate_iou_gpu_eval
  File "/data_e/BaiduSyncdisk/code/MonoFlex/data/datasets/evaluation/kitti_object_eval_python/rotate_iou.py", line 19, in <module>
    def trangle_area(a, b, c):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/decorators.py", line 131, in _jit
    disp.compile_device(argtypes, restype)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 875, in compile_device
    cres = compile_cuda(self.py_func, return_type, args,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/compiler.py", line 194, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 770, in compile_extra
    return pipeline.compile_extra(func)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 461, in compile_extra
    return self._compile_bytecode()
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 529, in _compile_bytecode
    return self._compile_core()
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 508, in _compile_core
    raise e
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 495, in _compile_core
    pm.run(self.state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 368, in run
    raise patched_exception
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 273, in check
    mangled = func(compiler_state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/typed_passes.py", line 464, in run_pass
    lower = self.lowering_class(targetctx, library, fndesc, interp,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/lowering.py", line 39, in __init__
    self.module = self.library.create_ir_module(self.fndesc.unique_name)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/codegen.py", line 574, in create_ir_module
    ir_module = self._codegen._create_empty_module(name)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/codegen.py", line 346, in _create_empty_module
    ir_module.data_layout = nvvm.NVVM().data_layout
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/nvvm.py", line 144, in __new__
    raise NvvmSupportError(errmsg % e)
numba.cuda.cudadrv.error.NvvmSupportError: libNVVM cannot be found. Do `conda install cudatoolkit`:
libnvvm.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/nvvm.py", line 139, in __new__
    inst.driver = open_cudalib('nvvm')
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/libs.py", line 64, in open_cudalib
    return ctypes.CDLL(path)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libnvvm.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./train.py", line 23, in <module>
    from utils.monitor import MonitorEval
  File "/data_e/BaiduSyncdisk/code/MonoFlex/utils/monitor.py", line 9, in <module>
    from data.datasets.evaluation.kitti_object_eval_python.eval import get_official_eval_result,get_coco_eval_result
  File "/data_e/BaiduSyncdisk/code/MonoFlex/data/datasets/evaluation/kitti_object_eval_python/eval.py", line 5, in <module>
    from .rotate_iou import rotate_iou_gpu_eval
  File "/data_e/BaiduSyncdisk/code/MonoFlex/data/datasets/evaluation/kitti_object_eval_python/rotate_iou.py", line 19, in <module>
    def trangle_area(a, b, c):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/decorators.py", line 131, in _jit
    disp.compile_device(argtypes, restype)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/dispatcher.py", line 875, in compile_device
    cres = compile_cuda(self.py_func, return_type, args,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/compiler.py", line 194, in compile_cuda
    cres = compiler.compile_extra(typingctx=typingctx,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 770, in compile_extra
    return pipeline.compile_extra(func)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 461, in compile_extra
    return self._compile_bytecode()
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 529, in _compile_bytecode
    return self._compile_core()
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 508, in _compile_core
    raise e
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler.py", line 495, in _compile_core
    pm.run(self.state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 368, in run
    raise patched_exception
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 356, in run
    self._runPass(idx, pass_inst, state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock
    return func(*args, **kwargs)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass
    mutated |= check(pss.run_pass, internal_state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/compiler_machinery.py", line 273, in check
    mangled = func(compiler_state)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/typed_passes.py", line 464, in run_pass
    lower = self.lowering_class(targetctx, library, fndesc, interp,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/lowering.py", line 39, in __init__
    self.module = self.library.create_ir_module(self.fndesc.unique_name)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/core/codegen.py", line 574, in create_ir_module
    ir_module = self._codegen._create_empty_module(name)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/codegen.py", line 346, in _create_empty_module
    ir_module.data_layout = nvvm.NVVM().data_layout
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/nvvm.py", line 144, in __new__
    raise NvvmSupportError(errmsg % e)
numba.cuda.cudadrv.error.NvvmSupportError: libNVVM cannot be found. Do `conda install cudatoolkit`:
libnvvm.so: cannot open shared object file: No such file or directory
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 8929) of binary: /home/zyf/miniconda3/envs/monoflex/bin/python
Traceback (most recent call last):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/launch.py", line 195, in <module>
    main()
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/launch.py", line 191, in main
    launch(args)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/launch.py", line 176, in launch
    run(args)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./train.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-09-28_10:08:03
  host      : 4c37173a9629
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 8930)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-09-28_10:08:03
  host      : 4c37173a9629
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 8929)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

The detailed environment information is as follows, by using 'numba -s':

(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2023-09-28 10:13:07.983914
UTC start time                                : 2023-09-28 10:13:07.983923
Running time (s)                              : 3.192452

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : icelake-server
CPU Count                                     : 112
Number of accessible CPUs                     : 32
List of accessible CPUs cores                 : 0-111
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit adx aes avx avx2
                                                avx512bitalg avx512bw avx512cd
                                                avx512dq avx512f avx512ifma
                                                avx512vbmi avx512vbmi2 avx512vl
                                                avx512vnni avx512vpopcntdq bmi
                                                bmi2 clflushopt clwb cmov crc32
                                                cx16 cx8 f16c fma fsgsbase fxsr
                                                gfni invpcid lzcnt mmx movbe
                                                pclmul pconfig pku popcnt prfchw
                                                rdpid rdrnd rdseed rtm sahf sgx
                                                sha sse sse2 sse3 sse4.1 sse4.2
                                                ssse3 vaes vpclmulqdq wbnoinvd
                                                xsave xsavec xsaveopt xsaves

Memory Total (MB)                             : 1547551
Memory Available (MB)                         : 1527522

__OS Information__
Platform Name                                 : Linux-3.10.0-1160.90.1.el7.x86_64-x86_64-with-glibc2.17
Platform Release                              : 3.10.0-1160.90.1.el7.x86_64
OS Name                                       : Linux
OS Version                                    : #1 SMP Thu May 4 15:21:22 UTC 2023
OS Specific Version                           : ?
Libc Version                                  : glibc 2.31

__Python Information__
Python Compiler                               : GCC 11.2.0
Python Implementation                         : CPython
Python Version                                : 3.8.18
Python Locale                                 : en_US.UTF-8

__Numba Toolchain Versions__
Numba Version                                 : 0.58.0
llvmlite Version                              : 0.41.0

__LLVM Information__
LLVM Version                                  : 14.0.6

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : 12.2
CUDA Runtime Version                          : 11.8
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Minor Version Compatibility Available    : ?
CUDA Minor Version Compatibility Needed       : ?
CUDA Minor Version Compatibility In Use       : ?
CUDA Detect Output:
Found 2 CUDA devices
id 0           b'NVIDIA A40'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 23
                                    UUID: GPU-c35028ea-7942-b3b0-79a5-159b3d2b3c0d
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 32
id 1           b'NVIDIA A40'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 152
                                    UUID: GPU-652c3288-b5ba-96cd-ab63-b57423115ea9
                                Watchdog: Disabled
             FP32/FP64 Performance Ratio: 32
Summary:
        2/2 devices are supported

CUDA Libraries Test Output:
None

__NumPy Information__
NumPy Version                                 : 1.24.4
NumPy Supported SIMD features                 : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512VPOPCNTDQ', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512VNNI', 'AVX512IFMA', 'AVX512VBMI', 'AVX512VBMI2', 'AVX512BITALG', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL')
NumPy Supported SIMD dispatch                 : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_KNL', 'AVX512_KNM', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL')
NumPy Supported SIMD baseline                 : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected             : True

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda Build                                   : not installed
Conda Env                                     : 23.7.4
Conda Platform                                : linux-64
Conda Python Version                          : 3.11.4.final.0
Conda Root Writable                           : True

__Installed Packages__
_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
absl-py                   2.0.0                    pypi_0    pypi
ca-certificates           2023.08.22           h06a4308_0  
cachetools                5.3.1                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.2.0                    pypi_0    pypi
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.11.0                   pypi_0    pypi
fire                      0.5.0                    pypi_0    pypi
fonttools                 4.42.1                   pypi_0    pypi
fvcore                    0.1.5.post20221221          pypi_0    pypi
google-auth               2.23.1                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
grpcio                    1.58.0                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.31.4                   pypi_0    pypi
importlib-metadata        6.8.0                    pypi_0    pypi
importlib-resources       6.1.0                    pypi_0    pypi
iopath                    0.1.10                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lazy-loader               0.3                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.4.4                h6a678d5_0  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
llvmlite                  0.41.0                   pypi_0    pypi
markdown                  3.4.4                    pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib                3.7.3                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
networkx                  3.1                      pypi_0    pypi
numba                     0.58.0                   pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
opencv-python             4.8.0.76                 pypi_0    pypi
openssl                   3.0.10               h7f8727e_2  
packaging                 23.1                     pypi_0    pypi
pillow                    10.0.1                   pypi_0    pypi
pip                       23.2.1           py38h06a4308_0  
portalocker               2.8.2                    pypi_0    pypi
protobuf                  4.24.3                   pypi_0    pypi
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pycocotools               2.0.7                    pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
python                    3.8.18               h955ad1f_0  
python-dateutil           2.8.2                    pypi_0    pypi
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0  
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rsa                       4.9                      pypi_0    pypi
scikit-image              0.21.0                   pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                68.0.0           py38h06a4308_0  
shapely                   2.0.1                    pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0  
tabulate                  0.9.0                    pypi_0    pypi
tensorboard               2.14.0                   pypi_0    pypi
tensorboard-data-server   0.7.1                    pypi_0    pypi
tensorboardx              2.6.2.2                  pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
tifffile                  2023.7.10                pypi_0    pypi
tk                        8.6.12               h1ccaba5_0  
torch                     1.13.1+cu117             pypi_0    pypi
torchaudio                0.13.1+cu117             pypi_0    pypi
torchvision               0.14.1+cu117             pypi_0    pypi
tqdm                      4.66.1                   pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
urllib3                   2.0.5                    pypi_0    pypi
werkzeug                  2.3.7                    pypi_0    pypi
wheel                     0.38.4           py38h06a4308_0  
xz                        5.4.2                h5eee18b_0  
yacs                      0.1.8                    pypi_0    pypi
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               h5eee18b_0  

No errors reported.


__Warning log__
Warning (cuda): Probing CUDA failed (device and driver present, runtime problem?)
(cuda) <class 'TypeError'>: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

I was able to confirm that my GPU driver and CUDA were installed correctly and that the rest of the pytorch model training code was running properly. My environment is deployed in a docker image hosted by CenterOS and in the image is Ubuntu.

(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ nvidia-smi
Thu Sep 28 10:27:48 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A40                     Off | 00000000:17:00.0 Off |                    0 |
|  0%   37C    P0              78W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A40                     Off | 00000000:98:00.0 Off |                    0 |
|  0%   36C    P0              81W / 300W |      4MiB / 46068MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ uname --all
Linux 4c37173a9629 3.10.0-1160.90.1.el7.x86_64 #1 SMP Thu May 4 15:21:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux


@esc esc added the CUDA CUDA related issue/PR label Sep 28, 2023
@gmarkall
Copy link
Member

If you run

python -c "from numba import cuda; cuda.cudadrv.libs.test()"

what is the output?

@stuartarchibald stuartarchibald added the more info needed This issue needs more information label Oct 3, 2023
@AromaticJ
Copy link
Author

If you run

python -c "from numba import cuda; cuda.cudadrv.libs.test()"

what is the output?

Forgive me for replying so late, I am on vacation recently

(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ python -c "from numba import cuda; cuda.cudadrv.libs.test()"
Finding driver from candidates:
        [libcuda.so](http://libcuda.so/)
        libcuda.so.1
        /usr/lib/[libcuda.so](http://libcuda.so/)
        /usr/lib/libcuda.so.1
        /usr/lib64/[libcuda.so](http://libcuda.so/)
        /usr/lib64/libcuda.so.1
Using loader <class 'ctypes.CDLL'>
        Trying to load driver...        ok
                Loaded from [libcuda.so](http://libcuda.so/)
        Mapped [libcuda.so](http://libcuda.so/) paths:
                /usr/lib/x86_64-linux-gnu/libcuda.so.535.104.05
Finding nvvm from CUDA_HOME
        Located at [libnvvm.so](http://libnvvm.so/)
        Trying to open library...       ERROR: failed to open nvvm:
[libnvvm.so](http://libnvvm.so/): cannot open shared object file: No such file or directory
Finding nvrtc from CUDA_HOME
        Located at [libnvrtc.so](http://libnvrtc.so/)
        Trying to open library...       ok
Finding cudart from CUDA_HOME
        Located at [libcudart.so](http://libcudart.so/)
        Trying to open library...       ok
Finding cudadevrt from CUDA_HOME
        Located at libcudadevrt.a
        Checking library...     ERROR: failed to find cudadevrt:
libcudadevrt.a not found
Finding libdevice from CUDA_HOME
        Located at None
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/libs.py", line 169, in test
    check_static_lib(path)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/libs.py", line 68, in check_static_lib
    if not os.path.isfile(path):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
        Checking library...(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$

@gmarkall
Copy link
Member

gmarkall commented Oct 5, 2023

A couple of notes:

  • Seeing http://libcuda.so/ in the output is very odd - is something interfering with your terminal?
  • Do you have CUDA_HOME set? If so, does it point to the location of a CUDA toolkit?

@AromaticJ
Copy link
Author

AromaticJ commented Oct 7, 2023

A couple of notes:

  • Seeing http://libcuda.so/ in the output is very odd - is something interfering with your terminal?
  • Do you have CUDA_HOME set? If so, does it point to the location of a CUDA toolkit?

Your reply is greatly appreciated:

For note 1:
I found out that when I copied the output log the text editor automatically took some of the text and displayed it as a hyperlink
When I remove this interference, the output should look like this. And I carefully checked the output of the terminal and the content on github.

(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ python -c "from numba import cuda; cuda.cudadrv.libs.test()"
Finding driver from candidates:
        libcuda.so
        libcuda.so.1
        /usr/lib/libcuda.so
        /usr/lib/libcuda.so.1
        /usr/lib64/libcuda.so
        /usr/lib64/libcuda.so.1
Using loader <class 'ctypes.CDLL'>
        Trying to load driver...        ok
                Loaded from libcuda.so
        Mapped libcuda.so paths:
                /usr/lib/x86_64-linux-gnu/libcuda.so.535.104.05
Finding nvvm from CUDA_HOME
        Located at libnvvm.so
        Trying to open library...       ERROR: failed to open nvvm:
libnvvm.so: cannot open shared object file: No such file or directory
Finding nvrtc from CUDA_HOME
        Located at libnvrtc.so
        Trying to open library...       ok
Finding cudart from CUDA_HOME
        Located at libcudart.so
        Trying to open library...       ok
Finding cudadevrt from CUDA_HOME
        Located at libcudadevrt.a
        Checking library...     ERROR: failed to find cudadevrt:
libcudadevrt.a not found
Finding libdevice from CUDA_HOME
        Located at None
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/libs.py", line 169, in test
    check_static_lib(path)
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/site-packages/numba/cuda/cudadrv/libs.py", line 68, in check_static_lib
    if not os.path.isfile(path):
  File "/home/zyf/miniconda3/envs/monoflex/lib/python3.8/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
        Checking library...(monoflex) zyf@4c37173a9629:/data_e/BaiduSyncdisk/code/MonoFlex$ 

We can see that two errors occurred

  1. ERROR: failed to open nvvm:
  2. ERROR: failed to find cudadevrt:
    But I can found them
(monoflex) zyf@4c37173a9629:/usr$ find -name libnvvm.so
./local/cuda-11.8/nvvm/lib64/libnvvm.so
(monoflex) zyf@4c37173a9629:/usr$ find -name libcudadevrt.a
./local/cuda-11.8/targets/x86_64-linux/lib/libcudadevrt.a

For note 2:
I have added the CUDA HOME path to my bashrc as follows

# CUDA PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
export NUMBAPRO_NVVM=$NUMBAPRO_NVVM:/usr/local/cuda/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=$NUMBAPRO_LIBDEVICE:/usr/local/cuda/nvvm/libdevice/

@gmarkall
Copy link
Member

gmarkall commented Oct 9, 2023

If your CUDA SDK is in /usr/local/cuda-11.8 then I think you would need to export:

export CUDA_HOME=/usr/local/cuda-11.8

Can you try setting this and trying the checks again please?

(Note that the NUMBAPRO_* environment variables have no effect on Numba)

@AromaticJ
Copy link
Author

If your CUDA SDK is in /usr/local/cuda-11.8 then I think you would need to export:

export CUDA_HOME=/usr/local/cuda-11.8

Can you try setting this and trying the checks again please?

(Note that the NUMBAPRO_* environment variables have no effect on Numba)

Thank you for the response, according to your suggestions, I have modified bashrc as follows

# CUDA PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64
export PATH=$PATH:/usr/local/cuda-11.8/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.8

But nothing has changed in the output,
And the output of python -c "from numba import cuda; cuda.cudadrv.libs.test()" is exactly the same as above.

@gmarkall
Copy link
Member

gmarkall commented Oct 9, 2023

You have:

export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.8

Could you instead try:

export CUDA_HOME=/usr/local/cuda-11.8

please? (CUDA_HOME should be a path, not a colon-separated list of paths)

@AromaticJ
Copy link
Author

You have:

export CUDA_HOME=$CUDA_HOME:/usr/local/cuda-11.8

Could you instead try:

export CUDA_HOME=/usr/local/cuda-11.8

please? (CUDA_HOME should be a path, not a colon-separated list of paths)

Yes, It work well! Thank you very much!

@gmarkall
Copy link
Member

@AromaticJ Thanks for the feedback! It looks like Numba doesn't handle it very gracefully when CUDA_HOME is incorrectly set, so I'll create a new issue for that - it would be much better if it made it clear that CUDA_HOME was not correct, rather than giving the exception you got, and failing to report the library test output in numba -s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA CUDA related issue/PR more info needed This issue needs more information
Projects
None yet
Development

No branches or pull requests

4 participants