No kernel image is available for execution on the device #5723

lee-van-oetz · 2021-02-13T11:29:20Z

I have installed through
pip3 install --upgrade jax jaxlib==0.1.61+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
nvcc --version shows

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

CUDA 11.1 is at /usr/local/cuda-11.1
and yet I am getting

RuntimeError: Unknown: no kernel image is available for execution on the device
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'

Output of nvidia-smi:

Tue Feb 16 21:26:58 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   53C    P8     6W /  N/A |    684MiB /  5944MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1700      G   /usr/lib/xorg/Xorg                106MiB |
|    0   N/A  N/A      9639      G   /usr/lib/xorg/Xorg                288MiB |
|    0   N/A  N/A      9833      G   /usr/bin/gnome-shell              136MiB |
|    0   N/A  N/A     10493      G   ...AAAAAAAAA= --shared-files        7MiB |
|    0   N/A  N/A     81098      G   ...gAAAAAAAAA --shared-files      135MiB |
+-----------------------------------------------------------------------------+

when trying the quickstart example.

The text was updated successfully, but these errors were encountered:

hawkinsp · 2021-02-16T20:24:55Z

Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)

lee-van-oetz · 2021-02-16T20:28:51Z

Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)

Added now. Thanks for flagging.

hawkinsp · 2021-02-16T22:22:52Z

I'm not sure why this happens, but I see the same thing if I have driver version 450.102.04 but CUDA version 11.1.

According to NVidia these two versions should work together (https://docs.nvidia.com/deploy/cuda-compatibility/index.html). I don't know why they aren't. I suggest either upgrading your driver or installing an older CUDA release.

alelovato · 2021-03-12T20:04:28Z

Hello,

I am experiencing a similar problem, even with a newer version of jaxlib

python3 -m pip install --upgrade jax jaxlib==0.1.62+cuda112 -f https://storage.googleapis.com/jax-releases/jax_releases.html

In my case, nvcc --version shows:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Jan_28_19:32:09_PST_2021 Cuda compilation tools, release 11.2, V11.2.142 Build cuda_11.2.r11.2/compiler.29558016_0

and nvidia-smi gives me this output:

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+`

The actual error that I am getting is the following:

Traceback (most recent call last): File "test.py", line 5, in <module> x = jnp.arange(10) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 3045, in arange return lax.iota(dtype, np.ceil(start)) # avoids materializing File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1492, in iota return iota_p.bind(dtype=dtype, shape=(size,), dimension=0) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 284, in bind out = top_trace.process_primitive(self, tracers, params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 622, in process_primitive return primitive.impl(*tracers, **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 241, in apply_primitive compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 198, in wrapper return cached(bool(config.x64_enabled), *args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 191, in cached return f(*args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 291, in xla_primitive_callable compiled = backend_compile(backend, built_c, options) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 355, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: Unknown: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'

Note that I manually set the environmental variable to the CUDA path with
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/gpfs/fs1/soft/swing/manual/cuda/11.2.1

Thanks for helping

hawkinsp · 2021-05-11T13:17:19Z

We've confirmed this issue is due to using too new a version of CUDA with too old a driver version. If you see this issue, the workaround is either to use an older CUDA release or a newer NVidia driver.

We may be able to work around at the JAX level also in the future.

kngwyu · 2021-07-14T09:52:47Z

I encountered this problem on an HPC of which I'm not an admin, and I set XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to suppress this error.
I think it can be a problem with singularity, but just want to share my experience here.

jasonkyuyim · 2021-08-16T20:36:29Z

Setting the XLA flag works for me as well. But it comes with the warning

The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.

This sounds undesireable to not be multithreading. Can we get a more permanent solution?

samuela · 2021-09-04T20:49:05Z

I'm also seeing this issue. I'm on NixOS 20.09.3301.42809feaa9f, jaxlib 0.1.71, and here's my nvidia-smi:

[nix-shell:~/dev/nixpkgs]$ nvidia-smi 
Sat Sep  4 20:47:26 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P0    25W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Based on the NVIDIA docs, it seems like these two versions should be compatible.

Passing XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 does work, but I'd rather not get hit with that slowdown.

samuela · 2021-09-05T19:10:06Z

I just upgraded to NixOS 21.05.2796.110a2c9ebbf to get driver version 470.57.02, and the issue has gone away.

A-Talavera · 2021-10-08T12:02:09Z

I am using Centos 7, and I was having the issue that is mentioned here, and the problem was solved after upgrading the nvidia driver from 460 to 470.74.

bantin · 2021-10-13T16:23:03Z

I'm having the same issue on an HPC using a singularity container. I'm not the admin so I can't update the nvidia driver. If a jax workaround that still allows multithreaded compilation is possible, that would be awesome!

xmax1 · 2021-10-25T11:43:19Z

Getting the same error and the XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 doesn't work for me and introduces a new error.

jax version

packages in environment at /home/energy/amawi/miniconda3/envs/pansatz: Name Version Build Channel jax 0.2.24 pypi_0 pypi jaxlib 0.1.73+cuda11.cudnn82 pypi_0 pypi

nvcc and nvidia-smi

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jun__2_19:15:15_PDT_2021 Cuda compilation tools, release 11.4, V11.4.48 Build cuda_11.4.r11.4/compiler.30033411_0
and

whereis nvcc nvcc: /usr/local/cuda-11.4/bin/nvcc.profile /usr/local/cuda-11.4/bin/nvcc
with
/usr/local/cuda-11.4
cuDNN v8.2.4

Before XLA flag

2021-10-25 13:59:17.326622: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc:63] cuLinkAddData fails. This is usually caused by stale driver version. 2021-10-25 13:59:17.326873: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1105] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.

...

File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 24, in run_vmc keys = rnd.PRNGKey(cfg['seed']) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 122, in PRNGKey key = prng.seed_with_impl(impl, seed) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 203, in seed_with_impl return PRNGKeyArray(impl, impl.seed(seed)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 241, in threefry_seed k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32))) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 408, in shift_right_logical return shift_right_logical_p.bind(x, y) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 272, in bind out = top_trace.process_primitive(self, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 624, in process_primitive return primitive.impl(*tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 312, in apply_primitive **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 187, in wrapper return cached(config._trace_context(), *args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 180, in cached return f(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 335, in xla_primitive_callable prim.name, donated_invars, *arg_specs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 654, in _xla_callable_uncached *arg_specs).compile().unsafe_call File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 770, in compile self.name, self.hlo(), *self.compile_args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 798, in from_xla_computation compiled = compile_or_get_cached(backend, xla_computation, options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 87, in compile_or_get_cached return backend_compile(backend, computation, compile_options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 369, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: UNKNOWN: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(66): 'status'

After including flag

File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/api.py", line 419, in cache_miss donated_invars=donated_invars, inline=inline) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1632, in bind return call_bind(self, fun, *args, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1623, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1635, in process return trace.process_call(self, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 627, in process_call return primitive.impl(f, *tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 584, in _xla_call_impl out = compiled_fun(*args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain. The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "demo_vmc.py", line 21, in <module> log = run_vmc(cfg) File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain.

without XLA flag with TF_CPP_MIN_LOG_LOVEL=0 as issue #7118 returns the same

export XLA_PYTHON_CLIENT_PREALLOCATE=false without XLA flag (solution in #7118 also returns the same

lkhphuc · 2021-11-30T18:17:32Z

I have the same problem on RTX 3090.
jax 0.2.25
jaxlib 0.1.74+cuda11.cudnn82
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
| NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |

Running with force compilation parallel =1 gives another error (could be related to #8506):

RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...

Edit:
I get everything working by using 'conda-forge' with this merge-pending jaxlib-gpu: conda-forge/jaxlib-feedstock#72

Machao-be-simple · 2021-12-20T06:01:35Z

我在RTX 3090上也有同样的问题。 jax 0.2.25 jaxlib 0.1.74+cuda11.cudnn82 Cuda编译工具，release 11.5，V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_00 | NVIDIA-SMI 470.74 驱动程序版本：470.74 CUDA 版本：11.4 |

使用 force compiler parallel =1 运行会产生另一个错误（可能与#8506相关）：
RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...
编辑：我通过使用“conda-forge”和这个合并的挂起的jaxlib-gpu来让一切正常工作：conda-forge/jaxlib-feedstock#72

Hello, can you tell me specifically how to install this package?
I currently download it locally through this link：https://anaconda.org/wolfv/jaxlib/files, and then try to install it with
conda install --use-local jaxlib-0.1.73-cuda112py39h52c056e_0.tar.bz2
but it doesn't work. . .

lkhphuc · 2021-12-20T14:44:16Z

@qinggeduoqing Install everything with conda-forge e.g conda install tensorflow jax -c conda-forge. Then install conda install jaxlib -c wolfv.

Machao-be-simple · 2021-12-21T01:49:59Z

@lkhphuc Thank you so much for your reply, let me try this way...

mshafiei · 2022-02-03T01:47:40Z

I get error with the following command,

>>> rng, init_rng = jax.random.split(jax.random.PRNGKey(1))
2022-02-02 17:44:08.505863: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2089] Execution of replica 0 failed: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 188, in split
    return _return_prng_keys(wrapped, _split(key, num))
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 174, in _split
    return key._split(num)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 187, in _split
    return PRNGKeyArray(self.impl, self.impl.split(self._keys, num))
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 435, in threefry_split
    return _threefry_split(key, int(num))  # type: ignore
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/traceback_util.py", line 165, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/api.py", line 430, in cache_miss
    out_flat = xla.xla_call(
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1681, in bind
    return call_bind(self, fun, *args, **params)
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1693, in call_bind
    outs = top_trace.process_call(primitive, fun, tracers, params)
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 594, in process_call
    return primitive.impl(f, *tracers, **params)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 145, in _xla_call_impl
    out = compiled_fun(*args)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 444, in _execute_compiled
    out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Jax can see the gpus though,

>>> jax.devices()
[GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0)]

I installed jaxlib 1.77.0 and jax 0.2.28 from source. I am using cuda 11.5 and cudnn 8.3.0. I made sure that the PATH env variable is setup properly and that the python session is loading the currect cuda libraries. I'm not sure what else can be wrong. I'm running the program on an Ubuntu 20.04 with 2 GTX 1080s.

hawkinsp · 2022-02-03T02:11:35Z

@mshafiei Can you share the output of nvidia-smi?

mshafiei · 2022-02-03T02:25:27Z

@hawkinsp sure,

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 27%   34C    P8    10W / 180W |   7328MiB /  8116MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 28%   31C    P8     6W / 180W |    212MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    214915      C   python                            107MiB |
|    0   N/A  N/A    266904      C   python                           7219MiB |
|    1   N/A  N/A    214915      C   python                            105MiB |
|    1   N/A  N/A    266904      C   python                            105MiB |

hawkinsp · 2022-02-03T02:43:52Z

Hmm. Interesting. I would have expected that to work. My best suggestion for something you to try to get unblocked would be to build jaxlib from source, explicitly opting in for the CUDA capability for your device. (https://jax.readthedocs.io/en/latest/developer.html). There's an option to specify a list of CUDA compute capabilities to the build.py via a flag (try --help).

mshafiei · 2022-02-03T02:59:28Z

@hawkinsp I am actually building jaxlib from source and passing the cuda specifications as below,

python ./build/build.py  \
  --enable_cuda \
  --cuda_path='/usr/local/cuda-11.5' \
  --cudnn_path='/usr/local/cuda-11.5' \
  --cuda_version='11.5' \
  --cudnn_version='8.3.0' \
  --cuda_compute_capabilities 8.0
  --noenable_mkl_dnn

Are these flags what you were referring to?

danieldanciu · 2022-05-20T11:43:47Z

In my case, making sure that nvcc --version and nvidia-smi were both at the same version (11.4 in my case) fixed the problem.

When nvcc --version was at 11.7:

Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

and nvidia-smi was showing cuda version 11.4:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

I was getting the error above, and I could only work around it by setting
export XLA_FLAGS --xla_gpu_force_compilation_parallelism=1. Note however that this disables parallelism in compiling your model, so if you have a big model it comes with a significant cost. In my case, parallel compilation reduced the xla compilation time from ~4.5 minutes to ~1.5 minutes.

sudhakarsingh27 · 2022-08-12T19:48:42Z

@lee-van-oetz is this fixed now?

hawkinsp · 2022-12-06T16:42:35Z

I'm pretty sure this is fixed in recent jaxlib releases. We added code to jaxlib that falls back to not using cuLink... if the driver version is too old.

hosein-cnn · 2023-02-11T09:02:01Z

Hi, I am also facing this problem.

I use the following :

Windows 10 , 19044(21H2)
Visual Studio 2019
Nvidia GeForce GTX 960m , Maxwell , Capability 5.0
Cuda Toolkit 11.7 or 11.8 or 12.0
I want to write simple Cuda C++ code.

After I installed all the packages related to C++ on VS2019, I installed Cuda Toolkit.
When I run sample code , I get the following error :

No kernel image is available for execution on the device.

I even ran Cuda 11.7, 11.8 and 12.0 on VS2019 and VS2022, But the error still exists.
I am facing this error for 20 days and I am really fed up. also deviceQuery.exe and nvidia-smi and nvcc --version runs fine.
I have also checked all the nvidia and other sites and my GPU has no problem with the Cuda version.
What factors can cause this error ?
What could be the reasons for this error?

please help me out ,
thanks all.

deoxyribose · 2023-11-22T16:28:04Z

Anyone else still struggling with this? I've tried dozens of combinations of nvidia drivers, ubuntu's nvidia-cuda-toolkit, installing different versions of cuda from nvidia's website, jaxlib and jax, nothing has solved the problem.

I'm using
Ubuntu 23.04
GeForce 940MX

nvidia-smi
Driver Version: 525.147.05 CUDA Version: 12.0

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

jaxlib==0.4.20+cuda11.cudnn86
jax==0.4.20

print(jax.devices()) shows the GPU
[cuda(id=0)]

But trying to use it results in
XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.custom_call' failed: jaxlib/gpu/prng_kernels.cc:33: operation gpuGetLastError() failed: no kernel image is available for execution on the device; current tracing scope: custom-call.1; current profiling annotation: XlaModule:#hlo_module=jit__normal,program_id=2#.

FWIW, I had jax working fine just a few days ago, might be caused by a recent update to 23.04.

hawkinsp · 2023-11-22T16:48:12Z

@deoxyribose I think the problem is that we are building for GPUs with SM version 5.2 at a minimum:

jax/.bazelrc

Line 68 in 961ba3c

    
           build:cuda --action_env TF_CUDA_COMPUTE_CAPABILITIES="sm_52,sm_60,sm_70,sm_80,compute_90"

but your GPU appears to have SM version 5.0.

The fix is to build jaxlib yourself, explicitly specifying your SM version. Try:

python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50

We've actually never shipped support for that model of GPU.

This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance. See: #5723 (comment) PiperOrigin-RevId: 584641856

hawkinsp · 2023-11-22T16:55:18Z

@deoxyribose #18644 will add sm_50 support to the next jaxlib release.

This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance. See: #5723 (comment) PiperOrigin-RevId: 584641856

deoxyribose · 2023-11-22T17:43:22Z

@hawkinsp Thanks for the quick reply!
When I try to build, I get
ERROR: @xla//xla/python:enable_gpu :: Error loading option @xla//xla/python:enable_gpu: no such package '@local_config_cuda//cuda': Repository command failed Inconsistent CUDA toolkit path: /usr vs /usr/lib

I tried removing and installing cuda with
sudo apt install nvidia-cuda-toolkit

I'm not sure if this is the expected location:

$ whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda

$ which nvcc
/usr/bin/nvcc

$ whereis cuda.h
cuda.h: /usr/include/cuda.h

I tried with
python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50 --cuda_path=/usr/
but no difference - any tips?

This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance. See: #5723 (comment) PiperOrigin-RevId: 584641856

This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance. See: #5723 (comment) PiperOrigin-RevId: 585967281

hawkinsp added the question Questions for the JAX team label Feb 16, 2021

hawkinsp self-assigned this Feb 16, 2021

WangDeyao mentioned this issue Sep 27, 2021

Training time google/nerfies#31

Closed

lkhphuc mentioned this issue Dec 1, 2021

add cuda support conda-forge/jaxlib-feedstock#72

Closed

sanjaysrikakulam mentioned this issue Dec 9, 2021

Couldn't get ptxas version string kalininalab/alphafold_non_docker#26

Closed

fcakyon mentioned this issue Dec 28, 2021

windows10 mmdet inference gives error on mmcv-full v1.4.2 (py3.9, cu113, mmdet@master) open-mmlab/mmcv#1623

Closed

merrymercy mentioned this issue Feb 20, 2022

jax-alpa build does not include Turing cards? alpa-projects/alpa#331

Closed

u1f98e mentioned this issue Jun 19, 2022

Local Development guide leads to misconfigured environment when using a GPU saharmor/dalle-playground#52

Open

sudhakarsingh27 added NVIDIA GPU Issues specific to NVIDIA GPUs P0 (urgent) An issue of the highest priority. We are addressing this urgently. (Assignee required) labels Aug 10, 2022

sudhakarsingh27 added P2 (eventual) This ought to be addressed, but has no schedule at the moment. (Assignee optional) and removed P0 (urgent) An issue of the highest priority. We are addressing this urgently. (Assignee required) labels Aug 12, 2022

ilemhadri mentioned this issue Aug 17, 2022

BUG: Conform to cuda toolkit/cuda driver compatibility #11967

Closed

dhruvbalwada mentioned this issue Oct 12, 2022

Using jax in ml-notebook instantly kills kernel pangeo-data/pangeo-docker-images#387

Closed

hawkinsp closed this as completed Dec 6, 2022

cdoersch mentioned this issue Jul 3, 2023

Inference with GPU failed google-deepmind/tapnet#33

Closed

copybara-service bot mentioned this issue Nov 22, 2023

Build CUDA kernels for sm_50 instead of sm_52. #18644

Merged

CLangford2098 mentioned this issue Mar 18, 2024

jaxlib on jetson TolimanSpace/toliman-flake#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No kernel image is available for execution on the device #5723

No kernel image is available for execution on the device #5723

lee-van-oetz commented Feb 13, 2021 •

edited

Loading

hawkinsp commented Feb 16, 2021

lee-van-oetz commented Feb 16, 2021

hawkinsp commented Feb 16, 2021

alelovato commented Mar 12, 2021

hawkinsp commented May 11, 2021

kngwyu commented Jul 14, 2021

jasonkyuyim commented Aug 16, 2021

samuela commented Sep 4, 2021 •

edited

Loading

samuela commented Sep 5, 2021

A-Talavera commented Oct 8, 2021

bantin commented Oct 13, 2021 •

edited

Loading

xmax1 commented Oct 25, 2021 •

edited

Loading

lkhphuc commented Nov 30, 2021 •

edited

Loading

Machao-be-simple commented Dec 20, 2021 •

edited

Loading

lkhphuc commented Dec 20, 2021 •

edited

Loading

Machao-be-simple commented Dec 21, 2021

mshafiei commented Feb 3, 2022 •

edited

Loading

hawkinsp commented Feb 3, 2022

mshafiei commented Feb 3, 2022

hawkinsp commented Feb 3, 2022

mshafiei commented Feb 3, 2022

danieldanciu commented May 20, 2022

sudhakarsingh27 commented Aug 12, 2022

hawkinsp commented Dec 6, 2022

hosein-cnn commented Feb 11, 2023

deoxyribose commented Nov 22, 2023 •

edited

Loading

hawkinsp commented Nov 22, 2023

hawkinsp commented Nov 22, 2023 •

edited

Loading

deoxyribose commented Nov 22, 2023 •

edited

Loading

No kernel image is available for execution on the device #5723

No kernel image is available for execution on the device #5723

Comments

lee-van-oetz commented Feb 13, 2021 • edited Loading

hawkinsp commented Feb 16, 2021

lee-van-oetz commented Feb 16, 2021

hawkinsp commented Feb 16, 2021

alelovato commented Mar 12, 2021

hawkinsp commented May 11, 2021

kngwyu commented Jul 14, 2021

jasonkyuyim commented Aug 16, 2021

samuela commented Sep 4, 2021 • edited Loading

samuela commented Sep 5, 2021

A-Talavera commented Oct 8, 2021

bantin commented Oct 13, 2021 • edited Loading

xmax1 commented Oct 25, 2021 • edited Loading

lkhphuc commented Nov 30, 2021 • edited Loading

Machao-be-simple commented Dec 20, 2021 • edited Loading

lkhphuc commented Dec 20, 2021 • edited Loading

Machao-be-simple commented Dec 21, 2021

mshafiei commented Feb 3, 2022 • edited Loading

hawkinsp commented Feb 3, 2022

mshafiei commented Feb 3, 2022

hawkinsp commented Feb 3, 2022

mshafiei commented Feb 3, 2022

danieldanciu commented May 20, 2022

sudhakarsingh27 commented Aug 12, 2022

hawkinsp commented Dec 6, 2022

hosein-cnn commented Feb 11, 2023

deoxyribose commented Nov 22, 2023 • edited Loading

hawkinsp commented Nov 22, 2023

hawkinsp commented Nov 22, 2023 • edited Loading

deoxyribose commented Nov 22, 2023 • edited Loading

lee-van-oetz commented Feb 13, 2021 •

edited

Loading

samuela commented Sep 4, 2021 •

edited

Loading

bantin commented Oct 13, 2021 •

edited

Loading

xmax1 commented Oct 25, 2021 •

edited

Loading

lkhphuc commented Nov 30, 2021 •

edited

Loading

Machao-be-simple commented Dec 20, 2021 •

edited

Loading

lkhphuc commented Dec 20, 2021 •

edited

Loading

mshafiei commented Feb 3, 2022 •

edited

Loading

deoxyribose commented Nov 22, 2023 •

edited

Loading

hawkinsp commented Nov 22, 2023 •

edited

Loading

deoxyribose commented Nov 22, 2023 •

edited

Loading