Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No kernel image is available for execution on the device #5723

Closed
lee-van-oetz opened this issue Feb 13, 2021 · 29 comments
Closed

No kernel image is available for execution on the device #5723

lee-van-oetz opened this issue Feb 13, 2021 · 29 comments
Assignees
Labels
NVIDIA GPU Issues specific to NVIDIA GPUs P2 (eventual) This ought to be addressed, but has no schedule at the moment. (Assignee optional) question Questions for the JAX team

Comments

@lee-van-oetz
Copy link

lee-van-oetz commented Feb 13, 2021

I have installed through
pip3 install --upgrade jax jaxlib==0.1.61+cuda111 -f https://storage.googleapis.com/jax-releases/jax_releases.html
nvcc --version shows

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

CUDA 11.1 is at /usr/local/cuda-11.1
and yet I am getting

RuntimeError: Unknown: no kernel image is available for execution on the device
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'

Output of nvidia-smi:

Tue Feb 16 21:26:58 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 166...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   53C    P8     6W /  N/A |    684MiB /  5944MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1700      G   /usr/lib/xorg/Xorg                106MiB |
|    0   N/A  N/A      9639      G   /usr/lib/xorg/Xorg                288MiB |
|    0   N/A  N/A      9833      G   /usr/bin/gnome-shell              136MiB |
|    0   N/A  N/A     10493      G   ...AAAAAAAAA= --shared-files        7MiB |
|    0   N/A  N/A     81098      G   ...gAAAAAAAAA --shared-files      135MiB |
+-----------------------------------------------------------------------------+

when trying the quickstart example.

@hawkinsp
Copy link
Member

Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)

@hawkinsp hawkinsp added the question Questions for the JAX team label Feb 16, 2021
@hawkinsp hawkinsp self-assigned this Feb 16, 2021
@lee-van-oetz
Copy link
Author

Can you share the output of nvidia-smi as well? (This shows what GPU you have, etc.)

Added now. Thanks for flagging.

@hawkinsp
Copy link
Member

I'm not sure why this happens, but I see the same thing if I have driver version 450.102.04 but CUDA version 11.1.

According to NVidia these two versions should work together (https://docs.nvidia.com/deploy/cuda-compatibility/index.html). I don't know why they aren't. I suggest either upgrading your driver or installing an older CUDA release.

@alelovato
Copy link

Hello,

I am experiencing a similar problem, even with a newer version of jaxlib

python3 -m pip install --upgrade jax jaxlib==0.1.62+cuda112 -f https://storage.googleapis.com/jax-releases/jax_releases.html

In my case, nvcc --version shows:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Jan_28_19:32:09_PST_2021 Cuda compilation tools, release 11.2, V11.2.142 Build cuda_11.2.r11.2/compiler.29558016_0

and nvidia-smi gives me this output:

`Fri Mar 12 13:54:27 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04 Driver Version: 450.102.04 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 A100-SXM4-40GB On | 00000000:07:00.0 Off | 0 |
| N/A 23C P0 52W / 400W | 0MiB / 40537MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+`

The actual error that I am getting is the following:

Traceback (most recent call last): File "test.py", line 5, in <module> x = jnp.arange(10) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 3045, in arange return lax.iota(dtype, np.ceil(start)) # avoids materializing File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 1492, in iota return iota_p.bind(dtype=dtype, shape=(size,), dimension=0) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 284, in bind out = top_trace.process_primitive(self, tracers, params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/core.py", line 622, in process_primitive return primitive.impl(*tracers, **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 241, in apply_primitive compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), **params) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 198, in wrapper return cached(bool(config.x64_enabled), *args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/_src/util.py", line 191, in cached return f(*args, **kwargs) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 291, in xla_primitive_callable compiled = backend_compile(backend, built_c, options) File "/gpfs/fs1/home/lovato/my_env/lib/python3.8/site-packages/jax/interpreters/xla.py", line 355, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: Unknown: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(44): 'cuLinkAddData( link_state, CU_JIT_INPUT_CUBIN, static_cast<void*>(image.bytes.data()), image.bytes.size(), "", 0, nullptr, nullptr)'

Note that I manually set the environmental variable to the CUDA path with
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/gpfs/fs1/soft/swing/manual/cuda/11.2.1

Thanks for helping

@hawkinsp
Copy link
Member

We've confirmed this issue is due to using too new a version of CUDA with too old a driver version. If you see this issue, the workaround is either to use an older CUDA release or a newer NVidia driver.

We may be able to work around at the JAX level also in the future.

@kngwyu
Copy link

kngwyu commented Jul 14, 2021

I encountered this problem on an HPC of which I'm not an admin, and I set XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to suppress this error.
I think it can be a problem with singularity, but just want to share my experience here.

@jasonkyuyim
Copy link

Setting the XLA flag works for me as well. But it comes with the warning

The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.

This sounds undesireable to not be multithreading. Can we get a more permanent solution?

@samuela
Copy link
Contributor

samuela commented Sep 4, 2021

I'm also seeing this issue. I'm on NixOS 20.09.3301.42809feaa9f, jaxlib 0.1.71, and here's my nvidia-smi:

[nix-shell:~/dev/nixpkgs]$ nvidia-smi 
Sat Sep  4 20:47:26 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.38       Driver Version: 455.38       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   34C    P0    25W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Based on the NVIDIA docs, it seems like these two versions should be compatible.

Passing XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 does work, but I'd rather not get hit with that slowdown.

@samuela
Copy link
Contributor

samuela commented Sep 5, 2021

I just upgraded to NixOS 21.05.2796.110a2c9ebbf to get driver version 470.57.02, and the issue has gone away.

@A-Talavera
Copy link

I am using Centos 7, and I was having the issue that is mentioned here, and the problem was solved after upgrading the nvidia driver from 460 to 470.74.

@bantin
Copy link

bantin commented Oct 13, 2021

I'm having the same issue on an HPC using a singularity container. I'm not the admin so I can't update the nvidia driver. If a jax workaround that still allows multithreaded compilation is possible, that would be awesome!

@xmax1
Copy link

xmax1 commented Oct 25, 2021

Getting the same error and the XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 doesn't work for me and introduces a new error.

jax version

packages in environment at /home/energy/amawi/miniconda3/envs/pansatz: Name Version Build Channel jax 0.2.24 pypi_0 pypi jaxlib 0.1.73+cuda11.cudnn82 pypi_0 pypi

nvcc and nvidia-smi

nvidia-smi Mon Oct 25 13:38:50 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 465.31 Driver Version: 465.31 CUDA Version: 11.3 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:1A:00.0 Off | N/A | | 30% 26C P8 20W / 350W | 1MiB / 24268MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Wed_Jun__2_19:15:15_PDT_2021 Cuda compilation tools, release 11.4, V11.4.48 Build cuda_11.4.r11.4/compiler.30033411_0
and

whereis nvcc nvcc: /usr/local/cuda-11.4/bin/nvcc.profile /usr/local/cuda-11.4/bin/nvcc
with
/usr/local/cuda-11.4
cuDNN v8.2.4

Before XLA flag

2021-10-25 13:59:17.326622: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc:63] cuLinkAddData fails. This is usually caused by stale driver version. 2021-10-25 13:59:17.326873: E external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1105] The CUDA linking API did not work. Please use XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 to bypass it, but expect to get longer compilation time due to the lack of multi-threading.

...

File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 24, in run_vmc keys = rnd.PRNGKey(cfg['seed']) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 122, in PRNGKey key = prng.seed_with_impl(impl, seed) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 203, in seed_with_impl return PRNGKeyArray(impl, impl.seed(seed)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 241, in threefry_seed k1 = convert(lax.shift_right_logical(seed_arr, lax._const(seed_arr, 32))) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 408, in shift_right_logical return shift_right_logical_p.bind(x, y) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 272, in bind out = top_trace.process_primitive(self, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 624, in process_primitive return primitive.impl(*tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 312, in apply_primitive **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 187, in wrapper return cached(config._trace_context(), *args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/util.py", line 180, in cached return f(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 335, in xla_primitive_callable prim.name, donated_invars, *arg_specs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 654, in _xla_callable_uncached *arg_specs).compile().unsafe_call File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 770, in compile self.name, self.hlo(), *self.compile_args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 798, in from_xla_computation compiled = compile_or_get_cached(backend, xla_computation, options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 87, in compile_or_get_cached return backend_compile(backend, computation, compile_options) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 369, in backend_compile return backend.compile(built_c, compile_options=options) RuntimeError: UNKNOWN: no kernel image is available for execution on the device in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_asm_compiler.cc(66): 'status'

After including flag

File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/api.py", line 419, in cache_miss donated_invars=donated_invars, inline=inline) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1632, in bind return call_bind(self, fun, *args, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1623, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 1635, in process return trace.process_call(self, fun, tracers, params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/core.py", line 627, in process_call return primitive.impl(f, *tracers, **params) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 584, in _xla_call_impl out = compiled_fun(*args) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain. The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "demo_vmc.py", line 21, in <module> log = run_vmc(cfg) File "/home/energy/amawi/projects/nn_ansatz/src/nn_ansatz/routines.py", line 26, in run_vmc keys = rnd.split(keys, cfg['n_devices']).reshape(cfg['n_devices'], 2) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 191, in split return _return_prng_keys(wrapped, _split(key, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/random.py", line 177, in _split return key._split(num) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 191, in _split return PRNGKeyArray(self.impl, self.impl.split(self._keys, num)) File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/_src/prng.py", line 422, in threefry_split return _threefry_split(key, int(num)) # type: ignore File "/home/energy/amawi/miniconda3/envs/pansatz/lib/python3.7/site-packages/jax/interpreters/xla.py", line 977, in _execute_compiled out_bufs = compiled.execute(input_bufs) RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: the provided PTX was compiled with an unsupported toolchain.

without XLA flag with TF_CPP_MIN_LOG_LOVEL=0 as issue #7118 returns the same

export XLA_PYTHON_CLIENT_PREALLOCATE=false without XLA flag (solution in #7118 also returns the same

@lkhphuc
Copy link
Contributor

lkhphuc commented Nov 30, 2021

I have the same problem on RTX 3090.
jax 0.2.25
jaxlib 0.1.74+cuda11.cudnn82
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
| NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |

Running with force compilation parallel =1 gives another error (could be related to #8506):

RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...

Edit:
I get everything working by using 'conda-forge' with this merge-pending jaxlib-gpu: conda-forge/jaxlib-feedstock#72

@Machao-be-simple
Copy link

Machao-be-simple commented Dec 20, 2021

我在RTX 3090上也有同样的问题。 jax 0.2.25 jaxlib 0.1.74+cuda11.cudnn82 Cuda编译工具,release 11.5,V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_00 | NVIDIA-SMI 470.74 驱动程序版本:470.74 CUDA 版本:11.4 |

使用 force compiler parallel =1 运行会产生另一个错误(可能与#8506相关):

RuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
...

编辑: 我通过使用“conda-forge”和这个合并的挂起的jaxlib-gpu来让一切正常工作:conda-forge/jaxlib-feedstock#72

Hello, can you tell me specifically how to install this package?
I currently download it locally through this link:https://anaconda.org/wolfv/jaxlib/files, and then try to install it with
conda install --use-local jaxlib-0.1.73-cuda112py39h52c056e_0.tar.bz2
but it doesn't work. . .

@lkhphuc
Copy link
Contributor

lkhphuc commented Dec 20, 2021

@qinggeduoqing Install everything with conda-forge e.g conda install tensorflow jax -c conda-forge. Then install conda install jaxlib -c wolfv.

@Machao-be-simple
Copy link

@lkhphuc Thank you so much for your reply, let me try this way...

@mshafiei
Copy link

mshafiei commented Feb 3, 2022

I get error with the following command,

>>> rng, init_rng = jax.random.split(jax.random.PRNGKey(1))
2022-02-02 17:44:08.505863: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2089] Execution of replica 0 failed: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 188, in split
    return _return_prng_keys(wrapped, _split(key, num))
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/random.py", line 174, in _split
    return key._split(num)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 187, in _split
    return PRNGKeyArray(self.impl, self.impl.split(self._keys, num))
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/prng.py", line 435, in threefry_split
    return _threefry_split(key, int(num))  # type: ignore
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/traceback_util.py", line 165, in reraise_with_filtered_traceback
    return fun(*args, **kwargs)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/api.py", line 430, in cache_miss
    out_flat = xla.xla_call(
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1681, in bind
    return call_bind(self, fun, *args, **params)
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 1693, in call_bind
    outs = top_trace.process_call(primitive, fun, tracers, params)
  File "/home/mohammad/Projects/optimizer/jax/jax/core.py", line 594, in process_call
    return primitive.impl(f, *tracers, **params)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 145, in _xla_call_impl
    out = compiled_fun(*args)
  File "/home/mohammad/Projects/optimizer/jax/jax/_src/dispatch.py", line 444, in _execute_compiled
    out_bufs = compiled.execute(input_bufs)
jax._src.traceback_util.UnfilteredStackTrace: RuntimeError: INTERNAL: CustomCall failed: jaxlib/cuda_prng_kernels.cc:30: operation cudaGetLastError() failed: no kernel image is available for execution on the device

The stack trace below excludes JAX-internal frames.
The preceding is the original exception that occurred, unmodified.

Jax can see the gpus though,

>>> jax.devices()
[GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0)]

I installed jaxlib 1.77.0 and jax 0.2.28 from source. I am using cuda 11.5 and cudnn 8.3.0. I made sure that the PATH env variable is setup properly and that the python session is loading the currect cuda libraries. I'm not sure what else can be wrong. I'm running the program on an Ubuntu 20.04 with 2 GTX 1080s.

@hawkinsp
Copy link
Member

hawkinsp commented Feb 3, 2022

@mshafiei Can you share the output of nvidia-smi?

@mshafiei
Copy link

mshafiei commented Feb 3, 2022

@hawkinsp sure,

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 27%   34C    P8    10W / 180W |   7328MiB /  8116MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 28%   31C    P8     6W / 180W |    212MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    214915      C   python                            107MiB |
|    0   N/A  N/A    266904      C   python                           7219MiB |
|    1   N/A  N/A    214915      C   python                            105MiB |
|    1   N/A  N/A    266904      C   python                            105MiB |

@hawkinsp
Copy link
Member

hawkinsp commented Feb 3, 2022

Hmm. Interesting. I would have expected that to work. My best suggestion for something you to try to get unblocked would be to build jaxlib from source, explicitly opting in for the CUDA capability for your device. (https://jax.readthedocs.io/en/latest/developer.html). There's an option to specify a list of CUDA compute capabilities to the build.py via a flag (try --help).

@mshafiei
Copy link

mshafiei commented Feb 3, 2022

@hawkinsp I am actually building jaxlib from source and passing the cuda specifications as below,

python ./build/build.py  \
  --enable_cuda \
  --cuda_path='/usr/local/cuda-11.5' \
  --cudnn_path='/usr/local/cuda-11.5' \
  --cuda_version='11.5' \
  --cudnn_version='8.3.0' \
  --cuda_compute_capabilities 8.0
  --noenable_mkl_dnn

Are these flags what you were referring to?

@danieldanciu
Copy link

In my case, making sure that nvcc --version and nvidia-smi were both at the same version (11.4 in my case) fixed the problem.

When nvcc --version was at 11.7:

Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

and nvidia-smi was showing cuda version 11.4:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

I was getting the error above, and I could only work around it by setting
export XLA_FLAGS --xla_gpu_force_compilation_parallelism=1. Note however that this disables parallelism in compiling your model, so if you have a big model it comes with a significant cost. In my case, parallel compilation reduced the xla compilation time from ~4.5 minutes to ~1.5 minutes.

@sudhakarsingh27 sudhakarsingh27 added NVIDIA GPU Issues specific to NVIDIA GPUs P0 (urgent) An issue of the highest priority. We are addressing this urgently. (Assignee required) labels Aug 10, 2022
@sudhakarsingh27
Copy link
Collaborator

@lee-van-oetz is this fixed now?

@sudhakarsingh27 sudhakarsingh27 added P2 (eventual) This ought to be addressed, but has no schedule at the moment. (Assignee optional) and removed P0 (urgent) An issue of the highest priority. We are addressing this urgently. (Assignee required) labels Aug 12, 2022
@hawkinsp
Copy link
Member

hawkinsp commented Dec 6, 2022

I'm pretty sure this is fixed in recent jaxlib releases. We added code to jaxlib that falls back to not using cuLink... if the driver version is too old.

@hawkinsp hawkinsp closed this as completed Dec 6, 2022
@hosein-cnn
Copy link

Hi, I am also facing this problem.

I use the following :

  1. Windows 10 , 19044(21H2)
  2. Visual Studio 2019
  3. Nvidia GeForce GTX 960m , Maxwell , Capability 5.0
  4. Cuda Toolkit 11.7 or 11.8 or 12.0
  5. I want to write simple Cuda C++ code.

After I installed all the packages related to C++ on VS2019, I installed Cuda Toolkit.
When I run sample code , I get the following error :

  • No kernel image is available for execution on the device.

I even ran Cuda 11.7, 11.8 and 12.0 on VS2019 and VS2022, But the error still exists.
I am facing this error for 20 days and I am really fed up. also deviceQuery.exe and nvidia-smi and nvcc --version runs fine.
I have also checked all the nvidia and other sites and my GPU has no problem with the Cuda version.
What factors can cause this error ?
What could be the reasons for this error?

please help me out ,
thanks all.

@deoxyribose
Copy link

deoxyribose commented Nov 22, 2023

Anyone else still struggling with this? I've tried dozens of combinations of nvidia drivers, ubuntu's nvidia-cuda-toolkit, installing different versions of cuda from nvidia's website, jaxlib and jax, nothing has solved the problem.

I'm using
Ubuntu 23.04
GeForce 940MX

nvidia-smi
Driver Version: 525.147.05 CUDA Version: 12.0

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

jaxlib==0.4.20+cuda11.cudnn86
jax==0.4.20

print(jax.devices()) shows the GPU
[cuda(id=0)]

But trying to use it results in
XlaRuntimeError: INTERNAL: Failed to execute XLA Runtime executable: run time error: custom call 'xla.gpu.custom_call' failed: jaxlib/gpu/prng_kernels.cc:33: operation gpuGetLastError() failed: no kernel image is available for execution on the device; current tracing scope: custom-call.1; current profiling annotation: XlaModule:#hlo_module=jit__normal,program_id=2#.

FWIW, I had jax working fine just a few days ago, might be caused by a recent update to 23.04.

@hawkinsp
Copy link
Member

@deoxyribose I think the problem is that we are building for GPUs with SM version 5.2 at a minimum:

jax/.bazelrc

Line 68 in 961ba3c

build:cuda --action_env TF_CUDA_COMPUTE_CAPABILITIES="sm_52,sm_60,sm_70,sm_80,compute_90"

but your GPU appears to have SM version 5.0.

The fix is to build jaxlib yourself, explicitly specifying your SM version. Try:

python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50

We've actually never shipped support for that model of GPU.

copybara-service bot pushed a commit that referenced this issue Nov 22, 2023
This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance.

See: #5723 (comment)
PiperOrigin-RevId: 584641856
@hawkinsp
Copy link
Member

hawkinsp commented Nov 22, 2023

@deoxyribose #18644 will add sm_50 support to the next jaxlib release.

copybara-service bot pushed a commit that referenced this issue Nov 22, 2023
This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance.

See: #5723 (comment)
PiperOrigin-RevId: 584641856
@deoxyribose
Copy link

deoxyribose commented Nov 22, 2023

@hawkinsp Thanks for the quick reply!
When I try to build, I get
ERROR: @xla//xla/python:enable_gpu :: Error loading option @xla//xla/python:enable_gpu: no such package '@local_config_cuda//cuda': Repository command failed Inconsistent CUDA toolkit path: /usr vs /usr/lib

I tried removing and installing cuda with
sudo apt install nvidia-cuda-toolkit

I'm not sure if this is the expected location:

$ whereis cuda
cuda: /usr/lib/cuda /usr/include/cuda

$ which nvcc
/usr/bin/nvcc

$ whereis cuda.h
cuda.h: /usr/include/cuda.h

I tried with
python build/build.py --enable_cuda --cuda_compute_capabilities=sm_50 --cuda_path=/usr/
but no difference - any tips?

copybara-service bot pushed a commit that referenced this issue Nov 22, 2023
This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance.

See: #5723 (comment)
PiperOrigin-RevId: 584641856
copybara-service bot pushed a commit that referenced this issue Nov 27, 2023
This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance.

See: #5723 (comment)
PiperOrigin-RevId: 584641856
copybara-service bot pushed a commit that referenced this issue Nov 28, 2023
This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance.

See: #5723 (comment)
PiperOrigin-RevId: 584641856
copybara-service bot pushed a commit that referenced this issue Nov 28, 2023
This improves compatibility with older Maxwell cards, and it probably doesn't matter a whole lot for performance.

See: #5723 (comment)
PiperOrigin-RevId: 585967281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NVIDIA GPU Issues specific to NVIDIA GPUs P2 (eventual) This ought to be addressed, but has no schedule at the moment. (Assignee optional) question Questions for the JAX team
Projects
None yet
Development

No branches or pull requests