-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unimplemented: DNN library is not found. #4920
Comments
That means that CuDNN is not in your library path. Can you try adding your CUDA lib path to |
This issue seems related google-deepmind/dm-haiku#83, perhaps something recently has changed? |
About 5 months ago (a141cc6) we switched how we link GPU libraries to be the same as TensorFlow, namely, we use I suspect you would see the exact same behavior with I also suspect if you set I agree the error message isn't very helpful; we should probably fix that. |
That does get rid of that error (although some other issues still remain in google-deepmind/dm-haiku#83). It'd be great if the cuDNN dependency is documented in more details in the installation guide. The cuda bit is clear with the symbolic link and env variable but I didn't know about cuDNN. |
After
Still outputs the same error |
Tensorflow 2.3 works perfect, no error. |
Do you mean upgrading to 2.3 or downgrading to 2.3? |
After creating a new conda enviroment, and installing tensorflow-gpu==2.3 using pip, there's no error with cuda or tensorflow and can train succesfully. However jax still fails. |
https://groups.google.com/a/tensorflow.org/g/discuss/c/TiWgve-KERo/m/NgUohfTiAgAJ ^ I think this thread in TF is relevant too. |
Looks like this may have been fixed in 0.2.6. @milmor can you confirm? |
The issue has not been fixed in 0.2.6. I found that although jaxlib .whl is for cuda111, jax version is 0.2.6 and it has been installed with:
it seems that is looking for a different cuda version as shown in the following:
|
Thank you for your comments on this thread. I met the similar problem and I solved it after manually install CuDNN. You could refer to this official guide for installing it. I agree that the dependency of CuDNN library could be introduced in the new version's install instruction. |
Hi everyone, I'm having a similar issue. I also get the However, I see a different error suggesting that CuDNN was loaded:
Any pointers? |
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64 added to my .bashrc based on: but error stays when using CNN module from stax or from haiku RuntimeError: Unimplemented: DNN library is not found.: while running replica 0 and partition 0 of a replicated computation (other replicas may have failed as well). |
I am also having the same issue. I can confirm that my LD_LIBRARY is correctly configured. |
I am also having the same issue. I can confirm that my LD_LIBRARY is correctly configured. I pointed LD_LIBRARY_PATH to CUDA path, and there is |
I was also getting this error. I don't know the details of what was happening, but the issue for me seemed to stem from JAX and Tensorflow not sharing the GPU nicely. When I added this code snippet to the top of my code it seems to run (taken from the Flax MNIST example):
The comment suggests this is a known issue, but a quick google only brings up an old closed issue #120. I don't get the same issue running the same code on Colab (without the above snippet), so it may be particular to my machine's configuration. Edit: Ah I now see there is a whole page on this in the JAX documentation. Would be very useful if JAX could detect this issue and give a helpful error message. Based on the "DNN library not found" error I went down the rabbit hole of thinking I had the wrong version of cuda/cudnn. |
I also believe this is due to wrong version of cuda/cudnn. I was able to overcome this issue by recreating the conda environment . |
①under cuda 11.2 install cudnn>8.2 |
I was able to solve this problem by adding these 4 lines of code at the head of the file: import os
os.environ['XLA_PYTHON_CLIENT_PREALLOCATE'] ='false'
os.environ['XLA_PYTHON_CLIENT_ALLOCATOR']='platform'
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' |
What worked for me: Check if LD_LIBRARY_PATH is empty: Finally |
@milmor is this resolved now? |
fixed for me
|
Still broken.
|
Arrived here after googling, running in to the same error with the The comment from @gabehope helped me resolve my problem. Specifically, I was running both Tensorflow and JAX in the same script and, presumably, they were both fighting for GPU memory. For reference, here's the (quite helpful!) page on memory allocation with JAX. It would be helpful if there were some way for the error to better indicate that it's a memory issue, though it sounds like for others it may be a different problem than what Gabe and I were running into. |
I sol
I solved by doing this, thanks a lot ! |
We've added an FAQ section addressing various CUDA library loading issues and solutions/workarounds, and have (hopefully) made it easier to find by including it in some error messages that often correlate to these memory starvation issues. I'm going to go ahead and close this specific issue, since the FAQ documentation should provide proper workarounds. |
Working on local GPU RTX 2060 super, Cuda 11.1, and got this error.
jax has been installed successfully with the following
and symlink
jax outputs the gpu with
and do math stuff like
however still can't train the model
The text was updated successfully, but these errors were encountered: