libcudnn.so* not found #20

robotsorcerer · 2015-11-28T11:28:23Z

Thanks for this great code. I tried to follow your README instructions as religiously as possible. When I tried running the eval.lua script, I came up with

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

But I have libcudnn.so* files installed as locate libcudnn gives

/home/User/Documents/cuda/lib64/libcudnn.so
/home/User/Documents/cuda/lib64/libcudnn.so.7.0
/home/User/Documents/cuda/lib64/libcudnn.so.7.0.64
/home/User/Documents/cuda/lib64/libcudnn_static.a

So I export this path to my LD_LIBRARY_PATH as in

export LD_LIBRARY_PATH=/home/User/cuda:${LD_LIBRARY_PATH}

When I echo $LD_LIBRARY_PATH, I get

/home/User/cuda:/home/User/catkin_ws/devel/lib:/home/User/cuda/lib64:/home/User/cuda/lib64/home/User/catkin_ws/devel/lib:/home/User/catkin_ws/devel/lib/x86_64-linux-gnu:/opt/ros/indigo/lib/x86_64-linux-gnu:/usr/local/cuda-7.0/lib64:/opt/ros/indigo/lib

It appears libcudnn is now in the LD_LIBRARY_PATH. However, running again the eval script still produces

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

I'm sorry for the bother but would appreciate any help.

The text was updated successfully, but these errors were encountered:

robotsorcerer · 2015-11-28T11:54:52Z

Okay. So I manually copied the libcudnn.so* files to my cuda directory in /usr/local/cuda/lib64. The earlier error is gone but now I have

lex@lex:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:294: unknown object
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:240: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
eval.lua:68: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

karpathy · 2015-11-28T18:09:56Z

Does your folder contain nonimage files? This could be an issue possibly.

soumith · 2015-11-28T18:11:28Z

your torch is out of date

robotsorcerer · 2015-11-29T03:41:14Z

@karpathy No.

@soumith , thanks. I found a way around it by rebuilding everything torch and lua dependencies from scratch.

I am adding what I changed here in case someone comes across the same problem.

I updated my torch package using the curl script from this site. Here's my luarocks list that are relevant to the needed dependencies:

lex@lex:~/Documents$ luarocks list

Installed rocks:

cudnn
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
cunn
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
cutorch
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
image
1.1.alpha-0 (installed) - /usr/local/lib/luarocks/rocks
nn
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
nngraph
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
nnx
0.1-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
torch
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks

Alrighty, it appears I have all the dependencies @karpathy talked about in his readme.md file namely image, nn, nngraph, cutorch and cunn. Also, I have loadcaffe in /usr/local/lib/luarocks/rocks/loadcaffe/1.0-0/*.

BTW, I had torch-hd5 and h5py installed before so I did not bother touching those.

I deleted the neuraltalk2 folder I had earlier cloned. Someone on the issues page mentioned the validation images from MS COCO were .png files instead of .jpg files even though they had .jpg extensions. I was using those earlier so I got rid of them and got new .jpg files from my smartphone which I packed into one folder I called neuraltalk_images and put the model in a folder I called neuralmodel. I placed the model and images folder into my Documents folder. Running

th eval.lua -model ../neuralmodel/model_id1-501-1448236541.t7 -image_folder ../neuraltalk_images/ -num_images 10

gave me the sort of results I would expect:

DataLoaderRaw found 236 images
constructing clones inside the LanguageModel
cp "../neuraltalk_images/DSC03001.JPG" vis/imgs/img1.jpg
image 1: a close up of a person holding a red apple
evaluating performance... 1/10 (0.000000)
cp "../neuraltalk_images/DSC02819.JPG" vis/imgs/img2.jpg
image 2: a street sign on the side of the road
evaluating performance... 2/10 (0.000000)

Thanks to both of you! A small step for a man. A giant leap for deep learning :)

gforge · 2016-02-13T18:33:36Z

After some debugging I would like to share my insights to a related libcudnn issue (since this is the top Google hit): Make sure that the cudnn-library matches your cuda version, i.e. you sometimes must reinstall the cudnn if you have updated the cuda-toolkit. After upgrading to cuDNN v4 require 'cudnn' works. The new error message is "libcudnn (R4) not found in library path." - the R4 being the obvious give-away that this bug is simply a version mismatch.

soumith · 2016-02-13T18:35:58Z

@gforge this is the actual error message now, is that not what you see?
Also, if you have any suggestions on tweaking the error message to make things more clear, i'm happy to incorporate them.
https://github.com/soumith/cudnn.torch/blob/master/ffi.lua#L1279-L1282

gforge · 2016-02-13T18:51:05Z

@soumith yes that is the message. I got led astray since most of the issues on the web are related to LD_LIBRARY_PATH not being properly set-up. If possible I think it would be useful to add a check whether 'libcudnn.so' exists and if so then perhaps change the message to:

error([['You seem to have an invalid libcudnn version, the software requires version 4 (R4) and libcudnn.so.4 or libcudnn.4.dylib are not found in your library path.
Please download and install CuDNN v4 from https://developer.nvidia.com/cuDNN. 
]])

The original error message is probably fine but due to its similarity with the previous issue it can cause googling issues - if I've actually read the details in the error before putting into my google search bar I would probably have saved two hours of frustration. It is hard to understand why the NVidia people decided that the CuDNN-library could not be distributed together with the toolkit...

BigeyeDestroyer · 2016-06-20T16:38:29Z

maybe it's due to the version update of cuDNN, you can clean the content under ~/.theano/, then compile your codes again. A brilliant guy in my lab told me to do so, and it really works.

robotsorcerer closed this as completed Nov 29, 2015

ashwinnair14 mentioned this issue Dec 10, 2015

Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libcudnn.so* not found #20

libcudnn.so* not found #20

robotsorcerer commented Nov 28, 2015

robotsorcerer commented Nov 28, 2015

karpathy commented Nov 28, 2015

soumith commented Nov 28, 2015

robotsorcerer commented Nov 29, 2015

Installed rocks:

gforge commented Feb 13, 2016

soumith commented Feb 13, 2016

gforge commented Feb 13, 2016

BigeyeDestroyer commented Jun 20, 2016

libcudnn.so* not found #20

libcudnn.so* not found #20

Comments

robotsorcerer commented Nov 28, 2015

robotsorcerer commented Nov 28, 2015

karpathy commented Nov 28, 2015

soumith commented Nov 28, 2015

robotsorcerer commented Nov 29, 2015

Installed rocks:

gforge commented Feb 13, 2016

soumith commented Feb 13, 2016

gforge commented Feb 13, 2016

BigeyeDestroyer commented Jun 20, 2016