Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcudnn.so* not found #20

Closed
robotsorcerer opened this issue Nov 28, 2015 · 8 comments
Closed

libcudnn.so* not found #20

robotsorcerer opened this issue Nov 28, 2015 · 8 comments

Comments

@robotsorcerer
Copy link

Thanks for this great code. I tried to follow your README instructions as religiously as possible. When I tried running the eval.lua script, I came up with

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

But I have libcudnn.so* files installed as locate libcudnn gives

/home/User/Documents/cuda/lib64/libcudnn.so
/home/User/Documents/cuda/lib64/libcudnn.so.7.0
/home/User/Documents/cuda/lib64/libcudnn.so.7.0.64
/home/User/Documents/cuda/lib64/libcudnn_static.a

So I export this path to my LD_LIBRARY_PATH as in

export LD_LIBRARY_PATH=/home/User/cuda:${LD_LIBRARY_PATH}

When I echo $LD_LIBRARY_PATH, I get

/home/User/cuda:/home/User/catkin_ws/devel/lib:/home/User/cuda/lib64:/home/User/cuda/lib64/home/User/catkin_ws/devel/lib:/home/User/catkin_ws/devel/lib/x86_64-linux-gnu:/opt/ros/indigo/lib/x86_64-linux-gnu:/usr/local/cuda-7.0/lib64:/opt/ros/indigo/lib

It appears libcudnn is now in the LD_LIBRARY_PATH. However, running again the eval script still produces

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

I'm sorry for the bother but would appreciate any help.

@robotsorcerer
Copy link
Author

Okay. So I manually copied the libcudnn.so* files to my cuda directory in /usr/local/cuda/lib64. The earlier error is gone but now I have

lex@lex:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:294: unknown object
stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/torch/File.lua:294: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:240: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
eval.lua:68: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

@karpathy
Copy link
Owner

Does your folder contain nonimage files? This could be an issue possibly.

@soumith
Copy link

soumith commented Nov 28, 2015

your torch is out of date

@robotsorcerer
Copy link
Author

@karpathy No.

@soumith , thanks. I found a way around it by rebuilding everything torch and lua dependencies from scratch.

I am adding what I changed here in case someone comes across the same problem.

I updated my torch package using the curl script from this site. Here's my luarocks list that are relevant to the needed dependencies:

lex@lex:~/Documents$ luarocks list

Installed rocks:

cudnn
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
cunn
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
cutorch
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
image
1.1.alpha-0 (installed) - /usr/local/lib/luarocks/rocks
nn
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
nngraph
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
nnx
0.1-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks
torch
scm-1 (installed) - /home/lex/torch/install/lib/luarocks/rocks

Alrighty, it appears I have all the dependencies @karpathy talked about in his readme.md file namely image, nn, nngraph, cutorch and cunn. Also, I have loadcaffe in /usr/local/lib/luarocks/rocks/loadcaffe/1.0-0/*.

BTW, I had torch-hd5 and h5py installed before so I did not bother touching those.

I deleted the neuraltalk2 folder I had earlier cloned. Someone on the issues page mentioned the validation images from MS COCO were .png files instead of .jpg files even though they had .jpg extensions. I was using those earlier so I got rid of them and got new .jpg files from my smartphone which I packed into one folder I called neuraltalk_images and put the model in a folder I called neuralmodel. I placed the model and images folder into my Documents folder. Running

th eval.lua -model ../neuralmodel/model_id1-501-1448236541.t7 -image_folder ../neuraltalk_images/ -num_images 10

gave me the sort of results I would expect:

DataLoaderRaw found 236 images
constructing clones inside the LanguageModel
cp "../neuraltalk_images/DSC03001.JPG" vis/imgs/img1.jpg
image 1: a close up of a person holding a red apple
evaluating performance... 1/10 (0.000000)
cp "../neuraltalk_images/DSC02819.JPG" vis/imgs/img2.jpg
image 2: a street sign on the side of the road
evaluating performance... 2/10 (0.000000)

Thanks to both of you! A small step for a man. A giant leap for deep learning :)

@gforge
Copy link

gforge commented Feb 13, 2016

After some debugging I would like to share my insights to a related libcudnn issue (since this is the top Google hit): Make sure that the cudnn-library matches your cuda version, i.e. you sometimes must reinstall the cudnn if you have updated the cuda-toolkit. After upgrading to cuDNN v4 require 'cudnn' works. The new error message is "libcudnn (R4) not found in library path." - the R4 being the obvious give-away that this bug is simply a version mismatch.

@soumith
Copy link

soumith commented Feb 13, 2016

@gforge this is the actual error message now, is that not what you see?
Also, if you have any suggestions on tweaking the error message to make things more clear, i'm happy to incorporate them.
https://github.com/soumith/cudnn.torch/blob/master/ffi.lua#L1279-L1282

@gforge
Copy link

gforge commented Feb 13, 2016

@soumith yes that is the message. I got led astray since most of the issues on the web are related to LD_LIBRARY_PATH not being properly set-up. If possible I think it would be useful to add a check whether 'libcudnn.so' exists and if so then perhaps change the message to:

error([['You seem to have an invalid libcudnn version, the software requires version 4 (R4) and libcudnn.so.4 or libcudnn.4.dylib are not found in your library path.
Please download and install CuDNN v4 from https://developer.nvidia.com/cuDNN. 
]])

The original error message is probably fine but due to its similarity with the previous issue it can cause googling issues - if I've actually read the details in the error before putting into my google search bar I would probably have saved two hours of frustration. It is hard to understand why the NVidia people decided that the CuDNN-library could not be distributed together with the toolkit...

@BigeyeDestroyer
Copy link

maybe it's due to the version update of cuDNN, you can clean the content under ~/.theano/, then compile your codes again. A brilliant guy in my lab told me to do so, and it really works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants