Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU ISSUE: OneHot.lua:17: attempt to call method 'long' (a nil value) #2

Closed
mschonwe opened this issue May 21, 2015 · 6 comments
Closed

Comments

@mschonwe
Copy link

Training proceeds fine on CPU ("-gpuid -1"), but errors as follows on GPU ("-gpuid 0"):

th train.lua -data_dir data/tinyshakespeare

using CUDA on GPU 0...
loading data files...
cutting off end of data so that the batches/sequences divide evenly
reshaping tensor...
data load done. Number of batches in train: 211, val: 11, test: 1
vocab size: 65
creating an LSTM with 2 layers
number of parameters in the model: 154165
cloning criterion
cloning softmax
cloning embed
cloning rnn
/home/username/torch/install/bin/luajit: ./util/OneHot.lua:17: attempt to call method 'long' (a nil value
stack traceback:
   ./util/OneHot.lua:17: in function 'forward'
   train.lua:172: in function 'opfunc'
   /home/username/torch/install/share/lua/5.1/optim/rmsprop.lua:36: in function 'rmsprop'
   train.lua:226: in main chunk
   [C]: in function 'dofile'
   .../username/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
   [C]: at 0x00406640

@karpathy
Copy link
Owner

hmmm, I'm not exactly sure what could be causing this. What GPU do you have? Is this a recent install of Torch and its dependencies? Is CUDA up to date?

@mschonwe
Copy link
Author

GPU is a GTX 980
Cuda 7.0

I ran a git pull in torch, and 'update.sh' before posting the issue. I'm not sure about 'its dependancies', but I'll start tracking that down.

Most recently I've been working on the neon code / NervanaGPU, which I got running on the GPU, but maybe this introduced a conflict.

I should also say, thanks so much for posting this project!

@mschonwe mschonwe reopened this May 21, 2015
@karpathy
Copy link
Owner

Can you try remove the call to long()? Or replace it with int() or short() ?

@hsheil
Copy link

hsheil commented May 21, 2015

I also got an Optim / rmsprop error (train.lua:226: attempt to call field 'rmsprop' (a nil value)) when training.

Pulling and rebuilding latest torch fixed this for me. I guess I was running a build of torch locally that was older than Mar 23rd when rmsprop was exported via init.lua. Can confirm that the code runs fine both on GPU and CPU after I upgraded to latest torch.

@karpathy
Copy link
Owner

@hsheil awesome thank you!

@mschonwe
Copy link
Author

Working now :)
Running torch/install.sh did the trick. (Looks like I also had some file permissions set as root, so I fixed those too -- perhaps this is why torch/update.sh didn't get it going.)
Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants