-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sample.lua fails to run: error in function addmm() #21
Comments
This other tool works fine however:
|
Did you train one with GPU and the other with CPU? Check the "gpuid" flag. Is it "0" on both models? |
Yes, GPU on both. What's interesting is that using checkpoints created later on during the process do work as well as the very last one. |
@swisspol, I'm running into the same issue inside an iTorch notebook environment, but it works fine on standard command line. I'm very new to Lua / Torch but it would be good to figure out what's causing this. |
I had this error on OSX 10.10. Using -opencl 1 on both train.lua and sample.lua made it work. |
Had this problem on a c2.2xl instance. Tried -opencl 1 but no luck. |
@svickers can you post the fourth line of your output, the one that has 'Linear.lua:46' in it. Edit: and also the results of |
@hughperkins Sorry Hugh! I killed that vm and went to g2.2xl and everything worked straight away. |
@svickers oh, nice! Hmmm, c2s dont actually have a gpu, right? g2 sounds like a gpu instance? |
Hello. I ran into the same problem, and is dealing with a lot of frustration in my holy quest of running tests with a Monty Python Flying Circus corpus :). The error appears too when trying the tiny shakespeare data set. I run CPU-only computing (no GPU). The training goes well, no NaN value :
The sampling raises an error whatever the checkpoint file used :
And the inspect method looks ok
|
This is a silly bug I think I introduced only few days ago unfortunately. Fixing... |
Ok I think I patched this issue with this commit: see if things work properly now with the new sampling script. The issue is that CPU models use doubles, but when I was converting GPU models I converted them to float() and then changed the sampling script to use float(), which broke previous CPU-only functionality. Sorry about the mess, when I was originally designing this code I always use GPUs and I didn't anticipate the conversion issues and that training on CPU or converting GPU->CPU would be a common use case. |
Looks like it works. Thank you and may my cat bless you, m'lord! |
Fixed it for me, too. Thanks @karpathy! |
Great job @karpathy! Had the same issue and now it works perfectly. |
Just trying out the default data set:
And then using a checkpoint file:
The text was updated successfully, but these errors were encountered: