Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/rnn/SequencerCriterion.lua:42: expecting target table #31

Closed
svenwoldt opened this issue Jun 5, 2016 · 38 comments
Closed

/rnn/SequencerCriterion.lua:42: expecting target table #31

svenwoldt opened this issue Jun 5, 2016 · 38 comments

Comments

@svenwoldt
Copy link

Hello,
I tried to train convo with some variations of the following:

$ th train.lua --opencl --dataset 5000 --hiddenSize 100

I am running with opencl (ATI Radeon) or plain CPU, both causing the following error:

libthclnn_searchpath /home/jack/torch-cl/install/lib/lua/5.1/libTHCLNN.so
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...

Dataset stats:
Vocabulary size: 10682
Examples: 15877
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Oland

-- Epoch 1 / 50

/home/jack/torch-cl/install/bin/luajit: ...k/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
[C]: in function 'assert'
...k/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
./seq2seq.lua:80: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...k/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d30

Would you know what is the issue here. Unfortunately I am quite new to Lua

@mtanana
Copy link

mtanana commented Jun 5, 2016

Had the same issue. It happened right after I updated a couple of modules (including RNN)

Trying to work through it now...

@mtanana
Copy link

mtanana commented Jun 5, 2016

The issue seems to be the decoder is outputting a table of tensors instead of a tensor and the criterion wants just tables or just tensors. The weird thing is that in the example on the RNN (encoder decoder example) page, the decoder correctly outputs the tensor from the decoder, even though the example is basically the same model as this. This is a tough one!

@coreybobco
Copy link

I had the same problem.

This was broken by this commit to the rnn dependency--Element-Research/rnn@a1373c4aaf8c3a40c41332b35559fd77a64b815b--so if you checkout rnn directly from Github, revert to the commit before this one (git checkout 14aff64132aa90339b6d510604a2b090f6509300 ), and then copy and paste all the .lua files from your rnn checkout directory to the torch rnn dependency subdirectory (in your case /torch/install/share/lua/5.1/rnn/ ) it will work.

@mtanana
Copy link

mtanana commented Jun 6, 2016

I got it working again with the newer commit, but I had to rewrite the input data so that it is seqlength x 1 tensor as opposed to just a sequencelength tensor. The code is something like this:

torch.Tensor({table_of_wordids}).t()

I also integrated some of the newer features from the rnn example.

I'm testing it now...if it works I'll post the fork (feel free to message me if you'd like to join in the testing fun)

As a side note- I'm also rewriting the import script...it can't handle files larger than the LUAJIT vm size (there are tricks to get around this) and there is a bug if you try to shrink the vocab size

@kenkit
Copy link

kenkit commented Jun 6, 2016

cool, i've just been training on my cpu 2.7ghz without a gpu which is slow even when u limit -dataset 5--hiddenSize 5.
It takes 2hrs to finish 1 epoch.
But got impatient at waiting.
And got this

screenshot

which means I need a good gpu.

@mtanana
Copy link

mtanana commented Jun 6, 2016

Yeah- it takes a long time to run these models without a good GPU. The smallest decent se2seq model in the literature is 1 layer of 1000 units...so maybe its time to buy yourself an early christmas present!

@kenkit
Copy link

kenkit commented Jun 7, 2016

By the way, am planning on creating a neural net that predicts matches after training it on previous games data. I don't know how hard this will be as a beginner in neural nets, since the maths involved is way beyond me.
Otherwise I'm not such a bad programmer, If you can hint me on how to go about it(some math topics), that would be great.

@svenwoldt
Copy link
Author

Sorry I dug into the suggestions extensively now. For the first option to downgrade rnn. I suppose it is dependant on changes to nn as well and that you need to checkout rnn together with nn before the last commit to this repository to make it work.

The second option is not clear to me either could you be somewhat more specific regarding your implementation mtanana?

@mtanana
Copy link

mtanana commented Jun 7, 2016

So I don't know where the change came from, but the long and the short is that if you just change the inputs from a 1 dimensional tensor to a 2 dimensional tensor, it fixes things.

So any place where you see torch.Tensor(table) you change it to torch.Tensor({table}):t(), and fix some of the outputs, it works.....

If you guys can hold on a couple of days I can post my fork that has adaGrad, runs a test set every few runs and can do multilayer LSTM's

@svenwoldt
Copy link
Author

Can't wait for it!!!

@vikram-gupta
Copy link

vikram-gupta commented Jun 12, 2016

@mtanana Can't wait for the change !! Let us know when we can play with it :)

@mtanana
Copy link

mtanana commented Jun 13, 2016

It's in the testing stage right now. Trying it with 9m talk turns from opensubs. I had some success with 200k talk turns.
2 layers 1000 LSTM cells per layer, adagrad and 4 epochs:

Hi : Hello.

What is your name : What happened to you.

How old are you : Thirteen

What is the meaning of life : The getman is a sin of the earth.

Do you like swimming : I have to go to bed

It's been a long day : Like a woman, a star

goodbye : Good evening, maam.

@mtanana
Copy link

mtanana commented Jun 13, 2016

If anyone wants to play with it while the bugs are still being fixed, I'd be happy to put it up now. But I should warn that the code has been going through big changes on a daily basis

@vikram-gupta
Copy link

Hey @mtanana, the results are looking nice. I would like to try it out. Can you please put it up (with a warning though) ?

@mtanana
Copy link

mtanana commented Jun 13, 2016

sure....

https://github.com/mtanana/torchneuralconvo

Let me know if you hit any bugs- I won't be able to handle a lot of pull requests until it's stabilized though

@vikram-gupta
Copy link

thanks @mtanana .. This is great !

Will keep you posted with any bugs that i encounter.

@vikram-gupta
Copy link

vikram-gupta commented Jun 13, 2016

Hi @macournoyer @chenb67

I checked out the 13th May commit and it works perfectly without any changes. However, the latest code breaks. I think the issue might have got introduced in our code after the 13th May commits. Any suggestions on how to run the updated code?

@chenb67
Copy link
Contributor

chenb67 commented Jun 16, 2016

Hi @vikram-gupta,
What error are you getting?
Did you try updating torch + rnn package (we are using the new nn.Select)

if not try
luarocks install torch
luarocks install nn
luarocks install rnn

@vikram-gupta
Copy link

vikram-gupta commented Jun 16, 2016

Thanks @chenb67, it worked after the updates but i am getting out of memory error on the same dataset now.

I am using the following command - th train.lua --cuda --dataset 50000 --hiddenSize 1000

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2235/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
/home/ubuntu/DeepLearning/torch/install/bin/lua: ...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:458: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2235/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'resizeAs'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:458: in function 'momentumGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:485: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
./seq2seq.lua:81: in function 'train'
train.lua:88: in main chunk
[C]: in function 'dofile'
...ning/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?

@chenb67
Copy link
Contributor

chenb67 commented Jun 16, 2016

@vikram-gupta The default now is using a batch size of 10, before it was 1.
I suggest trying to decrease --hiddenSize or use a smaller --batchSize.
try to find a balance between speed and size(batchSize 1 is very slow)

@vikram-gupta
Copy link

Thanks @chenb67

Are you aware of any fundamental reasons/solutions for the out of memory issue, as the google papers mention that they are using 1000 hidden units and that also with a much bigger batch size and with 4 layers.

Do we need to use a better machine than amazon ec2 instance with 4GB Nvidia GPU ? Are you able to run this on your machines with 1000 hidden units and 10 batch size? If yes, what configurations are you using?

@vikram-gupta
Copy link

@chenb67 Correct me i am wrong but batch size would also impact the quality of models. With batch size as 1, we are doing online training which may be noisy !

@chenb67
Copy link
Contributor

chenb67 commented Jun 16, 2016

The google paper use number(I think 8..) of GPUs.

A big issue with the current settings is the vocabulary size which makes
the softmax layer huge and due to bug we can't control the vocabulary size.
I'm working on it and will submit a PR soon.
On יום ה׳, 16 ביוני 2016 at 13:26 vikram-gupta notifications@github.com
wrote:

Thanks @chenb67 https://github.com/chenb67

Are you aware of any fundamental reasons/solutions for the out of memory
issue, as the google papers mention that they are using 1000 hidden units
and that also with a much bigger batch size and with 4 layers.

Do we need to use a better machine than amazon ec2 instance with 4GB
Nvidia GPU ? Are you able to run this on your machines with 1000 hidden
units and 10 batch size? If yes, what configurations are you using?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#31 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AF0ptT47DdxjP6y2ZFAnqmD7QHS6aYcLks5qMSTpgaJpZM4IuUmb
.

@mtanana
Copy link

mtanana commented Jun 16, 2016

Hey all-

I've got a fork that can control vocab size

https://github.com/mtanana/torchneuralconvo

It seems to be working ok at this point but let me know

@kenkit
Copy link

kenkit commented Jun 23, 2016

If anyone can visualize the seq2seq model with nngraph, please postback the graph + the code used to achieve that, I've tried but I can't get it to display anything.
Only when you do print(model)I get
neuralconvo.Seq2Seq sorry but am new to this stuff.
Thanks in advance.

@mtanana
Copy link

mtanana commented Jun 29, 2016

ok...confirmed, this sucker is working:

https://github.com/mtanana/torchneuralconvo

if you're working with these models, ada-grad does wonders. The multilayer LSTM is nice as well. I've added a fixed vocab size and train/test splits.

In the next month or so I'm planning to add beam search to the decoding.

Let me know if anyone tries this and whether you see any issues

@vikram-gupta
Copy link

Great work @mtanana !

Would be great if you can share some nice replies from the bot or the perplexity?

@macournoyer
Copy link
Owner

@mtanana I'm also curious about your results. Could you share a sample conversation?

Also, why ada-grad?

@mtanana
Copy link

mtanana commented Jun 30, 2016

I had a conversation earlier in this thread from a set of 200k examples. Right now I'm on the second epoch of a 9m example set, so it has a little ways to go (earlier on these models end up answering 'I don't know ' to a lot of prompts...) I'll share the results when I get another epoch or two down the road.

On perplexity- this is pretty dataset and vocab size specific. But on my current set (9 million movie examples + 300k of a domain specific dataset and 30k vocab size) I have a perplexity of 7.46 at epoch 1.5 on the test set. But, this won't be comparable to other datasets or the same dataset with a different vocab size.

Ada-grad is pretty awesome. It has made drastic improvements on basically every NLP problem I've ever put it to. The reason is that it makes SGD updates that are inversely proportional to the running gradient total for that parameter. In other words, if you have seen the word "I" 1,000 times, the updates to the parameters associated with this word are much smaller than a new word that you've never seen before. And it is so simple to implement. (Duchi's paper looks messy, but the algorithm is crazy simple...and torch can do it for you if you use the optim package). Many people think this should now just be the default for SGD type problems, because it isn't any harder to implement than momentum. I've even found it improves the solutions to sparse Max-Ent models!

@macournoyer
Copy link
Owner

Interesting. I'll take a look at ada-grad. Thx @mtanana.

Can't wait to see your results!

@wazzy
Copy link

wazzy commented Jul 4, 2016

I followed thread and still I am getting the same issue

th train.lua --dataset 50000.0 --hiddenSize 1000

-- Loading dataset  
Loading vocabulary from data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 25931    
         Examples: 83632    

-- Epoch 1 / 50 

/home/wasim/torch/install/bin/luajit: ...m/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
    [C]: in function 'assert'
    ...m/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
    ./seq2seq.lua:74: in function 'train'
    train.lua:85: in main chunk
    [C]: in function 'dofile'
    ...asim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

I tried updating via luarocks, I tried reinstalling torch but no luck. Please help me.

I am trying out https://github.com/mtanana/torchneuralconvo and it is training so what is wrong I am not getting

@felixzfx
Copy link

felixzfx commented Jul 4, 2016

I also met this problem..now i fixed that..
I found it's because the SequencerCriterion only accepts table
as in the source code of SequencerCriterion.lua, most functions use code like this:
for i,input in ipairs(inputTable) do
...
end

So, currently, you have to input data in forms of table(not tensor) into the SequencerCriterion class

@wazzy
Copy link

wazzy commented Jul 5, 2016

When I converted tensor to table in lua

      local inputTable = torch.totable(input)
      print(inputTable)
      local err = model:train(inputTable, target)

I am getting this error

/home/wasim/torch/install/bin/luajit: /home/wasim/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 1 module of nn.Sequential:
/home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:59: attempt to call method 'isContiguous' (a nil value)
stack traceback:
    /home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:59: in function 'makeInputContiguous'
    /home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:71: in function </home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:68>
    [C]: in function 'xpcall'
    /home/wasim/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/wasim/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:71: in function 'train'
    train.lua:88: in main chunk
    [C]: in function 'dofile'
    ...asim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    /home/wasim/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    /home/wasim/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:71: in function 'train'
    train.lua:88: in main chunk
    [C]: in function 'dofile'
    ...asim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

@macournoyer
Copy link
Owner

I think original problem was resolved by updating nn and rnn:

   luarocks install nn
   luarocks install rnn

If not, feel free to re-open.

@kenkit
Copy link

kenkit commented Jul 13, 2016

Hi again,
Is it possible to resume training to save time incase on interruption ?

@macournoyer
Copy link
Owner

@kenkit not atm, but it could be an option to load the model like in eval.th and resume training on it instead of starting from scratch. Shouldn't be too hard to implement.

@PicassoCT
Copy link

Hi,
i did update as recommended the packages nn and rnn.
It still results in this error:

/home/picassoct/torch/install/bin/luajit: ...t/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:47: expecting target table stack traceback: [C]: in function 'assert' ...t/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:47: in function 'forward' ./seq2seq.lua:74: in function 'train' train.lua:86: in main chunk [C]: in function 'dofile' ...soct/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50

Can we have this reopened?

@kenkit
Copy link

kenkit commented Mar 7, 2017

@PicassoCT That means that lua is not able to accesss a file from the main script, make sure all your files exist and the names are ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants