/rnn/SequencerCriterion.lua:42: expecting target table #31

svenwoldt · 2016-06-05T10:01:11Z

Hello,
I tried to train convo with some variations of the following:

$ th train.lua --opencl --dataset 5000 --hiddenSize 100

I am running with opencl (ATI Radeon) or plain CPU, both causing the following error:

libthclnn_searchpath /home/jack/torch-cl/install/lib/lua/5.1/libTHCLNN.so
-- Loading dataset
Loading vocabulary from data/vocab.t7 ...

Dataset stats:
Vocabulary size: 10682
Examples: 15877
Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
Using OpenCL device: Oland

-- Epoch 1 / 50

/home/jack/torch-cl/install/bin/luajit: ...k/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
[C]: in function 'assert'
...k/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
./seq2seq.lua:80: in function 'train'
train.lua:85: in main chunk
[C]: in function 'dofile'
...k/torch-cl/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d30

Would you know what is the issue here. Unfortunately I am quite new to Lua

mtanana · 2016-06-05T18:43:58Z

Had the same issue. It happened right after I updated a couple of modules (including RNN)

Trying to work through it now...

mtanana · 2016-06-05T18:46:13Z

The issue seems to be the decoder is outputting a table of tensors instead of a tensor and the criterion wants just tables or just tensors. The weird thing is that in the example on the RNN (encoder decoder example) page, the decoder correctly outputs the tensor from the decoder, even though the example is basically the same model as this. This is a tough one!

coreybobco · 2016-06-06T03:51:37Z

I had the same problem.

This was broken by this commit to the rnn dependency--Element-Research/rnn@a1373c4aaf8c3a40c41332b35559fd77a64b815b--so if you checkout rnn directly from Github, revert to the commit before this one (git checkout 14aff64132aa90339b6d510604a2b090f6509300 ), and then copy and paste all the .lua files from your rnn checkout directory to the torch rnn dependency subdirectory (in your case /torch/install/share/lua/5.1/rnn/ ) it will work.

mtanana · 2016-06-06T10:47:16Z

I got it working again with the newer commit, but I had to rewrite the input data so that it is seqlength x 1 tensor as opposed to just a sequencelength tensor. The code is something like this:

torch.Tensor({table_of_wordids}).t()

I also integrated some of the newer features from the rnn example.

I'm testing it now...if it works I'll post the fork (feel free to message me if you'd like to join in the testing fun)

As a side note- I'm also rewriting the import script...it can't handle files larger than the LUAJIT vm size (there are tricks to get around this) and there is a bug if you try to shrink the vocab size

kenkit · 2016-06-06T12:15:01Z

cool, i've just been training on my cpu 2.7ghz without a gpu which is slow even when u limit -dataset 5--hiddenSize 5.
It takes 2hrs to finish 1 epoch.
But got impatient at waiting.
And got this

which means I need a good gpu.

mtanana · 2016-06-06T12:56:55Z

Yeah- it takes a long time to run these models without a good GPU. The smallest decent se2seq model in the literature is 1 layer of 1000 units...so maybe its time to buy yourself an early christmas present!

kenkit · 2016-06-07T12:12:22Z

By the way, am planning on creating a neural net that predicts matches after training it on previous games data. I don't know how hard this will be as a beginner in neural nets, since the maths involved is way beyond me.
Otherwise I'm not such a bad programmer, If you can hint me on how to go about it(some math topics), that would be great.

svenwoldt · 2016-06-07T19:26:50Z

Sorry I dug into the suggestions extensively now. For the first option to downgrade rnn. I suppose it is dependant on changes to nn as well and that you need to checkout rnn together with nn before the last commit to this repository to make it work.

The second option is not clear to me either could you be somewhat more specific regarding your implementation mtanana?

mtanana · 2016-06-07T19:41:14Z

So I don't know where the change came from, but the long and the short is that if you just change the inputs from a 1 dimensional tensor to a 2 dimensional tensor, it fixes things.

So any place where you see torch.Tensor(table) you change it to torch.Tensor({table}):t(), and fix some of the outputs, it works.....

If you guys can hold on a couple of days I can post my fork that has adaGrad, runs a test set every few runs and can do multilayer LSTM's

svenwoldt · 2016-06-09T13:48:22Z

Can't wait for it!!!

vikram-gupta · 2016-06-12T20:06:12Z

@mtanana Can't wait for the change !! Let us know when we can play with it :)

mtanana · 2016-06-13T03:17:28Z

It's in the testing stage right now. Trying it with 9m talk turns from opensubs. I had some success with 200k talk turns.
2 layers 1000 LSTM cells per layer, adagrad and 4 epochs:

Hi : Hello.

What is your name : What happened to you.

How old are you : Thirteen

What is the meaning of life : The getman is a sin of the earth.

Do you like swimming : I have to go to bed

It's been a long day : Like a woman, a star

goodbye : Good evening, maam.

mtanana · 2016-06-13T03:19:16Z

If anyone wants to play with it while the bugs are still being fixed, I'd be happy to put it up now. But I should warn that the code has been going through big changes on a daily basis

vikram-gupta · 2016-06-13T03:50:13Z

Hey @mtanana, the results are looking nice. I would like to try it out. Can you please put it up (with a warning though) ?

mtanana · 2016-06-13T04:22:12Z

sure....

https://github.com/mtanana/torchneuralconvo

Let me know if you hit any bugs- I won't be able to handle a lot of pull requests until it's stabilized though

vikram-gupta · 2016-06-13T04:31:16Z

thanks @mtanana .. This is great !

Will keep you posted with any bugs that i encounter.

vikram-gupta · 2016-06-13T12:18:58Z

Hi @macournoyer @chenb67

I checked out the 13th May commit and it works perfectly without any changes. However, the latest code breaks. I think the issue might have got introduced in our code after the 13th May commits. Any suggestions on how to run the updated code?

chenb67 · 2016-06-16T09:38:08Z

Hi @vikram-gupta,
What error are you getting?
Did you try updating torch + rnn package (we are using the new nn.Select)

if not try
luarocks install torch
luarocks install nn
luarocks install rnn

vikram-gupta · 2016-06-16T10:04:37Z

Thanks @chenb67, it worked after the updates but i am getting out of memory error on the same dataset now.

I am using the following command - th train.lua --cuda --dataset 50000 --hiddenSize 1000

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2235/cutorch/lib/THC/generic/THCStorage.cu line=41 error=2 : out of memory
/home/ubuntu/DeepLearning/torch/install/bin/lua: ...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:458: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2235/cutorch/lib/THC/generic/THCStorage.cu:41
stack traceback:
[C]: in function 'resizeAs'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:458: in function 'momentumGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:485: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
...DeepLearning/torch/install/share/lua/5.2/dpnn/Module.lua:478: in function 'updateGradParameters'
./seq2seq.lua:81: in function 'train'
train.lua:88: in main chunk
[C]: in function 'dofile'
...ning/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: in ?

chenb67 · 2016-06-16T10:18:42Z

@vikram-gupta The default now is using a batch size of 10, before it was 1.
I suggest trying to decrease --hiddenSize or use a smaller --batchSize.
try to find a balance between speed and size(batchSize 1 is very slow)

vikram-gupta · 2016-06-16T10:26:46Z

Thanks @chenb67

Are you aware of any fundamental reasons/solutions for the out of memory issue, as the google papers mention that they are using 1000 hidden units and that also with a much bigger batch size and with 4 layers.

Do we need to use a better machine than amazon ec2 instance with 4GB Nvidia GPU ? Are you able to run this on your machines with 1000 hidden units and 10 batch size? If yes, what configurations are you using?

vikram-gupta · 2016-06-16T10:41:49Z

@chenb67 Correct me i am wrong but batch size would also impact the quality of models. With batch size as 1, we are doing online training which may be noisy !

chenb67 · 2016-06-16T10:42:11Z

The google paper use number(I think 8..) of GPUs.

A big issue with the current settings is the vocabulary size which makes
the softmax layer huge and due to bug we can't control the vocabulary size.
I'm working on it and will submit a PR soon.
On יום ה׳, 16 ביוני 2016 at 13:26 vikram-gupta notifications@github.com
wrote:

Thanks @chenb67 https://github.com/chenb67

Are you aware of any fundamental reasons/solutions for the out of memory
issue, as the google papers mention that they are using 1000 hidden units
and that also with a much bigger batch size and with 4 layers.

Do we need to use a better machine than amazon ec2 instance with 4GB
Nvidia GPU ? Are you able to run this on your machines with 1000 hidden
units and 10 batch size? If yes, what configurations are you using?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#31 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AF0ptT47DdxjP6y2ZFAnqmD7QHS6aYcLks5qMSTpgaJpZM4IuUmb
.

mtanana · 2016-06-16T16:41:55Z

Hey all-

I've got a fork that can control vocab size

https://github.com/mtanana/torchneuralconvo

It seems to be working ok at this point but let me know

kenkit · 2016-06-23T07:12:57Z

If anyone can visualize the seq2seq model with nngraph, please postback the graph + the code used to achieve that, I've tried but I can't get it to display anything.
Only when you do print(model)I get
neuralconvo.Seq2Seq sorry but am new to this stuff.
Thanks in advance.

mtanana · 2016-06-29T02:22:18Z

ok...confirmed, this sucker is working:

https://github.com/mtanana/torchneuralconvo

if you're working with these models, ada-grad does wonders. The multilayer LSTM is nice as well. I've added a fixed vocab size and train/test splits.

In the next month or so I'm planning to add beam search to the decoding.

Let me know if anyone tries this and whether you see any issues

vikram-gupta · 2016-06-29T09:39:15Z

Great work @mtanana !

Would be great if you can share some nice replies from the bot or the perplexity?

macournoyer · 2016-06-30T02:07:50Z

@mtanana I'm also curious about your results. Could you share a sample conversation?

Also, why ada-grad?

mtanana · 2016-06-30T15:24:10Z

I had a conversation earlier in this thread from a set of 200k examples. Right now I'm on the second epoch of a 9m example set, so it has a little ways to go (earlier on these models end up answering 'I don't know ' to a lot of prompts...) I'll share the results when I get another epoch or two down the road.

On perplexity- this is pretty dataset and vocab size specific. But on my current set (9 million movie examples + 300k of a domain specific dataset and 30k vocab size) I have a perplexity of 7.46 at epoch 1.5 on the test set. But, this won't be comparable to other datasets or the same dataset with a different vocab size.

Ada-grad is pretty awesome. It has made drastic improvements on basically every NLP problem I've ever put it to. The reason is that it makes SGD updates that are inversely proportional to the running gradient total for that parameter. In other words, if you have seen the word "I" 1,000 times, the updates to the parameters associated with this word are much smaller than a new word that you've never seen before. And it is so simple to implement. (Duchi's paper looks messy, but the algorithm is crazy simple...and torch can do it for you if you use the optim package). Many people think this should now just be the default for SGD type problems, because it isn't any harder to implement than momentum. I've even found it improves the solutions to sparse Max-Ent models!

macournoyer · 2016-07-01T17:15:02Z

Interesting. I'll take a look at ada-grad. Thx @mtanana.

Can't wait to see your results!

wazzy · 2016-07-04T07:09:43Z

I followed thread and still I am getting the same issue

th train.lua --dataset 50000.0 --hiddenSize 1000

-- Loading dataset  
Loading vocabulary from data/vocab.t7 ...   

Dataset stats:  
  Vocabulary size: 25931    
         Examples: 83632    

-- Epoch 1 / 50 

/home/wasim/torch/install/bin/luajit: ...m/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: expecting target table
stack traceback:
    [C]: in function 'assert'
    ...m/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:42: in function 'forward'
    ./seq2seq.lua:74: in function 'train'
    train.lua:85: in main chunk
    [C]: in function 'dofile'
    ...asim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

I tried updating via luarocks, I tried reinstalling torch but no luck. Please help me.

I am trying out https://github.com/mtanana/torchneuralconvo and it is training so what is wrong I am not getting

felixzfx · 2016-07-04T22:41:21Z

I also met this problem..now i fixed that..
I found it's because the SequencerCriterion only accepts table
as in the source code of SequencerCriterion.lua, most functions use code like this:
for i,input in ipairs(inputTable) do
...
end

So, currently, you have to input data in forms of table(not tensor) into the SequencerCriterion class

wazzy · 2016-07-05T05:42:24Z

When I converted tensor to table in lua

      local inputTable = torch.totable(input)
      print(inputTable)
      local err = model:train(inputTable, target)

I am getting this error

/home/wasim/torch/install/bin/luajit: /home/wasim/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 1 module of nn.Sequential:
/home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:59: attempt to call method 'isContiguous' (a nil value)
stack traceback:
    /home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:59: in function 'makeInputContiguous'
    /home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:71: in function </home/wasim/torch/install/share/lua/5.1/nn/LookupTable.lua:68>
    [C]: in function 'xpcall'
    /home/wasim/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/wasim/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:71: in function 'train'
    train.lua:88: in main chunk
    [C]: in function 'dofile'
    ...asim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

WARNING: If you see a stack trace below, it doesn't point to the place where this error occured. Please use only the one above.
stack traceback:
    [C]: in function 'error'
    /home/wasim/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
    /home/wasim/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    ./seq2seq.lua:71: in function 'train'
    train.lua:88: in main chunk
    [C]: in function 'dofile'
    ...asim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00405d70

macournoyer · 2016-07-05T20:44:52Z

I think original problem was resolved by updating nn and rnn:

   luarocks install nn
   luarocks install rnn

If not, feel free to re-open.

kenkit · 2016-07-13T07:30:02Z

Hi again,
Is it possible to resume training to save time incase on interruption ?

macournoyer · 2016-07-13T13:47:04Z

@kenkit not atm, but it could be an option to load the model like in eval.th and resume training on it instead of starting from scratch. Shouldn't be too hard to implement.

PicassoCT · 2016-11-14T18:29:22Z

Hi,
i did update as recommended the packages nn and rnn.
It still results in this error:

/home/picassoct/torch/install/bin/luajit: ...t/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:47: expecting target table stack traceback: [C]: in function 'assert' ...t/torch/install/share/lua/5.1/rnn/SequencerCriterion.lua:47: in function 'forward' ./seq2seq.lua:74: in function 'train' train.lua:86: in main chunk [C]: in function 'dofile' ...soct/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x00405d50

Can we have this reopened?

kenkit · 2017-03-07T11:45:17Z

@PicassoCT That means that lua is not able to accesss a file from the main script, make sure all your files exist and the names are ok.

juharris mentioned this issue Jun 13, 2016

Killed llSourcell/Chatbot-AI#2

Open

macournoyer closed this as completed Jul 5, 2016

macournoyer mentioned this issue Jul 5, 2016

cuda rnn training failed on assert after if torch.isTensor(input) #28

Closed

domerin0 mentioned this issue Oct 24, 2016

error: erCriterion.lua:47: expecting target table domerin0/seq2seq-chatbot#19

Closed

/rnn/SequencerCriterion.lua:42: expecting target table #31

/rnn/SequencerCriterion.lua:42: expecting target table #31

Comments

svenwoldt commented Jun 5, 2016

mtanana commented Jun 5, 2016

mtanana commented Jun 5, 2016

coreybobco commented Jun 6, 2016

mtanana commented Jun 6, 2016

kenkit commented Jun 6, 2016 • edited Loading

mtanana commented Jun 6, 2016

kenkit commented Jun 7, 2016

svenwoldt commented Jun 7, 2016

mtanana commented Jun 7, 2016

svenwoldt commented Jun 9, 2016

vikram-gupta commented Jun 12, 2016 • edited Loading

mtanana commented Jun 13, 2016

mtanana commented Jun 13, 2016

vikram-gupta commented Jun 13, 2016

mtanana commented Jun 13, 2016

vikram-gupta commented Jun 13, 2016

vikram-gupta commented Jun 13, 2016 • edited Loading

chenb67 commented Jun 16, 2016

vikram-gupta commented Jun 16, 2016 • edited Loading

chenb67 commented Jun 16, 2016

vikram-gupta commented Jun 16, 2016

vikram-gupta commented Jun 16, 2016

chenb67 commented Jun 16, 2016

mtanana commented Jun 16, 2016

kenkit commented Jun 23, 2016

mtanana commented Jun 29, 2016

vikram-gupta commented Jun 29, 2016

macournoyer commented Jun 30, 2016

mtanana commented Jun 30, 2016

macournoyer commented Jul 1, 2016

wazzy commented Jul 4, 2016 • edited Loading

felixzfx commented Jul 4, 2016

wazzy commented Jul 5, 2016

macournoyer commented Jul 5, 2016

kenkit commented Jul 13, 2016

macournoyer commented Jul 13, 2016

PicassoCT commented Nov 14, 2016

kenkit commented Mar 7, 2017

kenkit commented Jun 6, 2016 •

edited

Loading

vikram-gupta commented Jun 12, 2016 •

edited

Loading

vikram-gupta commented Jun 13, 2016 •

edited

Loading

vikram-gupta commented Jun 16, 2016 •

edited

Loading

wazzy commented Jul 4, 2016 •

edited

Loading