-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence labelling with rnn #21
Comments
Hi @shuokay, thank you for your question. So if I understand correctly, you want to input a sequence of images (handwritten characters) into the LSTM to predict a sequence characters (using SoftMax)? |
@nicholas-leonard thanks for your reply. the input is a sequence of points that every stroke in one charater throughed, not images, and the output is ONE charater. (the 'online' means you can get every point in one stroke exactly while 'offline' means you can't get the points exactly, what you can get is only one image) |
For many to one problems, you can use this kind of architecture : rnn = nn.Sequential()
rnn:add(nn.Sequencer(nn.Recurrent(...)))
rnn:add(nn.SelectTable(-1))
rnn:add(nn.Linear(...))
rnn:add(nn.LogSoftMax()) So basically, input is a sequence of points (coordinates or whatever), output is log-likelihood of character. The key is the |
@nicholas-leonard thank you very much. Just one advice, Torch and its packages need more detailed documentation.^_^ |
Hi @nicholas-leonard , still get errors. My code like this: require 'dp'
require 'torch'
rho=5
hiddenSize=100
inputSize=2
outputSize=10
lstm=nn.LSTM(inputSize, hiddenSize,rho)
model = nn.Sequential()
model:add(lstm)
model:add(nn.SelectTable(-1))
model:add(nn.Linear(hiddenSize,outputSize))
model:add(nn.LogSoftMax())
criterion=nn.ClassNLLCriterion()
input=torch.Tensor({{0,0,1,1,2,2,3,3}}) --the input points in one stroke (0,0),(1,1),(2,2),(3,3)
output=torch.Tensor({1,0,0,0,0,0,0,0,0,0})
model:forward(input[1]) and it give errors:
|
require 'dp'
require 'torch'
hiddenSize=100
inputSize=2
outputSize=10
lstm=nn.LSTM(inputSize, hiddenSize)
model = nn.Sequential()
model:add(nn.SplitTable(1,2))
model:add(nn.Sequencer(lstm))
model:add(nn.SelectTable(-1))
model:add(nn.Linear(hiddenSize,outputSize))
model:add(nn.LogSoftMax())
criterion=nn.ClassNLLCriterion()
input=torch.Tensor({{0,0},{1,1},{2,2},{3,3}}) --the input points in one stroke (0,0),(1,1),(2,2),(3,3)
output=torch.Tensor({1,0,0,0,0,0,0,0,0,0})
model:forward(input) |
@nicholas-leonard everything goes well ,thank you very. I will close this issue. |
@nicholas-leonard I have a question along similar lines. Do you mind telling me what is wrong with my training here? In this toy example, the RNN doesn't perform well learning the sum of a series. -- more imports than necessary
require 'nn'
require 'cunn'
require 'rnn'
require 'dp'
require 'cutorch'
-- set to a small learning rate due to possible explosion
lr = .00001
-- roll lstm over the numbers in the sequence, select last output layer, apply linear model
rnn = nn.Sequential()
lstm = nn.FastLSTM( 1,500)
rnn:add( nn.Sequencer( lstm) )
rnn:add( nn.SelectTable(-1) )
rnn:add( nn.Linear(500,1) )
rnn:cuda()
criterion = nn.MSECriterion():cuda()
-- random numbers that are scaled to make the problem a little harder
inputs = torch.rand(100,200)
for i=1,100 do
inputs[i] = inputs[i]* i
end
inputs = inputs:cuda()
targets = inputs:sum(2):cuda()
baseline = targets:std()^2
print( baseline ) -- usually around 8,600,000
for i =1,10000 do
rnn:training()
errors = {}
for j = 1,100 do
--get input row and target
local input = inputs[j]:split(1)
local target = targets[j]
-- print( target)
local output = rnn:forward( input)
local err = criterion:forward( output, target)
table.insert( errors, err)
local gradOutputs = criterion:backward(output, target)
rnn:backward(input, gradOutputs)
rnn:updateParameters(lr)
rnn:zeroGradParameters()
rnn:forget()
end
print ( torch.mean( torch.Tensor( errors)) )
end |
@flybass Your problem is very hard. Maybe the model needs more capacity. You could try adding more hidden (1000 instead of 500), and/or stacking another LSTM over the first one. Also, you could try it with more examples. 100 is not a lot. Also, the data is unbounded. So a sum could be 1, or it could be 10000. If you could find a way to keep it within a range, say between -1 and 1, that might help. To do so, you could normalize the dataset so that the min and max sum is -1 and 1 respectively. |
@nicholas-leonard Thanks for the advice (and super fast response). I'll see if normalization makes a difference here. At least I'm using the modules correctly. I added clipping on the gradients and managed to get the MSE much lower than the baseline. However, I still supposed that the LSTM should be able to do a better job learning addition (adding more hidden units seems like a lot of parameters for a simple function). I'm working on an analogous problem with very long documents (but first layer word embeddings). Are there any tips on initialization for the models? I see many use cases that set weights to uniform parameters. Thanks again, |
@flybass You can try the default initialization, or getParameters():uniform(0.1) (or something around that number). It's empirical, so trial and error. But usually, I find uniform 0.1 to work for me. |
@nicholas-leonard I spoke to a professor about this issue and he mentioned to initialize the LSTM weights so they are orthogonal. I'll compare this method. |
@flybass let me know if that works best. Also, if it does work, you could add a |
Is there a way to initialize the rnn for a one to many sequence? I am doing a single output to six output system Everything is fine until I call
I train as follows in mini-batches
I am getting errors when I call home/local/ANT/ogunmolu/torch/install/bin/luajit: ...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Linear.lua:75: size mismatch, m1: [6 x 1], m2: [6 x 1] at /home/local/ANT/ogunmolu/torch/pkg/torch/lib/TH/generic/THTensorMath.c:706
stack traceback:
[C]: in function 'addmm'
...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
...T/ogunmolu/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
...NT/ogunmolu/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
...lu/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
...T/ogunmolu/torch/install/share/lua/5.1/rnn/Sequencer.lua:78: in function 'updateGradInput'
...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
rnn.lua:475: in function 'train'
rnn.lua:688: in main chunk
[C]: in function 'dofile'
...molu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406240 Would appreciate your help! I left a gist here in case you are interested in the code. |
I want to recognize the online handwriten charaters with your lstm, is there any example? The example in https://github.com/nicholas-leonard/dp/blob/master/examples/recurrentlanguagemodel.lua is language model and not suite to my task.
The text was updated successfully, but these errors were encountered: