Question about Sequencer.lua #61

helson73 · 2017-01-10T20:31:34Z

I've implemented recursive net, and initialize sequencer with that. (also memory optimizer)
Source code is

require('nngraph')
local RVNN, parent = torch.class('onmt.RVNN', 'nn.Container')

function RVNN:__init (outSize, relDim, numRel, dropout)
  parent.__init(self)
  self.outSize = outSize
  self.relDim = relDim
  self.numRel = numRel
  self.dropout = dropout
  self.net = self:_buildModel()
  self:add(self.net)
end

function RVNN:_buildModel ()
  local model = nn.Linear(self.outSize*2+self.relDim, self.outSize, true)
  local emb = nn.LookupTable(self.numRel, self.relDim)
  local inputs = {nn.Identity()(), nn.Identity()(), nn.Identity()()}
  local rel = emb(inputs[3])
  local proj = nn.JoinTable(2)({inputs[1], inputs[2], rel})
  if self.dropout > 0 then
    proj = onmt.BayesianDropout(self.dropout, 'recursive')(proj)
  end
  local out = nn.Tanh()(model(proj))
  return nn.gModule(inputs, {out})
end

function RVNN:updateOutput(input)
  self.output = self.net:updateOutput(input)
  return self.output
end

function RVNN:updateGradInput(input, gradOutput)
  return self.net:updateGradInput(input, gradOutput)
end

function RVNN:accGradParameters(input, gradOutput, scale)
  return self.net:accGradParameters(input, gradOutput, scale)
end

But I found it returns gradient with zero dimension.
I have to change the updateGradInput function to

function RVNN:updateGradInput(input, gradOutput)
  self.gradInput = self.net:updateGradInput(input, gradOutput)
  return self.gradInput
end

which is not necessary in LSTM.lua.
I can't find any difference between sequencer with LSTM and sequencer with my recursive nets.
I wondering in current sequencer implementation, how self.gradInput is redirected to self.net.gradInput ?
Thanks.

guillaumekln · 2017-01-11T11:37:52Z

How do you initialize the Sequencer?

helson73 · 2017-01-13T16:40:02Z

@guillaumekln
I initialize sequencer like this

local Tree, parent = torch.class('onmt.Tree', 'onmt.Sequencer')

function Tree:__init (rvnn)
  self.rvnn = rvnn
  parent.__init(self, self.rvnn)
  self:resetPreallocation()
end

function Tree.load(pretrained)
  local self = torch.factory('onmt.Tree')
  self.rvnn = pretrained.modules[1]
  parent.__init(self, self.rvnn)
  self:resetPreallocation()
end

function Tree:training()
  parent.training(self)
end

function Tree:evaluate()
  parent.evaluate(self)
end

function Tree:serialize()
  return {
    modules = self.modules
  }
end

function Tree:maskPadding()
  self.maskPad = true
end

function Tree:resetPreallocation()
  self.headProto = torch.Tensor()
  self.depProto = torch.Tensor()
  self.gradFeedProto = torch.Tensor()
end

function Tree:forward(batch, f2s_)
  if self.train then
    self.inputs = {}
    self:_reset_noise()
  end

  local head_ = onmt.utils.Tensor.reuseTensor(self.headProto,
                                              {batch.size, self.rvnn.outSize})
  local dep_ = onmt.utils.Tensor.reuseTensor(self.depProto,
                                              {batch.size, self.rvnn.outSize})

  for t = 1, batch.headLength do
    onmt.utils.DepTree._get(head_, f2s_, batch.head[t])
    onmt.utils.DepTree._get(dep_, f2s_, batch.dep[t])
    local tree_input = {head_, dep_, batch.relation[t]}
    if self.train then
      self.inputs[t] = tree_input
    end
    onmt.utils.DepTree._set(f2s_, self:net(t):forward(tree_input), batch.update[t])
  end
  return f2s_
end

function Tree:backward(batch, gradFeedOutput)
  local gradFeed_ = onmt.utils.Tensor.reuseTensor(self.gradFeedProto,
                                                  {batch.size, self.rvnn.outSize})
  for t = batch.headLength, 1, -1 do
    onmt.utils.DepTree._get(gradFeed_, gradFeedOutput, batch.update[t])
    local dtree = self:net(t):backward(self.inputs[t], gradFeed_)
    onmt.utils.DepTree._add(gradFeedOutput, dtree[1], batch.head[t])
    onmt.utils.DepTree._add(gradFeedOutput, dtree[2], batch.dep[t])
    onmt.utils.DepTree._fill(gradFeedOutput, 0, batch.update[t])
  end
  return gradFeedOutput
end

guillaumekln · 2017-01-13T17:02:56Z

The difference is that you call backward on the RVNN module because it is the one exposed by the Sequencer. As you don't override the backward function, the definition from nn.Module is used:

function Module:backward(input, gradOutput, scale)
   scale = scale or 1
   self:updateGradInput(input, gradOutput)
   self:accGradParameters(input, gradOutput, scale)
   return self.gradInput
end

which expects self.gradInput to be not nil.

On the other hand, the LSTM module is not directly exposed by the Sequencer and it only relies on updateGradInput's return value. See https://github.com/torch/nngraph/blob/master/gmodule.lua#L420 which is called on each node in the graph.

However, these lines:

self.gradInput = self.net:updateGradInput(input, gradOutput)
return self.gradInput

should also appear in the LSTM module for consistency.

So thank you for your question.

helson73 · 2017-01-13T17:06:46Z

@guillaumekln That's really helpful. Thanks!

guillaumekln closed this as completed Jan 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Sequencer.lua #61

Question about Sequencer.lua #61

helson73 commented Jan 10, 2017 •

edited by guillaumekln

guillaumekln commented Jan 11, 2017

helson73 commented Jan 13, 2017 •

edited by guillaumekln

guillaumekln commented Jan 13, 2017 •

edited

helson73 commented Jan 13, 2017

Question about Sequencer.lua #61

Question about Sequencer.lua #61

Comments

helson73 commented Jan 10, 2017 • edited by guillaumekln

guillaumekln commented Jan 11, 2017

helson73 commented Jan 13, 2017 • edited by guillaumekln

guillaumekln commented Jan 13, 2017 • edited

helson73 commented Jan 13, 2017

helson73 commented Jan 10, 2017 •

edited by guillaumekln

helson73 commented Jan 13, 2017 •

edited by guillaumekln

guillaumekln commented Jan 13, 2017 •

edited