Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

Question about Sequencer.lua #61

Closed
helson73 opened this issue Jan 10, 2017 · 4 comments
Closed

Question about Sequencer.lua #61

helson73 opened this issue Jan 10, 2017 · 4 comments

Comments

@helson73
Copy link

helson73 commented Jan 10, 2017

I've implemented recursive net, and initialize sequencer with that. (also memory optimizer)
Source code is

require('nngraph')
local RVNN, parent = torch.class('onmt.RVNN', 'nn.Container')

function RVNN:__init (outSize, relDim, numRel, dropout)
  parent.__init(self)
  self.outSize = outSize
  self.relDim = relDim
  self.numRel = numRel
  self.dropout = dropout
  self.net = self:_buildModel()
  self:add(self.net)
end

function RVNN:_buildModel ()
  local model = nn.Linear(self.outSize*2+self.relDim, self.outSize, true)
  local emb = nn.LookupTable(self.numRel, self.relDim)
  local inputs = {nn.Identity()(), nn.Identity()(), nn.Identity()()}
  local rel = emb(inputs[3])
  local proj = nn.JoinTable(2)({inputs[1], inputs[2], rel})
  if self.dropout > 0 then
    proj = onmt.BayesianDropout(self.dropout, 'recursive')(proj)
  end
  local out = nn.Tanh()(model(proj))
  return nn.gModule(inputs, {out})
end

function RVNN:updateOutput(input)
  self.output = self.net:updateOutput(input)
  return self.output
end

function RVNN:updateGradInput(input, gradOutput)
  return self.net:updateGradInput(input, gradOutput)
end

function RVNN:accGradParameters(input, gradOutput, scale)
  return self.net:accGradParameters(input, gradOutput, scale)
end

But I found it returns gradient with zero dimension.
I have to change the updateGradInput function to

function RVNN:updateGradInput(input, gradOutput)
  self.gradInput = self.net:updateGradInput(input, gradOutput)
  return self.gradInput
end

which is not necessary in LSTM.lua.
I can't find any difference between sequencer with LSTM and sequencer with my recursive nets.
I wondering in current sequencer implementation, how self.gradInput is redirected to self.net.gradInput ?
Thanks.

@guillaumekln
Copy link
Collaborator

How do you initialize the Sequencer?

@helson73
Copy link
Author

helson73 commented Jan 13, 2017

@guillaumekln
I initialize sequencer like this

local Tree, parent = torch.class('onmt.Tree', 'onmt.Sequencer')

function Tree:__init (rvnn)
  self.rvnn = rvnn
  parent.__init(self, self.rvnn)
  self:resetPreallocation()
end

function Tree.load(pretrained)
  local self = torch.factory('onmt.Tree')
  self.rvnn = pretrained.modules[1]
  parent.__init(self, self.rvnn)
  self:resetPreallocation()
end

function Tree:training()
  parent.training(self)
end

function Tree:evaluate()
  parent.evaluate(self)
end

function Tree:serialize()
  return {
    modules = self.modules
  }
end

function Tree:maskPadding()
  self.maskPad = true
end

function Tree:resetPreallocation()
  self.headProto = torch.Tensor()
  self.depProto = torch.Tensor()
  self.gradFeedProto = torch.Tensor()
end

function Tree:forward(batch, f2s_)
  if self.train then
    self.inputs = {}
    self:_reset_noise()
  end

  local head_ = onmt.utils.Tensor.reuseTensor(self.headProto,
                                              {batch.size, self.rvnn.outSize})
  local dep_ = onmt.utils.Tensor.reuseTensor(self.depProto,
                                              {batch.size, self.rvnn.outSize})

  for t = 1, batch.headLength do
    onmt.utils.DepTree._get(head_, f2s_, batch.head[t])
    onmt.utils.DepTree._get(dep_, f2s_, batch.dep[t])
    local tree_input = {head_, dep_, batch.relation[t]}
    if self.train then
      self.inputs[t] = tree_input
    end
    onmt.utils.DepTree._set(f2s_, self:net(t):forward(tree_input), batch.update[t])
  end
  return f2s_
end

function Tree:backward(batch, gradFeedOutput)
  local gradFeed_ = onmt.utils.Tensor.reuseTensor(self.gradFeedProto,
                                                  {batch.size, self.rvnn.outSize})
  for t = batch.headLength, 1, -1 do
    onmt.utils.DepTree._get(gradFeed_, gradFeedOutput, batch.update[t])
    local dtree = self:net(t):backward(self.inputs[t], gradFeed_)
    onmt.utils.DepTree._add(gradFeedOutput, dtree[1], batch.head[t])
    onmt.utils.DepTree._add(gradFeedOutput, dtree[2], batch.dep[t])
    onmt.utils.DepTree._fill(gradFeedOutput, 0, batch.update[t])
  end
  return gradFeedOutput
end

@guillaumekln
Copy link
Collaborator

guillaumekln commented Jan 13, 2017

The difference is that you call backward on the RVNN module because it is the one exposed by the Sequencer. As you don't override the backward function, the definition from nn.Module is used:

function Module:backward(input, gradOutput, scale)
   scale = scale or 1
   self:updateGradInput(input, gradOutput)
   self:accGradParameters(input, gradOutput, scale)
   return self.gradInput
end

which expects self.gradInput to be not nil.

On the other hand, the LSTM module is not directly exposed by the Sequencer and it only relies on updateGradInput's return value. See https://github.com/torch/nngraph/blob/master/gmodule.lua#L420 which is called on each node in the graph.


However, these lines:

self.gradInput = self.net:updateGradInput(input, gradOutput)
return self.gradInput

should also appear in the LSTM module for consistency.

So thank you for your question.

@helson73
Copy link
Author

@guillaumekln That's really helpful. Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants