Skip to content

Latest commit



345 lines (277 loc) · 12.9 KB

File metadata and controls

345 lines (277 loc) · 12.9 KB

dpnn : deep extensions to nn

This package provides many useful features that aren't part of the main nn package. These include sharedClone, which allows you to clone a module and share parameters or gradParameters with the original module, without incuring any memory overhead. We also redefined type such that the type-cast preserves Tensor sharing within a structure of modules.

The package provides the following Modules:

  • Decorator : abstract class to change the behaviour of an encapsulated module ;
  • DontCast : prevent encapsulated module from being casted by Module:type() ;
  • Serial : decorate a module makes its serialized output more compact ;
  • Inception : implements the Inception module of the GoogleLeNet article ;
  • Dictionary : a LookupTable with sparse updates;
  • Collapse : just like nn.View(-1);
  • Convert : convert between different tensor types or shapes;
  • ZipTable : zip a table of tables into a table of tables;
  • ReverseTable : reverse the order of elements in a table;
  • PrintSize : prints the size of inputs and gradOutputs (useful for debugging);

A lot of the functionality implemented here was pulled from dp, which makes heavy use of this package. However, dpnn can be used without dp (for e.g. you can use it with optim), which is one of the main reasons why we made it.


The Module interface has been further extended with methods that facilitate stochastic gradient descent like updateGradParameters (i.e. momentum learning), weightDecay, maxParamNorm (for regularization), and so on.


A table that specifies the name of parameter attributes. Defaults to {'weight', 'bias'}, which is a static variable (i.e. table exists in class namespace). Sub-classes can define their own table statically.


A table that specifies the name of gradient w.r.t. parameter attributes. Defaults to {'gradWeight', 'gradBias'}, which is a static variable (i.e. table exists in class namespace). Sub-classes can define their own table statically.

[self] Module:type(type_str)

This function converts all the parameters of a module to the given type_str. The type_str can be one of the types defined for torch.Tensor like torch.DoubleTensor, torch.FloatTensor and torch.CudaTensor. Unlike the type method defined in nn, this one was overriden to maintain the sharing of storage among Tensors. This is especially useful when cloning modules share parameters and gradParameters.

[clone] Module:sharedClone([shareParams, shareGradParams])

Similar to clone. Yet when shareParams = true (the default), the cloned module will share the parameters with the original module. Furthermore, when shareGradParams = true (the default), the clone module will share the gradients w.r.t. parameters with the original module. This is equivalent to :

clone = mlp:clone()
clone:share(mlp, 'weight', 'bias', 'gradWeight', 'gradBias')

yet it is much more efficient, especially for modules with lots of parameters, as these Tensors aren't needlessly copied during the clone. This is particularly useful for Recurrent neural networks which require efficient copies with shared parameters and gradient w.r.t. parameters for each time-step.

Module:maxParamNorm([maxOutNorm, maxInNorm])

This method implements a hard constraint on the upper bound of the norm of output and/or input neuron weights (Hinton et al. 2012, p. 2) . In a weight matrix, this is a contraint on rows (maxOutNorm) and/or columns (maxInNorm), respectively. Has a regularization effect analogous to weightDecay, but with easier to optimize hyper-parameters. Assumes that parameters are arranged (output dim x ... x input dim). Only affects parameters with more than one dimension. The method should normally be called after updateParameters. It uses the C/CUDA optimized torch.renorm function. Hint : maxOutNorm = 2 usually does the trick.

[momGradParams] Module:momentumGradParameters()

Returns a table of Tensors (momGradParams). For each element in the table, a corresponding parameter (params) and gradient w.r.t. parameters (gradParams) is returned by a call to parameters. This method is used internally by updateGradParameters.

Module:updateGradParameters(momFactor [, momDamp, momNesterov])

Applies classic momentum or Nesterov momentum (Sutskever, Martens et al, 2013) to parameter gradients. Each parameter Tensor (params) has a corresponding Tensor of the same size for gradients w.r.t. parameters (gradParams). When using momentum learning, another Tensor is added for each parameter Tensor (momGradParams). This method should be called before updateParameters as it affects the gradients w.r.t. parameters.

Classic momentum is computed as follows :

momGradParams = momFactor*momGradParams + (1-momDamp)*gradParams
gradParams = momGradParams

where momDamp has a default value of momFactor.

Nesterov momentum (momNesterov = true) is computed as follows (the first line is the same as classic momentum):

momGradParams = momFactor*momGradParams + (1-momDamp)*gradParams
gradParams = gradParams + momFactor*momGradParams

The default is to use classic momentum (momNesterov = false).

Module:weightDecay(wdFactor [, wdMinDim])

Decays the weight of the parameterized models. Implements an L2 norm loss on parameters with dimensions greater or equal to wdMinDim (default is 2). The resulting gradients are stored into the corresponding gradients w.r.t. parameters. Such that this method should be called before updateParameters.

Module:gradParamClip(cutoffNorm [, moduleLocal])

Implements a contrainst on the norm of gradients w.r.t. parameters (Pascanu et al. 2012). When moduleLocal = false (the default), the norm is calculated globally to Module for which this is called. So if you call it on an MLP, the norm is computed on the concatenation of all parameter Tensors. When moduleLocal = true, the norm constraint is applied to the norm of all parameters in each component (non-container) module. This method is useful to prevent the exploding gradient in Recurrent neural networks.


dmodule = nn.Decorator(module)

This module is an abstract class used to decorate a module. This means that method calls to dmodule will call the same method on the encapsulated module, and return its results.


dmodule = nn.DontCast(module)

This module is a decorator. Use it to decorate a module that you don't want to be cast when the type() method is called.

module = nn.DontCast(nn.Linear(3,4):float())
th> print(module:forward(torch.FloatTensor{1,2,3}))
[torch.FloatTensor of size 4]


dmodule = nn.Serial(module)

This module is a decorator that can be used to control the serialization/deserialization behavior of the encapsulated module. Basically, making the resulting string or file heavy (the default), medium or light in terms of size. Furthermore, when specified, the type attribute (e.g float, double, cuda, torch.FloatTensor, torch.DoubleTensor and so on.), determines what type the module will be cast to during serialization. Note that this will also be the type of the deserialized object.

The heavySerial([type]) has the serialization process serialize every attribute in the module graph, which is the default behavior of nn.

The mediumSerial([type]) has the serialization process serialize everything except the attributes specified in each module's dpnn_mediumEmpty table, which has a default value of {'output', 'gradInput', 'momGradParams', 'dpnn_input'}. During serialization, whether they be tables or Tensors, these attributes are emptied (no storage). Some modules overwrite the default Module.dpnn_mediumEmpty static attribute with their own. The default serialization type of the mediumSerial() is float.

The lightSerial([type]) has the serialization process empty
everything a call to mediumSerial(type) would (so it uses dpnn_mediumEmpty). But also empties all the parameter gradients specified by the attribute dpnn_gradParameters, which defaults to {gradWeight, gradBias}.

We recomment using mediumSerial() for training, and lightSerial() for production (feed-forward-only models).




module = nn.Collapse(nInputDim)

This module is the equivalent of:

view = nn.View(-1)

It collapses all non-batch dimensions. This is useful for converting a spatial feature map to the single dimension required by a dense hidden layer like Linear.


module = nn.Convert([inputShape, outputShape])

Module to convert between different data formats. For example, we can flatten images by using :

module = nn.Convert('bchw', 'bf')

or equivalently

module = nn.Convert('chw', 'f')

Lets try it with an input:

 0.5692 -0.0190  0.5243  0.7530  0.4230  1.2483
-0.9142  0.6013  0.5608 -1.0417 -1.4014  1.0177
-1.5207 -0.1641 -0.4166  1.4810 -1.1725 -1.0037
[torch.DoubleTensor of size 3x6]

You could also try:

module = nn.Convert('chw', 'hwc')
input = torch.randn(1,2,3,2)
(1,1,.,.) = 
  1  1
  1  1
  1  1
(1,2,.,.) = 
  2  2
  2  2
  2  2
[torch.DoubleTensor of size 1x2x3x2]
(1,1,.,.) = 
  1  2
  1  2

(1,2,.,.) = 
  1  2
  1  2

(1,3,.,.) = 
  1  2
  1  2
[torch.DoubleTensor of size 1x3x2x2]

Furthermore, it automatically converts the input to have the same type as self.output (i.e. the type of the module). So you can also just use is for automatic input type converions:

module = nn.Convert()
print(module.output) -- type of module
[torch.DoubleTensor with no dimension]
input = torch.FloatTensor{1,2,3}
[torch.DoubleTensor of size 3]


module = nn.ZipTable()

Zips a table of tables into a table of tables.


print(module:forward{ {'a1','a2'}, {'b1','b2'}, {'c1','c2'} })
{ {'a1','b1','c1'}, {'a2','b2','c2'} }


module = nn.ReverseTable()

Reverses the order of elements in a table.





criterion = nn.ModuleCriterion(criterion [, inputModule, targetModule, castTarget])

This criterion decorates a criterion by allowing the input and target to be fed through optional an inputModule and targetModule before being passed to the criterion. The inputModule must not contain parameters as these would not be updated.

When castTarget = true (the default), the targetModule is cast along with the inputModule and criterion. Otherwise, the targetModule isn't.