Skip to content

Latest commit

 

History

History
 
 

copy

Common Settings

Both series and single models were trained on 2-layer feedforward controller (with hidden sizes 128 and 256 respectively) with ReLU activations, and both share the following set of hyperparameters:

  • RMSProp Optimizer with learning rate of 10⁻⁴, momentum of 0.9.
  • Memory word size of 10, with a single read head.
  • Controller weights are initialized from samples 1 standard-deviation away from a zero mean normal distribution with a variance , where is the size of the input vector coming into the weight matrix.
  • A batch size of 1.

All output from the DNC is squashed between 0 and 1 using a sigmoid functions and binary cross-entropy loss (or logistic loss) function of the form:

loss

is used. That is the mean of the logistic loss across the batch, time steps, and output size.

All gradients are clipped between -10 and 10.

Possible NaNs could occur during training!

Series Training

The model was first trained on a length-2 series of random binary vectors of size 6. Then starting off from the length-2 learned model, a length-4 model was trained in a curriculum learning fashion.

The following plots show the learning curves for the length-2 and length-4 models respectively.

series-2

series-4

Attempting to train a length-4 model directly always resulted in NaNs. The paper mentioned using curriculum learning for the graph and mini-SHRDLU tasks, but it did not mention any thing about the copy task, so there's a possibility that this is not the most efficient method.

Retraining

$python tasks/copy/train-series.py --length=2

Then, assuming that the trained model from that execution is saved under the name 'step-100000'.

$python tasks/copy/train-series.py --length=4 --checkpoint=step-100000 --iterations=20000

Single Training

The model was trained directly on a single input of length between 1 and 10 and the length was chosen randomly at each run, so no curriculum learning was used. The following plot shows the learning curve of the single model.

single-10

Retraining

$python tasks/copy/train.py --iterations=50000