Sequential non-normal initializers for RNNs
This repository contains the code for reproducing the results in the following paper:
Orhan AE, Pitkow X (2020) Improved memory in recurrent neural networks with sequential non-normal dynamics. International Conference on Learning Representations (ICLR 2020).
NonnormalInit.py contains plug-and-play Keras initializer classes implementing the proposed non-normal initializers for RNNs. The code was tested with
keras==2.2.4, other versions may or may not work.
NonnormalInit_torchlstm.py contain torch functions implementing the proposed non-normal initializers for vanilla RNNs and LSTMs, respectively. The
ramp_init function in
NonnormalInit_torchlstm.py implements the "mixed" initialization strategy discussed in section 3.3 of the paper. The code was tested with
torch==0.4.0, other versions may or may not work.
The remaining files can be used to replicate the results in Figure 3. Please contact me for raw data from this figure (it was too large to upload here). An example usage would be as follows:
python train.py --task 'copy' --init 'chain' --init_scale 1.02 --lr 5e-5 --rand_seed 3
taskis the task (
copy, addition, psmnist)
initis the initializer for the RNN (
chain, fbchain, orthogonal, identity)
init_scaleis the gain of the initializer (in the paper, this corresponds to
lris the learning rate for the rmsprop algorithm
rand_seedis the random seed.
train.py for more options.
For the language modeling experiments, we used the Salesforce awd-lstm-lm repository, as described in sections 3.1.2 and 3.3 of the paper (with the torch initializers provided here:
NonnormalInit_torchlstm.py). Again, please feel free to contact me for raw simulation results from these experiments as they were too large to upload here.