Adaptive LSTM from Breaking the Activation Function Bottleneck Through Adaptive Parameterization (
Switch branches/tags
Nothing to show
Clone or download
flennerhag Update README
Link to informal write-up.
Latest commit 33fe910 Jun 10, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
alstm handle nn.init.orthogonal_ for pytorch v4 May 24, 2018
examples Update README Jun 10, 2018
.gitignore ignore data dirs May 24, 2018
LICENSE BSD-3 License May 23, 2018 Update README Jun 10, 2018
requirements.txt initial commit May 23, 2018 add the package name to setup May 23, 2018

Adaptive LSTM (aLSTM)

PyTorch implementation of the adaptive LSTM (, an extension of the standard LSTM that increases model flexibility through adaptive parameterization.

The aLSTM converges faster than the LSTM with superior generalizing performance. It is also stable; no need to use gradient clipping, even for sequences of up to thousands of terms. For more info, see the paper or the informal write up.

If you use this code or our results in your research, please cite

  title   = {{Breaking the Activation Function Bottleneck through Adaptive Parameterization}},
  author  = {Flennerhag, Sebastian and Hujun, Yin and Keane, John and Elliot, Mark},
  journal = {{arXiv preprint arXiv:1805.08574}},
  year    = {2018}


This implementation should run on any PyTorch version. It has been tested for v2–v4. To install:

git clone; cd alstm
python install


This implementation follows the LSTM implementation in the official (and constantly changing) PyTorch repo. You have an alstm_cell function and its aLSTMCell module wrapper. These apply to a given time step. The aLSTM class provides an end-user API with variational dropout and our hybrid RHN-LSTM adaptation model for multi-layer aLSTMs.

import torch
from torch.autograd import Variable
from alstm import aLSTM

seq_len, batch_size, input_size, hidden_size, adapt_size, output_size, = 20, 5, 8, 10, 3, 7

alstm = aLSTM(input_size, hidden_size, adapt_size, output_size, nlayers=2)

X = Variable(torch.rand(seq_len, batch_size, hidden_size))
out, hidden = alstm(X) 


To replicate the original experiments of the aLSTM paper see examples.


If you spot a bug, think the docs are useless or have an idea for an extension, don't hesitate to send a PR! If your contribution is substantial, please raise an issue first to check that it is in line with the scope of this repo. Quick wins that would be great to have are:

  • Support for bidirectional aLSTM
  • Support PyTorch's PackedSequence