GitHub - huyanxin/sru: Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.

Average processing time of LSTM, conv2d and SRU, tested on GTX 1070

For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.

Reference:

Training RNNs as Fast as CNNs

@article{lei2017sru,
  title={Training RNNs as Fast as CNNs},
  author={Lei, Tao and Zhang, Yu},
  journal={arXiv preprint arXiv:1709.02755},
  year={2017}
}

Requirements

GPU and CUDA 8 are required
PyTorch
CuPy
pynvrtc

Install requirements via pip install -r requirements.txt. CuPy and pynvrtc needed to compile the CUDA code into a callable function at runtime. Only single GPU training is supported.

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from torch.autograd import Variable
from cuda_functional import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = Variable(torch.FloatTensor(20, 32, 128).cuda())

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    rnn_dropout = 0.0,       # variational dropout applied on linear transformation
    use_tanh = 1,            # use tanh?
    use_relu = 0,            # use ReLU?
    bidirectional = False    # bidirectional RNN ?
)
rnn.cuda()

output, hidden = rnn(x)      # forward pass

# output is (length, batch size, hidden size * number of directions)
# hidden is (layers, batch size, hidden size * number of directions)

Make sure cuda_functional.py and the shared library cuda/lib64 can be found by the system, e.g.

export LD_LIBRARY_PATH=/usr/local/cuda/lib64
export PYTHONPATH=path_to_repo/sru

Instead of using PYTHONPATH, the SRU module now can be installed as a regular package via python setup.py install or pip install. See this PR.

classification
question answering (SQuAD)
language modelling on PTB
machine translation (to be included in OpenNMT-py)
speech recognition (Note: implemented in CNTK instead of PyTorch)-

Contributors

https://github.com/taolei87/sru/graphs/contributors

To-do

ReLU activation
support multi-GPU (context change)
Layer normalization + residual to compare with highway connection (current version)

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
DrQA		DrQA
classification		classification
imgs		imgs
language_model		language_model
speech		speech
sru		sru
LICENSE		LICENSE
README.md		README.md
cuda_functional.py		cuda_functional.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DrQA

DrQA

classification

classification

imgs

imgs

language_model

language_model

speech

speech

sru

sru

LICENSE

LICENSE

README.md

README.md

cuda_functional.py

cuda_functional.py

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

About

Reference:

Requirements

Examples

Contributors

To-do

About

Releases

Packages

Languages

License

huyanxin/sru

Folders and files

Latest commit

History

Repository files navigation

About

Reference:

Requirements

Examples

Contributors

To-do

About

Resources

License

Stars

Watchers

Forks

Languages