Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018

TensorFlow

The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
Please refer to tf/README.md for details.

PyTorch

The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
Please refer to pytorch/README.md for details.

Results

Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method	enwiki8	text8	One Billion Word	WT-103	PTB (w/o finetuning)
Previous Best	1.06	1.13	23.7	20.5	55.5
Transformer-XL	0.99	1.08	21.8	18.3	54.5

Acknowledgement

A large portion of the getdata.sh script comes from the awd-lstm repo. Happy Language Modeling :)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
pytorch		pytorch
tf		tf
LICENSE		LICENSE
README.md		README.md
getdata.sh		getdata.sh
prep_text8.py		prep_text8.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch

pytorch

tf

tf

LICENSE

LICENSE

README.md

README.md

getdata.sh

getdata.sh

prep_text8.py

prep_text8.py

Repository files navigation

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

About

Releases

Packages

Contributors 8

Languages

License

kimiyoung/transformer-xl

Folders and files

Latest commit

History

Repository files navigation

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

TensorFlow

PyTorch

Results

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Languages