Skip to content
No description, website, or topics provided.
Python Shell
Branch: master
Clone or download
Latest commit 44781ed Apr 11, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
pytorch Update Feb 11, 2019
tf fix typo Apr 10, 2019
LICENSE Create LICENSE Jan 11, 2019 Update Feb 5, 2019 init Jan 9, 2019 init Jan 9, 2019

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

This repository contains the code in both PyTorch and TensorFlow for our paper

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution)

Preprint 2018


  • The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training.
  • Besides the source code, we also provide pretrained "TensorFlow" models with state-of-the-art (SoTA) performances reported in the paper.
  • Please refer to tf/ for details.


  • The source code is in the pytorch/ folder, supporting single-node multi-gpu training via the module nn.DataParallel.
  • Please refer to pytorch/ for details.


Transformer-XL achieves new state-of-the-art results on multiple language modeling benchmarks. Transformer-XL is also the first to break through the 1.0 barrier on char-level language modeling. Below is a summary.

Method enwiki8 text8 One Billion Word WT-103 PTB (w/o finetuning)
Previous Best 1.06 1.13 23.7 20.5 55.5
Transformer-XL 0.99 1.08 21.8 18.3 54.5


A large portion of the script comes from the awd-lstm repo. Happy Language Modeling :)

You can’t perform that action at this time.