Pytorch world language model (text generation) for PTB dataset example
Clone or download
Latest commit 19eac43 Oct 29, 2018

Word-level language modeling RNN

This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the PTB dataset, provided. The trained model can then be used by the generate script to generate new text. This is a porting of pytorch/examples/word_language_model making it usables on FloydHub.


The script accepts the following arguments:

optional arguments:
  -h, --help         show this help message and exit
  --data DATA        location of the data corpus
  --model MODEL      type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
  --emsize EMSIZE    size of word embeddings
  --nhid NHID        number of hidden units per layer
  --nlayers NLAYERS  number of layers
  --lr LR            initial learning rate
  --clip CLIP        gradient clipping
  --epochs EPOCHS    upper epoch limit
  --batch-size N     batch size
  --bptt BPTT        sequence length
  --dropout DROPOUT  dropout applied to layers (0 = no dropout)
  --decay DECAY      learning rate decay per epoch
  --tied             tie the word embedding and softmax weights
  --seed SEED        random seed
  --cuda             use CUDA
  --log-interval N   report interval
  --save SAVE        path to save the final model

With these arguments, a variety of models can be tested. As an example, the following arguments produce slower but better models:

python --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40           # Test perplexity of 80.97
python --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied    # Test perplexity of 75.96
python --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40        # Test perplexity of 77.42
python --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied # Test perplexity of 72.30

These perplexities are equal or better than Recurrent Neural Network Regularization (Zaremba et al. 2014) and are similar to Using the Output Embedding to Improve Language Models (Press & Wolf 2016 and Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling (Inan et al. 2016), though both of these papers have improved perplexities by using a form of recurrent dropout (variational dropout).



Run on FloydHub

Here's the commands to training, evaluating and serving your language modeling task on FloydHub.

Project Setup

Before you start, log in on FloydHub with the floyd login command, then fork and init the project:

$ git clone
$ cd word-language-model
$ floyd init word-language-model


Before you start, you need to upload the Penn Treebank-3 dataset as a FloydHub Dataset following this guide: create and upload a dataset. Then you will be ready to play with different language models.

# Train a LSTM on PTB with CUDA, reaching perplexity of 114.22
floyd run --gpu --env pytorch-0.2 --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input "python --cuda --epochs 7"

# Train a tied LSTM on PTB with CUDA, reaching perplexity of 110.44
floyd run --gpu --env pytorch-0.2 --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input "python --cuda --epochs 7 --tied"

# Train a tied LSTM on PTB with CUDA for 40 epochs, reaching perplexity of 87.17
floyd run --gpu --env pytorch-0.2 --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input "python --cuda --tied"


  • --gpu run your job on a FloydHub GPU instance.
  • --env pytorch-0.2 prepares a pytorch environment for python 3.
  • --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input mounts the previus uploaded Penn Treebank-3 dataset in the /input folder inside the container for our job.

The model uses the nn.RNN module (and its sister modules nn.GRU and nn.LSTM) which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.

During training, if a keyboard interrupt (Ctrl-C) is received, training is stopped and the current model is evaluated against the test dataset.

You can follow along the progress by using the logs command. The first 2 examples of training should be completed in about 5 minutes on a GPU instance and 40' on a CPU one. The last example should take about 30' on a GPU instance and above 3 hours on a CPU instace.


It's time to evaluate our model generating some text:

# Generate samples from the trained LSTM model.
floyd run --gpu --env pytorch-0.2 --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model "python --cuda"

Try our pre-trained model

We have provided to you a pre-trained model trained for 40 epochs reaching perplexity of 87.17:

# Generate samples from the trained LSTM model.
floyd run --gpu --env pytorch-0.2 --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model "python --cuda"

Serve model through REST API

FloydHub supports seving mode for demo and testing purpose. Before serving your model through REST API, you need to create a floyd_requirements.txt and declare the flask requirement in it. If you run a job with --mode serve flag, FloydHub will run the file in your project and attach it to a dynamic service endpoint:

floyd run --gpu --mode serve --env pytorch-0.2  --data <USERNAME>/dataset/<PENN-TB3>/<VERSION>:input --data <REPLACE_WITH_JOB_OUTPUT_NAME>:model

The above command will print out a service endpoint for this job in your terminal console.

The service endpoint will take a couple minutes to become ready. Once it's up, you can interact with the model by sending a POST request wih the number of words and the temperature that the model will use to generate text:

# Template

curl -X POST -o generated.txt -F "words=100" -F "temperature=3"

Any job running in serving mode will stay up until it reaches maximum runtime. So once you are done testing, remember to shutdown the job!

Note that this feature is in preview mode and is not production ready yet

More resources

Some useful resources on NLP for Deep Learning and language modeling task:


For any questions, bug(even typos) and/or features requests do not hesitate to contact me or open an issue!