Skip to content
Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.
data/tinyshakespeare init commit Nov 28, 2015
logs Add NASCell, update MultiRNNCell, and tensorboard Mar 11, 2017
save better names Nov 28, 2015
.gitignore Ignore data.npy and vocab.pkl Jan 19, 2017
.travis.yml pep8 is now pycodestyle Mar 12, 2018 Update Nov 27, 2015 README: add info and useful tips from @ubergarm from issue #91 Apr 20, 2017 fix checkpointing May 11, 2018 decoding again from the utf-8 format Aug 29, 2018 faster help May 11, 2018 add comments to utils Jul 29, 2017


Join the chat at Coverage Status Build Status

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow.

Inspired from Andrej Karpathy's char-rnn.


Basic Usage

To train with default parameters on the tinyshakespeare corpus, run python To access all the parameters use python --help.

To sample from a checkpointed model, python Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU. To force CPU mode, use export CUDA_VISIBLE_DEVICES="" and unset CUDA_VISIBLE_DEVICES afterward (resp. set CUDA_VISIBLE_DEVICES="" and set CUDA_VISIBLE_DEVICES= on Windows).

To continue training after interruption or to run on more epochs, python --init_from=save


You can use any plain text file as input. For example you could download The complete Sherlock Holmes as such:

cd data
mkdir sherlock
cd sherlock
mv cnus.txt input.txt

Then start train from the top level directory using python --data_dir=./data/sherlock/

A quick tip to concatenate many small disparate .txt files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt.


Tuning your models is kind of a "dark art" at this point. In general:

  1. Start with as much clean input.txt as possible e.g. 50MiB
  2. Start by establishing a baseline using the default settings.
  3. Use tensorboard to compare all of your runs visually to aid in experimenting.
  4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
  5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
  6. Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
  7. Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.


To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your log_dir. E.g.:

$ tensorboard --logdir=./logs/

Then open a browser to http://localhost:6006 or the correct IP/Port specified.


  • Add explanatory comments
  • Expose more command-line arguments
  • Compare accuracy and performance with char-rnn
  • More Tensorboard instrumentation


Please feel free to:

  • Leave feedback in the issues
  • Open a Pull Request
  • Join the gittr chat
  • Share your success stories and data sets!
You can’t perform that action at this time.