
## *Data Science Unit 4 Sprint 3 Assignment 1*

# Recurrent Neural Networks and Long Short Term Memory (LSTM)

![Monkey at a typewriter](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/Chimpanzee_seated_at_typewriter.jpg/603px-Chimpanzee_seated_at_typewriter.jpg)

It is said that [**infinite monkeys typing for an infinite amount of time**](https://en.wikipedia.org/wiki/Infinite_monkey_theorem) will eventually type, among other things, the complete works of William Shakespeare. Let's see if we can get there a bit faster, with the power of Recurrent Neural Networks and LSTM.

We will focus specifically on Shakespeare's Sonnets in order to improve our model's ability to learn from the data.

In [1]:
import random
import sys
import os

import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.callbacks import LambdaCallback

from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Bidirectional
from tensorflow.keras.layers import LSTM

%matplotlib inline

# a custom data prep class that we'll be using 
from data_cleaning_toolkit_class import data_cleaning_toolkit

### Use request to pull data from a URL

[**Read through the request documentation**](https://requests.readthedocs.io/en/master/user/quickstart/#make-a-request) in order to learn how to download the Shakespeare Sonnets from the Gutenberg website. 

**Protip:** Do not over think it.

In [2]:
# download all of Shakespears Sonnets from the Project Gutenberg website

# here's the link for the sonnets
url_shakespeare_sonnets = "https://www.gutenberg.org/cache/epub/1041/pg1041.txt"

# use request and the url to download all of the sonnets - save the result to `r`

# YOUR CODE HERE
r = requests.get(url_shakespeare_sonnets)

In [3]:
# move the downloaded text out of the request object - save the result to `raw_text_data`
# hint: take at look at the attributes of `r`
# YOUR CODE HERE
raw_text_data = r.text

In [4]:
# check the data type of `raw_text_data`
type(raw_text_data)

str

### Data Cleaning

In [5]:
# as usual, we are tasked with cleaning up messy data
# Question: Do you see any characters that we could use to split up the text?
raw_text_data[:3000]

"\ufeffThe Project Gutenberg EBook of Shakespeare's Sonnets, by William Shakespeare\r\n\r\nThis eBook is for the use of anyone anywhere at no cost and with\r\nalmost no restrictions whatsoever.  You may copy it, give it away or\r\nre-use it under the terms of the Project Gutenberg License included\r\nwith this eBook or online at www.gutenberg.org\r\n\r\n\r\nTitle: Shakespeare's Sonnets\r\n\r\nAuthor: William Shakespeare\r\n\r\nPosting Date: April 7, 2014 [EBook #1041]\r\nRelease Date: September, 1997\r\nLast Updated: March 10, 2010\r\n\r\nLanguage: English\r\n\r\n\r\n*** START OF THIS PROJECT GUTENBERG EBOOK SHAKESPEARE'S SONNETS ***\r\n\r\n\r\n\r\n\r\nProduced by Joseph S. Miller and Embry-Riddle Aeronautical\r\nUniversity Library. HTML version by Al Haines.\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\nTHE SONNETS\r\n\r\nby William Shakespeare\r\n\r\n\r\n\r\n\r\n  I\r\n\r\n  From fairest creatures we desire increase,\r\n  That thereby beauty's rose might never die,\r\n  But as the riper

In [6]:
# split the text into lines and save the result to `split_data`
# YOUR CODE HERE
split_data = raw_text_data.splitlines()


In [7]:
# we need to drop all the boilder plate text (i.e. titles and descriptions) as well as white spaces
# so that we are left with only the sonnets themselves 
split_data[:20] 

["\ufeffThe Project Gutenberg EBook of Shakespeare's Sonnets, by William Shakespeare",
 '',
 'This eBook is for the use of anyone anywhere at no cost and with',
 'almost no restrictions whatsoever.  You may copy it, give it away or',
 're-use it under the terms of the Project Gutenberg License included',
 'with this eBook or online at www.gutenberg.org',
 '',
 '',
 "Title: Shakespeare's Sonnets",
 '',
 'Author: William Shakespeare',
 '',
 'Posting Date: April 7, 2014 [EBook #1041]',
 'Release Date: September, 1997',
 'Last Updated: March 10, 2010',
 '',
 'Language: English',
 '',
 '',
 "*** START OF THIS PROJECT GUTENBERG EBOOK SHAKESPEARE'S SONNETS ***"]

**Use list index slicing in order to remove the titles and descriptions so we are only left with the sonnets.**


In [8]:
# sonnets exists between these indicies 
# titles and descriptions exist outside of these indicies

# use index slicing to isolate the sonnet lines - save the result to `sonnets`

# YOUR CODE HERE
sonnets = split_data[45:-369]

In [9]:
# notice how all non-sonnet lines have far less characters than the actual sonnet lines?
# well, let's use that observation to filter out all the non-sonnet lines
sonnets[200:240]

["    And nothing 'gainst Time's scythe can make defence",
 '    Save breed, to brave him when he takes thee hence.',
 '',
 '  XIII',
 '',
 '  O! that you were your self; but, love you are',
 '  No longer yours, than you your self here live:',
 '  Against this coming end you should prepare,',
 '  And your sweet semblance to some other give:',
 '  So should that beauty which you hold in lease',
 '  Find no determination; then you were',
 "  Yourself again, after yourself's decease,",
 '  When your sweet issue your sweet form should bear.',
 '  Who lets so fair a house fall to decay,',
 '  Which husbandry in honour might uphold,',
 "  Against the stormy gusts of winter's day",
 "  And barren rage of death's eternal cold?",
 '    O! none but unthrifts. Dear my love, you know,',
 '    You had a father: let your son say so.',
 '',
 '  XIV',
 '',
 '  Not from the stars do I my judgement pluck;',
 '  And yet methinks I have astronomy,',
 '  But not to tell of good or evil luck,',
 "  Of plagu

In [15]:
# any string with less than n_chars characters will be filtered out - save results to `filtered_sonnets`

# YOUR CODE HERE
n_char = 10
filtered_sonnets = [line.lstrip() for line in sonnets if len(line) > n_char]

In [16]:
# ok - much better!
# but we still need to remove all the punctuation and case normalize the text
filtered_sonnets

['From fairest creatures we desire increase,',
 "That thereby beauty's rose might never die,",
 'But as the riper should by time decease,',
 'His tender heir might bear his memory:',
 'But thou, contracted to thine own bright eyes,',
 "Feed'st thy light's flame with self-substantial fuel,",
 'Making a famine where abundance lies,',
 'Thy self thy foe, to thy sweet self too cruel:',
 "Thou that art now the world's fresh ornament,",
 'And only herald to the gaudy spring,',
 'Within thine own bud buriest thy content,',
 "And tender churl mak'st waste in niggarding:",
 'Pity the world, or else this glutton be,',
 "To eat the world's due, by the grave and thee.",
 'When forty winters shall besiege thy brow,',
 "And dig deep trenches in thy beauty's field,",
 "Thy youth's proud livery so gazed on now,",
 "Will be a tatter'd weed of small worth held:",
 'Then being asked, where all thy beauty lies,',
 'Where all the treasure of thy lusty days;',
 'To say, within thine own deep sunken eyes,',


### Use custom data cleaning tool 

Use one of the methods in `data_cleaning_toolkit` to clean your data.

There is an example of this in the guided project.

In [12]:
# instantiate the data_cleaning_toolkit class - save result to `dctk`

# YOUR CODE HERE
dctk = data_cleaning_toolkit()

In [19]:
# use data_cleaning_toolkit to remove punctuation and to case normalize - save results to `clean_sonnets`

# YOUR CODE HERE
clean_sonnets = [dctk.clean_data(line) for line in filtered_sonnets]

In [21]:
# much better!
clean_sonnets

['from fairest creatures we desire increase ',
 'that thereby beauty s rose might never die ',
 'but as the riper should by time decease ',
 'his tender heir might bear his memory ',
 'but thoucontracted to thine own bright eyes ',
 'feed st thy light s flame with self-substantial fuel ',
 'making a famine where abundance lies ',
 'thy self thy foeto thy sweet self too cruel ',
 'thou that art now the world s fresh ornament ',
 'and only herald to the gaudy spring ',
 'within thine own bud buriest thy content ',
 'and tender churl mak st waste in niggarding ',
 'pity the worldor else this glutton be ',
 'to eat the world s dueby the grave and thee ',
 'when forty winters shall besiege thy brow ',
 'and dig deep trenches in thy beauty s field ',
 'thy youth s proud livery so gazed on now ',
 'will be a tatter d weed of small worth held ',
 'then being askedwhere all thy beauty lies ',
 'where all the treasure of thy lusty days ',
 'to saywithin thine own deep sunken eyes ',
 'were an al

### Use your data tool to create character sequences 

We'll need the `create_char_sequenes` method for this task. However this method requires a parameter call `maxlen` which is responsible for setting the maximum sequence length. 

So what would be a good sequence length, exactly? 

In order to answer that question, let's do some statistics! 

In [27]:
def calc_stats(corpus):
    """
    Calculates statisics on the length of every line in the sonnets
    """
    
    # write a list comprehension that calculates each sonnets line length - save the results to `doc_lens` 

    # use numpy to calcualte and return the mean, median, std, max, min of the doc lens - all in one line of code

    # YOUR CODE HERE
    doc_lens = [len(line) for line in clean_sonnets]
    
    return np.mean(doc_lens), np.median(doc_lens), np.std(doc_lens), np.max(doc_lens), np.min(doc_lens)

In [28]:
# sonnet line length statistics 
mean ,med, std, max_, min_ = calc_stats(clean_sonnets)
mean, med, std, max_, min_

(41.57753017641597, 42.0, 4.188100070534026, 58, 27)

In [29]:
# using the results of the sonnet line length statistics, use your judgement and select a value for maxlen
# use .create_char_sequences() to create sequences

# YOUR CODE HERE
maxlen = 42
dctk.create_char_sequences(clean_sonnets, maxlen)

Created 18334 sequences.


Take a look at the `data_cleaning_toolkit_class.py` file. 

In the first 4 lines of code in the `create_char_sequences` method, class attributes `n_features` and `unique_chars` are created. Let's call them in the cells below.

In [30]:
# number of input features for our LSTM model
dctk.n_features

28

In [31]:
# unique charactes that appear in our sonnets 
dctk.unique_chars

[' ',
 'i',
 'q',
 'h',
 'g',
 'l',
 'n',
 'z',
 'w',
 'o',
 'p',
 'd',
 'k',
 'x',
 'b',
 's',
 'y',
 'v',
 'j',
 '-',
 'c',
 'a',
 't',
 'm',
 'f',
 'u',
 'r',
 'e']

In [33]:
len(dctk.unique_chars)

28

## Time for Questions 

----
**Question 1:** 

Why are the `number of unique characters` (i.e. **dctk.unique_chars**) and the `number of model input features` (i.e. **dctk.n_features**) the same?

**Hint:** The model that we will shortly be building here is very similar to the text generation model that we built in the guided project.

**Answer 1:**

`number of unique characters` and the `number of model input features` are the same because the text generation model will treat every unique character as a possible feature/category to predict


**Question 2:**

Take a look at the print out of `dctk.unique_chars` one more time. Notice that there is a white space. 

Why is it desirable to have a white space as a possible character to predict?

**Answer 2:**

It is desirable to have a white space as a possible character to predict to allow the model to learn from the training data that there are sometimes spaces between characters (separating the words) and thus make it more likely that the model will be able to form human-readable and -sensible words from the predicted characters.

----

### Use our data tool to create X and Y splits

You'll need the `create_X_and_Y` method for this task. 

In [34]:
# TODO: provide a walk through of data_cleaning_toolkit with unit tests that check for understanding 
X, y = dctk.create_X_and_Y()

![](https://miro.medium.com/max/891/0*jGB1CGQ9HdeUwlgB)

In [35]:
# notice that our input matrix isn't actually a matrix - it's a rank 3 tensor
X.shape

(18334, 42, 28)

In $X$.shape we see three numbers (*n1*, *n2*, *n3*). What do these numbers mean?

Well, *n1* tells us the number of samples that we have. But what about the other two?

In [36]:
# first index returns a single sample, which we can see is a sequence 
first_sample_index = 0 
X[first_sample_index]

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False,  True, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False,  True],
       [ True, False, False, ..., False, False, False]])

Notice that each sequence (i.e. $X[i]$ where $i$ is some index value) is `maxlen` long and has `dctk.n_features` number of features. Let's try to better understand this shape. 

In [37]:
# each sequence is maxlen long and has dctk.n_features number of features
X[first_sample_index].shape

(42, 28)

**Each row corresponds to a character vector** and there are `maxlen` number of character vectors. 

**Each column corresponds to a unique character** and there are `dctk.n_features` number of features. 


In [38]:
# let's index for a single character vector 
first_char_vect_index = 0
X[first_sample_index][first_char_vect_index]

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False,  True, False, False,
       False])

Notice that there is a single `TRUE` value and all the rest of the values are `FALSE`. 

This is a one-hot encoding for which character appears at each index within a sequence. Specifically, the cell above is looking at the first character in the sequence.

Only a single character can appear as the first character in a sequence, so there will necessarily be a single `TRUE` value and the rest will be `FALSE`. 

Let's say that `TRUE` appears in the $ith$ index; by  $ith$ index we simply mean some index in the general case. How can we find out which character that actually corresponds to?

To answer this question, we need to use the character-to-integer look up dictionaries. 

In [39]:
# take a look at the index to character dictionary
# if a TRUE appears in the 0th index of a character vector,
# then we know that whatever char you see below next to the 0th key 
# is the character that that character vector is endcoding for
dctk.int_char

{0: ' ',
 1: 'i',
 2: 'q',
 3: 'h',
 4: 'g',
 5: 'l',
 6: 'n',
 7: 'z',
 8: 'w',
 9: 'o',
 10: 'p',
 11: 'd',
 12: 'k',
 13: 'x',
 14: 'b',
 15: 's',
 16: 'y',
 17: 'v',
 18: 'j',
 19: '-',
 20: 'c',
 21: 'a',
 22: 't',
 23: 'm',
 24: 'f',
 25: 'u',
 26: 'r',
 27: 'e'}

In [40]:
# let's look at an example to tie it all together

seq_len_counter = 0

# index for a single sample 
for seq_of_char_vects in X[first_sample_index]:
    
    # get index with max value, which will be the one TRUE value 
    index_with_TRUE_val = np.argmax(seq_of_char_vects)
    
    print (dctk.int_char[index_with_TRUE_val])
    
    seq_len_counter+=1
    
print ("Sequence length: {}".format(seq_len_counter))

f
r
o
m
 
f
a
i
r
e
s
t
 
c
r
e
a
t
u
r
e
s
 
w
e
 
d
e
s
i
r
e
 
i
n
c
r
e
a
s
e
 
Sequence length: 42


## Time for Questions 

----
**Question 1:** 

In your own words, how would you describe the numbers from the shape print out of `X.shape` to a fellow classmate?


**Answer 1:**

The first number in X.shape is the number of sequences created.
The second number in X.shape is the number of characters in each sequence (== maxlen == 42).
The third number in X.shape is the number of features/unique characters in each sequence (== 28).

### Build a Text Generation model

Now that we have prepped our data (and understood that process) let's finally build out our character generation model, similar to what we did in the guided project.

In [41]:
def sample(preds, temperature=1.0):
    """
    Helper function to sample an index from a probability array
    """
    # convert preds to array 
    preds = np.asarray(preds).astype('float64')
    # scale values 
    preds = np.log(preds) / temperature
    # exponentiate values
    exp_preds = np.exp(preds)
    # this equation should look familar to you (hint: it's an activation function)
    preds = exp_preds / np.sum(exp_preds)
    # Draw samples from a multinomial distribution
    probas = np.random.multinomial(1, preds, 1)
    # return the index that corresponds to the max probability 
    return np.argmax(probas)

def on_epoch_end(epoch, _):
    """"
    Function invoked at end of each epoch. Prints the text generated by our model.
    """
    
    print()
    print('----- Generating text after Epoch: %d' % epoch)
    

    # randomly pick a starting index 
    # will be used to take a random sequence of chars from `text`
    start_index = random.randint(0, len(text) - dctk.maxlen - 1)
    
    # this is our seed string (i.e. input sequence into the model)
    generated = ''

    # start the sentence at index `start_index` and include the next` dctk.maxlen` number of chars
    sentence = text[start_index: start_index + dctk.maxlen]

    # add to generated
    generated += sentence

    
    print('----- Generating with seed: "' + sentence + '"')
    sys.stdout.write(generated)
    
    # use model to predict what the next 40 chars should be that follow the seed string
    for i in range(40):

        # shape of a single sample in a rank 3 tensor 
        x_dims = (1, dctk.maxlen, dctk.n_features)
        # create an array of zeros with shape x_dims
        # recall that python considers zeros and boolean FALSE as the same
        x_pred = np.zeros(x_dims)

        # create a seq vector for our randomly select sequence 
        # i.e. create a numerical encoding for each char in the sequence 
        for t, char in enumerate(sentence):
            # for sample 0 in seq index t and character `char` encode a 1 (which is the same as a TRUE)
            x_pred[0, t, dctk.char_int[char]] = 1

        # next, take the seq vector and pass into model to get a prediction of what the next char should be 
        preds = model.predict(x_pred, verbose=0)[0]
        # use the sample helper function to get index for next char 
        next_index = sample(preds)
        # use look up dict to get next char 
        next_char = dctk.int_char[next_index]

        # append next char to sequence 
        sentence = sentence[1:] + next_char 
        
        sys.stdout.write(next_char)
        sys.stdout.flush()
    print()

In [42]:
# need this for on_epoch_end()
text = " ".join(clean_sonnets)

In [43]:
# create callback object that will print out text generation at the end of each epoch 
# use for real-time monitoring of model performance
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

----
### Train Model

Build a text generation model using LSTMs. Feel free to reference the model used in the guided project. 

It is recommended that you train this model to at least 50 epochs (but more if you're computer can handle it). 

You are free to change up the architecture as you wish. 

Just in case you have difficultly training a model, there is a pre-trained model saved to a file called `trained_text_gen_model.h5` that you can load in (the same way that you learned how to load in Keras models in Sprint 2 Module 4). 

In [51]:
# build text generation model layer by layer 
# fit model

# YOUR CODE HERE
model = Sequential()

# first LSTM layer
model.add(LSTM(256, 
               input_shape= (dctk.maxlen, dctk.n_features), 
               activation = 'tanh',
               return_sequences=True))

# second LSTM layer
model.add(LSTM(128, activation = 'tanh'))

# Output layer
model.add(Dense(dctk.n_features, activation = 'softmax'))

# Compile
model.compile(loss = 'categorical_crossentropy', 
              optimizer = 'adam')

model.fit(X, y, 
          batch_size = 256, 
          epochs = 50, 
          workers = 8, 
          callbacks = [print_callback])

Epoch 1/50

----- Generating text after Epoch: 0
----- Generating with seed: "frown they in their glory die  the painful"
frown they in their glory die  the painful optyohhe tna vu i a  lirul dtydne  ab o
Epoch 2/50

----- Generating text after Epoch: 1
----- Generating with seed: "ow can love s eye be true  that is so vexe"
ow can love s eye be true  that is so vexeiyoiestusecf-calotgovoith ssiesueet egbr
Epoch 3/50

----- Generating text after Epoch: 2
----- Generating with seed: "  farewellthou art too dear for my possess"
  farewellthou art too dear for my possess bot nrgee megrntmyve thud eysfets  walo
Epoch 4/50

----- Generating text after Epoch: 3
----- Generating with seed: " for my lovenot for their rhyme  exceeded "
 for my lovenot for their rhyme  exceeded  urve hoqreaut  os roth eto ee mron af o
Epoch 5/50

----- Generating text after Epoch: 4
----- Generating with seed: " twoslight airand purging fire are both wi"
 twoslight airand purging fire are both wiulwbehaek pire a

trespass with compare  myself corruptingsall tle to spuch -neing tain the eres of 
Epoch 31/50

----- Generating text after Epoch: 30
----- Generating with seed: "ll denote love s eye is not so true as all"
ll denote love s eye is not so true as all my love by apey  when i werstoulfong sp
Epoch 32/50

----- Generating text after Epoch: 31
----- Generating with seed: "ad  past reason hatedas a swallow d bait  "
ad  past reason hatedas a swallow d bait  love om meseepase whesswith theing on th
Epoch 33/50

----- Generating text after Epoch: 32
----- Generating with seed: "ve err d  and to this false plague are the"
ve err d  and to this false plague are the urof stall  and the are out soul blevea
Epoch 34/50

----- Generating text after Epoch: 33
----- Generating with seed: "ack againand straight grow sad  mine eye a"
ack againand straight grow sad  mine eye angeaistet st canld my beauty ferer creec
Epoch 35/50

----- Generating text after Epoch: 34
----- Generating with seed: "nowledge 

<keras.callbacks.History at 0x192e4bd94f0>

### Model Appears to be overfitting based on at least last 10 epochs

In [52]:
# save trained model to file 
model.save("trained_text_gen_model.h5")

### Let's play with our trained model 

Now that we have a trained model that, though far from perfect, is able to generate actual English words, we can take a look at the predictions to continue to learn more about how a text generation model works. 

We can also take this as an opportunity to unpack the `def on_epoch_end` function to better understand how it works. 

In [53]:
# this is our joined clean sonnet data
text



In [56]:
# randomly pick a starting index 
# will be used to take a random sequence of chars from `text`
# run this cell a few times and you'll see `start_index` is random
start_index = random.randint(0, len(text) - dctk.maxlen - 1)
start_index

8599

In [57]:
# next use the randomly selected starting index to sample a sequence from the `text`

# this is our seed string (i.e. input sequence into the model)
generated = ''

# start the sentence at index `start_index` and include the next` dctk.maxlen` number of chars
sentence = text[start_index: start_index + dctk.maxlen]

# add to generated
generated += sentence

generated

'ecret influence comment  when i perceive t'

In [58]:
# this block of code let's us know what the seed string is 
# i.e. the input sequence into the model
print('----- Generating with seed: "' + sentence + '"')
sys.stdout.write(generated)

----- Generating with seed: "ecret influence comment  when i perceive t"
ecret influence comment  when i perceive t

42

In [59]:
# use model to predict what the next 40 chars should be that follow the seed string
for i in range(40):

    # shape of a single sample in a rank 3 tensor 
    x_dims = (1, dctk.maxlen, dctk.n_features)
    # create an array of zeros with shape x_dims
    # recall that python considers zeros and boolean FALSE as the same
    x_pred = np.zeros(x_dims)

    # create a seq vector for our randomly select sequence 
    # i.e. create a numerical encoding for each char in the sequence 
    for t, char in enumerate(sentence):
        # for sample 0 in seq index t and character `char` encode a 1 (which is the same as a TRUE)
        x_pred[0, t, dctk.char_int[char]] = 1

    # next, take the seq vector and pass into model to get a prediction of what the next char should be 
    preds = model.predict(x_pred, verbose=0)[0]
    # use the sample helper function to get index for next char 
    next_index = sample(preds)
    # use look up dict to get next char 
    next_char = dctk.int_char[next_index]

    # append next char to sequence 
    sentence = sentence[1:] + next_char 

In [60]:
# this is the seed string
generated

'ecret influence comment  when i perceive t'

In [61]:
# these are the 40 chars that the model thinks should come after the seed stirng
sentence

' thy wolld to bairty thound warghssandgr d'

In [62]:
# now put it all together
generated + sentence

'ecret influence comment  when i perceive t thy wolld to bairty thound warghssandgr d'

# Resources and Stretch Goals

## Stretch goals:
- Refine the training and generation of text to be able to ask for different genres/styles of Shakespearean text (e.g. plays versus sonnets)
- Train a classification model that takes text and returns which work of Shakespeare it is most likely to be from
- Make it more performant! Many possible routes here - lean on Keras, optimize the code, and/or use more resources (AWS, etc.)
- Revisit the news example from class, and improve it - use categories or tags to refine the model/generation, or train a news classifier
- Run on bigger, better data

## Resources:
- [The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/) - a seminal writeup demonstrating a simple but effective character-level NLP RNN
- [Simple NumPy implementation of RNN](https://github.com/JY-Yoon/RNN-Implementation-using-NumPy/blob/master/RNN%20Implementation%20using%20NumPy.ipynb) - Python 3 version of the code from "Unreasonable Effectiveness"
- [TensorFlow RNN Tutorial](https://github.com/tensorflow/models/tree/master/tutorials/rnn) - code for training a RNN on the Penn Tree Bank language dataset
- [4 part tutorial on RNN](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/) - relates RNN to the vanishing gradient problem, and provides example implementation
- [RNN training tips and tricks](https://github.com/karpathy/char-rnn#tips-and-tricks) - some rules of thumb for parameterizing and training your RNN