# Generating Shakespeare

## Setup

We're going to download the collected plays of Shakespeare to use as our data.

Source: http://www.gutenberg.org/cache/epub/100/pg100.txt

The original source was preprocessed to remove sonnets and non-Shakesperean text added by Project Gutenberg.

In [1]:
import numpy as np

In [2]:
import os

BASE_DIR = os.getcwd()
DATA_DIR = BASE_DIR + '/data/shakespeare/'

In [3]:
model_path = DATA_DIR + 'models/'
if not os.path.exists(model_path): os.mkdir(model_path)

In [4]:
data = DATA_DIR + 'gutenberg_shakespeare_modified.txt' # preprocessed

with open(data, 'r') as f:
    text = f.read()
print('corpus length:', len(text))

('corpus length:', 5291227)


In [5]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars:', vocab_size)

('total chars:', 88)


Sometimes it's useful to have a zero value in the dataset, e.g. for padding

In [6]:
chars.insert(0, "\0")

In [7]:
''.join(chars)

'\x00\n\r !"&\'(),-.0123456789:;<?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_`abcdefghijklmnopqrstuvwxyz|}\xbb\xbf\xef'

Map chars to indices and vice versa

In [8]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

In [9]:
print(char_indices)

{'\x00': 0, ' ': 3, '(': 8, ',': 10, '0': 13, '4': 17, '8': 21, '\xbb': 85, '<': 25, '\xbf': 86, 'D': 30, 'H': 34, 'L': 38, 'P': 42, 'T': 46, 'X': 50, '`': 56, 'd': 60, 'h': 64, 'l': 68, '\xef': 87, 'p': 72, 't': 76, 'x': 80, '|': 83, "'": 7, '3': 16, '7': 20, ';': 24, '?': 26, 'C': 29, 'G': 33, 'K': 37, 'O': 41, 'S': 45, 'W': 49, '[': 53, '_': 55, 'c': 59, 'g': 63, 'k': 67, 'o': 71, 's': 75, 'w': 79, '\n': 1, '"': 5, '&': 6, '.': 12, '2': 15, '6': 19, ':': 23, 'B': 28, 'F': 32, 'J': 36, 'N': 40, 'R': 44, 'V': 48, 'Z': 52, 'b': 58, 'f': 62, 'j': 66, 'n': 70, 'r': 74, 'v': 78, 'z': 82, '\r': 2, '!': 4, ')': 9, '-': 11, '1': 14, '5': 18, '9': 22, 'A': 27, 'E': 31, 'I': 35, 'M': 39, 'Q': 43, 'U': 47, 'Y': 51, ']': 54, 'a': 57, 'e': 61, 'i': 65, 'm': 69, 'q': 73, 'u': 77, 'y': 81, '}': 84}


*idx* converts the Shakepearean text to character indices (based on the *char_indices* mapping above)

In [10]:
idx = [char_indices[c] for c in text]

In [11]:
print(idx[:70])

[87, 85, 86, 45, 29, 31, 40, 31, 23, 2, 1, 44, 71, 77, 75, 65, 68, 68, 71, 70, 24, 3, 42, 57, 74, 65, 75, 24, 3, 32, 68, 71, 74, 61, 70, 59, 61, 24, 3, 39, 57, 74, 75, 61, 65, 68, 68, 61, 75, 2, 1, 2, 1, 2, 1, 27, 29, 46, 3, 35, 12, 3, 45, 29, 31, 40, 31, 3, 14, 12]


In [12]:
''.join(indices_char[i] for i in idx[:70])

'\xef\xbb\xbfSCENE:\r\nRousillon; Paris; Florence; Marseilles\r\n\r\n\r\nACT I. SCENE 1.'

## 3 char model

### GLOBALS needed from this point on

In [13]:
from keras.layers import Input, Embedding, LSTM, merge, SimpleRNN, TimeDistributed
from keras.layers.core import Dense, Dropout, Flatten
from keras.models import Model, Sequential
from keras.optimizers import Adam
from keras.layers.normalization import BatchNormalization

Using Theano backend.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


In [14]:
n_fac = 42 # number of latent factors (size of embedding matrix)
n_hidden = 256 # hyperparameter: size of hidden state

### Create inputs

Create a list of every 4th character, starting at the 0th, 1st, 2nd, then 3rd characters

In [15]:
nc = 3 # num chars
c1_dat = [idx[i] for i in xrange(0, len(idx)-1-nc, nc)]
c2_dat = [idx[i+1] for i in xrange(0, len(idx)-1-nc, nc)]
c3_dat = [idx[i+2] for i in xrange(0, len(idx)-1-nc, nc)]
c4_dat = [idx[i+3] for i in xrange(0, len(idx)-1-nc, nc)]

In [16]:
0, len(idx)-1-nc, nc

(0, 5291223, 3)

In [17]:
len(c1_dat), len(c4_dat)

(1763741, 1763741)

Out inputs

In [18]:
x1 = np.stack(c1_dat)
x2 = np.stack(c2_dat)
x3 = np.stack(c3_dat)

Out output

In [19]:
y = np.stack(c4_dat)

In [20]:
x1.shape, y.shape

((1763741,), (1763741,))

Create inputs and embedding outputs for each of our 3 character inputs

In [21]:
def embedding_input(name, n_in, n_out):
    inp = Input(shape=(1,), dtype='int64', name=name+'_in')
    emb = Embedding(n_in, n_out, input_length=1, name=name+'_emb')(inp)
    return inp, Flatten()(emb)

In [22]:
c1_in, c1_emb = embedding_input('c1', vocab_size, n_fac)
c2_in, c2_emb = embedding_input('c2', vocab_size, n_fac)
c3_in, c3_emb = embedding_input('c3', vocab_size, n_fac)

### Create and train model

![3char](./3char.png)

`dense_in` is the 'green arrow' in the diagram - the layer operation from input to hidden

In [23]:
dense_in = Dense(n_hidden, activation='relu')

Our first hidden activation is simply this function applied to the result of the embedding of the first character.

In [24]:
c1_hidden = dense_in(c1_emb)

`dense_hidden` is the 'orange arrow' from our diagram - the layer operation from hidden to hidden

_Note:_ unsure why the activation for this is `tanh`

In [25]:
dense_hidden = Dense(n_hidden, activation='tanh')

Our second and third activations sum up the previous hidden state (after applying `dense_hidden`) to the new input state.

In [26]:
# merge([new input state, orange arrow from previous hidden state])
c2_hidden = merge([dense_in(c2_emb), dense_hidden(c1_hidden)])
c3_hidden = merge([dense_in(c3_emb), dense_hidden(c2_hidden)])

`dense_out` is the 'blue arrow' from our diagram - the layer operation from hidden to output

In [27]:
dense_out = Dense(vocab_size, activation='softmax')

The third hidden state is the input to our output layer

In [28]:
c4_out = dense_out(c3_hidden)

In [29]:
model = Model([c1_in, c2_in, c3_in], c4_out)
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
model.optimizer.lr=0.000001

In [30]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
c1_in (InputLayer)               (None, 1)             0                                            
____________________________________________________________________________________________________
c2_in (InputLayer)               (None, 1)             0                                            
____________________________________________________________________________________________________
c1_emb (Embedding)               (None, 1, 42)         3696        c1_in[0][0]                      
____________________________________________________________________________________________________
c2_emb (Embedding)               (None, 1, 42)         3696        c2_in[0][0]                      
___________________________________________________________________________________________

In [31]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f4a1b992990>

In [32]:
model.optimizer.lr=0.01

In [33]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f4a1a351e50>

In [34]:
model.optimizer.lr=0.000001

In [35]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f4a1b838550>

In [36]:
model.optimizer.lr=0.01

In [37]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f4a1eaa5550>

In [38]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f4a1b838750>

Let's save the model.

In [42]:
save1_path = model_path + 'save1.h5'
if not os.path.exists(save1_path):
    model.save_weights(save1_path)
model.load_weights(save1_path)

### Test Model

"`newaxis` is used to increase the dimension of the existing array by one more dimension, when used once" - [source](https://stackoverflow.com/questions/29241056/the-use-of-numpy-newaxis)

In [43]:
def get_next(m, inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [np.array(i)[np.newaxis] for i in idxs]
    p = m.predict(arrs)
    i = np.argmax(p)
    return chars[i]

In [44]:
get_next(model, 'phi')

's'

In [45]:
get_next(model, ' th')

'e'

In [46]:
get_next(model, ' an')

'd'

## Our first RNN!

### GLOBALS needed from this point on

In [15]:
nc = 8 # numChars == size of our unrolled RNN

`xs` (+ `c_in_dat`), `y` (+ `c_out_dat`), `cs` (+ `embedding_input()`)

### Create inputs

Now let's try predicting char 9 using chars 1-8.

For each of 0 through 7, create a list of every 8th character with that starting point. These will be the 8 inputs to our model.

In [16]:
c_in_dat = [[idx[i+n] for i in xrange(0, len(idx)-1-nc, nc)]
           for n in range(nc)]

Then create a list of the next character in each of these series. This will be the labels for our model.

In [17]:
c_out_dat = [idx[i+nc] for i in xrange(0, len(idx)-1-nc, nc)]

In [18]:
xs = [np.stack(c) for c in c_in_dat]

In [51]:
len(xs), xs[0].shape

(8, (661403,))

In [19]:
y = np.stack(c_out_dat)

So each column below is one series of 8 characters from the text:

In [53]:
[xs[n][:nc] for n in range(nc)]

[array([87, 23, 68, 74, 74, 57, 75, 29]),
 array([85,  2, 68, 65, 61, 74,  2, 46]),
 array([86,  1, 71, 75, 70, 75,  1,  3]),
 array([45, 44, 70, 24, 59, 61,  2, 35]),
 array([29, 71, 24,  3, 61, 65,  1, 12]),
 array([31, 77,  3, 32, 24, 68,  2,  3]),
 array([40, 75, 42, 68,  3, 68,  1, 45]),
 array([31, 65, 57, 71, 39, 61, 27, 29])]

...and this is the next character after each sequence:

In [54]:
y[:nc]

array([23, 68, 74, 74, 57, 75, 29, 31])

### Create and train model

In [20]:
def embedding_input(name, n_in, n_out):
    inp = Input(shape=(1,), dtype='int64', name=name+'_in')
    emb = Embedding(n_in, n_out, input_length=1, name=name+'_emb')(inp)
    return inp, Flatten()(emb)

In [21]:
cs = [embedding_input('c'+str(n), vocab_size, n_fac) for n in range(nc)]

"I'd suggest trying the trick I mentioned in the lesson for simple RNNs: using an identity matrix to initialize your hidden state, and use relu instead of tanh." - [Jeremy on forums](http://forums.fast.ai/t/purpose-of-rnns-and-theano/242/5)

In [57]:
dense_in = Dense(n_hidden, activation='relu')
dense_hidden = Dense(n_hidden, activation='relu', init='identity')
dense_out = Dense(vocab_size, activation='softmax')

The embedding of the first character of each sequence goes through `dense_in` to create our first hidden activations.

In [58]:
hidden = dense_in(cs[0][1])

Then for each successive layer, we combine the output of `dense_in` on the next character with the output of `dense_hidden` on the current hidden state to create the new hidden state.

In [59]:
for i in range(1, nc):
    dense = dense_in(cs[i][1])
    hidden = dense_hidden(hidden)
    hidden = merge([dense, hidden])

Putting the final hidden state through `dense_out` gives us our output.

In [60]:
out = dense_out(hidden)

Now we can create our model.

In [61]:
model = Model([c[0] for c in cs], out)
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
c0_in (InputLayer)               (None, 1)             0                                            
____________________________________________________________________________________________________
c1_in (InputLayer)               (None, 1)             0                                            
____________________________________________________________________________________________________
c0_emb (Embedding)               (None, 1, 42)         3696        c0_in[0][0]                      
____________________________________________________________________________________________________
c1_emb (Embedding)               (None, 1, 42)         3696        c1_in[0][0]                      
___________________________________________________________________________________________

In [62]:
model.fit(xs, y, batch_size=64, nb_epoch=12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f4a036c9b10>

### Test Model

In [63]:
def get_next(m, inp):
    arrs = [np.array(char_indices[c])[np.newaxis] for c in inp]
    p = m.predict(arrs)
    return chars[np.argmax(p)]

In [64]:
get_next(model, 'for thos')

' '

In [65]:
get_next(model, 'part of ')

't'

In [66]:
get_next(model, 'queens a')

'n'

Here's a helper function for generating `k` additional words (separated by whitespace) in a starter sequence

In [67]:
def get_seq(m, inp, k):
    k_count = 0
    seq = inp
    while k_count < k+1:
        pc = get_next(m, inp)
        seq += pc
        inp = inp[1:] + pc
        if (pc == ' '):
            k_count += 1
    return seq

In [68]:
get_seq(model, 'queens a', 10)

'queens and the son the roper the roper the roper the roper '

In [69]:
get_seq(model, 'part of ', 10)

'part of the roper the roper the roper the roper the roper the '

In [70]:
get_seq(model, 'for thos', 10)

'for thos  a dount of the roper the roper the roper '

Model currently seems to 'fixate' phrases like: "the some sore" or "the roper"

In [71]:
model.fit(xs, y, batch_size=64, nb_epoch=12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f4a036c9fd0>

In [72]:
get_seq(model, 'queens a', 10)

'queens and the sor of the sor of the sor of the '

In [73]:
get_seq(model, 'part of ', 10)

'part of the sor of the sor of the sor of the sor '

In [74]:
get_seq(model, 'for thos', 10)

'for thos  h and the sor of the sor of the '

In [75]:
model.fit(xs, y, batch_size=64, nb_epoch=12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f4a036c9e10>

In [76]:
get_seq(model, 'queens a', 10)

'queens and the hand the hand the hand the hand the hand '

In [77]:
get_seq(model, 'part of ', 10)

'part of the gonder the sore the gonder the sore the gonder the '

In [78]:
get_seq(model, 'for thos', 10)

'for thos of the gonder the sore the gonder the sore the '

Different 'fixation' phrases like: "the best with", "the gonder the sore"

In [79]:
save2_path = model_path + 'save2.h5'
if not os.path.exists(save2_path):
    model.save_weights(save2_path)
model.load_weights(save2_path)

## Our first RNN with keras!

This is nearly equivalent to the RNN we built ourselves in the previous section.

In [80]:
model = Sequential([
        Embedding(vocab_size, n_fac, input_length=nc),
        SimpleRNN(n_hidden, activation='relu', inner_init='identity'),
        Dense(vocab_size, activation='softmax')
    ])
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
embedding_1 (Embedding)          (None, 8, 42)         3696        embedding_input_1[0][0]          
____________________________________________________________________________________________________
simplernn_1 (SimpleRNN)          (None, 256)           76544       embedding_1[0][0]                
____________________________________________________________________________________________________
dense_7 (Dense)                  (None, 88)            22616       simplernn_1[0][0]                
Total params: 102856
____________________________________________________________________________________________________


To avoid `IndexError: axis 1 out of bounds [0, 1)`: http://forums.fast.ai/t/lesson-6-discussion/245/70

In [86]:
#model.fit(np.concatenate([x[np.newaxis] for x in xs]).T, y, batch_size=64, nb_epoch=12)
model.fit(np.concatenate(xs, axis=1), y, batch_size=64, nb_epoch=12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f49f2b4f090>

In [87]:
def get_next_keras(m, inp):
    idxs = [char_indices[c] for c in inp]
    arrs = np.array(idxs)[np.newaxis,:]
    p = m.predict(arrs)[0]
    return chars[np.argmax(p)]

In [88]:
def get_keras_seq(m, inp, k):
    k_count = 0
    seq = inp
    while k_count < k+1:
        pc = get_next_keras(m, inp)
        seq += pc
        inp = inp[1:] + pc
        if (pc == ' '):
            k_count += 1
    return seq

In [89]:
get_keras_seq(model, 'queens a', 10)

'queens and the shall be the shall be the shall be the '

In [90]:
get_keras_seq(model, 'part of ', 10)

'part of the shall be the shall be the shall be the shall '

In [91]:
get_keras_seq(model, 'for thos', 10)

'for those the shall be the shall be the shall be the '

_Fixations_: "the sent", "the shall be"

In [92]:
#model.fit(np.concatenate([x[np.newaxis] for x in xs]).T, y, batch_size=64, nb_epoch=12)
model.fit(np.concatenate(xs, axis=1), y, batch_size=64, nb_epoch=12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f49f1b72110>

In [93]:
get_keras_seq(model, 'queens a', 10)

'queens and the strength the strength the strength the strength the strength '

In [94]:
get_keras_seq(model, 'part of ', 10)

'part of the strength the strength the strength the strength the strength the '

In [95]:
get_keras_seq(model, 'for thos', 10)

'for those thou shalt the strength the strength the strength the strength '

_Fixations_: "the serve me", "the strength"

In [96]:
save3_path = model_path + 'save3.h5'
if not os.path.exists(save3_path):
    model.save_weights(save3_path)
model.load_weights(save3_path)

## Returning sequences

### GLOBALS needed from this point on

`ys` (+ `c_out_dat`)

### Create inputs

To use a sequence model, we can leave our input unchanged - but we have to change our output to a sequence.

Here, `c_out_dat` is identical to `c_in_dat`, but moved across 1 character.

In [22]:
c_out_dat = [[idx[i+n] for i in xrange(1, len(idx)-nc, nc)]
            for n in range(nc)]

In [23]:
ys = [np.stack(c) for c in c_out_dat]

Reading down each column shows one set of inputs and outputs

In [99]:
[xs[n][:nc] for n in range(nc)]

[array([[87],
        [23],
        [68],
        [74],
        [74],
        [57],
        [75],
        [29]]), array([[85],
        [ 2],
        [68],
        [65],
        [61],
        [74],
        [ 2],
        [46]]), array([[86],
        [ 1],
        [71],
        [75],
        [70],
        [75],
        [ 1],
        [ 3]]), array([[45],
        [44],
        [70],
        [24],
        [59],
        [61],
        [ 2],
        [35]]), array([[29],
        [71],
        [24],
        [ 3],
        [61],
        [65],
        [ 1],
        [12]]), array([[31],
        [77],
        [ 3],
        [32],
        [24],
        [68],
        [ 2],
        [ 3]]), array([[40],
        [75],
        [42],
        [68],
        [ 3],
        [68],
        [ 1],
        [45]]), array([[31],
        [65],
        [57],
        [71],
        [39],
        [61],
        [27],
        [29]])]

In [100]:
[ys[n][:nc] for n in range(nc)]

[array([85,  2, 68, 65, 61, 74,  2, 46]),
 array([86,  1, 71, 75, 70, 75,  1,  3]),
 array([45, 44, 70, 24, 59, 61,  2, 35]),
 array([29, 71, 24,  3, 61, 65,  1, 12]),
 array([31, 77,  3, 32, 24, 68,  2,  3]),
 array([40, 75, 42, 68,  3, 68,  1, 45]),
 array([31, 65, 57, 71, 39, 61, 27, 29]),
 array([23, 68, 74, 74, 57, 75, 29, 31])]

### Create and train model

In [101]:
dense_in = Dense(n_hidden, activation='relu')
dense_hidden = Dense(n_hidden, activation='relu', init='identity')
dense_out = Dense(vocab_size, activation='softmax', name='output')

We're going to pass a vectcor of all zeros as our starting point - here's our input layers for that:

In [102]:
inp1 = Input(shape=(n_fac,), name='zeros')
hidden = dense_in(inp1)

In [103]:
outs = []

for i in range(nc):
    dense = dense_in(cs[i][1])
    hidden = dense_hidden(hidden)
    hidden = merge([dense, hidden], mode='sum')
    # every layer now has an output
    outs.append(dense_out(hidden))

In [104]:
model = Model([inp1] + [c[0] for c in cs], outs)
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
c0_in (InputLayer)               (None, 1)             0                                            
____________________________________________________________________________________________________
c0_emb (Embedding)               (None, 1, 42)         3696        c0_in[0][0]                      
____________________________________________________________________________________________________
zeros (InputLayer)               (None, 42)            0                                            
____________________________________________________________________________________________________
dense_8 (Dense)                  (None, 256)           11008       zeros[0][0]                      
                                                                   flatten_4[0][0]         

In [105]:
zeros = np.tile(np.zeros(n_fac), (len(xs[0]), 1))
zeros.shape

(661403, 42)

In [106]:
model.fit([zeros]+xs, ys, batch_size=64, nb_epoch=12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f49f6f81f50>

### Test model

In [107]:
def get_nexts(m, inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [np.array(i)[np.newaxis] for i in idxs]
    p = model.predict([np.zeros(n_fac)[np.newaxis,:]] + arrs)
    print(list(inp))
    return [chars[np.argmax(o)] for o in p]

In [108]:
get_nexts(model, ' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


[' ', 'h', 'e', 't', ' ', 'm', 's', ' ']

In [109]:
get_nexts(model, ' part of')

[' ', 'p', 'a', 'r', 't', ' ', 'o', 'f']


[' ', 'o', 'r', 'e', 'o', 'o', 'f', ' ']

In [110]:
get_nexts(model, 'queens a')

['q', 'u', 'e', 'e', 'n', 's', ' ', 'a']


['u', 'i', 'e', 'n', ' ', ' ', 't', 'n']

### GLOBALS needed from this point on

In [24]:
xs[0].shape

(661403,)

In [25]:
x_rnn = np.stack(np.squeeze(xs), axis=1)
y_rnn = np.atleast_3d(np.stack(ys, axis=1))

In [26]:
x_rnn.shape, y_rnn.shape

((661403, 8), (661403, 8, 1))

### Sequence model with keras

To convert our previous keras model into a sequence model, simply add the `return_sequences=True` parameter, and add `TimeDistributed` around our dense layer.

In [114]:
model = Sequential([
        Embedding(vocab_size, n_fac, input_length=nc),
        SimpleRNN(n_hidden, return_sequences=True, activation='relu', inner_init='identity'),
        TimeDistributed(Dense(vocab_size, activation='softmax'))
    ])
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
embedding_2 (Embedding)          (None, 8, 42)         3696        embedding_input_2[0][0]          
____________________________________________________________________________________________________
simplernn_2 (SimpleRNN)          (None, 8, 256)        76544       embedding_2[0][0]                
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribute(None, 8, 88)         22616       simplernn_2[0][0]                
Total params: 102856
____________________________________________________________________________________________________


In [115]:
model.fit(x_rnn, y_rnn, batch_size=64, nb_epoch=8)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x7f49ce7a1190>

In [116]:
model.fit(x_rnn, y_rnn, batch_size=64, nb_epoch=4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x7f49e5eec450>

In [117]:
def get_nexts_keras(m, inp):
    idxs = [char_indices[c] for c in inp]
    arrs = np.array(idxs)[np.newaxis,:]
    p = m.predict(arrs)[0]
    print(list(inp))
    return [chars[np.argmax(o)] for o in p]

In [118]:
get_nexts_keras(model, ' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


[' ', 'h', 'e', 's', ' ', 's', 's', ' ']

In [119]:
get_nexts_keras(model, ' part of')

[' ', 'p', 'a', 'r', 't', ' ', 'o', 'f']


[' ', 'r', 'r', 't', ' ', 'o', 'f', ' ']

In [120]:
get_nexts_keras(model, 'queens a')

['q', 'u', 'e', 'e', 'n', 's', ' ', 'a']


['u', 'i', 'e', 'n', ',', ' ', 'o', 'n']

In [121]:
save4_path = model_path + 'save4.h5'
if not os.path.exists(save4_path):
    model.save_weights(save4_path)
model.load_weights(save4_path)

## Stateful model with keras

In [27]:
bs = 64
nc = 40

In [28]:
c_in_dat = [[idx[i+n] for i in xrange(0, len(idx)-1-nc, nc)]
           for n in range(nc)]
c_out_dat = [[idx[i+n] for i in xrange(1, len(idx)-nc, nc)]
            for n in range(nc)]

In [29]:
xs = [np.stack(c) for c in c_in_dat]
xs = np.concatenate([[np.array(o)] for o in xs])

In [30]:
ys = [np.stack(c) for c in c_out_dat]
ys = np.concatenate([[np.array(o)] for o in ys])

In [31]:
xs.shape, ys.shape

((40, 132280), (40, 132280))

In [32]:
x_rnn = np.stack(np.squeeze(xs), axis=1)
y_rnn = np.atleast_3d(np.stack(ys, axis=1))

In [33]:
x_rnn.shape, y_rnn.shape

((132280, 40), (132280, 40, 1))

In [42]:
def make_model(batch_size_override=None):
    if batch_size_override is None:
        batch_size_override = bs
    model = Sequential([
        Embedding(vocab_size, n_fac, input_length=nc, batch_input_shape=(batch_size_override,nc)),
        BatchNormalization(),
        LSTM(n_hidden, input_dim=n_fac, return_sequences=True, stateful=True, dropout_U=0.2, dropout_W=0.2,
             consume_less='gpu'),
        LSTM(n_hidden, input_dim=n_fac, return_sequences=True, stateful=True, dropout_U=0.2, dropout_W=0.2,
             consume_less='gpu'),
        TimeDistributed(Dense(n_hidden, activation='relu')),
        Dropout(0.2),
        TimeDistributed(Dense(vocab_size, activation='softmax'))
    ])
    model.compile(loss="sparse_categorical_crossentropy", optimizer=Adam())
    return model

In [35]:
def print_example(m, seed, gen_length=320):
    pred_m = make_model(batch_size_override=1) # batch_size_override is the important bit
    for layer, pred_layer in zip(m.layers, pred_m.layers):
        pred_layer.set_weights(layer.get_weights())
    
    output = seed
    for i in range(gen_length):
        x = np.array([char_indices[c] for c in output[-nc:]])[np.newaxis,:]
        preds = pred_m.predict(x, verbose=0, batch_size=1)[0][-1]
        preds = preds / np.sum(preds)
        output += np.random.choice(chars, p=preds)
    print(output)

In [36]:
def run_epochs(m, num_epochs=12, seed='In the cathedral church of Westminster, '):
    for i in range(num_epochs):
        m.reset_states()
        m.fit(x_rnn[:mx], y_rnn[:mx], batch_size=bs, nb_epoch=1, shuffle=False)
        print_example(m, seed)
        print()

In [43]:
model = Sequential([
        Embedding(vocab_size, n_fac, input_length=nc, batch_input_shape=(bs,nc)),
        BatchNormalization(),
        LSTM(n_hidden, input_dim=n_fac, return_sequences=True, stateful=True, dropout_U=0.2, dropout_W=0.2,
             consume_less='gpu'),
        LSTM(n_hidden, input_dim=n_fac, return_sequences=True, stateful=True, dropout_U=0.2, dropout_W=0.2,
             consume_less='gpu'),
        TimeDistributed(Dense(n_hidden, activation='relu')),
        Dropout(0.2),
        TimeDistributed(Dense(vocab_size, activation='softmax'))
    ])

In [44]:
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam())
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
embedding_4 (Embedding)          (64, 40, 42)          3696        embedding_input_2[0][0]          
____________________________________________________________________________________________________
batchnormalization_4 (BatchNormal(64, 40, 42)          84          embedding_4[0][0]                
____________________________________________________________________________________________________
lstm_7 (LSTM)                    (64, 40, 256)         306176      batchnormalization_4[0][0]       
____________________________________________________________________________________________________
lstm_8 (LSTM)                    (64, 40, 256)         525312      lstm_7[0][0]                     
___________________________________________________________________________________________

Since we're using a fixed batch shape, we have to ensure our inputs and outputs are an even multiple of the batch size.

In [45]:
mx = len(x_rnn)//bs*bs

In [46]:
run_epochs(model, 25)

Epoch 1/1
In the cathedral church of Westminster, to
    She will then, what hath of this
    good your back not;
    Pelive so good nigh deed diest her is basit sh                          [Manion, parthing of with be partena sacit'd thou'll do you cordents resorn this,
    And upon chang'd- In sicns, as bid your blood! Call hands her gorthouss!
  PLORY. or'igne
()
Epoch 1/1
In the cathedral church of Westminster, he that they may be r'd-didd
    With their head all an every; sweet,
    Me; like to have this, and dokethang the revolt uncompanyly.
  AUTOLYCUS. Your lord who done. Hath see their passices will be harm on my devil-would,
    The stomy charge againly's painstractions-
  MARTIZEL. My lord, sweet time is't the 'on
()
Epoch 1/1
In the cathedral church of Westminster, we'll cannot a husbander, what hath a blood that I know form again,
    Up you as no heaven;
    Which freed like untall in a well make him,
    see in her foolish, do you not all
    life]  Now, piper of your fo

In [47]:
save5_path = model_path + 'save5.h5'
if not os.path.exists(save5_path):
    model.save_weights(save5_path)
model.load_weights(save5_path)

In [48]:
run_epochs(model, 25)

Epoch 1/1
In the cathedral church of Westminster, I come
    So so, the visit and deceive your
    drop of your friend
    It three month we see the forms will stoppell them approaching mouth,
    you should call the ways she been persuaded bosom.'
  ULYSSES. O that, Hector.
    Never heard you mine.
    Die for nurse,
    His fortune and hear upon men blood;
()
Epoch 1/1
In the cathedral church of Westminster, content thereof you have unjest,
    Together now is sad
    If your eyes were your care was not plac'd
    Means me, he'll be sent come,
    That Goth my bloody peril of him.
    Walk: let 'gno.
  DEMETRIUS. 'Tis by our offices.
  THURIO. Even the estate lives that. O,
    Let's presently be pass'd me; but I c
()
Epoch 1/1
In the cathedral church of Westminster, or by a fortune of these book;
    To do it holouring fits of what I do wear
    So, sir, what banish'd to swear with common officers, and
    make impossible. I be come in her stone, praise you one still unwoman and so

In [49]:
save5_2_path = model_path + 'save5_2.h5'
if not os.path.exists(save5_2_path):
    model.save_weights(save5_2_path)
model.load_weights(save5_2_path)