## Lesson 6

[lesson 6 wiki](http://wiki.fast.ai/index.php/Lesson_6)

In [1]:
%matplotlib inline
import utils
import imp
imp.reload(utils)
from utils import *

Using TensorFlow backend.


## Setup

We're going to download the collected works of Nietzsche to use a sout data for this class.

In [2]:
path = get_file('nietzsche.txt', origin='http://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read()
print('corpus length:', len(text))

corpus length: 600893


In [3]:
chars = sorted(list(set(text)))
vocab_size = len(chars) + 1
print('total chars', vocab_size)

total chars 85


Sometimes it's useful to have zero value in the dataset, e.g. for padding

In [4]:
chars.insert(0, '\0')

In [5]:
''.join(chars[1:-6])

'\n !"\'(),-.0123456789:;=?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxy'

Map from chars to indices and back again

In [6]:
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

*idx* will be the data we use from now own - it simply converts all the characters to their index (based on the mapping above)

In [7]:
idx = [char_indices[c] for c in text]

In [8]:
idx[:10]

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]

In [9]:
''.join(indices_char[i] for i in idx[:70])

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not gro'

## 3 char model

### create inputs

Create a list of every 4th character, starting at the 0th, 1st, 2nd, then 3rd characters

In [10]:
cs = 3
c1_dat = [idx[i] for i in range(0, len(idx) - 1 - cs, cs)]
c2_dat = [idx[i+1] for i in range(0, len(idx) - 1 - cs, cs)]
c3_dat = [idx[i+2] for i in range(0, len(idx) - 1 - cs, cs)]
c4_dat = [idx[i+3] for i in range(0, len(idx) - 1 - cs, cs)]

Our inputs

In [11]:
x1 = np.stack(c1_dat[:-2])
x2 = np.stack(c2_dat[:-2])
x3 = np.stack(c3_dat[:-2])

Our output

In [12]:
y = np.stack(c4_dat[:-2])

The first 4 inputs and outputs

In [13]:
x1[:4], x2[:4], x3[:4]

(array([40, 30, 29,  1]), array([42, 25,  1, 43]), array([29, 27,  1, 45]))

In [14]:
y[:4]

array([30, 29,  1, 40])

In [15]:
x1.shape, y.shape

((200295,), (200295,))

The number of latent factors to create (i.e. the size of the embedding matrix)

In [16]:
n_fac = 42

Create inpus and embedding outputs for each our 3 character inputs

In [19]:
def embedding_input(name, n_in, n_out):
    inp = Input(shape = (1,), dtype = 'int64', name = name)
    emb = Embedding(n_in, n_out, input_length = 1)(inp)
    return inp, Flatten()(emb)

In [20]:
c1_in, c1 = embedding_input('c1', vocab_size, n_fac)
c2_in, c2 = embedding_input('c2', vocab_size, n_fac)
c3_in, c3 = embedding_input('c3', vocab_size, n_fac)

### Create and train model

Pick a size for out hidden state

In [21]:
n_hidden = 256

This is the 'green arrow' from our diagram - the layer operation from input to hidden

In [22]:
dense_in = Dense(n_hidden, activation = 'relu')

Our first hidden activation is simply this function applied to the result of the embedding of the first character.

In [23]:
c1_hidden = dense_in(c1)

This is the 'orange arrow' from our diagram - the layer operation from hidden to hidden

In [24]:
dense_hidden = Dense(n_hidden, activation = 'tanh')

Our second and third hidden activations sum up the previous hidden state (agter applying dense_hidden) to the new input state.

In [26]:
c2_dense = dense_in(c2)
hidden_2 = dense_hidden(c1_hidden)
c2_hidden = merge([c2_dense, hidden_2])

  This is separate from the ipykernel package so we can avoid doing imports until


In [27]:
c3_dense = dense_in(c3)
hidden_3 = dense_hidden(c2_hidden)
c3_hidden = merge([c3_dense, hidden_3])

  This is separate from the ipykernel package so we can avoid doing imports until


This is the 'blue arrow' from out diagram - the layer operation from hidden to output.

In [28]:
dense_out = Dense(vocab_size, activation = 'softmax')

The third hidden state is the inupt to our output layer.

In [29]:
c4_out = dense_out(c3_hidden)

In [30]:
model = Model([c1_in, c2_in, c3_in], c4_out)

In [32]:
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = Adam())

In [33]:
model.optimizer.lr = 1e-6

In [34]:
model.fit([x1, x2, x3], y, batch_size = 64, epochs = 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0xde8ae80>

In [35]:
model.optimizer.kr = 0.01

In [36]:
model.fit([x1, x2, x3], y, batch_size = 64, epochs = 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0xbb1e358>

In [37]:
model.optimizer.lr = 1e-6

In [38]:
model.fit([x1, x2, x3], y, batch_size = 64, epochs = 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x13ed9780>

In [39]:
model.optimizer.lr = 0.01

In [40]:
model.fit([x1, x2, x3], y, batch_size = 64, epochs = 4)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x13ed90b8>

### Test model

In [41]:
def get_next(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [np.array(i)[np.newaxis] for i in idxs]
    p = model.predict(arrs)
    i = np.argmax(p)
    return chars[i]

In [42]:
get_next('phi')

' '

In [46]:
get_next(' th')

' '

In [47]:
get_next(' an')

' '

## Out first RNN!

### Create inputs

This is the size of out unrolled RNN.

In [48]:
cs = 8

For each of 0 through 7, create a list of every 8th character with taht starting point. These will be the 8 inputs to out model.

In [49]:
c_in_dat = [[idx[i+n] for i in range(0, len(idx)-1-cs, cs)] for n in range(cs)]

Then create a list of the next character in each of these series. This will be the labels for our model.

In [50]:
c_out_dat = [idx[i+cs] for i in range(0, len(idx)-1-cs, cs)]

In [51]:
xs = [np.stack(c[:-2]) for c in c_in_dat]

In [52]:
len(xs), xs[0].shape

(8, (75109,))

In [54]:
y = np.stack(c_out_dat[:-2])

So each column below is one series of 8 characters from the text.

In [55]:
[xs[n][:cs] for n in range(cs)]

[array([40,  1, 33,  2, 72, 67, 73,  2]),
 array([42,  1, 38, 44,  2,  9, 61, 73]),
 array([29, 43, 31, 71, 54,  9, 58, 61]),
 array([30, 45,  2, 74,  2, 76, 67, 58]),
 array([25, 40, 73, 73, 76, 61, 24, 71]),
 array([27, 40, 61, 61, 68, 54,  2, 58]),
 array([29, 39, 54,  2, 66, 73, 33,  2]),
 array([ 1, 43, 73, 62, 54,  2, 72, 67])]

..and this is the next character after each sequence.

In [56]:
y[:cs]

array([ 1, 33,  2, 72, 67, 73,  2, 68])

In [57]:
n_fac = 42

### Create and train model

In [58]:
def embedding_input(name, n_in, n_out) :
    inp = Input(shape = (1,), dtype = 'int64', name = name + '_in')
    emb = Embedding(n_in, n_out, input_length = 1, name = name + '_emb')(inp)
    return inp, Flatten()(emb)

In [59]:
c_ins = [embedding_input('c' + str(n), vocab_size, n_fac) for n in range(cs)]

In [60]:
n_hidden = 256

In [68]:
dense_in = Dense(n_hidden, activation = 'relu')
dense_hidden = Dense(n_hidden, activation = 'relu', kernel_initializer = 'identity')
dense_out = Dense(vocab_size, activation = 'softmax')

The first character of each sequence goes through dense_in(), to create out first hidden activations.

In [69]:
hidden = dense_in(c_ins[0][1])

Then for each successive layer we combine the output of dense_in() on the next character with the output of dense_hidden() on the current hidden state, to create new hidden state.

In [70]:
for i in range(1, cs):
    c_dense = dense_in(c_ins[i][1])
    hidden = dense_hidden(hidden)
    hidden = merge([c_dense, hidden])

  after removing the cwd from sys.path.


Putting the final hidden state through desnse_out() gives us our output.

In [72]:
c_out = dense_out(hidden)

So now we can create out model.

In [73]:
model = Model([c[0] for c in c_ins], c_out)
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = Adam())

In [74]:
model.fit(xs, y, batch_size = 64, epochs = 12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x11e987b8>

### Test model

In [75]:
def get_next(inp):
    idxs = [np.array(char_indices[c])[np.newaxis] for c in inp]
    p = model.predict(idxs)
    return chars[np.argmax(p)]

In [76]:
get_next('for thos')

'e'

In [78]:
get_next('part of ')

't'

In [79]:
get_next('queens a')

'n'

## Our first RNN with keras!

In [81]:
n_hidden, n_fac, cs, vocab_size = (256, 42, 8, 86)

This is nearly exactly equivalent to the RNN we built ourselves in the previous section.

In [83]:
model = Sequential([
    Embedding(vocab_size, n_fac, input_length = cs),
    SimpleRNN(n_hidden, activation = 'relu', inner_init = 'identity'),
    Dense(vocab_size, activation = 'softmax')
])

  This is separate from the ipykernel package so we can avoid doing imports until


In [84]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, 8, 42)             3612      
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 256)               76544     
_________________________________________________________________
dense_20 (Dense)             (None, 86)                22102     
Total params: 102,258
Trainable params: 102,258
Non-trainable params: 0
_________________________________________________________________


In [85]:
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = Adam())

In [86]:
model.fit(np.concatenate(xs, axis = 1), y, batch_size = 64, epochs = 8)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x11decf98>

In [87]:
def get_next_keras(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = np.array(idxs)[np.newaxis,:]
    p = model.predict(arrs)[0]
    return chars[np.argmax(p)]

In [88]:
get_next_keras('this is ')

't'

In [89]:
get_next_keras('part of ')

't'

In [90]:
get_next_keras('queens a')

'n'

## Returning sequeces

### Create inputs

To use a sequence model, we can leave out input unchanged - but we have to change out output to a sequence (of course!)

Here, c_out_dat is identical to c_in_dat, but moved across 1 character,

In [91]:
c_out_dat = [[idx[i+n] for i in range(1, len(idx)-cs, cs)] for n in range(cs)]

In [94]:
ys = [np.stack(c[:-2]) for c in c_out_dat]

Reading down each column shows one set of inputs and outputs.

In [95]:
[xs[n][:cs] for n in range(cs)]

[array([[40],
        [ 1],
        [33],
        [ 2],
        [72],
        [67],
        [73],
        [ 2]]), array([[42],
        [ 1],
        [38],
        [44],
        [ 2],
        [ 9],
        [61],
        [73]]), array([[29],
        [43],
        [31],
        [71],
        [54],
        [ 9],
        [58],
        [61]]), array([[30],
        [45],
        [ 2],
        [74],
        [ 2],
        [76],
        [67],
        [58]]), array([[25],
        [40],
        [73],
        [73],
        [76],
        [61],
        [24],
        [71]]), array([[27],
        [40],
        [61],
        [61],
        [68],
        [54],
        [ 2],
        [58]]), array([[29],
        [39],
        [54],
        [ 2],
        [66],
        [73],
        [33],
        [ 2]]), array([[ 1],
        [43],
        [73],
        [62],
        [54],
        [ 2],
        [72],
        [67]])]

In [96]:
[ys[n][:cs] for n in range(cs)]

[array([42,  1, 38, 44,  2,  9, 61, 73]),
 array([29, 43, 31, 71, 54,  9, 58, 61]),
 array([30, 45,  2, 74,  2, 76, 67, 58]),
 array([25, 40, 73, 73, 76, 61, 24, 71]),
 array([27, 40, 61, 61, 68, 54,  2, 58]),
 array([29, 39, 54,  2, 66, 73, 33,  2]),
 array([ 1, 43, 73, 62, 54,  2, 72, 67]),
 array([ 1, 33,  2, 72, 67, 73,  2, 68])]

### Create and train model

In [97]:
dense_in = Dense(n_hidden, activation = 'relu')
dense_hidden = Dense(n_hidden, activation = 'relu', init = 'identity')
dense_out = Dense(vocab_size, activation = 'softmax', name = 'output')

  


In [99]:
inp1 = Input(shape = (n_fac,), name = 'zero')
hidden = dense_in(inp1)

In [100]:
outs = []

for i in range(cs):
    c_dense = dense_in(c_ins[i][1])
    hidden = dense_hidden(hidden)
    hidden = merge([c_dense, hidden], mode = 'sum')
    
    # every Layer new has an output
    outs.append(dense_out(hidden))

  


In [101]:
model = Model([inp1] + [c[0] for c in c_ins], outs)
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = Adam())

In [102]:
zeros = np.tile(np.zeros(n_fac), (len(xs[0]), 1))
zeros.shape

(75109, 42)

In [103]:
model.fit([zeros] + xs, ys, batch_size = 64, epochs = 12)

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x10466780>

### Test model

In [107]:
def get_nexts(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = [np.array(i)[np.newaxis] for i in idxs]
    p = model.predict([np.zeros(n_fac)[np.newaxis,:]] + arrs)
    print(list(inp))
    return [chars[np.argmax(o)] for o in p]

In [108]:
get_nexts(' this is')

[' ', 't', 'h', 'i', 's', ' ', 'i', 's']


['t', 'h', 'e', 't', ' ', 'e', 'n', ' ']

In [109]:
get_nexts(' part of')

[' ', 'p', 'a', 'r', 't', ' ', 'o', 'f']


['t', 'o', 'r', 't', ' ', 'o', 'f', ' ']

### Sequence model with keras

In [110]:
n_hidden, n_fac, cs, vocab_size

(256, 42, 8, 86)

To convert out previous keras model into a sequence model, simply add the 'return_sequences = True' parameter, and add TimeDistributed() around out dense layer.

In [114]:
model = Sequential([
    Embedding(vocab_size, n_fac, input_length = cs),
    SimpleRNN(n_hidden, return_sequences = True, activation = 'relu', inner_init = 'identity'),
    TimeDistributed(Dense(vocab_size, activation = 'softmax'))
])

  This is separate from the ipykernel package so we can avoid doing imports until


In [115]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_8 (Embedding)      (None, 8, 42)             3612      
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, 8, 256)            76544     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 8, 86)             22102     
Total params: 102,258
Trainable params: 102,258
Non-trainable params: 0
_________________________________________________________________


In [116]:
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = Adam())

In [117]:
xs[0].shape

(75109, 1)

In [118]:
x_rnn = np.stack(np.squeeze(xs), axis = 1)
y_rnn = np.atleast_3d(np.stack(ys, axis = 1))

In [120]:
x_rnn.shape, y_rnn.shape

((75109, 8), (75109, 8, 1))

In [121]:
model.fit(x_rnn, y_rnn, batch_size = 64, epochs = 8)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


<keras.callbacks.History at 0x1c482f98>

In [124]:
def get_nexts_keras(inp):
    idxs = [char_indices[c] for c in inp]
    arrs = np.array(idxs)[np.newaxis, :]
    p = model.predict(arrs)[0]
    print(list(p))
    return [chars[np.argmax(o)] for o in p]

In [125]:
get_nexts_keras(' this is')

[array([  1.3300e-08,   9.7863e-04,   9.3279e-03,   1.4379e-05,   9.3872e-03,   2.2838e-04,
         4.0600e-03,   6.6583e-06,   8.0739e-05,   8.7644e-06,   1.8521e-05,   2.7755e-08,
         2.4759e-04,   6.6310e-05,   5.1688e-05,   3.7037e-05,   3.7574e-05,   4.9740e-05,
         2.9261e-05,   8.3142e-05,   1.4095e-05,   1.3405e-05,   1.1970e-05,   6.1021e-05,
         2.2689e-05,   4.1309e-03,   2.2781e-03,   1.3450e-03,   1.5092e-03,   3.1624e-03,
         3.7977e-03,   3.8464e-03,   1.7714e-03,   6.5103e-03,   2.2833e-04,   2.7988e-04,
         1.5426e-03,   1.8487e-03,   1.9987e-03,   2.5050e-03,   2.5550e-03,   1.9709e-04,
         1.4508e-03,   3.0313e-03,   5.9994e-03,   7.3280e-04,   4.2734e-04,   5.3910e-03,
         2.3499e-06,   3.4971e-04,   3.5913e-05,   3.7655e-04,   2.0996e-07,   7.4795e-04,
         1.4141e-01,   2.9348e-02,   2.6940e-02,   3.0382e-02,   2.2971e-02,   3.3395e-02,
         1.2665e-02,   4.7787e-02,   6.8860e-02,   2.6954e-03,   3.7443e-03,   1.4519e-02

['a', 'h', 'e', 's', ' ', 'c', 's', ' ']

### one-hot sequence model with keras

This is the keras version of th theano model taht we're about to create.

In [127]:
model = Sequential([
    SimpleRNN(n_hidden, return_sequences = True, input_shape = (cs, vocab_size), activation = 'relu', inner_init = 'identity'),
    TimeDistributed(Dense(vocab_size, activation = 'softmax'))
])
model.compile(loss = 'categorical_crossentropy', optimizer = Adam())

  


In [130]:
oh_ys = [to_categorical(o, vocab_size) for o in ys]
oh_y_rnn = np.stack(oh_ys, axis = 1)

oh_xs = [to_categorical(o, vocab_size) for o in xs]
oh_x_rnn = np.stack(oh_xs, axis = 1)

oh_x_rnn.shape, oh_y_rnn.shape

((75109, 8, 86), (75109, 8, 86))

In [None]:
model.fit(oh_x_rnn, oh_y_rnn, batch_size = 64, epochs = 8)

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8

In [None]:
def get_nexts_oh(inp):
    idxs = np.array([char_indices[c] for c in inp])
    arr = to_categorical(idxs, vocab_size)
    
    p = model.predict(arr[np.newaxis, :])[0]
    print(list(inp))
    return [chars[np.argmax(o)] for o in p]

In [None]:
get_nexts_oh(' this is')