# Lesson 6 - RNN - redux 1
By me.

In [1]:
from theano.sandbox import cuda
cuda.use('gpu1')

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


In [2]:
%matplotlib inline
import utils; reload(utils)
from utils import *
from __future__ import division, print_function

Using Theano backend.


## Setup
We'll work on a Nietzsche text corpus.

In [3]:
path = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path).read()
print('corpus length:', len(text))

corpus length: 600901


In [4]:
chars = sorted(list(set(text)))
vocab_size = len(chars)+1
print('total chars:', vocab_size)

total chars: 86


In [5]:
# we're adding a zero for padding (sometimes it's useful to have a meaningless token)
chars.insert(0, "\0")

In [7]:
print(' '.join(chars))

  
   ! " ' ( ) , - . 0 1 2 3 4 5 6 7 8 9 : ; = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] _ a b c d e f g h i j k l m n o p q r s t u v w x y z � � � � � �


In [8]:
# we want to work with numbers so we need to turn these chars (our vocabulary) into indices
char_indices = {c:i for i, c in enumerate(chars)}
indices_char = {i:c for i,c in enumerate(chars)}

In [9]:
# and now we change the entire corpus into numbers
idx = [char_indices[c] for c in text]

In [10]:
idx[:10]

[40, 42, 29, 30, 25, 27, 29, 1, 1, 1]

In [12]:
''.join([indices_char[i] for i in idx[:70]])

'PREFACE\n\n\nSUPPOSING that Truth is a woman--what then? Is there not gro'

## Models
### 3 Char Model
Start with the simplest.

#### Create input
For this model our input will be a list of every fourth character, starting at 0, 1, 2 and 3rd char.

In [14]:
# what the hell is cs? Well it's e.g. the number of chars from which we'll be trying to predict the 4th one (3 previous)
cs = 3

# ok, so we're going to be using step here (of cs=3) and grabbing every 1st char of that 4 char sequence, every 2nd char
# and so on.
c1_dat = [idx[i] for i in xrange(0, len(idx)-1-cs, cs)]
c2_dat = [idx[i+1] for i in xrange(0, len(idx)-1-cs, cs)]
c3_dat = [idx[i+2] for i in xrange(0, len(idx)-1-cs, cs)]
c4_dat = [idx[i+3] for i in xrange(0, len(idx)-1-cs, cs)]

# so c1_dat holds the 0th char, 4th char, 8th char of idx (that's how step of cs = 3 works)
# c2_dat holds 1st, 5th, 9th etc.
# and c4_dat is our y, what we're trying to predict

In [16]:
c1_dat[:10]

[40, 30, 29, 1, 40, 43, 31, 61, 2, 74]

In [30]:
# turn them into inputs (np.ndarrays, using np.stack) - no idea why we're skipping the last two
x1 = np.stack(c1_dat[:-2])
x2 = np.stack(c2_dat[:-2])
x3 = np.stack(c3_dat[:-2])

In [31]:
x1

array([40, 30, 29, ..., 62, 72, 59])

In [32]:
# and outputs (y)
y = np.stack(c4_dat[:-2])

In [33]:
x1.shape, y.shape

((200297,), (200297,))

Let's define the number of latent factors:

In [34]:
n_fac = 42

Create inputs and embedding outputs for each of our 3 inputs (define a function)

In [35]:
from keras.layers import Input, Embedding

def embedding_input(name, n_in, n_out):
    inp = Input(shape=(1,), dtype='int64', name=name)
    emb = Embedding(n_in, n_out, input_length=1)(inp)
    return inp, Flatten()(emb)

In [36]:
# n_in is our vocab size, n_out is the number of latent factors we've defined
c1_in, c1 = embedding_input("c1", vocab_size, n_fac)
c2_in, c2 = embedding_input("c2", vocab_size, n_fac)
c3_in, c3 = embedding_input("c3", vocab_size, n_fac)

#### Create and train model
We've got the first 2 layers already done.

In [37]:
# pick the number of activations in our hidden fully connected layer:
n_hidden = 256

The green arrow from our diagram (from every input to hidden layer):

In [38]:
from keras.layers import Dense

dense_in = Dense(n_hidden, activation="relu")

For our first input (every first character in a 4char sequence) we just use this green arrow to turn it into our first hidden matrix.

In [39]:
c1_hidden = dense_in(c1)  # this is the functional notation, passing something to the layer

This is the orange arrows - passing info from hidden to hidden layer.

In [40]:
dense_hidden = Dense(n_hidden, activation="tanh")  # no explanation why we used tanh here

Remember from the diagram that the 2nd and 3rd characters come in after the previous ones have already been turned
via the green arrow into a hidden dense matrix.

In [41]:
c2_dense = dense_in(c2)  # (green) this is just the green arrow for c2 input
hidden_2 = dense_hidden(c1_hidden)  # (orange) this is the first part of the dense matrix resulting from c1 and c2
c2_hidden = merge([c2_dense, hidden_2])  # this is the full c2_hidden layer, a SUM of c2_dense and the hidden from c1.

In [43]:
c2_hidden.shape

Shape.0

In [45]:
# repeat for the c3
c3_dense = dense_in(c3) # green arrow for c3
hidden_3 = dense_hidden(c2_hidden) # orange arrow between 2 hidden dense layers
c3_hidden = merge([c3_dense, hidden_3]) # this is a merge (default=sum) of the input from c3 and .. 
# ... the previous hidden dense.

Now for the blue arrow, going from last hidden to output.

In [46]:
dense_out = Dense(vocab_size, activation="softmax")
# we want it to output a char, hence vocab_size

In [47]:
# the last hidden state is the input to this last layer
c4_out = dense_out(c3_hidden)

The model is defined by 3 inputs in a list and the c4_out holds all the operations (we've chained them functionally).

In [49]:
model = Model([c1_in, c2_in, c3_in], c4_out)

In [50]:
model.compile(loss="sparse_categorical_crossentropy", optimizer=Adam())
# we use sparse categorical crossentropy because we didn't one-hot-encode our output.
# it takes integer targets, one-hot encodes automatically in the background!
# REALLY USEFUL POSSIBLY - this way we don't need to create Thousand-columned arrays!
# WE CAN SKIP ONE-HOT ENCODING IN KERAS!

In [51]:
model.optimizer.lr = 0.000001

In [54]:
model.fit([x1, x2, x3], y, batch_size=64, nb_epoch=4)

Epoch 1/4
Epoch 2/4

KeyboardInterrupt: 

In [55]:
model.optimizer.lr = 0.01

In [None]:
model.fit([x1, x2, x3])