# The Neural Network That Reads and Writes Obama's Speeches

This tutorial shows how to train recurrent neural networks on sequence data with MXNet.
We will train a model that reads President Obama's speeches and tries to imitate its style by spitting out english characters one by one.

### Download the data

In [13]:
import os
if not os.path.exists("lab_data.zip"):
    urllib.urlretrieve("http://webdocs.cs.ualberta.ca/~bx3/lab_data.zip", "lab_data.zip")
os.system("unzip -o lab_data.zip")

0

In [1]:
import mxnet as mx
import numpy as np
import random
import bisect
import urllib

In [2]:
# set up logging
import logging
reload(logging)
logging.basicConfig(format='%(asctime)s %(levelname)s:%(message)s', level=logging.DEBUG, datefmt='%I:%M:%S')

# A Glance of LSTM structure and embedding layer

We will build a LSTM network to learn from char only. At each time, input is a char. We will see this LSTM is able to learn words and grammers from sequence of chars.

The following figure is showing an unrolled LSTM network, and how we generate embedding of a char. The one-hot to embedding operation is a special case of fully connected network.


In [3]:
#image

In [10]:
# Read from doc
def read_content(path):
    with open(path) as ins:
        content = ins.read()
        return content

# Build a vocabulary of what char we have in the content
def build_vocab(path):
    content = read_content(path)
    print "k"
   
    content = list(content)
    print content[0:20]
    idx = 1 # 0 is left for zero-padding
    the_vocab = {}
    for word in content:
        if len(word) == 0:
            continue
        if not word in the_vocab:
            the_vocab[word] = idx
            idx += 1
    return the_vocab

# We will assign each char with a special numerical id
def text2id(sentence, the_vocab):
    words = list(sentence)
    words = [the_vocab[w] for w in words if len(w) > 0]
    return words

# Evaluation 
def Perplexity(label, pred):
    loss = 0.
    for i in range(pred.shape[0]):
        loss += -np.log(max(1e-10, pred[i][int(label[i])]))
    return np.exp(loss / label.size)

## Setting up the LSTM

In [17]:
from bucket_io import BucketSentenceIter
# The batch size for training
batch_size = 32
# We can support various length input
# For this problem, we cut each input sentence to length of 129
# So we only need fix length bucket
buckets = [129]
# hidden unit in LSTM cell
num_hidden = 512
# embedding dimension, which is, map a char to a 256 dim vector
num_embed = 256
# number of lstm layer
num_lstm_layer = 3
#we will show a quick demo in 1 epoch
# and we will see result by training 75 epoch
num_epoch = 1
# learning rate 
learning_rate = 0.01
# we will use pure sgd without momentum
momentum = 0.0
# we can select multi-gpu for training
# for this demo we only use one
devs = [mx.context.gpu(i) for i in range(1)]
# initalize states for LSTM
init_c = [('l%d_init_c'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
init_states = init_c + init_h
# we can build an iterator for text
data_train = BucketSentenceIter("./obama.txt", vocab, buckets, batch_size,
                                init_states, seperate_char='\n',
                                text2id=text2id, read_content=read_content)


bucket of len 129 : 8290 samples


## Build vocab

In [18]:
# build char vocabluary from input
vocab = build_vocab("./obama.txt")

k
['C', 'a', 'l', 'l', ' ', 't', 'o', ' ', 'R', 'e', 'n', 'e', 'w', 'a', 'l', ' ', 'K', 'e', 'y', 'n']


In [19]:
# generate symbol for a length
def sym_gen(seq_len):
    return lstm_unroll(num_lstm_layer, seq_len, len(vocab) + 1,
                       num_hidden=num_hidden, num_embed=num_embed,
                       num_label=len(vocab) + 1, dropout=0.2)

In [21]:
# the network symbol
from lstm import lstm_unroll
symbol = sym_gen(buckets[0])

In [22]:
symbol

<mxnet.symbol.Symbol at 0x1140eb190>

# Train model

In [23]:
# Train a LSTM network as simple as feedforward network
model = mx.model.FeedForward(ctx=devs,
                             symbol=symbol,
                             num_epoch=num_epoch,
                             learning_rate=learning_rate,
                             momentum=momentum,
                             wd=0.0001,
                             initializer=mx.init.Xavier(factor_type="in", magnitude=2.34))

In [24]:
# Fit it
model.fit(X=data_train,
          eval_metric = mx.metric.np(Perplexity),
          batch_end_callback=mx.callback.Speedometer(batch_size, 5),
          epoch_end_callback=mx.callback.do_checkpoint("obama"))

04:27:46 INFO:Start training with [gpu(0)]


MXNetError: [16:27:46] src/storage/storage.cc:43: Please compile with CUDA enabled