This notebook is highly inspired from 
- [LSTM Text Generation](https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py)
- [Lasagne doc about recurrent](http://lasagne.readthedocs.io/en/latest/modules/layers/recurrent.html)

In [1]:
import numpy as np
import theano
import theano.tensor as T
import lasagne
seed = 1
lasagne.random.set_rng(np.random.RandomState(seed))

Using cuDNN version 6021 on context None
Mapped name None to device cuda0: GeForce GTX TITAN Black (0000:03:00.0)


# Hyperparameters

The following are hyperparameters, except for `PRINT_FREQ`, that will have an impact on the learning algorithm.

In [2]:
# Sequence Length
SEQ_LENGTH = 20

# Number of units in the hidden (LSTM) layers
DEPTH = 2
N_HIDDEN = 512
NON_LINEARITY = lasagne.nonlinearities.rectify

# All gradients above this will be clipped
GRAD_CLIP = 100

# Number of epochs to train the net
NUM_EPOCHS = 50

# Optimization learning rate
LEARNING_RATE = 0.01

# Batch Size
BATCH_SIZE = 128

# How often should we check the output?
PRINT_FREQ = 500

Define the optimizer to be used for the training. An optimizer can be seen as a function that takes a gradient, obtained by backpropagation, and returns an update to be applied to the current parameters. Other optimizers can be found in: [optimizer reference](http://lasagne.readthedocs.io/en/latest/modules/updates.html?highlight=update).

In [3]:
my_optimizer = lambda loss, params: lasagne.updates.nesterov_momentum(
        loss, params, learning_rate=0.01, momentum=0.9)

# Dataset

The following loads the dataset (the full text *Beyond Good and Evil* by Friedrich Nietzsche) directly from Internet.
You can also replace the dataset by your own by adapting the commented line.

In [4]:
import urllib.request #For downloading the sample text file. You won't need this if you are providing your own file.
try:
    in_text = urllib.request.urlopen('https://s3.amazonaws.com/text-datasets/nietzsche.txt').read()
    #in_text = open('your_file.txt', 'r').read()
    in_text = in_text.decode("utf-8")
    print(in_text[:250])
except Exception as e:
    print("Please verify the location of the input file/URL.")
    print("A sample txt file can be downloaded from https://s3.amazonaws.com/text-datasets/nietzsche.txt")
    raise IOError('Unable to Read Text')

PREFACE


SUPPOSING that Truth is a woman--what then? Is there not ground
for suspecting that all philosophers, in so far as they have been
dogmatists, have failed to understand women--that the terrible
seriousness and clumsy importunity with which t


The pre-processing consists in retrieving the list of symbols occuring in the text and to convert each of them into an unique index. This index will be used to create an one-hot representation of the symbol that will be the input of the model.

In [None]:
chars = list(set(in_text))
data_size, vocab_size = len(in_text), len(chars)
char_to_ix = {ch:i for i,ch in enumerate(chars)}
ix_to_char = {i:ch for i,ch in enumerate(chars)}
print('Number of unique symbols: {}'.format(vocab_size))
print('Number of symbols in the dataset: {}'.format(data_size))

The following auxiliary function creates a minibatch in a 3D tensor (batch_size, SEQ_LENGTH, vocab_size).
For each datapoint (fixed first coordinate of the 3D matrix), there is a matrix of dimension (SEQ_LENGTH, vocab_size)
where each line contains the one-hot vector representing the character at the associated position. Notice that the sequences have all the same length (SEQ_LENGTH), which can covers many sentences.

In [None]:
input_shape = (None, None, vocab_size)

def iterate_minibatch(
    p, 
    batch_size = BATCH_SIZE, 
    data=in_text, 
    return_target=True):
    """
    Return a minibatch compatible with the input of the model and the associated targets
    
    :type p: int
    :param The index of the character to begin to read
    :type batch_size: int
    :param The number of datapoints in the current batch
    :type data: str
    :param The whole text
    :type return_target: bool
    :param Create the targets (next character) associated to the sequences
    """
    x = np.zeros((batch_size,SEQ_LENGTH,vocab_size))
    y = np.zeros(batch_size)
    
    for n in range(batch_size):
        ptr = n
        for i in range(SEQ_LENGTH):
            x[n,i,char_to_ix[data[p+ptr+i]]] = 1.
        if(return_target):
            y[n] = char_to_ix[data[p+ptr+SEQ_LENGTH]]
    return x, np.array(y,dtype='int32')

# Model definition

Recurrent layers can be used similarly to feed-forward layers except that the input shape is expected to be (batch_size, sequence_length, num_inputs). By setting the first two dimensions as None, we are allowing them to vary. They correspond to batch size and sequence length, so we will be able to feed in batches of varying size with sequences of varying length. If `only_return_final` is set, it only returns the final sequential output (e.g. for tasks where a single target value for the entire sequence is desired). In this case, Theano makes an optimization, which saves memory. If you are working with variable size sequences, an additional parameters `masks` of size (batch_size, sequence_length) is given as a boolean mask where its entries are fixed to 0 after the end of the sequence.

In [None]:
def create_lstm(
    vocab_size,
    input_shape,
    input_var = None,
    nonlinearity = lasagne.nonlinearities.tanh,
    depth=2, 
    n_hidden=800,
    grad_clip=100):
    """
    A generic function for creating a LSTM neural network.

    :type vocab_size: int
    :param vocab_size: number of elements in the dictionary
    :type input_shape: tuple
    :param input_shape: a tuple containing the shape of the input
    :type input_var: theano.tensor.var.TensorVariable
    :param input_var: a theano symbolic variable, created automatically if None
    :type nonlinearity: lasagne.nonlinearities
    :param nonlinearity: a nonlinearity function that follows all dense layers
    :type depth: int
    :param depth: the depth of the LSTM
    :type n_hidden: int or list
    :param n_hidden: number of hidden units per LSTM cells (if int, the same for all layers)
    :type grad_clip: float
    :param grad_clip: threshold for the gradient in the LSTM
   """

    # First, we build the network, starting with an input layer
    # Recurrent layers expect input of shape
    # (batch size, SEQ_LENGTH, num_features)
    network = lasagne.layers.InputLayer(
        shape=input_shape,
        input_var=input_var
    )

    # We now build the LSTM layer
    # We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients. 
    for _ in range(depth-1):
    
        network = lasagne.layers.LSTMLayer(
            network, 
            num_units=n_hidden, 
            grad_clipping=grad_clip,
            nonlinearity=nonlinearity)

    network = lasagne.layers.LSTMLayer(
        network, 
        num_units=n_hidden, 
        grad_clipping=grad_clip,
        nonlinearity=nonlinearity,
        only_return_final=True)

    # The output the previous module, with shape (batch_size, N_HIDDEN),
    # is then passed through the softmax nonlinearity to 
    # create a probability distribution over the dictionary.
    # The output of this stage is (batch_size, vocab_size).
    network = lasagne.layers.DenseLayer(network, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
    return network

# Optimization

In the following, we want to maximize the probability to output the right character given the SEQ_LENGTH previous ones. To do this, we retrieve the output of our model, which is a softmax over the characters, and we compare it to the actual character of the sequence. Finally, since we are using minibatches of size `BATCH_SIZE`, we compute the mean over the examples of the minibatch.

In [None]:
# Theano tensor for the targets
input_var = T.tensor3('inputs')
target_values = T.ivector('target_output')

network = create_lstm(
    vocab_size,
    input_shape,
    input_var,
    nonlinearity=NON_LINEARITY,
    depth=DEPTH, 
    n_hidden=N_HIDDEN)

# lasagne.layers.get_output produces a variable for the output of the net
network_output = lasagne.layers.get_output(network)

# The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target.
loss = lasagne.objectives.categorical_crossentropy(network_output, target_values).mean()

# Retrieve all the parameters of the models and create the optimizer
params = lasagne.layers.get_all_params(network,trainable=True)
updates = my_optimizer(loss, params)

In [None]:
# Theano functions for training and computing cost
print("Compiling functions ...")
train = theano.function([input_var, target_values], loss, updates=updates, allow_input_downcast=True)
compute_loss = theano.function([input_var, target_values], loss, allow_input_downcast=True)

# In order to generate text from the network, we need the probability distribution of the next character given
# the state of the network and the input (a seed).
# In order to produce the probability distribution of the prediction, we compile a function called probs.
probs = theano.function([input_var],network_output,allow_input_downcast=True)

The next function generates text given a phrase of length at least `SEQ_LENGTH`.
The phrase is set using the variable generation_phrase.
The optional input `N` is used to set the number of characters of text to predict. 

In [None]:
generation_phrase = "The meaning of life is" #This phrase will be used as seed to generate text.

def predict(N=200):
    """
    Output a sequence of characters of lenght N according to the current model
    
    :type N: int
    :param N: the number of characters to generate
    """
    assert(len(generation_phrase)>=SEQ_LENGTH)
    sample_ix = []
    x,_ = iterate_minibatch(len(generation_phrase)-SEQ_LENGTH, 1, generation_phrase,0)

    for i in range(N):
        # Pick the character that got assigned the highest probability
        # ix = np.argmax(probs(x).ravel())
        
        # Alternatively, to sample from the distribution instead:
        ix = np.random.choice(np.arange(vocab_size), p=probs(x).ravel())
        sample_ix.append(ix)
        x[:,0:SEQ_LENGTH-1,:] = x[:,1:,:]
        x[:,SEQ_LENGTH-1,:] = 0
        x[0,SEQ_LENGTH-1,sample_ix[-1]] = 1. 

    random_snippet = generation_phrase + ''.join(ix_to_char[ix] for ix in sample_ix)    
    print("----\n %s \n----" % random_snippet)

In [None]:
print("Training ...")
print("Seed used for text generation is: " + generation_phrase)
p = 0
for it in range(int(data_size * NUM_EPOCHS / BATCH_SIZE)):
    predict() # Generate text using the p^th character as the start. 
            
    avg_cost = 0;
    for i in range(PRINT_FREQ):
        x,y = iterate_minibatch(p)

        p += SEQ_LENGTH + BATCH_SIZE - 1 
        if(p+BATCH_SIZE+SEQ_LENGTH >= data_size):
            print('Carriage Return')
            p = 0;

        avg_cost += train(x, y)
    print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))


Training ...
Seed used for text generation is: The meaning of life is
----
 The meaning of life isS"Eé:Ra(h6fW9rclR7f]fYxjh0CweeUFVzCZqi6yo]3p-HX)XLmLP,9zjkXn1VAK)n-wF--AL43a'æpkryodPcD02N"WW0[U.ySTBj4AIH-c?kXYQc(4)3SEXKguth.ëLpfROwndhHæv[220PI2BXd(EVR,LUP2p-é(Æv_?R9bZHcE0vw)ztæ,NRc9Lk=02jkp6(5M"L 
----
Epoch 0.0 average loss = 3.2399878993034363
----
 The meaning of life isi ol nn ihonybtielseS
h ta o  fEd essupltyrtefa dronirarr mieearvesiRttorenUdr  leate;aro,ytr,trstenrn tiU w .toiittrisue evntsnt ipip ywtt"da eTglvhainsy net inhoi den sneteohtm olef"et p ctn rt teha 
----
Epoch 0.10650814704115374 average loss = 3.039201445102692
----
 The meaning of life is ileur soutens 
8leate
dtaendsy, teo vmdenuTtutiay aor ,d,, enregts eiwEd eemt nh iae, -o
etstden
nfaaards ifeattesl,uIu7nthil, as=aetvlhie ëed ogia tfnuchpoachac8"tissioe paate nee, anr iadeoc"spoj5g 
----
Epoch 0.2130162940823075 average loss = 2.7900235023498534
----
 The meaning of life ismseguosisd Nast
ely teed!tsSen af t