# Text generation with RNN

In this lab, we are going to generate text with RNNs.

We'll try to have a RNN learning the *fables de la Fontaine*.

Lets load into variable the *Fables*:

In [20]:
with open('./fables.txt') as f:
    text = f.read()

print text

À Monseigneur le Dauphin
 
Je chante les héros dont Ésope est le père,
Troupe de qui l'histoire, encor que mensongère,
Contient des vérités qui servent de leçons.
Tout parle en mon ouvrage, et même les poissons.
Ce qu'ils disent s'adresse à tous tant que nous sommes.
Je me sers d'animaux pour instruire les hommes.
Illustre rejeton d'un Prince aimé des Cieux,
Sur qui le monde entier a maintenant les yeux,
Et qui, faisant fléchir les plus superbes têtes,
Comptera désormais ses jours par ses conquêtes,
Quelque autre te dira d'une plus forte voix
Les faits de tes aïeux et les vertus des rois.
Je vais t'entretenir de moindres aventures,
Te tracer en ces vers de légères peintures :
Et, si de t'agréer je n'emporte le prix,
J'aurai du moins l'honneur de l'avoir entrepris.
 
 
PRÉFACE
 
      L'indulgence que l'on a eue pour quelques-unes de mes fables me donne lieu d'espérer la même grâce pour ce recueil. Ce n'est pas qu'un des maîtres de notre éloquence n'ait désapprouvé le dessein de les met

### Helpers

Define some methods to read this text
- a batch generator, generating batchs of text
- a decoder to translate a batch into stg more convinient

In [85]:
import numpy as np

vocab = sorted(set(text))  # my vocabulary (many letters)
print "I have", len(vocab), "different elements in  my text which are :"
print ' '.join(vocab)


def sample_gen(batch_size, n_items):
    """Return a random sample"""
    while True:
        permutations = list(np.random.permutation(len(text) - n_items))
        while len(permutations) > n_items + 1:
            # Generate a batch
            batch = []
            for i in range(batch_size):
                p = permutations.pop()
                batch.append(text[p : p + n_items])
            yield batch

def encode_batch(batch, one_hot=False):
    """Takes a batch of string as input and encode it to a numerical
    batch"""
    batch_new = np.ndarray((len(batch),len(batch[0])))
    for i in range(len(batch)):
        for j in range(len(batch[0])):
            batch_new[i][j] = vocab.index(batch[i][j])
    
    if one_hot == True:
        # One hot the vector
        # TODO
        pass
    return batch_new


a = sample_gen(10, 28)
b = a.next()
print b
print encode_batch(b)

I have 91 different elements in  my text which are :

   ! " ' ( ) , - . 0 1 2 3 4 5 6 7 8 9 : ; ? A B C D E F G H I J L M N O P Q R S T U V X Y Z ` a b c d e f g h i j l m n o p q r s t u v x y z � � � � � � � � � � � � � � � � � � �
['partit : " Mes Petits sont m', 'rnir ;\n      Et cela me fait', 'l esprit, elle le fit na\xc3\xaetr', "\xc3\xa0 ce Marchand il n'en co\xc3\xbbt", 'leur est arriv\xc3\xa9 de plus rem', 'nd cr\xc3\xa9dit pr\xc3\xa8s de Lyc\xc3\xa9rus', '\xa9s\n      Feront aux Oisillon', 'es les plus lentes :\nIl fit ', '\nLaissa tomber sa proie, afi', 'n accusant son sort :\n      ']
[[ 62.  48.  64.  66.  56.  66.   1.  20.   1.   3.   1.  34.  52.  65.
    1.  37.  52.  66.  56.  66.  65.   1.  65.  61.  60.  66.   1.  59.]
 [ 64.  60.  56.  64.   1.  21.   0.   1.   1.   1.   1.   1.   1.  27.
   66.   1.  50.  52.  58.  48.   1.  59.  52.   1.  53.  48.  56.  66.]
 [ 58.   1.  52.  65.  62.  64.  56.  66.   7.   1.  52.  58.  58.  52.
    1.  58.  52.   1.  53.  56. 

### Sample of training taken from the web

In [13]:
# https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb
import tensorflow as tf
from tensorflow.contrib import rnn

# Training Parameters
learning_rate = 0.001
training_steps = 100
batch_size = 128
display_step = 200

# Network Parameters
num_input = len(vocab)
timesteps = 28 # timesteps
num_hidden = 128 # hidden layer num of features
num_classes = len(vocab)

# tf Graph input
tf.reset_default_graph()
X = tf.placeholder("float", [None, timesteps, num_input])
Y = tf.placeholder("float", [None, num_classes])

# Define weights
W1 = tf.Variable(tf.random_normal([num_hidden, num_classes]))
B1 = tf.Variable(tf.random_normal([num_classes]))

def RNN(x, W1, B1):
    # Prepare data shape to match `rnn` function requirements
    # Current data input shape: (batch_size, timesteps, n_input)
    # Required shape: 'timesteps' tensors list of shape (batch_size, n_input)

    # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input)
    x = tf.unstack(x, timesteps, 1)

    # Define a lstm cell with tensorflow
    lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0)

    # Get lstm cell output
    outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)

    # Linear activation, using rnn inner loop last output
    return tf.matmul(outputs[-1], W1) + B1

with tf.name_scope('model')
    logits = RNN(X, W1, B1)
    prediction = tf.nn.softmax(logits)

with tf.name_scope('loss'):
    loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
        logits=logits, labels=Y))

with tf.name_scope('optimizer')
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss_op)

with tf.name_scope('metrics')
    correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

In [None]:
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    for step in range(1, training_steps+1):
        batch_x, batch_y = ??

        # Run optimization op (backprop)
        sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
        if step % display_step == 0 or step == 1:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
                                                                 Y: batch_y})
            print("Step " + str(step) + ", Minibatch Loss= " + \
                  "{:.4f}".format(loss) + ", Training Accuracy= " + \
                  "{:.3f}".format(acc))

    print("Optimization Finished!")



### Train a model (6/10)
Train a model that can learn to create text from a given input (letter wise)

Dont forget to explain what you do, why, and if it do look to be working

### Train a model (4/10)
Train a model that can learn to create text from a given input (text wise). Using a word embeding seen in class, like CBOW

Dont forget to explain what you do, why, and if it do look to be working