<h1 style="text-align:center">Deep Learning   </h1>
<h1 style="text-align:center"> Lab Session 3 - 3 Hours </h1>
<h1 style="text-align:center">Long Short Term Memory (LSTM) for Language Modeling</h1>

<b> Student 1:</b> Sofiene JERBI
 
 
In this Lab Session,  you will build and train a Recurrent Neural Network, based on Long Short-Term Memory (LSTM) units for next word prediction task. 

Answers and experiments should be made by groups of one or two students. Each group should fill and run appropriate notebook cells. 
Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an pdf document using print as PDF (Ctrl+P). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by June 9th 2017.

Send you pdf file to benoit.huet@eurecom.fr and olfa.ben-ahmed@eurecom.fr using **[DeepLearning_lab3]** as Subject of your email.

#  Introduction

You will train a LSTM to predict the next word using a sample short story. The LSTM will learn to predict the next item of a sentence from the 3 previous items (given as input). Ponctuation marks are considered as dictionnary items so they can be predicted too. Figure 1 shows the LSTM and the process of next word prediction. 

<img src="lstm.png" height="370" width="370"> 


Each word (and ponctuation) from text sentences is encoded by a unique integer. The integer value corresponds to the index of the corresponding word (or punctuation mark) in the dictionnary. The network output is a one-hot-vector indicating the index of the predicted word in the reversed dictionnary (Section 1.2). For example if the prediction is 86, the predicted word will be "company". 



You will use a sample short story from Aesop’s Fables (http://www.taleswithmorals.com/) to train your model. 


<font size="3" face="verdana" > <i> "There was once a young Shepherd Boy who tended his sheep at the foot of a mountain near a dark forest.

It was rather lonely for him all day, so he thought upon a plan by which he could get a little company and some excitement.
He rushed down towards the village calling out "Wolf, Wolf," and the villagers came out to meet him, and some of them stopped with him for a considerable time.
This pleased the boy so much that a few days afterwards he tried the same trick, and again the villagers came to his help.
But shortly after this a Wolf actually did come out from the forest, and began to worry the sheep, and the boy of course cried out "Wolf, Wolf," still louder than before.
But this time the villagers, who had been fooled twice before, thought the boy was again deceiving them, and nobody stirred to come to his help.
So the Wolf made a good meal off the boy's flock, and when the boy complained, the wise man of the village said:
"A liar will not be believed, even when he speaks the truth."  "</i> </font>.    







Start by loading the necessary libraries and resetting the default computational graph. For more details about the rnn packages, we suggest you to take a look at https://www.tensorflow.org/api_guides/python/contrib.rnn

In [1]:
import numpy as np
import collections # used to build the dictionary
import random
import time
import pickle # may be used to save your model 
import matplotlib.pyplot as plt
#Import Tensorflow and rnn
import tensorflow as tf
from tensorflow.contrib import rnn  

# Target log path
logs_path = 'lstm_words'
writer = tf.summary.FileWriter(logs_path)

# Next-word prediction task

## Part 1: Data  preparation

### 1.1. Loading data

Load and split the text of our story

In [2]:
def load_data(filename):
    with open(filename) as f:
        data = f.readlines()
    data = [x.strip().lower() for x in data]
    data = [data[i].split() for i in range(len(data))]
    data = np.array(data)
    data = np.reshape(data, [-1, ])
    print(data)
    return data

#Run the cell 
train_file ='data/story.txt'
train_data = load_data(train_file)
print("Loaded training data...")
print(len(train_data))

['there' 'was' 'once' 'a' 'young' 'shepherd' 'boy' 'who' 'tended' 'his'
 'sheep' 'at' 'the' 'foot' 'of' 'a' 'mountain' 'near' 'a' 'dark' 'forest'
 '.' 'it' 'was' 'rather' 'lonely' 'for' 'him' 'all' 'day' ',' 'so' 'he'
 'thought' 'upon' 'a' 'plan' 'by' 'which' 'he' 'could' 'get' 'a' 'little'
 'company' 'and' 'some' 'excitement' '.' 'he' 'rushed' 'down' 'towards'
 'the' 'village' 'calling' 'out' 'wolf' ',' 'wolf' ',' 'and' 'the'
 'villagers' 'came' 'out' 'to' 'meet' 'him' ',' 'and' 'some' 'of' 'them'
 'stopped' 'with' 'him' 'for' 'a' 'considerable' 'time' '.' 'this'
 'pleased' 'the' 'boy' 'so' 'much' 'that' 'a' 'few' 'days' 'afterwards'
 'he' 'tried' 'the' 'same' 'trick' ',' 'and' 'again' 'the' 'villagers'
 'came' 'to' 'his' 'help' '.' 'but' 'shortly' 'after' 'this' 'a' 'wolf'
 'actually' 'did' 'come' 'out' 'from' 'the' 'forest' ',' 'and' 'began' 'to'
 'worry' 'the' 'sheep,' 'and' 'the' 'boy' 'of' 'course' 'cried' 'out'
 'wolf' ',' 'wolf' ',' 'still' 'louder' 'than' 'before' '.' 'but' 't

### 1.2.Symbols encoding

The LSTM input's can only be numbers. A way to convert words (symbols or any items) to numbers is to assign a unique integer to each word. This process is often based on frequency of occurrence for efficient coding purpose.

Here, we define a function to build an indexed word dictionary (word->number). The "build_vocabulary" function builds both:

- Dictionary : used for encoding words to numbers for the LSTM inputs 
- Reverted dictionnary : used for decoding the outputs of the LSTM into words (and punctuation).

For example, in the story above, we have **113** individual words. The "build_vocabulary" function builds a dictionary with the following entries ['the': 0], [',': 1], ['company': 85],...


In [3]:
def build_vocabulary(words):
    count = collections.Counter(words).most_common()
    dic= dict()
    for word, _ in count:
        dic[word] = len(dic)
    reverse_dic= dict(zip(dic.values(), dic.keys()))
    return dic, reverse_dic


Run the cell below to display the vocabulary

In [4]:
dictionary, reverse_dictionary = build_vocabulary(train_data)
vocabulary_size= len(dictionary) 
print("Dictionary size (Vocabulary size) = ", vocabulary_size)
print("\n")
print("Dictionary : \n")
print(dictionary)
print("\n")
print("Reverted Dictionary : \n" )
print(reverse_dictionary)

Dictionary size (Vocabulary size) =  113


Dictionary : 

{'but': 17, 'stirred': 32, 'days': 33, 'villagers': 11, 'forest': 25, 'cried': 35, 'calling': 36, 'he': 6, 'sheep,': 37, 'afterwards': 71, 'came': 19, 'plan': 38, 'deceiving': 39, 'of': 9, 'louder': 40, ',': 1, 'little': 41, 'get': 42, 'could': 43, 'tried': 44, 'before': 20, 'speaks': 45, 'did': 46, 'when': 21, 'nobody': 47, 'them': 22, 'had': 48, 'who': 23, 'course': 49, 'some': 24, 'once': 50, 'wolf': 5, 'shepherd': 52, 'meet': 104, 'even': 54, 'near': 34, 'tended': 55, 'stopped': 56, 'lonely': 57, 'by': 58, 'than': 59, 'rather': 60, 'excitement': 61, 'considerable': 62, 'still': 63, 'day': 64, 'shortly': 65, 'flock': 66, 'come': 26, 'was': 13, 'with': 67, 'from': 68, 'out': 10, 'made': 69, 'this': 14, 'same': 70, 'time': 18, 'help': 29, 'wise': 73, 'worry': 92, 'so': 15, 'which': 75, 'liar': 76, 'at': 77, 'been': 78, 'trick': 79, 'thought': 27, 'fooled': 80, 'a': 2, 'dark': 81, 'and': 3, 'rushed': 82, ':': 83, 'the': 0, 'off'

## Part 2 : LSTM Model in TensorFlow

Since you have defined how the data will be modeled, you are now to develop an LSTM model to predict the word of following a sequence of 3 words. 

### 2.1. Model definition

Define a 2-layers LSTM model.  

For this use the following classes from the tensorflow.contrib library:

- rnn.BasicLSTMCell(number of hidden units) 
- rnn.static_rnn(rnn_cell, data, dtype=tf.float32)
- rnn.MultiRNNCell(,)


You may need some tensorflow functions (https://www.tensorflow.org/api_docs/python/tf/) :
- tf.split
- tf.reshape 
- ...




In [5]:
def lstm_model(x, w, b):
    inputs = tf.reshape(x, [-1, n_input])
    inputs = tf.split(inputs,n_input,1)
    lstm = rnn.BasicLSTMCell(n_hidden)
    stacked_lstm = rnn.MultiRNNCell([lstm]*2)
    outputs, state = rnn.static_rnn(stacked_lstm, inputs, dtype=tf.float32)
    model = tf.matmul(outputs[-1], w['out']) + b['out']
    return model

Training Parameters and constants

In [6]:
# Training Parameters
learning_rate = 0.001
epochs = 5000
display_step = 100
n_input = 3

#For each LSTM cell that you initialise, supply a value for the hidden dimension, number of units in LSTM cell
n_hidden = 64

# tf Graph input
x = tf.placeholder("float", [None, n_input, 1])
y = tf.placeholder("float", [None, vocabulary_size])

# LSTM  weights and biases
weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }


#build the model
pred = lstm_model(x, weights, biases)

Define the Loss/Cost and optimizer

In [7]:
# Loss and optimizer
#cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)



# Model evaluation
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

We give you here the Test Function

In [55]:
#run the cell
def test(sentence, session, verbose=False):
    sentence = sentence.strip()
    words = sentence.split(' ')
    if len(words) != n_input:
        print("sentence length should be equel to", n_input, "!")
    try:
        symbols_inputs = [dictionary[str(words[i - n_input])] for i in range(n_input)]
        keys = np.reshape(np.array(symbols_inputs), [-1, n_input, 1])
        onehot_pred = session.run(pred, feed_dict={x: keys})
        onehot_pred_index = int(tf.argmax(onehot_pred, 1).eval())
        words.append(reverse_dictionary[onehot_pred_index])
        sentence = " ".join(words)
        if verbose:
            print(sentence)
        return reverse_dictionary[onehot_pred_index]
    except:
        print(" ".join(["Word", words[i - n_input], "not in dictionary"]))

## Part 3 : LSTM Training  

In the Training process, at each epoch, 3 words are taken from the training data, encoded to integer to form the input vector. The training labels are one-hot vector encoding the word that comes after the 3 inputs words. Display the loss and the training accuracy every 1000 iteration. Save the model at the end of training in the **lstm_model** folder

In [19]:
# Initializing the variables
start_time = time.time()
init = tf.global_variables_initializer()
model_saver = tf.train.Saver()
# Create a summary to monitor cost tensor
tf.summary.scalar("Loss", cost)
# Create a summary to monitor accuracy tensor
tf.summary.scalar("Accuracy", accuracy)
# Merge all summaries into a single op
merged_summary_op = tf.summary.merge_all()
# Initializing the session 
with tf.Session() as sess:
    sess.run(init)
    # op to write logs to Tensorboard
    summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    print("Start Training")
    ##############################################
    total_batch = len(train_data)-n_input-1
    xs = []
    ys = []
    yt = np.zeros(vocabulary_size)
    for i in range(total_batch):
        xs += [np.array([dictionary[word] for word in train_data[i:i+n_input]]).reshape(1,n_input,1)]
        idx = dictionary[train_data[i+n_input]]
        yt[idx] = 1
        ys += [np.copy(yt).reshape(1,vocabulary_size)]
        yt[idx] = 0
    xs_bis = np.array(xs).reshape(-1,n_input,1)
    ys_bis = np.array(ys).reshape(-1,vocabulary_size)
    for epoch in range(epochs):
        avg_cost = 0.
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = xs[i], ys[i]
            # Run optimization op (backprop), cost op (to get loss value)
            # and summary nodes
            _, c, summary, onehot_pred = sess.run([optimizer, cost, merged_summary_op, pred], feed_dict={x: batch_xs, y: batch_ys})
            # Write logs at every iteration
            summary_writer.add_summary(summary, epoch * total_batch + i)
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost), "  Validation Accuracy=", "{:.9f}".format(accuracy.eval(feed_dict={x: xs_bis, y: ys_bis})))

    ##############################################
    print("End Of training Finished!")
    print("time: ",time.time() - start_time)
    print("For tensorboard visualisation run on command line.")
    print("\ttensorboard --logdir=%s" % (logs_path))
    print("and point your web browser to the returned link")
    ##############################################
    save_path = model_saver.save(sess, "lstm_model/model"+str(int(time.time())))
    ##############################################
    print("Model saved")

Start Training
Epoch:  100   =====> Loss= 0.147865921   Validation Accuracy= 0.600000024
Epoch:  200   =====> Loss= 0.170389204   Validation Accuracy= 0.738095224
Epoch:  300   =====> Loss= 0.168773343   Validation Accuracy= 0.819047630
Epoch:  400   =====> Loss= 0.174424479   Validation Accuracy= 0.933333337
Epoch:  500   =====> Loss= 0.115744019   Validation Accuracy= 0.933333337
Epoch:  600   =====> Loss= 0.104514964   Validation Accuracy= 0.966666639
Epoch:  700   =====> Loss= 0.113949627   Validation Accuracy= 0.961904764
Epoch:  800   =====> Loss= 0.101774005   Validation Accuracy= 0.942857146
Epoch:  900   =====> Loss= 0.096509059   Validation Accuracy= 0.909523785
Epoch:  1000   =====> Loss= 0.099715668   Validation Accuracy= 0.966666639
Epoch:  1100   =====> Loss= 0.120024360   Validation Accuracy= 0.966666639
Epoch:  1200   =====> Loss= 0.064044490   Validation Accuracy= 0.966666639
Epoch:  1300   =====> Loss= 0.068750653   Validation Accuracy= 0.947619021
Epoch:  1400   ====

At the end of our training we see that we get a good training accuracy, close to 97%. The training worked well and fast.<br/>
<img src="graphs/accuracy_3.png"> <br/>
<img src="graphs/loss_3.png"> 

## Part 4 : Test your model 

### 3.1. Next word prediction

Load your model (using the model_saved variable given in the training session) and test the sentences :
- 'get a little' 
- 'nobody tried to'
- Try with other sentences using words from the stroy's vocabulary. 

In [76]:
tf.reset_default_graph()
with tf.Session() as sess:
    n_input = 3
    x = tf.placeholder("float", [None, n_input, 1])
    weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
    biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }
    pred = lstm_model(x, weights, biases)
    model_saver = tf.train.Saver()
    init = tf.global_variables_initializer()
    # Restore variables from disk.
    model_saver.restore(sess, "lstm_model/model1496327640")
    print("Model restored.")
    # Do some work with the model
    for sentence in ["get a little","nobody tried to", "he speaks the", "the boy was", "the man came"]:
        print(sentence)
        print(test(sentence, sess, verbose=False))
        print("**********")

Model restored.
get a little
company
**********
nobody tried to
come
**********
he speaks the
truth
**********
the boy was
tended
**********
the man came
out
**********


### 3.2. More fun with the Fable Writer !

You will use the RNN/LSTM model learned in the previous question to create a
new story/fable.
For this you will choose 3 words from the dictionary which will start your
story and initialize your network. Using those 3 words the RNN will generate
the next word or the story. Using the last 3 words (the newly predicted one
and the last 2 from the input) you will use the network to predict the 5
word of the story.. and so on until your story is 5 sentence long. 
Make a point at the end of your story. 
To implement that, you will use the test function. 

In [33]:
with tf.Session() as sess:
    model_saver.restore(sess, "lstm_model/model1496327640")
    fable = "The boy was"
    nwords = "the boy was"
    nbpoints = 0
    while nbpoints<5:
        word = test(nwords, sess, verbose=False)
        if word in [",", "."]:
            fable+= word
        elif fable[-1]==".":
            fable+= " " + word.title()
        else:
            fable+= " " + word
        tmp = nwords.split(" ")[1:] + [word]
        nwords = ""
        for w in tmp:
            nwords += w + " "
        nwords = nwords[:-1]
        if word==".":
            nbpoints+=1
    print(fable)

The boy was again deceiving them, and nobody stirred to come to his help. So the wolf made a good by which he could get a little company and some excitement. He rushed down towards the village calling out wolf, wolf, and the villagers came to his help. So the wolf made a good by which he could get a little company and some excitement. He rushed down towards the village calling out wolf, wolf, and the villagers came to his help.


We see that the created fable makes some sense (at least is written with a correct grammar) even though we enter a two sentences loop with the given 3-words input.

### 3.3. Play with number of inputs

The number of input in our example is 3, see what happens when you use other number (1 and 5)

In [15]:
for n_input in [1,5]:
    tf.reset_default_graph()
    x = tf.placeholder("float", [None, n_input, 1])
    y = tf.placeholder("float", [None, vocabulary_size])
    weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
    biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }
    pred = lstm_model(x, weights, biases)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)
    # Model evaluation
    correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

    # Initializing the variables
    start_time = time.time()
    init = tf.global_variables_initializer()
    model_saver = tf.train.Saver()
    # Create a summary to monitor cost tensor
    tf.summary.scalar("Loss", cost)
    # Create a summary to monitor accuracy tensor
    tf.summary.scalar("Accuracy", accuracy)
    # Merge all summaries into a single op
    merged_summary_op = tf.summary.merge_all()
    # Initializing the session 
    with tf.Session() as sess:
        sess.run(init)
        # op to write logs to Tensorboard
        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
        print("Start Training")
        ##############################################
        total_batch = len(train_data)-n_input-1
        xs = []
        ys = []
        yt = np.zeros(vocabulary_size)
        for i in range(total_batch):
            xs += [np.array([dictionary[word] for word in train_data[i:i+n_input]]).reshape(1,n_input,1)]
            idx = dictionary[train_data[i+n_input]]
            yt[idx] = 1
            ys += [np.copy(yt).reshape(1,vocabulary_size)]
            yt[idx] = 0
        xs_bis = np.array(xs).reshape(-1,n_input,1)
        ys_bis = np.array(ys).reshape(-1,vocabulary_size)
        for epoch in range(epochs):
            avg_cost = 0.
            # Loop over all batches
            for i in range(total_batch):
                batch_xs, batch_ys = xs[i], ys[i]
                # Run optimization op (backprop), cost op (to get loss value)
                # and summary nodes
                _, c, summary, onehot_pred = sess.run([optimizer, cost, merged_summary_op, pred], feed_dict={x: batch_xs, y: batch_ys})
                # Write logs at every iteration
                summary_writer.add_summary(summary, epoch * total_batch + i)
                # Compute average loss
                avg_cost += c / total_batch
            # Display logs per epoch step
            if (epoch+1) % display_step == 0:
                print("Epoch: ", '%02d' % (epoch+1), "  =====> Loss=", "{:.9f}".format(avg_cost), "  Validation Accuracy=", "{:.9f}".format(accuracy.eval(feed_dict={x: xs_bis, y: ys_bis})))

        ##############################################
        print("End Of training Finished!")
        print("time: ",time.time() - start_time)
        print("For tensorboard visualisation run on command line.")
        print("\ttensorboard --logdir=%s" % (logs_path))
        print("and point your web browser to the returned link")
        ##############################################
        model_name = "model"+str(int(time.time()))
        save_path = model_saver.save(sess, "lstm_model/"+model_name)
        ##############################################
        print("Model saved")

Start Training
Epoch:  100   =====> Loss= 3.653317738   Validation Accuracy= 0.132075474
Epoch:  200   =====> Loss= 3.521692361   Validation Accuracy= 0.136792451
Epoch:  300   =====> Loss= 3.468380347   Validation Accuracy= 0.132075474
Epoch:  400   =====> Loss= 3.457010381   Validation Accuracy= 0.127358496
Epoch:  500   =====> Loss= 3.408037501   Validation Accuracy= 0.113207549
Epoch:  600   =====> Loss= 3.365570235   Validation Accuracy= 0.165094346
Epoch:  700   =====> Loss= 3.302691246   Validation Accuracy= 0.127358496
Epoch:  800   =====> Loss= 3.356638100   Validation Accuracy= 0.136792451
Epoch:  900   =====> Loss= 3.468619726   Validation Accuracy= 0.146226421
Epoch:  1000   =====> Loss= 3.486100896   Validation Accuracy= 0.160377353
Epoch:  1100   =====> Loss= 3.474109301   Validation Accuracy= 0.146226421
Epoch:  1200   =====> Loss= 3.360642204   Validation Accuracy= 0.146226421
Epoch:  1300   =====> Loss= 3.585799253   Validation Accuracy= 0.165094346
Epoch:  1400   ====

In [80]:
tf.reset_default_graph()
with tf.Session() as sess:
    n_input = 1
    x = tf.placeholder("float", [None, n_input, 1])
    weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
    biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }
    pred = lstm_model(x, weights, biases)
    model_saver = tf.train.Saver()
    init = tf.global_variables_initializer()
    model_saver.restore(sess, "lstm_model/model1496341229")
    fable = "Help"
    nwords = "help"
    nbpoints = 0
    it=0
    while nbpoints<5 and it<10:
        it+=1
        word = test(nwords, sess, verbose=False)
        print(nwords, word)
        if word in [",", "."]:
            fable+= word
        elif fable[-1]==".":
            fable+= " " + word.title()
        else:
            fable+= " " + word
        nwords = word
        if word==".":
            nbpoints+=1
    print(fable)

help the
the boy
boy who
who the
the boy
boy who
who the
the boy
boy who
who the
Help the boy who the boy who the boy who the


In [70]:
tf.reset_default_graph()
with tf.Session() as sess:
    n_input = 5
    x = tf.placeholder("float", [None, n_input, 1])
    weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
    biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }
    pred = lstm_model(x, weights, biases)
    model_saver = tf.train.Saver()
    init = tf.global_variables_initializer()
    model_saver.restore(sess, "lstm_model/model1496343309")
    fable = "The boy was a shepherd"
    nwords = "the boy was a shepherd"
    nbpoints = 0
    while nbpoints<5:
        word = test(nwords, sess, verbose=False)
        if word in [",", "."]:
            fable+= word
        elif fable[-1]==".":
            fable+= " " + word.title()
        else:
            fable+= " " + word
        tmp = nwords.split(" ")[1:] + [word]
        nwords = ""
        for w in tmp:
            nwords += w + " "
        nwords = nwords[:-1]
        if word==".":
            nbpoints+=1
    print(fable)

The boy was a shepherd a and some again deceiving, so the the and the this a villagers complained been come out from day, so he thought upon a plan by which he could get a little company and some excitement. He rushed down towards the village calling out wolf, wolf, and the villagers came out to meet him, and some of them stopped with him for a considerable time. This pleased the boy so much that a few days afterwards he tried the same trick, and again the villagers came to his help. But shortly after this a wolf actually did come out from the forest, and began to worry the sheep, and the boy of course cried out wolf, wolf, and the villagers came out to meet him, and some of them stopped with him for a considerable time. This pleased the boy so much that a few days afterwards he tried the same trick, and again the villagers came to his help.


As we can see, training with 1 word input gives underfitting, the validation accuracy is always low and the suggestions are really poor as we end in the "the boy who boy" loop very easily (after 1 or 2 words) and cannot create any fable. Training with 5 words inputs on the contrary induces overfitting, we have a very good training accuracy of 99.5% very fast, but in the fable making, pretty fast we come back to the original fable exactly as it was.

We can deduce from this that n_input = 3 was probably the optimal paramater for our model.