# Week 9 Homework

For homework, we will build off of the class exercises. One thing you might have noticed as you were looking at text generated by our language model was that it was difficult to tell how well the language model was doing besides eye-balling the quality of the generated text. It turns out there is a way to quantitatively evaluate the model, using a metric called perplexity. Load the following cell to get started.

In [None]:
import numpy as np
import re
import utils
import math
import io
import random
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.models import load_model

Run the below cells to load up our dataset and pretrained model as before.

In [None]:
with io.open('resources/fake.txt', encoding='utf-8') as f:
    articles_raw = f.read()
    articles_split = re.split("<a>", articles_raw)[1:]
    articles = [a[:-6].strip() for a in articles_split]

X, y, char_indices, indices_char = utils.get_X_y(articles_raw)

print("Shapes:", X.shape, y.shape)

number_of_chunks, chunk_length, number_of_characters = X.shape

print(number_of_chunks, chunk_length, number_of_characters)

In [None]:
pretrained_model = load_model('resources/pretrained_model.h5')
pretrained_model.summary()

We can see that we are working with a 2-layer LSTM. When evaluating models, we generally prefer tune our hyperparameters based on quantitative performance metrics, such as test set accuracy. We don't have anything like that for our language model, so instead we will use perplexity. 

Intuitively, perplexity is a measure of how confused our model is when it looks at our original dataset. For each chunk, we compare our model's predictions to the actual next character, and if the model's predictions differ by a lot, the model has a high perplexity. Thus, we aim to minimize perplexity.

More formally, perplexity is the geometric mean of the product of the inverse prediction probabilities for the correct characters: 

![Perplexity](resources/assets/perplexity.png)

Note that for us, T is the number of chunks. To calculate perplexity, we evaluate our model on each chunk to produce a y_hat for that chunk. In practice, we may instead calculate the log(Perplexity), which simplifies to the following:

![log perplexity](resources/assets/log-perplexity.png)

Note that in both of these equations we have the following term:

![Probability term](resources/assets/prob-term.png)

Does this term look familiar? It is the dot-product of our prediction vector and the one-hot label vector. When we compute the dot product with a one-hot vector, we are effectively doing an index lookup at the position represented by the one-hot vector. This sum always simplifies to the predicted probability of the correct character, but it is still simpler to compute this as a dot product in code.

Now you will finish a function to compute log(perplexity). Note that to compute this, you need all of the prediction vectors for each chunk (this corresponds to the y_hat(t) vectors in the equations above). We have provided the below function to get all of the predictions for our dataset using our pretrained model. The reason this code is more complicated than you'd expect is that Keras expects the input to come as batches of size 128 since this is how we trained our model. Don't worry about the details of this!

In [None]:
def get_pred(model, X):
    number_of_chunks, chunk_length, number_of_characters = X.shape
    pred = np.zeros((number_of_chunks, number_of_characters))
    num_batches = int(math.ceil(number_of_chunks / 128.0))
    for i in range(num_batches):
        curr_pred = model.predict(X[(i) * 128: (i + 1) * 128])
        pred[(i) * 128: (i + 1) * 128] = curr_pred
    return pred

print("Getting predictions... this will take a few minutes.")
pretrained_pred = get_pred(pretrained_model, X)
print("Done getting predictions.")
print(pretrained_pred.shape)

Now that we have our predictions, we simply need to use them, along with *y*, to compute log(perplexity). You just need to change two lines below to finish the below function. Hint: the first line involves computing the predicted probability of the correct character. You may find *np.dot* useful for this.

Note: in the case that the predicted probability of the correct character is exactly 0.0 due to underflow or some other issue, we skip this chunk to avoid log domain errors. This is obviously a hack, but we do it infrequently enough that our metric is still meaningful.

In [None]:
def get_log_perplexity(pred, y):
    number_of_chunks, number_of_characters = pred.shape
    total = 0.0
    for t in range(number_of_chunks):
        ### YOUR CODE HERE
        prob = None
        ### END CODE
        # If prob is 0.0, skip to avoid log domain errors
        if prob == 0.0:
            continue
        ### YOUR CODE HERE
        total += None
        ### END CODE
    return total / number_of_chunks

In [None]:
get_log_perplexity(pretrained_pred, y)

**Expected output:**

1.110400204748201

## Optional: Train your own model and evaluate it!

You can also optionally train your own model with your own choice of architecture and evaluate it against our new metric! Go into the *resources* directory and edit *lstm_text_generation.py* to your liking. You should only need to edit lines 93-97 if you want to change the architecture only, but feel free to change anything else you like. Simply run ```python lstm_text_generation.py``` to train for up to 300 epochs and save model files each epoch (this will take a really long time, feel free to stop training at any time). The model files will be saved to the *resources/outputs* directory, which will be created if it does not already exist. You should expect the model to take a few minutes per epoch and around 100 epochs before it has mostly converged. Once you have a trained model, just insert your model path below and run the cells to evaluate it. See if you can do better than our pretrained model!

In [None]:
### YOUR MODEL PATH HERE
your_model = load_model('resources/outputs/lstm_epochXXX.h5')
### END MODEL PATH
your_model.summary()

In [None]:
your_pred = get_pred(your_model, X)
get_log_perplexity(your_pred, y)