### NLP Lab w/ Keras + Deep Learning

Welcome to tonight's lab!  For today we're going to try and re-build what was covered in the previous class, but with you behind the driver's wheel.  

The purpose is to allow ourselves to better understand the details of model building with neural networks by forcing yourself to recreate what was already covered.  Everything gets more clear with practice.

You can refer to class notes from the previous lab, but it's best to try and force yourself to try and remember what you're supposed to do from memory first.

**Step 1:** Import the necessary modules:  numpy, pandas, tensorflow and keras, and load in the dataset.  You can find it in the `data` folder in the `Unit4` directory.

In [1]:
# your answer here
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras

url = r"https://raw.githubusercontent.com/JonathanBechtel/dat-11-15/main/ClassMaterial/Unit4/data/IMDB.csv"
df  = pd.read_csv(url)

**Step 2:** Data Cleaning.  

Now go ahead and do some necessary data prep for our modeling.  Try and take the following steps:

 - Convert the `sentiment` column to 1 and 0
 - Remove `<br>` tags
 - Create training and test sets with an 80 / 20 split

In [2]:
# your answer here
from sklearn.model_selection import train_test_split
df['review'] = df['review'].str.replace('<br />', '')
df['sentiment'] = np.where(df['sentiment'] == 'positive', 1, 0)
X_train, X_test, y_train, y_test = train_test_split(df['review'], df['sentiment'], test_size = 0.2, random_state = 42)

**Step 3:** Language Processing

We'll now go ahead and convert our word corpus into numeric form that can be used for modeling.  We'll do so with the following steps:

 - Create a tokenizer with a vocabulary size of 10,000 words
 - Use it to transform the training and test sets (make sure to transform the training set using values in the test set)
 - Arrange your samples so that all of them are 300 characters long

In [3]:
# your answer here

# tokenizing to develop word index
tokenizer = keras.preprocessing.text.Tokenizer(num_words = 10000)
tokenizer.fit_on_texts(X_train)
X_train   = tokenizer.texts_to_sequences(X_train)
X_test    = tokenizer.texts_to_sequences(X_test)

# padding to make everything an even length
X_train   = keras.preprocessing.sequence.pad_sequences(X_train, maxlen = 300)
X_test    = keras.preprocessing.sequence.pad_sequences(X_test, maxlen = 300)

**Knowledge check:** Can you make sense out of why your new values for `X_train` and `X_test` appear the way they do?  
  - Why are there 0's?
  - Can you take your numbers and reverse-engineer them back into their original words?

There are 0's when the review is less than 300 characters long, and you need to insert them in order to bring it up to the required length.

You can take each number, and look it up in `tokenizer.word_index` to see what word the number corresponds to.

**Step 4:** Construct a sequential model

Now use the `Sequential` module in keras to build a network with the following layers:

- A word embedding with 64 weights for each word in our corpus
- A layer to resize the embedding output back into two dimensions
- Two dense layers with 64 columns of weights each
- A dense layer at the end for the final prediction

**Note:** Use the appropriate activation functions where necessary!

In [4]:
# your answer here
mod = keras.models.Sequential([
      keras.layers.Embedding(10000, 64, input_length = 300),
      keras.layers.Flatten(),
      keras.layers.Dense(64, activation = 'relu'),
      keras.layers.Dense(64, activation = 'relu'),
      keras.layers.Dense(1, activation = 'sigmoid')
])

**Knowledge check:**  Can you describe what your activation functions actually do?  Specifically:
 - Could you write a function in regular python that would recreate their behavior?
 - Can you explain why each one is useful for where it's being used?

In [5]:
# relu
def relu(X):
    # clip negative values to 0
    return np.maximum(0, X)

# sigmoid -- coerces values into a probabiity
def sigmoid(X):
    return 1 / (1 + np.exp(-X))

1). ReLu activations are typically used in the inner layers of a neural network, and they are useful here because they allow neural networks to train faster because their gradients are easy to calculate.  Stylistically, they are helpful because they introduce a very small amount of non-linearity, thus allowing the model to update its weights gradually, without missing an optimum.

2).  Sigmoid functions are 'squashing' functions that coerce values into a probability (some float between 0 and 1) and are useful for the final layer because they can mold a model's output into a formal prediction.

**Step 5:** Compile your model. 

This part is a little bit different from what we've done previously.  For a neural network, you have to explicitly tell it what type of loss function to use and (optionally) what metrics to track during training.  

Take a moment and read through the keras documentation to find the one that best suits this purpose:  https://keras.io/api/losses/ and use it in the compilation step.  

Also specify that your model use the accuracy metric during training.  You can read more about available metrics here:  https://keras.io/api/metrics/

In [6]:
# your answer here
mod.compile(loss = 'binary_crossentropy', metrics = ['acc'])

**Step 6:** Fit your model.

Use the following criteria:

 - 5 epochs (fitting rounds)
 - Use 20% of your training data for validation

In [None]:
# your answer here
mod.fit(X_train, y_train.values, epochs = 5, validation_split = 0.2)

Epoch 1/5
Epoch 2/5

**Step 7:** Make a prediction with your model on a single training sample to validate that it works, and evaluate your test set, and compare with your maximum validation score

In [None]:
# prediction for 1st sample
pred = mod.predict(X_train[:1])
print(f"Probability of 1st sample being a positive review: {pred:.2%}")

# test score
print("Test score is: {mod.evaluate(X_test, y_test)}")