### Packages import

In [None]:
# Import base packages
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Import DL packages
import tensorflow as tf
print(f'TensorFlow version: {tf.__version__}')
print(f'Keras version: {tf.keras.__version__}')
import tensorflow_hub as hub
from tensorflow.keras import regularizers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Read train/test datasets
Create a function to read train and test datasets with follow actions:
- Have a look at readme.txt from the unzipped folder to get more information about the datasets
- Columns should be renamed to 'rate' and 'text'
- Take a random sample of 5000 records for training and test datasets
- Positive labels should be mapped to 0 (instead of 2 in the initial dataset)

In [None]:
def read_format_dataset(dataset_path):
    
    return()

train_dataset_path = 'path_to_train_data'
test_dataset_path = 'path_to_test_data'
train_data = read_format_dataset(train_dataset_path)
test_data = read_format_dataset(test_dataset_path)
train_data.head()

### Define your Keras model using Transfer Learning
Now you should define your NN structure using Keras sequential layers. Your base model will come from TensorFlow Hub with source url https://tfhub.dev/google/universal-sentence-encoder/4.
To import this base model in your structure, you must use hub.KerasLayer function (https://www.tensorflow.org/hub/api_docs/python/hub/KerasLayer). You should add following parameters:
- input_shape = []
- dtype = tf.string
- trainable = False

Please note this trainable option that allows to retrain the entire NN or not.

Now that you have your base model, you should add a new layer on top to predict a probability for our 2 classes (Positive/Negative). Which layer would you use for this ? Which activation function ?

Your final model should have two layers:
- base model with 256797824 params
- prediction layer with 513 params

Please have a look at the number of training params and its relation with base model option trainable.

In [None]:
model = ...
model.summary()

### Model compiler
Compile your Keras model using an Adam optimizer, binary crossentropy for the loss and accuracy as the target metric

In [None]:
model.compile(...)

### Model training
Split your training data into x_train, x_valid, y_train, y_valid using sklearn function. Test size must be set to 0.3 and the repartition of the target variable should be similar between your valid and training samples.

In [None]:
x_train, x_valid, y_train, y_valid = ...

Now you can train your NN by providing your training and valid datasets. Number of epochs can be set to 5 for now. You need to save the model fit output into history variable, so we can plot the loss later.

In [None]:
history = model.fit(...)

### Train/validation error history
You can plot the training/validation error and accuracy using the following function.

In [None]:
def plot_loss_acc(history):
    """Plot training and (optionally) validation loss and accuracy"""

    loss = history.history['loss']
    epochs = range(1, len(loss) + 1)

    plt.figure(figsize=(10, 10))

    plt.subplot(2, 1, 1)
    plt.plot(epochs, loss, '.--', label='Training loss')
    final_loss = loss[-1]
    title = 'Training loss: {:.4f}'.format(final_loss)
    plt.ylabel('Loss')
    if 'val_loss' in history.history:
        val_loss = history.history['val_loss']
        plt.plot(epochs, val_loss, 'o-', label='Validation loss')
        final_val_loss = val_loss[-1]
        title += ', Validation loss: {:.4f}'.format(final_val_loss)
    plt.title(title)
    plt.legend()

    acc = history.history['accuracy']

    plt.subplot(2, 1, 2)
    plt.plot(epochs, acc, '.--', label='Training acc')
    final_acc = acc[-1]
    title = 'Training accuracy: {:.2f}%'.format(final_acc * 100)
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    if 'val_accuracy' in history.history:
        val_acc = history.history['val_accuracy']
        plt.plot(epochs, val_acc, 'o-', label='Validation acc')
        final_val_acc = val_acc[-1]
        title += ', Validation accuracy: {:.2f}%'.format(final_val_acc * 100)
    plt.title(title)
    plt.legend()

plot_loss_acc(history)

### Performance on test dataset
Compute the accuracy for our test dataset

In [None]:
y_pred = ...
acc = ...
print(acc)

### 1st interpretation
Apart from the final precision, what do you think about the training/validation curves ? Is there any evidence of overfitting when we freeze the base layer ?

Now, you can go back to the model definition and unfreeze our base layers. You should see the difference with the number of trainable parameters. No other parameters should be updated. Obviously training your model will take longer.

### 2nd interpretation
What do you notice now with our training/validation curves? Do you have any hints about why we observe such results ?

You need to find a way to solve this problem. Please go back to the model definition and try to add new type of layer.