<a href="https://colab.research.google.com/github/royn5618/Talks_Resources/blob/main/PyConPortugal2022/Improved_Keras_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**About:**

This notebook has an improved implmentation of an NLP classifier that predicts emotions using Keras Tuner with some other data and text based approaches for model improvement.

**Data Source on Kaggle:**

https://www.kaggle.com/praveengovi/emotions-dataset-for-nlp

**Data Source on HuggingFace:**

https://huggingface.co/datasets/emotion

**Recap Top 5 Techniques to Improve Model Performance**

 - Appending more data which eventually gives an ML model more examples to learn and generalize from.
 - Feature engineering for extracting helpful information from the given data that allow the model to easily and efficiently find patterns for predictions… and feature selection for treating the GIGO problem. Basically, this allows models to work with only a few useful features, remove noise, and save computation time and resources.
 - Try Multiple Algorithms to find the best-suited one for the predictions.
 - Use Cross-Validation for a robust and well-generalized model. Using cross-validation, you can train and test a model’s performance on multiple chunks of the dataset, get the average performances and figure out if a model is at its best or not.
 - Tune the Hyperparameters to identify the best combination suited for the dataset since they have a pivotal influence on the outcome of the model training process.

# Data import

In [None]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")

In [None]:
train_data = pd.read_csv('Data/train.txt', sep=';', names=['text', 'emotion'])
train_data.head()

In [None]:
test_data = pd.read_csv('Data/test.txt', sep=';', names=['text', 'emotion'])
test_data.head()

In [None]:
val_data = pd.read_csv('Data/val.txt', sep=';', names=['text', 'emotion'])
val_data.head()

### Note 

I will be using train set to train and validation set to validate the model training process. Previously, I had retained 33% of the train data for validation. Hence, now I have more data to train and validate the model upon.

# Data preparation
## Label Encoding

In [None]:
train_data["emotion"] = train_data["emotion"].astype('category')
train_data["emotion_label"] = train_data["emotion"].cat.codes
train_data.head()

In [None]:
test_data["emotion"] = test_data["emotion"].astype('category')
test_data["emotion_label"] = test_data["emotion"].cat.codes
test_data.head()

In [None]:
val_data["emotion"] = val_data["emotion"].astype('category')
val_data["emotion_label"] = val_data["emotion"].cat.codes
val_data.head()

## One Hot Encoding

In [None]:
import tensorflow as tf

In [None]:
train_features, train_labels = train_data['text'], tf.one_hot(train_data["emotion_label"], 6)
test_features, test_labels = test_data['text'], tf.one_hot(test_data["emotion_label"], 6)
val_features, val_labels = val_data['text'], tf.one_hot(val_data["emotion_label"], 6)

In [None]:
train_features[:5]

In [None]:
train_labels[:5]

## Decoder

In [None]:
def get_labels_from_oh_code(oh_code):
    """ Takes in one-hot encoded matrix
    Returns a list of decoded categories"""
    label_code = np.argmax(oh_code, axis=1)
#     print(label_code)
    label = test_data.emotion.cat.categories[label_code]
#     print(list(label))
    return list(label)

In [None]:
"Test Method"
test= np.array(train_labels[:5])
get_labels_from_oh_code(test)

## Text Preprocessing

### Note 

I will be removing stopwords and stem the tokens. Stemming is crude heuristic process where variations of the same words/concepts are transformed into the root form. Stemmed tokens might not be a lexical word.

Example: 
```
Important -> import
Imported -> import
bravery -> braveri
```

In [None]:
import nltk
nltk.download('punkt')
nltk.download('stopwords')
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

STOPWORDS = stopwords.words('english')
PORTER_STEMMER = PorterStemmer()

In [None]:
def preprocess_text(text):
    filtered_text = []
    for each_word in word_tokenize(text):
        if each_word not in STOPWORDS:
            filtered_text.append(PORTER_STEMMER.stem(each_word))
    return " ".join(filtered_text)

In [None]:
train_data['text'] = train_data.text.apply(preprocess_text)
test_data['text'] = test_data.text.apply(preprocess_text)
val_data['text'] = val_data.text.apply(preprocess_text)

In [None]:
import tensorflow.keras as keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [None]:
vocab_size = 10000
max_seq_len = 20

tokenizer = Tokenizer(oov_token = "<OOV>", num_words=vocab_size, lower=True)
tokenizer.fit_on_texts(train_data['text'])

sequences_train = tokenizer.texts_to_sequences(train_data['text'])
sequences_test = tokenizer.texts_to_sequences(test_data['text'])
sequences_val = tokenizer.texts_to_sequences(val_data['text'])

padded_train = pad_sequences(sequences_train, padding = 'post', maxlen=max_seq_len)
padded_test = pad_sequences(sequences_test, padding = 'post', maxlen=max_seq_len)
padded_val = pad_sequences(sequences_val, padding = 'post', maxlen=max_seq_len)

# Model Designing

### Note

I will be using hyperparameter tuning technique to select the best layer configurations for my ANN.

**Hyperparameter Tuning**: choosing a set of optimal hyperparameters for a learning algorithm.

- Random
- Grid Search
- Bayesian
- Early-stopping based



## Build a Hyperband Model

### Hyperband Tuner

An extension of the Successive Halving Algorithm(SHA) for **adaptive resource allocation** with **early stopping** .

**Successive Halving Algorithm:**
1. Uniformly allocate all resources to the hyperparameter sets and tune them using half the resources/time.

2. The top-half best performing set of hyperparameters is then “progressed” onto the next stage where the resulting models are trained with higher resources/time allocated to them.

3. Repeat until there is only one configuration.


**Hyperband Tuner** uses **η** which is the **rate of elimination** where only 1/ η of the hyperparameter sets are progressed to the next bracket for training and evaluation. η is determined by the formula ```rounded to nearest(1 + logbase=factor(max_epochs))``` in this Keras implementation of the algorithm.

### Hyperparamters Chosen:
```
vector_size — Integer| range 100 to 500, step size: 100
dropout_rate — Float | range 0.6 to 0.9, step size: 0.1
lstm_units1 — Integer| range 32 to 512, step size: 32
lstm_units2 — Integer| range 16 to 512, step size: 32
learning_rate — Choice| 1e-2, 1e-3, 1e-4
```



In [None]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Embedding, Dropout, LSTM

In [None]:
!pip install -q -U keras-tuner

In [None]:
import keras_tuner as kt

In [None]:
vector_size = 300

def model_builder(hp):
    model = Sequential()
    hp_vector_size = hp.Int('vector_size', min_value=100, max_value=500, step=100)
    model.add(
        Embedding(input_dim=vocab_size,
                  output_dim=vector_size,
                  input_length=max_seq_len))
    hp_dropout_rate = hp.Float('dropout_rate', min_value=0.6, max_value=0.9, step=0.1)
    model.add(Dropout(hp_dropout_rate))
    hp_lstm_units1 = hp.Int('lstm_units1', min_value=32, max_value=512, step=32)
    model.add(LSTM(hp_lstm_units1,return_sequences=True))
    hp_lstm_units2 = hp.Int('lstm_units2', min_value=16, max_value=512, step=32)
    model.add(LSTM(hp_lstm_units2))
    model.add(Dense(6,activation='softmax'))
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp_learning_rate),
              loss='categorical_crossentropy',
              metrics=[tf.keras.metrics.Recall(), tf.keras.metrics.Precision()])
    return model

In [None]:
tuner = kt.Hyperband(model_builder,
                     objective=kt.Objective("val_recall", direction="max"),
                     max_epochs=20,
                     factor=3,
                     directory="model_trials_1",
                     project_name="emotion_detector_1"
                     )

In [None]:
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
                                              mode='min', 
                                              patience=5)

In [None]:
tuner.search(padded_train, 
             train_labels, 
             epochs=20, 
             validation_data=(padded_val, val_labels), 
             callbacks=[stop_early]
             )

In [None]:
# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

In [None]:
best_hps.get('vector_size')

In [None]:
best_hps.get('dropout_rate')

## Build Model with Best HyperParameters

In [None]:
# Build the model with the optimal hyperparameters and train it on the data for 50 epochs
model_best_hp = tuner.hypermodel.build(best_hps)

In [None]:
model_best_hp.summary()

### Set Callbacks

In [None]:
callbacks = [
    keras.callbacks.EarlyStopping(monitor='val_loss',
                                  mode='min',
                                  patience=5,
                                  verbose=1,
                                  restore_best_weights=True)  
]

## Train Model

In [None]:
history = model_best_hp.fit(padded_train, 
                            train_labels, 
                            epochs=20,
                            callbacks=callbacks,
                            validation_data=(padded_val, val_labels))

In [None]:
model_best_hp.metrics

## Visualize and verify the Loss per epoch

In [None]:
from plotly.graph_objs import *

In [None]:
metric_to_plot = "loss"
epochs = list(range(1, max(history.epoch) + 2))
training_loss = history.history[metric_to_plot]
validation_loss = history.history["val_" + metric_to_plot]

trace1 = {
    "mode": "lines+markers",
    "name": "Training Loss",
    "type": "scatter",
    "x": epochs,
    "y": training_loss
}

trace2 = {
    "mode": "lines+markers",
    "name": "Validation Loss",
    "type": "scatter",
    "x": epochs,
    "y": validation_loss
}

data = Data([trace1, trace2])
layout = {
    "title": "Training - Validation Loss",
    "xaxis": {
        "title": "Number of epochs",
        "titlefont": {
            "size": 18,
            "color": "#7f7f7f"
        }
    },
    "yaxis": {
        "title": "Loss",
        "titlefont": {
            "size": 18,
            "color": "#7f7f7f"
        }
    }
}
fig = Figure(data=data, layout=layout)
fig.update_layout(hovermode="x unified")
fig.show()

# Model Evaluation

In [None]:
from sklearn.metrics import classification_report

In [None]:
y_pred_one_hot_encoded = (model_best_hp.predict(padded_train)> 0.5).astype("int32")
y_pred_one_hot_encoded

In [None]:
y_pred = np.array(tf.argmax(y_pred_one_hot_encoded, axis=1))
print(classification_report(train_data['emotion_label'], y_pred))

In [None]:
# Model Evaluation on Test Data
y_pred_one_hot_encoded = (model_best_hp.predict(padded_test)> 0.5).astype("int32")
y_pred = np.array(tf.argmax(y_pred_one_hot_encoded, axis=1))
print(classification_report(test_data['emotion_label'], y_pred))

### Note

 - Cross validation for deep learning models is computationally expensive, and time consuming. You can use KFolds or Stratified Folds(for imbalanced data) and run model training for each fold and then compare scores.
 - Use other RNN layers instead of LSTM, other architecture designs as well to check the performance. Some experiments are here: https://github.com/royn5618/Medium_Blog_Codes/tree/master/Emotion%20Detection