# Model Comparison: First Attempt

In this notebook, three ML models:
- NBC
- RNN with Tokenizer
- RNN with TextVectorization

are evaluated on an unseen dataset (without labels) to assess their performance by manually checking the sentiment (human evaluation).

## Setup

In [1]:
import numpy as np
import pandas as pd

from tqdm import tqdm
tqdm.pandas()

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import load_model

import joblib

import sys
sys.path.append('../scripts')  # add the 'scripts' directory to sys.path
from word_normalization import preprocess_text  # for customized preprocessing

  if not hasattr(np, "object"):
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ASUS\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Loading Unseen Dataset

In [2]:
dataset = pd.read_csv('../datasets/a2_RestaurantReviews_FreshDump.tsv', delimiter = '\t', quoting = 3)
dataset

Unnamed: 0,Review
0,Spend your money elsewhere.
1,Their regular toasted bread was equally satisf...
2,The Buffet at Bellagio was far from what I ant...
3,"And the drinks are WEAK, people!"
4,-My order was not correct.
...,...
95,I think food should have flavor and texture an...
96,Appetite instantly gone.
97,Overall I was not impressed and would not go b...
98,"The whole experience was underwhelming, and I ..."


In [3]:
dataset = dataset.dropna()
dataset['Review'] = dataset['Review'].progress_apply(lambda x: preprocess_text(x))

100%|███████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 163.47it/s]


In [4]:
raw_X = dataset.Review
raw_X

0                                  spend money elsewher
1     regular toast bread equal satisfi occasion pat...
2                           buffet bellagio far anticip
3                                      drink weak peopl
4                                     order not correct
                            ...                        
95                        think food flavor textur lack
96                                 appetit instantli go
97                 overal not impress would not go back
98    whole experi underwhelm think well go ninja su...
99    have not wast enough life pour salt wound draw...
Name: Review, Length: 100, dtype: object

## Loading Models

There are two ways to use a trained model with unseen data:

1. **Modular**: Import the component that performs the vectorization, apply it to the unseen data, then import the model and use it on the vectorized unseen data.
    ```python
    count_vectorizer = joblib.load(<vectorizer only>)
    nbc = joblib.load(<model only>)
    ```

2. **All-in-one**: Import the object that already includes both the model and the vectorizer.
    ```python
    nbc = joblib.load(<GridCV that includes Pipeline that includes model and vectorizer>)
    ```

In [5]:
# model 1: NBC
nbc = joblib.load('../models/sentiment_analysis_nbc_model.joblib')

# model 2
tokenizer = joblib.load("../text_transformers/tokenizer.pkl")
rnn_tokenizer = load_model("../models/sentiment_analysis_rnn_tokenizer_model.keras")

# model 3
rnn_text_vectorization = load_model("../models/sentiment_analysis_rnn_textvectorization_model.keras")

## Prediction

In [6]:
y_pred = nbc.predict(raw_X)
print(y_pred)

[1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 2 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0
 0 1 0 1 0 0 0 0 0 1 0 0 2 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 1 1 0 0 0 0 0 2 1 0 1 0 0 0 0 0 1 0 1 0 1 0 0 1 0]


In [7]:
dataset['nbc_prediction'] = y_pred.tolist()

In [8]:
X_sequences = tokenizer.texts_to_sequences(raw_X)
X = pad_sequences(X_sequences, maxlen=80, padding='post')

prediction_probabilities = rnn_tokenizer.predict(X)
y_pred = np.argmax(prediction_probabilities, axis=1) # argmax: return the index of the maximum value # axis=1: for each row
print(y_pred)

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 113ms/step
[1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1
 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 0
 1 0 1 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 0 1 0]


In [9]:
dataset['rnn_tokenizer_prediction'] = y_pred.tolist()

In [10]:
prediction_probabilities = rnn_text_vectorization.predict(tf.convert_to_tensor(raw_X)) # convert to a tensor
y_pred = np.argmax(prediction_probabilities, axis=1)
print(y_pred)

[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 126ms/step
[1 1 1 1 0 0 1 1 0 1 0 2 1 0 0 0 1 1 1 1 0 2 1 1 1 0 1 1 0 2 0 1 1 0 0 2 1
 0 1 0 2 0 1 0 1 1 0 2 1 1 0 0 0 2 0 1 1 2 0 1 1 0 1 0 1 0 0 1 1 0 0 0 2 0
 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 0]


In [11]:
dataset['rnn_textvectorizer_prediction'] = y_pred.tolist()

In [12]:
dataset

Unnamed: 0,Review,nbc_prediction,rnn_tokenizer_prediction,rnn_textvectorizer_prediction
0,spend money elsewher,1,1,1
1,regular toast bread equal satisfi occasion pat...,1,1,1
2,buffet bellagio far anticip,1,0,1
3,drink weak peopl,0,0,1
4,order not correct,0,0,0
...,...,...,...,...
95,think food flavor textur lack,1,1,1
96,appetit instantli go,0,0,1
97,overal not impress would not go back,0,0,0
98,whole experi underwhelm think well go ninja su...,1,1,1


## Insight

The following facts are already known before hyperparameter tuning:
- Each model has been trained with 3 classes from the chosen training dataset.
- The currently chosen testing dataset contains only 2 classes.
- The class distribution within the chosen training dataset is not uniform.
- The average word-count-per-sentence after preprocessing is around 3-4, with a maximum of 19 in the training dataset.
- The average word-count-per-sentence after preprocessing is around 65, with a maximum of 472 in the testing dataset.

After comparing each model in this notebook, it is even more certain that these models perform poorly on unseen data.

Thus, **a hypothesis is made: the models are likely to perform much better when provided with datasets that include random samples and balanced class distributions.**

Before proceeding with the next comparison, each model will be retrained with improved datasets which will need to be selected again.

This notebook is done by `La Wun Nannda`.