# Amazon Review Classification Using Statistical Learning and Deep Learning

In this notebook, we examine review classification via a variety of methods.

## Word2Vec and Similarity Comparisons
We begin by examining Word2Vec. Word2Vec is a library that allows us to train on our own corpus to find embeddings of words. First, we examine the GoogleNews-300 vector dataset which is a pretrained embeddings corpus based on Google News. 

We compute similarities on phrases such as `Awesome ~ Amazing and excellent ~ outstanding` on the Google News corpus and the reviews corpus and we find the following results

Google News Corpus

*   Awesome ~ Amazing: Awesome is 82.82865285873413% similar to Amazing 
*   excellent ~ outstanding: Awesome is Excellent is 55.67485690116882% similar to Outstanding 

Amazon Corpus:


*   Awesome ~ Amazing: Awesome is 77.89764404296875% similar to Amazing 
*   excellent ~ outstanding: Excellent is 63.154661655426025% similar to Outstanding 

For one of the words above, the dataset we provided is better. This is because we trained on a corpus of reviews, and the words we tested on correspond to sentiment words.

The reviews corpus is only trained on a small subset of reviews and thus has a smaller vocabulary (so it might fail when a word we encounter is not within the corpus); in a broader sense, it is best to use the Google News Corpus

## Training Simple Models
Before we train simple models, we take the average embeddings of all of the words in a review and use that as our embeddings per review and perform a 80-20 split of train/test data. 

## Perceptron
We train a perceptron model on the averaged embeddings and find a **60.63%** accuracy

## Linear SVM
We train a linear SVM model on the averaged embeddings and find a **68.18%** accuracy

### Accuracy Comparisons with TFIDF
*   Perceptron (TFIDF): 64.06%
*   Perceptron (Word2Vec): 60.63%
*   SVM (TFIDF): 70.58%
*   SVM (Word2Vec): 68.18%


When training using the perceptron, TFIDF performs better than Word2Vec (64.06% using TFIDF, 60.6% using Word2Vec)

Similarly, training using the perceptron, TFIDF performs better than Word2Vec (70.58% using TFIDF, 68.18% using Word2Vec)

Overall, TFIDF performs better. 

Both show improvement when using SVM. The perceptron may be overfitting and thus the SVM with the features averaged performs better

## Feedforward Neural Network
A feedforward neural network is a network with hidden layers that train in one direction without aa hidden state capturing information about thue past. FFNNs are typically used for tasks such as image classification, text classification, and regression, where the input data is fixed in size and there is no temporal relationship between observations.

We train a feedforward neural network with 2 hidden layers and each with 100 and 10 nodes, respectively. We use our averaged embeddings as features from earlier. We use cross-entropy as our loss function along with an Adam Optimizer with betas (0.9, 0.999) and a learning rate of 0.001 and train for 20 epochs.

We obtain an accuracy of **69.34%** using the configuration described above.

### First 10 words
We then condense our features and only use the first 10 words of every review. 

We obtain an accuracy of **59.56%** using only the first 10 words, which intuitively makes sense since we are losing context.


## Recurrent Neural Network
Recurrent neural networks introduce the context of "hidden states" which allow them to incorporate the idea of sequential memory. The hidden state is updated at each time step and at each iteration the hidden state from the previous time step is used as input to the network alongside the input at the current time step.

We use the embeddings of the first 20 words of the review, truncating the review if longer or padding with 0 if shorter as our features.

Our RNN has an input_size of 300 dimensions, a hidden dimension of size 20, and a batch size of 32 with an Adam Optimizer with betas (0.9, 0.999) and a learning rate of 0.001 and train for 20 epochs, using cross entropy as our loss function. 

We obtain an accuracy of **58.00%** on the test set after training the simple RNN.

## Gated Recurrent Unit (GRU)
GRUs have the concepts of an "update" gate and a "reset" gate which control how much of the current state to use and how much of the past information is important. The update gate z_t controls how much of the new information to incorporate into the hidden state, while the reset gate r_t controls how much of the previous hidden state to forget.

The gates are defined as follows:

z_t = sigmoid(W_xz * x_t + W_hz * h_t-1) <br>
r_t = sigmoid(W_xr * x_t + W_hr * h_t-1)

where W_xz, W_hz, W_xr, and W_hr are weight matrices that determine the contribution of the input and the previous hidden state to each gate, and sigmoid is the sigmoid activation function.

Our GRU has an input_size of 3 layers, 300 dimensions, a hidden dimension of size 20, and a batch size of 32 with an Adam Optimizer with betas (0.9, 0.999) and a learning rate of 0.001 and train for 20 epochs, using cross entropy as our loss function. 

We obtain an accuracy of **59.04%** on the test set after training the GRU.

## LSTM
LSTMs are similar to GRUs, but have an input gate, a forget gate, and an output gate. . At each time step t, the model receives an input x_t and computes the activations of the gates and the new cell state c_t~ with the weights and biases of the hidden state and all the gates. 

The input gate i_t controls how much of the new input to incorporate into the new cell state c_t , while the forget gate f_t controls how much of the previous cell state c_t-1 to forget. The output gate o_t determines how much of the new cell state to output as the hidden state h_t. The cell state c_t is updated as a combination of the previous cell state c_t-1 and the new cell state c_t~.

By using these gates and the cell state, the LSTM architecture can selectively learn and forget information over time, allowing it to capture long-term dependencies more effectively. The input gate i_t and the forget gate f_t determine how much information to store in the cell state, while the output gate o_t determines how much of the stored information to output as the hidden state h_t. The hidden state can be seen as the "summary" of the previous inputs, which is then used as input to the next time step. 

The cell state can be seen as the "long-term memory" of the model, which can store information over multiple time steps and selectively forget or update it based on the activations of the gates.

Our LSTM has an input_size of 4 layers, 300 dimensions, a hidden dimension of size 20, and a batch size of 32 with an Adam Optimizer with betas (0.9, 0.999) and a learning rate of 0.001 and train for 20 epochs, using cross entropy as our loss function. 

We obtain an accuracy of **60.75%** on the test set after training the LSTM.



# Environment Setup

In [1]:
env = 'colab'

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
# Dataset: https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Beauty_v1_00.tsv.gz
!wget https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Beauty_v1_00.tsv.gz
!gunzip /content/amazon_reviews_us_Beauty_v1_00.tsv.gz

--2023-02-24 20:36:23--  https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Beauty_v1_00.tsv.gz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.172.72, 52.216.166.109, 52.216.245.14, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.172.72|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 914070021 (872M) [application/x-gzip]
Saving to: ‘amazon_reviews_us_Beauty_v1_00.tsv.gz’


2023-02-24 20:37:39 (11.6 MB/s) - ‘amazon_reviews_us_Beauty_v1_00.tsv.gz’ saved [914070021/914070021]



In [4]:
import pandas as pd
import numpy as np
# import nltk
# import re
# from bs4 import BeautifulSoup

In [5]:
dataset_path = None
if env == "colab":
   dataset_path = './amazon_reviews_us_Beauty_v1_00.tsv'
else:
    dataset_path = './data.tsv'

In [6]:
from google.colab import drive
import os
drive.mount('/content/drive')

Mounted at /content/drive


In [7]:
import gensim
wv = gensim.models.KeyedVectors.load_word2vec_format("/content/drive/MyDrive/GoogleNews-vectors-negative300.bin.gz", binary=True)

# Word Embeddings
Generate word2vec embeddings- learning semantic and syntactic relationships between words

## Checking semantic similarities on word2vec-google-news-300 
On phrases such as `Awesome ~ Amazing and excellent ~ outstanding`

In [None]:
similar_excellent = wv.similarity('excellent', 'outstanding')
print(f'Excellent is {similar_excellent * 100}% similar to Outstanding ')

Excellent is 55.67485690116882% similar to Outstanding 


In [None]:
similar_awesome = wv.similarity('awesome', 'amazing')
print(f'Awesome is {similar_awesome * 100}% similar to Amazing ')

Awesome is 82.82865285873413% similar to Amazing 


# Training Word2Vec on Amazon Reviews Dataset

## Preprocessing Dataset
Removing HTML, URLs, converting floats to strings, dropping all columns except for the reviews

In [8]:
import gc
gc.collect()

14

In [9]:
merged_df = pd.read_csv(dataset_path, sep='\t', on_bad_lines='skip')

In [10]:
merged_df.drop(merged_df.columns.difference(['review_body', 'star_rating']), 1, inplace=True)

In [11]:
merged_df.loc[merged_df['star_rating'] == '1', 'class'] = 1
merged_df.loc[merged_df['star_rating'] == '1', 'class'] = 1
merged_df.loc[merged_df['star_rating'] == '3', 'class'] = 2
merged_df.loc[merged_df['star_rating'] == '4', 'class'] = 3
merged_df.loc[merged_df['star_rating'] == '5', 'class'] = 3

In [12]:
merged_df["review_body"] = merged_df["review_body"].apply(lambda x: str(x) if type(x)==float else x)

In [13]:
class1_sample = merged_df.loc[merged_df["class"] == 1].sample(n = 20000)
class2_sample = merged_df.loc[merged_df["class"] == 2].sample(n = 20000)
class3_sample = merged_df.loc[merged_df["class"] == 3].sample(n = 20000)

In [14]:
merged_df = pd.concat([class1_sample, class2_sample, class3_sample], axis=0)

In [15]:
reviews = merged_df["review_body"].values.tolist()

In [16]:
tokenized_reviews = [review.lower().split() for review in reviews]

In [None]:
import gensim.models

In [None]:
model = gensim.models.Word2Vec(sentences=tokenized_reviews, size=300, window = 13, min_count=9)

In [None]:
similar_excellent_rev = model.similarity('excellent', 'outstanding')
print(f'Excellent is {similar_excellent_rev * 100}% similar to Outstanding ')

Excellent is 63.154661655426025% similar to Outstanding 


In [None]:
similar_awesome_rev = model.similarity('awesome', 'amazing')
print(f'Awesome is {similar_awesome_rev * 100}% similar to Amazing ')

Awesome is 77.89764404296875% similar to Amazing 


For one of the words above, the dataset we provided is better. This is because we trained on a corpus of reviews, and the words we tested on correspond to sentiment words. <br>

The reviews corpus is only trained on a small subset of reviews and thus has a smaller vocabulary (so it might fail when a word we encounter is not within the corpus); in a broader sense, it is best to use the Google News Corpus

## Training simple models

In [25]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [18]:
labels_Y = merged_df["class"].values
labels_Y = labels_Y[:, np.newaxis]

In [19]:
def average_vectors(reviews, model):
  reviews_X = np.empty((60000, 300))
  for idx, review in enumerate(reviews):
    vectors = [model[word] if word in model else 0.0 for word in review]
    avg_vectors = np.mean(vectors, axis=0)
    reviews_X[idx] = avg_vectors
  return reviews_X

In [20]:
reviews_X = average_vectors(tokenized_reviews, wv)

In [21]:
X_train, X_test, y_train, y_test = train_test_split(reviews_X, labels_Y, test_size=0.2, random_state=42)

In [None]:
# Check the shapes of the resulting matrices
if env == 'colab':
  print("Shape of X_train:", X_train.shape)
  print("Shape of X_test:", X_test.shape)
  print("Shape of y_train:", y_train.shape)
  print("Shape of y_test:", y_test.shape)

Shape of X_train: (48000, 300)
Shape of X_test: (12000, 300)
Shape of y_train: (48000, 1)
Shape of y_test: (12000, 1)


### Perceptron Model with Word2Vec

In [23]:
from sklearn.linear_model import Perceptron

perceptron = Perceptron(random_state=0)

perceptron.fit(X_train, y_train)

Perceptron()

In [24]:
# Make predictions on the testing data
y_pred_perceptron = perceptron.predict(X_test)

In [26]:
acc_perceptron = accuracy_score(y_test, y_pred_perceptron)
print(f'Perceptron accuracy: {acc_perceptron}')

Perceptron accuracy: 0.6063333333333333


### SVM with Word2Vec

In [27]:
from sklearn.svm import LinearSVC

svm = LinearSVC(random_state=0, max_iter=1000).fit(X_train, y_train)

In [28]:
svm_pred = svm.predict(X_test)

In [29]:
acc_svm = accuracy_score(y_test, svm_pred)
print(f'SVM accuracy: {acc_svm}')

SVM accuracy: 0.6818333333333333


When training using the perceptron, TFIDF performs better than Word2Vec (64.06% using TFIDF, 60.6% using Word2Vec)

Similarly, training using the perceptron, TFIDF performs better than Word2Vec (70.58% using TFIDF, 68.18% using Word2Vec)

Both show improvement when using SVM. The perceptron may be overfitting and thus the SVM with the features averaged performs better

## Feedforward Neural Network

Below, we examine using a feedforward neural network using the average of the word2vec vectors, similar to above.

Converting the SKLearn data to PyTorch dataloader, curteousy of https://www.kaggle.com/code/glebbuzin/solving-sklearn-datasets-with-pytorch

In [30]:
def one_hot(y, num_classes):
  y_t = torch.transpose(y, 0, 1).squeeze() 
  one_hot = torch.nn.functional.one_hot(y_t, num_classes)
  return one_hot

In [31]:
import torch
from torch.utils.data import TensorDataset, DataLoader
from torch import nn
import torch.nn.functional as F

In [32]:
import torch
X_train_t = torch.from_numpy(X_train).to(torch.float32)
y_train_t = torch.from_numpy(y_train).to(torch.long)
X_test_t = torch.from_numpy(X_test).to(torch.float32)
y_test_t = torch.from_numpy(y_test).to(torch.long)

In [33]:
y_train_t = one_hot(y_train_t, 4)
y_test_t = one_hot(y_test_t, 4)

In [None]:
train_dataset = TensorDataset(X_train_t, y_train_t)
test_dataset = TensorDataset(X_test_t, y_test_t)
train_dataloader = DataLoader(train_dataset, batch_size=32)
test_dataloader = DataLoader(test_dataset, batch_size=32)

In [None]:
class FeedForwardNN(nn.Module):
  def __init__(self, input_size, output_size):
    super(FeedForwardNN, self).__init__()
    self.input_size = input_size
    self.output_size = output_size

    self.in1 = nn.Linear(input_size, 100)
    self.relu1 = nn.ReLU()
    self.l2 = nn.Linear(100, 10)
    self.relu2 = nn.ReLU()
    self.out = nn.Linear(10, output_size)
  def forward(self, x):
    y = self.in1(x)
    y = self.relu1(y)
    y = self.l2(y)
    y = self.relu2(y)
    y = self.out(y)
    return y

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device

'cuda'

In [None]:
model = FeedForwardNN(300, 4).to(device)
print(model)

FeedForwardNN(
  (in1): Linear(in_features=300, out_features=100, bias=True)
  (relu1): ReLU()
  (l2): Linear(in_features=100, out_features=10, bias=True)
  (relu2): ReLU()
  (out): Linear(in_features=10, out_features=4, bias=True)
)


In [None]:
import gc
class Trainer:
  def __init__(self, model, training_data, device):
    self.hyperparams = {
        'lr': 1e-3,
        'epochs': 20,
        'betas': (0.9, 0.999)
    }
    self.td = training_data
    self.device = device
    self.loss_fn = nn.CrossEntropyLoss()
    self.model = model
    self.optim = torch.optim.Adam(self.model.parameters(), lr = self.hyperparams['lr'], betas=(self.hyperparams['betas']))
  
  def calc_accuracy(self, output, y):
    pred = torch.argmax(output, dim=1)
    y = torch.argmax(y, dim=1)
    return (pred == y).sum().item() / len(y)
  
  def calc_accuracy_test(self, output, y):
    pred = torch.argmax(output, dim=1)
    return (pred == y).sum().item() / len(y)
  
  def train_epoch(self):
    running_loss = 0.0
    running_acc = 0.0

    for review, label in self.td:
      self.optim.zero_grad()
      review = review.to(self.device, dtype=torch.float32)
      label = label.to(self.device, dtype=torch.float32)
      output = self.model(review)
      if isinstance(output, tuple):
        output = output[0]

      loss = self.loss_fn(output, label)

      loss.backward()
      self.optim.step()

      running_loss += loss.item()
      running_acc += self.calc_accuracy(output, label)

      del review, label, output

    train_loss = running_loss / len(self.td)
    training_acc = running_acc / len(self.td)
    return train_loss, training_acc
  
  def fit(self):        
    train_losses,train_accs = [], []
    min_vl = 999999

    for epoch in range(self.hyperparams['epochs']):
        
        print(f"------EPOCH {epoch+1}/{self.hyperparams['epochs']}------")
        
        self.model.train()
        
        train_loss, train_acc = self.train_epoch()
        train_losses.append(train_loss)
        train_accs.append(train_acc)
        

        
        print(f"Training LOSS: {train_loss} | ACCURACY: {train_acc}")
        
        
        
        # CLEANUP
        gc.collect()
        torch.cuda.empty_cache()
        

    return (train_losses, train_accs)
            




  

In [None]:
trainer = Trainer(model, train_dataloader, device)

In [None]:
(train_losses, train_accs) = trainer.fit()

------EPOCH 1/20------
Training LOSS: 0.8580149493018786 | ACCURACY: 0.616
------EPOCH 2/20------
Training LOSS: 0.7594271267851194 | ACCURACY: 0.6742708333333334
------EPOCH 3/20------
Training LOSS: 0.7386357170740764 | ACCURACY: 0.6841041666666666
------EPOCH 4/20------
Training LOSS: 0.7250249315301577 | ACCURACY: 0.6900416666666667
------EPOCH 5/20------
Training LOSS: 0.7146575082937876 | ACCURACY: 0.6951666666666667
------EPOCH 6/20------
Training LOSS: 0.7060685671965281 | ACCURACY: 0.6981666666666667
------EPOCH 7/20------
Training LOSS: 0.6984484259088835 | ACCURACY: 0.7018125
------EPOCH 8/20------
Training LOSS: 0.6910099113980929 | ACCURACY: 0.704875
------EPOCH 9/20------
Training LOSS: 0.6836891483068466 | ACCURACY: 0.7082083333333333
------EPOCH 10/20------
Training LOSS: 0.6768757694164912 | ACCURACY: 0.7116875
------EPOCH 11/20------
Training LOSS: 0.670400546014309 | ACCURACY: 0.7146458333333333
------EPOCH 12/20------
Training LOSS: 0.664261295457681 | ACCURACY: 0.7

In [None]:
@torch.no_grad()
def infer(trainer, model, test_dataloader):
    
    running_loss = 0
    running_acc = 0
    test_output = []
    true_res = []
    incorrect_examples = []
    incorrect_labels = []
    incorrect_pred = []
    
    
    for x,y in test_dataloader:
        
        x = x.to(device, dtype=torch.float)
        y = y.to(device, dtype=torch.float)
        
        output = model(x)
        if isinstance(output, tuple):
          output = output[0]
        pred = torch.argmax(output, dim=1)
        y = torch.argmax(y, dim=1)
        test_output.append(pred)
        true_res.append(y)

        idxs_mask = ((pred == y.view_as(pred))==False).view(-1)
        if idxs_mask.numel(): #if index masks is non-empty append the correspoding data value in incorrect examples
          incorrect_examples.append(x[idxs_mask].squeeze().cpu().numpy())
          incorrect_labels.append(y[idxs_mask].cpu().numpy()) #the corresponding target to the misclassified image
          incorrect_pred.append(pred[idxs_mask].squeeze().cpu().numpy()) #the corresponiding predicted class of the misclassified image
        
        loss = trainer.loss_fn(output, y)
        
        running_loss += loss.item()
        running_acc += trainer.calc_accuracy_test(output,y)
        
        del x,y,output
        
    test_loss = running_loss/len(test_dataloader)
    test_acc = running_acc/len(test_dataloader)
    
    return test_loss, test_acc, test_output, true_res, incorrect_examples, incorrect_labels, incorrect_pred


In [None]:
test_loss_plain, test_acc_plain, test_out, true_res, incorrect_examples, incorrect_labels, incorrect_pred = infer(model, test_dataloader)

In [None]:
print(f'Test Loss: {test_loss_plain}')
print(f'Test Accuracy: {test_acc_plain}')

Test Loss: 0.7319384795029958
Test Accuracy: 0.6934166666666667


### Using only the first 10 words


In [None]:
def topten_reviews(reviews, model):
    num_reviews = len(reviews)
    vector_size = model.vector_size
    reviews_X = np.zeros((num_reviews, 10, vector_size))
    for i, review in enumerate(reviews):
        num_words = min(len(review), 10)
        for j in range(num_words):
            word = review[j]
            if word in model:
                reviews_X[i,j,:] = model[word]
    return reviews_X

In [None]:
reviews_top10 = topten_reviews(tokenized_reviews, wv)

In [None]:
X_train_t10, X_test_t10, y_train_t10, y_test_t10 = train_test_split(reviews_top10, labels_Y, test_size=0.2, random_state=42)

In [None]:
import torch
X_train_reduced_t = torch.from_numpy(X_train_t10).to(torch.float32)
y_train_t_reduced = torch.from_numpy(y_train_t10).to(torch.long)
X_test_reduced_t = torch.from_numpy(X_test_t10).to(torch.float32)
y_test_t_reduced = torch.from_numpy(y_test_t10).to(torch.long)

In [None]:
y_train_t_reduced = one_hot(y_train_t_reduced, 4)
y_test_t_reduced = one_hot(y_test_t_reduced, 4)

In [None]:
# Check the shapes of the resulting matrices
if env == 'colab':
  print("Shape of X_train:", X_train_reduced_t.shape)
  print("Shape of X_test:", X_test_reduced_t.shape)
  print("Shape of y_train:", y_train_t_reduced.shape)
  print("Shape of y_test:", y_test_t_reduced.shape)

Shape of X_train: torch.Size([48000, 10, 300])
Shape of X_test: torch.Size([12000, 10, 300])
Shape of y_train: torch.Size([48000, 4])
Shape of y_test: torch.Size([12000, 4])


In [None]:
train_dataset_reduced = TensorDataset(X_train_reduced_t, y_train_t_reduced)
test_dataset_reduced = TensorDataset(X_test_reduced_t, y_test_t_reduced)
train_dataloader_reduced = DataLoader(train_dataset_reduced, batch_size=32)
test_dataloader_reduced = DataLoader(test_dataset_reduced, batch_size=32)

In [None]:
class FeedForwardNNTopX(nn.Module):
  def __init__(self, input_size, output_size):
    super(FeedForwardNNTopX, self).__init__()
    self.input_size = input_size
    self.output_size = output_size

    self.in1 = nn.Linear(input_size, 100)
    self.relu1 = nn.ReLU()
    self.l2 = nn.Linear(100, 10)
    self.relu2 = nn.ReLU()
    self.out = nn.Linear(10, output_size)

  def forward(self, x):
    batch_size = x.size(0)
    x = x.view(batch_size, -1)  # Reshape input to (batch_size, input_size)
    y = self.in1(x)
    y = self.relu1(y)
    y = self.l2(y)
    y = self.relu2(y)
    y = self.out(y)
    return y

In [None]:
model_reduced = FeedForwardNNTopX(3000, 4).to(device)
print(model_reduced)

FeedForwardNNTop10(
  (in1): Linear(in_features=3000, out_features=100, bias=True)
  (relu1): ReLU()
  (l2): Linear(in_features=100, out_features=10, bias=True)
  (relu2): ReLU()
  (out): Linear(in_features=10, out_features=4, bias=True)
)


In [None]:
trainer_reduced = Trainer(model_reduced, train_dataloader_reduced, device)

In [None]:
(train_losses, train_accs) = trainer_reduced.fit()

------EPOCH 1/20------
Training LOSS: 0.8609319173494975 | ACCURACY: 0.5948541666666667
------EPOCH 2/20------
Training LOSS: 0.7586062659819921 | ACCURACY: 0.6608125
------EPOCH 3/20------
Training LOSS: 0.6888598437905311 | ACCURACY: 0.7007708333333333
------EPOCH 4/20------
Training LOSS: 0.6045685842235883 | ACCURACY: 0.7456458333333333
------EPOCH 5/20------
Training LOSS: 0.5123702744940917 | ACCURACY: 0.79275
------EPOCH 6/20------
Training LOSS: 0.42147025881210964 | ACCURACY: 0.836
------EPOCH 7/20------
Training LOSS: 0.33807916860779125 | ACCURACY: 0.8742708333333333
------EPOCH 8/20------
Training LOSS: 0.2696292301391562 | ACCURACY: 0.9031458333333333
------EPOCH 9/20------
Training LOSS: 0.22766238913064202 | ACCURACY: 0.9176666666666666
------EPOCH 10/20------
Training LOSS: 0.20108065392325322 | ACCURACY: 0.9254375
------EPOCH 11/20------
Training LOSS: 0.1858758095347633 | ACCURACY: 0.9307291666666667
------EPOCH 12/20------
Training LOSS: 0.17518385212744275 | ACCURAC

In [None]:
test_loss_plain_r, test_acc_plain_r, test_out_r, true_res_r, incorrect_examples_r, incorrect_labels_r, incorrect_pred_r = infer(trainer_reduced, model_reduced, test_dataloader_reduced)

In [None]:
print(f'Test Loss (on first 10): {test_loss_plain_r}')
print(f'Test Accuracy (on first 10): {test_acc_plain_r}')

Test Loss (on first 10): 3.2353582534790037
Test Accuracy (on first 10): 0.5956666666666667


## Recurrent Neural Network

Reference: https://blog.floydhub.com/a-beginners-guide-on-recurrent-neural-networks-with-pytorch/

In [None]:
def tokenize_all_reviews(reviews, model, rev_len):
    num_reviews = len(reviews)
    vector_size = model.vector_size
    reviews_X = np.zeros((num_reviews, rev_len, vector_size))
    for i, review in enumerate(reviews):
        num_words = len(review)
        for j in range(min(num_words, rev_len)):
            word = review[j]
            if word in model:
                reviews_X[i,j,:] = model[word]
    return reviews_X

In [None]:
X_f20_rnn_tokenized = tokenize_all_reviews(reviews, wv, 20)

In [None]:
X_train_f20, X_test_f20, y_train_f20, y_test_f20 = train_test_split(X_f20_rnn_tokenized, labels_Y, test_size=0.2, random_state=42)

In [None]:
# Free up memory
reviews = None
wv = None
labels_Y = None
gc.collect()

14

In [None]:
import torch
X_train_t_rnn = torch.from_numpy(X_train_f20).to(torch.float32)
y_train_t_rnn = torch.from_numpy(y_train_f20).to(torch.long)
X_test_t_rnn = torch.from_numpy(X_test_f20).to(torch.float32)
y_test_t_rnn = torch.from_numpy(y_test_f20).to(torch.long)

In [None]:
y_train_t_rnn = one_hot(y_train_t_rnn, 4)
y_test_t_rnn = one_hot(y_test_t_rnn, 4)

In [None]:
train_dataset_rnn = TensorDataset(X_train_t_rnn, y_train_t_rnn)
test_dataset_rnn = TensorDataset(X_test_t_rnn, y_test_t_rnn)
train_dataloader_rnn = DataLoader(train_dataset_rnn, batch_size=32)
test_dataloader_rnn = DataLoader(test_dataset_rnn, batch_size=32)

In [None]:
class SimpleRNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(SimpleRNN, self).__init__()

        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        self.rnn = nn.RNN(input_size, hidden_dim, n_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim * 20, output_size)

    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size).to(device)
        out, hidden = self.rnn(x, hidden)
        out = out.contiguous().view(batch_size, -1)
        out = self.fc(out)
        return out, hidden

    def init_hidden(self, batch_size):
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
        return hidden

In [None]:
simplernn_model = SimpleRNN(300, 4, 20, 3).to(device)
print(simplernn_model)

SimpleRNN(
  (rnn): RNN(300, 20, num_layers=3, batch_first=True)
  (fc): Linear(in_features=400, out_features=4, bias=True)
)


In [None]:
trainer_rnn = Trainer(simplernn_model, train_dataloader_rnn, device)

In [None]:
(train_losses_rnn, train_accs_rnn) = trainer_rnn.fit()

------EPOCH 1/20------
Training LOSS: 1.0060612388451895 | ACCURACY: 0.478375
------EPOCH 2/20------
Training LOSS: 0.9562386382420858 | ACCURACY: 0.5243125
------EPOCH 3/20------
Training LOSS: 0.9332923852205276 | ACCURACY: 0.54175
------EPOCH 4/20------
Training LOSS: 0.9160294694900513 | ACCURACY: 0.5532083333333333
------EPOCH 5/20------
Training LOSS: 0.9019498533805211 | ACCURACY: 0.5620208333333333
------EPOCH 6/20------
Training LOSS: 0.8890658142169316 | ACCURACY: 0.5714583333333333
------EPOCH 7/20------
Training LOSS: 0.8777119198242823 | ACCURACY: 0.5814166666666667
------EPOCH 8/20------
Training LOSS: 0.8680261938969295 | ACCURACY: 0.58875
------EPOCH 9/20------
Training LOSS: 0.8595240587393442 | ACCURACY: 0.593625
------EPOCH 10/20------
Training LOSS: 0.8519853099981943 | ACCURACY: 0.5967708333333334
------EPOCH 11/20------
Training LOSS: 0.8452907829284668 | ACCURACY: 0.6011458333333334
------EPOCH 12/20------
Training LOSS: 0.8394368140697479 | ACCURACY: 0.6051875
-

In [None]:
test_loss_srnn, test_acc_srnn, test_out_srnn, true_res_lstm, incorrect_examples_srnn, incorrect_labels_srnn, incorrect_pred_srnn = infer(trainer_rnn, simplernn_model, test_dataloader_rnn)

In [None]:
print(f'Simple RNN Test Loss: {test_loss_srnn}')
print(f'Simple RNN Test Accuracy: {test_acc_srnn}')

Simple RNN Test Loss: 0.8856915869712829
Simple RNN Test Accuracy: 0.5800833333333333


## Gated Recurrent Unit

In [None]:
class GRUNet(nn.Module):
  def __init__(self, input_size, output_size, hidden_dim, n_layers):
    super(GRUNet, self).__init__()

    self.hidden_dim = hidden_dim
    self.n_layers = n_layers

    self.rnn = nn.GRU(input_size, hidden_dim, n_layers, batch_first=True)
    self.fc = nn.Linear(hidden_dim * 20, output_size)

  def forward(self, x):
    batch_size = x.size(0)
    hidden = self.init_hidden(batch_size).to(device)
    out, hidden = self.rnn(x, hidden)
    out = out.contiguous().view(batch_size, -1)
    out = self.fc(out)

    return out, hidden
  

  def init_hidden(self, batch_size):
    hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
    return hidden


In [None]:
gru_model = GRUNet(300, 4, 20, 4).to(device)
print(gru_model)

GRUNet(
  (rnn): GRU(300, 20, num_layers=4, batch_first=True)
  (fc): Linear(in_features=400, out_features=4, bias=True)
)


In [None]:
trainer_gru = Trainer(gru_model, train_dataloader_rnn, device)

In [None]:
(train_losses_gru, train_accs_gru) = trainer_gru.fit()

------EPOCH 1/20------
Training LOSS: 0.7165122906366984 | ACCURACY: 0.6806458333333333
------EPOCH 2/20------
Training LOSS: 0.7113434735139211 | ACCURACY: 0.683875
------EPOCH 3/20------
Training LOSS: 0.7063109607696533 | ACCURACY: 0.6862916666666666
------EPOCH 4/20------
Training LOSS: 0.7013799872994423 | ACCURACY: 0.6897916666666667
------EPOCH 5/20------
Training LOSS: 0.6965669304728508 | ACCURACY: 0.691625
------EPOCH 6/20------
Training LOSS: 0.6918963541587194 | ACCURACY: 0.6940625
------EPOCH 7/20------
Training LOSS: 0.6871913316448529 | ACCURACY: 0.696
------EPOCH 8/20------
Training LOSS: 0.682539682606856 | ACCURACY: 0.6983541666666667
------EPOCH 9/20------
Training LOSS: 0.6783448173602422 | ACCURACY: 0.69925
------EPOCH 10/20------
Training LOSS: 0.6741498392621676 | ACCURACY: 0.7018125
------EPOCH 11/20------
Training LOSS: 0.6705937325755755 | ACCURACY: 0.7029583333333334
------EPOCH 12/20------
Training LOSS: 0.6670641004244486 | ACCURACY: 0.705625
------EPOCH 13

In [None]:
test_loss_gru, test_acc_gru, test_out_gru, true_res_gru, incorrect_examples_gru, incorrect_labels_gru, incorrect_pred_gru = infer(trainer_gru, gru_model, test_dataloader_rnn)

In [None]:
print(f'GRU Test Loss: {test_loss_gru}')
print(f'GRU Test Accuracy: {test_acc_gru}')

GRU Test Loss: 0.9679175629615784
GRU Test Accuracy: 0.5904166666666667


## LSTM

In [None]:
import torch.nn as nn

class LSTMNet(nn.Module):
    def __init__(self, input_size, output_size, hidden_dim, n_layers):
        super(LSTMNet, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.lstm = nn.LSTM(input_size, hidden_dim, n_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim * 20, output_size)
        
    def forward(self, x):
        batch_size = x.size(0)
        hidden = self.init_hidden(batch_size).to(device)
        c0 = self.init_hidden(batch_size).to(device)
        # print(x.shape)
        # print(hidden.shape)
        # print(c0.shape)

        out, (hidden, c0) = self.lstm(x, (hidden, c0))
        out = out.contiguous().view(batch_size, -1)
        out = self.fc(out)
        return out, hidden
    
    def init_hidden(self, batch_size):
        hidden = torch.zeros(self.n_layers, batch_size, self.hidden_dim)
        return hidden

In [None]:
lstm_model = LSTMNet(300, 4, 20, 4).to(device)
print(lstm_model)

LSTMNet(
  (lstm): LSTM(300, 20, num_layers=4, batch_first=True)
  (fc): Linear(in_features=400, out_features=4, bias=True)
)


In [None]:
trainer_lstm = Trainer(lstm_model, train_dataloader_rnn, device)

In [None]:
(train_losses_lstm, train_accs_lstm) = trainer_lstm.fit()

------EPOCH 1/20------
Training LOSS: 1.0273742792208989 | ACCURACY: 0.4436875
------EPOCH 2/20------
Training LOSS: 0.9540639110008875 | ACCURACY: 0.5191875
------EPOCH 3/20------
Training LOSS: 0.9089155247608821 | ACCURACY: 0.554875
------EPOCH 4/20------
Training LOSS: 0.8778061361710231 | ACCURACY: 0.5779166666666666
------EPOCH 5/20------
Training LOSS: 0.8557336489756902 | ACCURACY: 0.5902291666666667
------EPOCH 6/20------
Training LOSS: 0.8389605298042297 | ACCURACY: 0.600625
------EPOCH 7/20------
Training LOSS: 0.8261014535029729 | ACCURACY: 0.6093958333333334
------EPOCH 8/20------
Training LOSS: 0.8155284626483917 | ACCURACY: 0.615375
------EPOCH 9/20------
Training LOSS: 0.8057626006205877 | ACCURACY: 0.6221875
------EPOCH 10/20------
Training LOSS: 0.7966937431494395 | ACCURACY: 0.6287291666666667
------EPOCH 11/20------
Training LOSS: 0.7888028825322787 | ACCURACY: 0.6330833333333333
------EPOCH 12/20------
Training LOSS: 0.7817297620773316 | ACCURACY: 0.638020833333333

In [None]:
test_loss_lstm, test_acc_lstm, test_out_lstm, true_res_lstm, incorrect_examples_lstm, incorrect_labels_lstm, incorrect_pred_lstm = infer(trainer_lstm, lstm_model, test_dataloader_rnn)

In [None]:
print(f'LSTM Test Loss: {test_loss_lstm}')
print(f'LSTM Test Accuracy: {test_acc_lstm}')

LSTM Test Loss: 0.8346940994262695
LSTM Test Accuracy: 0.6075
