<a href="https://colab.research.google.com/github/nataliakoliou/NLP-Various-Implementations/blob/main/Assignment-2/Assignment-2c/nlp-2c.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**`❕`** <font size="2">**WARNING:** Some of the code lines in this notebook may be cropped out due to display limitations. To view the entire code properly, please click on this [link](https://nbviewer.org/github/nataliakoliou/NLP-Various-Implementations/blob/main/Assignment-2/Assignment-2c/nlp-2c.ipynb) to open the notebook in nbviewer or this [link](https://colab.research.google.com/github/nataliakoliou/NLP-Various-Implementations/blob/main/Assignment-2/Assignment-2c/nlp-2c.ipynb) to open the notebook in Google Colab.</font>

# **NLP-Various Implementations | Text Classification with RNNs**

**Overview:** In this part of the project, I trained several neural network models, including RNNs and LSTMs, with different architectures and hyperparameters to evaluate their performance on some simple classification tasks. For this purpose, I used the AG News Topic Classification and the IMDB movie review12 datasets. Overall, this work provided me with valuable hands-on experience in training neural networks and insight into the factors that affect their performance in text classification tasks. Through experimenting with different architectures and hyperparameters, I gained a deeper understanding of how these models operate and can be optimized to achieve high accuracy levels.

## **1. Import all the necessary modules**

**Briefly:** `time` library provides functions for working with time-related tasks, `torch` library provides support for deep learning operations using tensors, `random` library provides tools for generating random numbers and `pandas` library provides data manipulation and analysis tools. Additionaly, `nn` module provides support for building neural networks, `tqdm` library provides a progress bar to track loops, `defaultdict` class provides a way to create a dictionary with default values for nonexistent keys, `PrettyTable` library provides a way to display data in a table format, `functional` module provides support for functional-style programming with neural networks, `FiDataLoader` class provides a way to load data in batches for training neural networks, `get_tokenizer` function provides a way to tokenize text and `build_vocab_from_iterator` function provides a way to build a vocabulary from an iterator of text. Finally, `accuracy_score` provides a way to calculate the accuracy of a model, along with other useful metrics such as `classification_report` and `confusion_matrix`.

In [1]:
import time
import torch
import random
import pandas as pd
from torch import nn
from tqdm import tqdm
from collections import defaultdict
from prettytable import PrettyTable
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchtext.data import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

## **2. Define and initialize the models' parameters**

The set_device function checks if a CUDA-enabled GPU is available and sets the device accordingly. The tokenizer variable is set to tokenize the text data using the "basic_english" tokenizer. Both models and classes are lists that contain the different models and classes used for the classification process, whereas accuracies, parameters, and time_costs are empty lists that will be used to store evaluation metrics during the training process. Finally, the remaining variables are hyperparameters used for this training process.

* The `models` listed in the models list are different types of neural network models that will be used for classification. Specifically, they are variations of recurrent neural networks (RNNs) and long short-term memory (LSTM) networks with different numbers of layers and types of connections between layers.

* The `classes` list specifies the different categories or classes that the classification model will be trained to predict. In this case, we have four classes: World, Sports, Business, and Sci/Tech. This suggest that our model will be trained to classify news articles or text documents into these four broad categories.

* `MAX_WORDS = 25` sets a maximum limit for the number of words allowed in a text sample. This means that if a text sample contains more than 25 words, it will be truncated to 25 words before being fed into the classification model.

In [2]:
def set_device(primary, secondary):
    return torch.device(primary if torch.cuda.is_available() else secondary) # device used to perform the computations for the machine learning model

device = set_device("cuda","cpu")
tokenizer = get_tokenizer("basic_english")
models = ["1Uni-RNN", "1Bi-RNN", "2Bi-RNN", "1Uni-LSTM", "1Bi-LSTM", "2Bi-LSTM"]; classes = ["World", "Sports", "Business", "Sci/Tech"]; accuracies = []; parameters = []; time_costs = []
MIN_FREQ = 10 ; MAX_WORDS = 25; EPOCHS = 15; LEARNING_RATE = 1e-3; BATCH_SIZE = 1024; EMBEDDING_DIM = 100; HIDDEN_DIM = 64; PADDED = "<PAD>"; UNKNOWN = "<UNK>"

## **3. Load and preprocess the training and testing datasets**

The load_dataset() function is used to load and preprocess a CSV file containing text data. It reads the CSV file using pandas, shuffles the rows (except the first one), and selects a subset of the data based on the given percent and mode arguments. It then combines the selected features into a single text column and returns a list of tuples, where each tuple contains the label and text data for each row of the dataset. The function is called twice to create train_dataset and test_dataset, which are used for training and testing a machine learning model.

* `data.iloc[:1]` selects only the first row of the data, which typically contains column names that correspond to our models' classes. By selecting only the data rows for shuffling, we ensure that the column names remain in the first row (for later use) and are not affected by the shuffling process.

* If `mode` is set to *start*, the first percent % of rows are selected, whereas if `mode` is set to *end*, the last percent % of rows are selected. The code calculates the starting and ending indexes based on the percent value and the total length of the dataset using integer division and multiplication.

In [3]:
def load_dataset(path, features, label, percent, mode):
    data = pd.read_csv(path)
    data = pd.concat([data.iloc[:1], data.iloc[1:].sample(frac=1)], ignore_index=True)  # shuffle all rows except the first one
    if mode == 'start':
        end_index = int(len(data) * (percent / 100))
        data = data.iloc[:end_index]
    elif mode == 'end':
        start_index = int(len(data) * ((100 - percent) / 100))
        data = pd.concat([data.iloc[0:0], data.iloc[start_index:]], ignore_index=True)
    text = data[features].astype(str).agg(' '.join, axis=1)
    return [(data[label][i], text[i]) for i in range(len(data))]

train_dataset, test_dataset = load_dataset("train.csv", ["Title","Description"], "Class Index", 100, "start"), load_dataset("test.csv", ["Title","Description"], "Class Index", 100, "start")

## **4. Build PyTorch DataLoaders for efficient model training and testing**

The generate_loader() function takes in a dataset, a maximum number of words, a batch size, and a shuffle flag, and returns a PyTorch DataLoader object with the specified parameters. The collate_batch() function is used as a custom collate function for the DataLoader, and it preprocesses the input data by tokenizing the text, padding the sequences with <PAD> tokens or truncating the sequences to a maximum length of max_words, and converting the data into PyTorch tensors. The DataLoader is then split into train_loader and test_loader, with train_loader being shuffled for better model training and test_loader being unshuffled for model evaluation.

In [4]:
def generate_loader(dataset, max_words, batch_size, shuffle):
    return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, collate_fn=lambda b: collate_batch(b, max_words))

def collate_batch(batch, max_words):
    Y, X = list(zip(*batch))
    Y = torch.tensor(Y) - 1  # target names in range [0,1,2,3] instead of [1,2,3,4]
    X = [vocab(tokenizer(text)) for text in X] # type(X): list of lists
    X = [tokens+([vocab['<PAD>']]* (max_words-len(tokens))) if len(tokens)<max_words else tokens[:max_words] for tokens in X]  # brings all samples to MAX_WORDS length - shorter texts are padded with <PAD> sequences, longer texts are truncated
    return torch.tensor(X, dtype=torch.int32).to(device), Y.to(device)

train_loader, test_loader = generate_loader(train_dataset, MAX_WORDS, BATCH_SIZE, True), generate_loader(test_dataset, MAX_WORDS, BATCH_SIZE, False)

## **5. Build the vocabulary from the training and testing datasets**

The build_vocab() function takes in datasets, min_freq, padded, and unknown as input parameters. It uses the tokenize() function to iterate over all the text in the datasets and tokenize them. Then, it builds a vocabulary from the tokens using build_vocab_from_iterator() with min_freq, padded, and unknown as arguments. It sets the default index to the index of the unknown token. Finally, it returns the vocabulary. The code then calls build_vocab() with train_dataset and test_dataset as datasets, MIN_FREQ, PADDED, and UNKNOWN as input arguments and assigns the returned vocabulary to vocab.

* `set_default_index(vocab[unknown])` means that if a word is not present in the vocabulary, its index will be the same as the index of the unknown token.

In [5]:
def build_vocab(datasets, min_freq, padded, unknown):
    vocab = build_vocab_from_iterator(tokenize(datasets), min_freq=min_freq, specials=[padded, unknown])
    vocab.set_default_index(vocab[unknown])
    return vocab

def tokenize(datasets):
    for dataset in datasets:
        for _, text in dataset:
            yield tokenizer(text)

vocab = build_vocab([train_dataset, test_dataset], MIN_FREQ, PADDED, UNKNOWN)

## **6. RNN & LSTM models trained on AGNTC dataset with sequence length of 25 words**

The setup_model function sets up a classification model, which takes in the model type, the number of output classes, the vocab and other parameters such as embedding_dim, hidden_dim, num_layers, bidirectional, learning_rate, embeddings, and freeze. It returns the classifier the cross-entropy loss and Adam optimizer).

The RNN_model class is defined as a subclass of nn.Module and has a constructor that sets up the model architecture. The RNN_model constructor uses the get_directions function to determine the size of the hidden state based on whether or not the RNN is bidirectional. The get_directions function returns 2 if the RNN is bidirectional and 1 otherwise, which is used to compute the size of the hidden state in the linear layer. This ensures that the output of the RNN can be fed into the linear layer correctly, regardless of whether or not the RNN is bidirectional.

* `hidden_size` is set to the product of hidden_dim and the number of directions, which is either 1 or 2 depending on the bidirectional parameter. This is because in a bidirectional RNN, the number of hidden units in the forward and backward directions are added together to obtain the total number of hidden units, whereas in a unidirectional RNN, there is only one set of hidden units.

* `nn.Linear` is then defined with an input size equal to hidden_dim times the number of directions, and an output size of output_dim. This linear layer is used to map the final hidden state of the RNN to the output classes.

The forward function takes in a batch of input data X_batch and passes it through the model. The input data is first passed through an embedding layer to transform it into a dense vector representation. This embedding is then fed into an RNN layer, which processes the input data sequence and produces output at each time step. The output of the RNN is concatenated and passed through a linear layer to produce the final output logits, which are then passed through a softmax function to generate class probabilities. The final probabilities are returned as the output of the forward pass.

* `output_concat` is created by concatenating the last hidden_size units of the forward and backward RNN outputs. The output tensor has shape (batch_size, sequence_length, hidden_size*num_directions), so output[:, :, :self.hidden_size] selects the forward outputs and output[:, :, self.hidden_size:] selects the backward outputs.

* the : in `output_concat[:, :, :]` means that we include all elements in the first two dimensions of the tensor (i.e., the batch size and the number of hidden units in the linear layer). The -1 in output_concat[:, -1, :] means that we only take the last element along the second dimension (i.e., the last hidden state of the concatenated RNN outputs). 

### **6.1. Unidirectional RNN model with 1 layer**

**6.1.1. Set-up model:** An instance of the RNN_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional RNN model with 1 layer.

In [6]:
def setup_model(device, model, classes, vocab, embedding_dim, hidden_dim, num_layers, bidirectional, learning_rate, embeddings, freeze):
    classifier = model(len(vocab), embedding_dim, hidden_dim, num_layers, bidirectional, len(classes), embeddings, freeze).to(device)
    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam([param for param in classifier.parameters() if param.requires_grad == True],lr=learning_rate)
    return classifier, loss_fn, optimizer
  
class RNN_model(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, num_layers, bidirectional, output_dim, none, freeze):
        super(RNN_model, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers, bidirectional=bidirectional, batch_first=True)
        self.hidden_size = hidden_dim * get_directions(bidirectional)
        self.linear = nn.Linear(hidden_dim * get_directions(bidirectional), output_dim)
    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        output_concat = torch.cat([output[:, :, :self.hidden_size], output[:, :, self.hidden_size:]], dim=2) # concatenates outputs
        logits = self.linear(output_concat[:, -1, :]) # the last output of the concatenated RNN is used for sequence classification
        probs = F.softmax(logits, dim=1)
        return probs

def get_directions(bidirectional):
    return 2 if bidirectional else 1

classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, None, None)

The train_model function trains the classifier model using the specified loss function and an optimizer over a specified number of epochs. The function iterates through batches of data in the training loader and updates the model weights using backpropagation after computing the loss. It calculates the training loss at the end of each epoch and prints it. Finally, it returns the average time taken for each epoch. The train_model function is called with the specified parameters and the returned time_cost variable stores the average time taken for each epoch.

**6.1.2. Train the model:** The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch.

In [7]:
def train_model(classifier, loss_fn, optimizer, train_loader, epochs):
    times = []
    for i in range(1, epochs+1):
        classifier.train()
        print('\033[1mEpoch\033[0m:',i)
        losses = []
        start_time = time.time()
        for X, Y in tqdm(train_loader):
            Y_preds = classifier(X)
            loss = loss_fn(Y_preds, Y)
            losses.append(loss.item())
            optimizer.zero_grad(); loss.backward(); optimizer.step()
        epoch_time = time.time() - start_time
        times.append(epoch_time)
        print("\033[1mTrain Loss\033[0m: {:.3f}\n".format(torch.tensor(losses).mean()))
    return sum(times)/len(times)

time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:06<00:00, 18.55it/s]


[1mTrain Loss[0m: 1.293

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 27.82it/s]


[1mTrain Loss[0m: 1.059

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 23.94it/s]


[1mTrain Loss[0m: 0.970

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 26.19it/s]


[1mTrain Loss[0m: 0.931

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 28.56it/s]


[1mTrain Loss[0m: 0.908

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 23.21it/s]


[1mTrain Loss[0m: 0.890

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:04<00:00, 27.47it/s]


[1mTrain Loss[0m: 0.878

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 28.06it/s]


[1mTrain Loss[0m: 0.869

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 22.45it/s]


[1mTrain Loss[0m: 0.862

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 28.46it/s]


[1mTrain Loss[0m: 0.858

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 25.36it/s]


[1mTrain Loss[0m: 0.850

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 25.69it/s]


[1mTrain Loss[0m: 0.846

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 27.77it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 23.12it/s]


[1mTrain Loss[0m: 0.839

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 28.06it/s]

[1mTrain Loss[0m: 0.837






The evaluate_model function sets the classifier to evaluation mode and then evaluates the performance of the model on the test data. It uses the classifier to make predictions on the test data and computes the loss for each prediction. The actual and predicted labels are then converted to numpy arrays using the detach() and cpu() functions. The function then detects misclassified data using another helper function called detect_misclassification. Finally, the function prints the accuracy of the model on the test data, the classification report and confusion matrix. It then returns the mean loss, the actual and predicted labels, and the misclassified data.

The detect_misclassification() function takes in three arguments: test_data, which is a dictionary containing the features and labels of the test dataset, Y_actual, which is a list of the true labels of the test data, and Y_preds, which is a list of the predicted labels of the test data. The function then iterates over each data point in the test dataset and compares the true label with the predicted label. If they are not the same, it appends the corresponding text and predicted label to the misclass_data dictionary, which is returned at the end.

The to_dict() function takes in a list of tuples, where each tuple contains a label and a feature, and returns a dictionary with two keys: features and labels. The features key holds a list of all the features in the tuples, and the labels key holds a list of all the labels in the tuples.

**6.1.3. Evaluate the model:** The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use.

In [8]:
def evaluate_model(classes, classifier, loss_fn, test_loader, test_data):
    classifier.eval()
    with torch.no_grad():  # during evaluation we don't update the model's parameters
        Y_actual, Y_preds, losses = [],[],[]
        for X, Y in test_loader:
            preds = classifier(X)
            loss = loss_fn(preds, Y)
            losses.append(loss.item())
            Y_actual.append(Y)
            Y_preds.append(preds.argmax(dim=-1))
        Y_actual, Y_preds = torch.cat(Y_actual), torch.cat(Y_preds)
        Y_actual, Y_preds = Y_actual.detach().cpu().numpy(), Y_preds.detach().cpu().numpy()
        misclass_data = detect_misclassification(test_data, Y_actual, Y_preds)
        print("\033[1mTest Accuracy\033[0m: {:.3f}\n".format(accuracy_score(Y_actual, Y_preds)))
        print("\033[1mClassification Report:\033[0m\n", classification_report(Y_actual, Y_preds, target_names=classes))
        print("\033[1mConfusion Matrix:\033[0m\n", confusion_matrix(Y_actual, Y_preds))
    return torch.tensor(losses).mean(), Y_actual, Y_preds, misclass_data      

def detect_misclassification(test_data, Y_actual, Y_preds):
    misclass_data = defaultdict(list)
    for i in range(len(Y_actual)):
        true_label = Y_actual[i]
        predicted_label = Y_preds[i]
        if true_label != predicted_label:
            text = test_data["features"][i]
            misclass_data[true_label].append((text, predicted_label))
    return misclass_data

def to_dict(tuples_list):
    return {'features': [d[1] for d in tuples_list], 'labels': [d[0] for d in tuples_list]}

_, Y_actual, Y_preds, misclass_data_1UniRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))

[1mTest Accuracy[0m: 0.862

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.89      0.86      0.87      1900
      Sports       0.92      0.92      0.92      1900
    Business       0.83      0.81      0.82      1900
    Sci/Tech       0.81      0.85      0.83      1900

    accuracy                           0.86      7600
   macro avg       0.86      0.86      0.86      7600
weighted avg       0.86      0.86      0.86      7600

[1mConfusion Matrix:[0m
 [[1637   85   91   87]
 [  46 1753   74   27]
 [  66   35 1543  256]
 [  94   28  158 1620]]


The function count_parameters calculates the number of trainable parameters in a PyTorch model. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [9]:
def count_parameters(model):
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

### **6.2. Bidirectional RNN model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional RNN with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [10]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 25.70it/s]


[1mTrain Loss[0m: 1.305

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 21.78it/s]


[1mTrain Loss[0m: 1.076

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 26.36it/s]


[1mTrain Loss[0m: 0.979

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 20.41it/s]


[1mTrain Loss[0m: 0.934

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 26.35it/s]


[1mTrain Loss[0m: 0.911

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 25.18it/s]


[1mTrain Loss[0m: 0.893

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 21.75it/s]


[1mTrain Loss[0m: 0.881

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 26.59it/s]


[1mTrain Loss[0m: 0.873

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 21.31it/s]


[1mTrain Loss[0m: 0.865

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 26.36it/s]


[1mTrain Loss[0m: 0.859

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 26.36it/s]


[1mTrain Loss[0m: 0.854

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 21.06it/s]


[1mTrain Loss[0m: 0.849

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 26.42it/s]


[1mTrain Loss[0m: 0.845

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 21.86it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 26.23it/s]


[1mTrain Loss[0m: 0.839

[1mTest Accuracy[0m: 0.864

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.86      0.88      0.87      1900
      Sports       0.91      0.94      0.92      1900
    Business       0.86      0.80      0.83      1900
    Sci/Tech       0.83      0.85      0.84      1900

    accuracy                           0.86      7600
   macro avg       0.86      0.86      0.86      7600
weighted avg       0.86      0.86      0.86      7600

[1mConfusion Matrix:[0m
 [[1669   88   82   61]
 [  61 1778    8   53]
 [ 136   26 1514  224]
 [  73   55  165 1607]]


### **6.3. Bidirectional RNN model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional RNN with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [11]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 24.60it/s]


[1mTrain Loss[0m: 1.255

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 20.79it/s]


[1mTrain Loss[0m: 1.052

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 23.68it/s]


[1mTrain Loss[0m: 0.986

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 20.49it/s]


[1mTrain Loss[0m: 0.962

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 24.51it/s]


[1mTrain Loss[0m: 0.938

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 21.60it/s]


[1mTrain Loss[0m: 0.922

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 22.48it/s]


[1mTrain Loss[0m: 0.916

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 24.91it/s]


[1mTrain Loss[0m: 0.919

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 20.22it/s]


[1mTrain Loss[0m: 0.899

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 22.96it/s]


[1mTrain Loss[0m: 0.889

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:05<00:00, 20.12it/s]


[1mTrain Loss[0m: 0.889

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 24.42it/s]


[1mTrain Loss[0m: 0.883

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 21.43it/s]


[1mTrain Loss[0m: 0.880

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 22.58it/s]


[1mTrain Loss[0m: 0.875

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 24.13it/s]


[1mTrain Loss[0m: 0.878

[1mTest Accuracy[0m: 0.845

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.86      0.87      0.86      1900
      Sports       0.92      0.94      0.93      1900
    Business       0.81      0.75      0.78      1900
    Sci/Tech       0.79      0.82      0.80      1900

    accuracy                           0.85      7600
   macro avg       0.84      0.85      0.84      7600
weighted avg       0.84      0.85      0.84      7600

[1mConfusion Matrix:[0m
 [[1649   72  103   76]
 [  74 1788   16   22]
 [ 120   32 1428  320]
 [  72   57  212 1559]]


The LSTM_model class is defined as a subclass of nn.Module and has a constructor that sets up the model architecture. The LSTM_model constructor uses the get_directions function to determine the size of the hidden state based on whether or not the LSTM is bidirectional. The get_directions function returns 2 if the LSTM is bidirectional and 1 otherwise, which is used to compute the size of the hidden state in the linear layer. This ensures that the output of the LSTM can be fed into the linear layer correctly, regardless of whether or not the LSTM is bidirectional.

* `hidden_size` is set to the product of hidden_dim and the number of directions, which is either 1 or 2 depending on the bidirectional parameter. This is because in a bidirectional LSTM, the number of hidden units in the forward and backward directions are added together to obtain the total number of hidden units, whereas in a unidirectional LSTM, there is only one set of hidden units.

* `nn.Linear` is then defined with an input size equal to hidden_dim times the number of directions, and an output size of output_dim. This linear layer is used to map the final hidden state of the LSTM to the output classes.

The forward function takes in a batch of input data X_batch and passes it through the model. The input data is first passed through an embedding layer to transform it into a dense vector representation. This embedding is then fed into an LSTM layer, which processes the input data sequence, produces the output at each time step and updates the hidden and cell state. The output of the LSTM is concatenated and passed through a linear layer to produce the final output logits, which are then passed through a softmax function to generate class probabilities. The final probabilities are returned as the output of the forward pass.

* `output_concat` is created by concatenating the last hidden_size units of the forward and backward LSTM outputs. The output tensor has shape (batch_size, sequence_length, hidden_size*num_directions), so output[:, :, :self.hidden_size] selects the forward outputs and output[:, :, self.hidden_size:] selects the backward outputs.

* the : in `output_concat[:, :, :]` means that we include all elements in the first two dimensions of the tensor (i.e., the batch size and the number of hidden units in the linear layer). The -1 in output_concat[:, -1, :] means that we only take the last element along the second dimension (i.e., the last hidden state of the concatenated LSTM outputs).

### **6.4. Unidirectional LSTM model with 1 layer**

An instance of the LSTM_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional LSTM model with 1 layer. The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch. The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [12]:
class LSTM_model(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, num_layers, bidirectional, output_dim, none, freeze):
        super(LSTM_model, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers, bidirectional=bidirectional, batch_first=True)
        self.hidden_size = hidden_dim * get_directions(bidirectional)
        self.linear = nn.Linear(hidden_dim * get_directions(bidirectional), output_dim)
    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, (hidden, cell) = self.lstm(embeddings)
        output_concat = torch.cat([output[:, :, :self.hidden_size], output[:, :, self.hidden_size:]], dim=2) # concatenates outputs
        logits = self.linear(output_concat[:, -1, :]) # the last output of the concatenated LSTM is used for sequence classification
        probs = F.softmax(logits, dim=1)
        return probs

classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:05<00:00, 21.02it/s]


[1mTrain Loss[0m: 1.252

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 25.70it/s]


[1mTrain Loss[0m: 0.979

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:05<00:00, 20.94it/s]


[1mTrain Loss[0m: 0.912

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 25.58it/s]


[1mTrain Loss[0m: 0.882

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 25.43it/s]


[1mTrain Loss[0m: 0.863

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 20.74it/s]


[1mTrain Loss[0m: 0.851

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:04<00:00, 25.80it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:05<00:00, 21.43it/s]


[1mTrain Loss[0m: 0.835

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 24.50it/s]


[1mTrain Loss[0m: 0.828

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 22.89it/s]


[1mTrain Loss[0m: 0.824

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:05<00:00, 22.21it/s]


[1mTrain Loss[0m: 0.820

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 25.03it/s]


[1mTrain Loss[0m: 0.817

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 20.89it/s]


[1mTrain Loss[0m: 0.814

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 25.11it/s]


[1mTrain Loss[0m: 0.812

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 19.67it/s]


[1mTrain Loss[0m: 0.810

[1mTest Accuracy[0m: 0.884

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.90      0.88      0.89      1900
      Sports       0.94      0.95      0.94      1900
    Business       0.86      0.83      0.85      1900
    Sci/Tech       0.84      0.89      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1663   59   98   80]
 [  44 1796   27   33]
 [  75   32 1578  215]
 [  67   25  124 1684]]


### **6.5. Bidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [13]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:05<00:00, 21.73it/s]


[1mTrain Loss[0m: 1.258

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:06<00:00, 18.74it/s]


[1mTrain Loss[0m: 0.980

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:05<00:00, 21.30it/s]


[1mTrain Loss[0m: 0.914

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:06<00:00, 19.51it/s]


[1mTrain Loss[0m: 0.885

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 20.94it/s]


[1mTrain Loss[0m: 0.867

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 21.00it/s]


[1mTrain Loss[0m: 0.855

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:06<00:00, 19.42it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:05<00:00, 21.39it/s]


[1mTrain Loss[0m: 0.836

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:06<00:00, 18.88it/s]


[1mTrain Loss[0m: 0.831

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 21.63it/s]


[1mTrain Loss[0m: 0.826

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:06<00:00, 18.65it/s]


[1mTrain Loss[0m: 0.821

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 21.44it/s]


[1mTrain Loss[0m: 0.817

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 18.57it/s]


[1mTrain Loss[0m: 0.814

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 21.92it/s]


[1mTrain Loss[0m: 0.812

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 18.98it/s]


[1mTrain Loss[0m: 0.810

[1mTest Accuracy[0m: 0.882

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.90      0.88      0.89      1900
      Sports       0.93      0.94      0.94      1900
    Business       0.85      0.83      0.84      1900
    Sci/Tech       0.84      0.87      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1676   63   88   73]
 [  43 1788   43   26]
 [  73   36 1579  212]
 [  74   30  137 1659]]


### **6.6. Bidirectional LSTM model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional LSTM with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [14]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:06<00:00, 18.96it/s]


[1mTrain Loss[0m: 1.191

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:06<00:00, 16.94it/s]


[1mTrain Loss[0m: 0.951

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:06<00:00, 19.00it/s]


[1mTrain Loss[0m: 0.899

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:06<00:00, 17.04it/s]


[1mTrain Loss[0m: 0.873

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:06<00:00, 19.04it/s]


[1mTrain Loss[0m: 0.857

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:06<00:00, 16.97it/s]


[1mTrain Loss[0m: 0.846

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:06<00:00, 17.23it/s]


[1mTrain Loss[0m: 0.838

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:06<00:00, 18.39it/s]


[1mTrain Loss[0m: 0.832

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:07<00:00, 16.85it/s]


[1mTrain Loss[0m: 0.827

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:06<00:00, 19.20it/s]


[1mTrain Loss[0m: 0.826

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:07<00:00, 16.47it/s]


[1mTrain Loss[0m: 0.820

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:06<00:00, 19.06it/s]


[1mTrain Loss[0m: 0.818

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 16.95it/s]


[1mTrain Loss[0m: 0.814

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:06<00:00, 19.40it/s]


[1mTrain Loss[0m: 0.811

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:07<00:00, 16.73it/s]


[1mTrain Loss[0m: 0.811

[1mTest Accuracy[0m: 0.881

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.91      0.87      0.89      1900
      Sports       0.94      0.93      0.94      1900
    Business       0.84      0.86      0.85      1900
    Sci/Tech       0.84      0.86      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1652   59  110   79]
 [  31 1775   35   59]
 [  65   27 1627  181]
 [  64   26  171 1639]]


## **7. Visualize the performance metrics of the models**

The visualize function takes four arguments: models, accuracies, parameters, and time_costs. It creates a PrettyTable object with four columns, sets the column names to Model, Accuracy, Parameters, and Time Cost, and then adds rows to the table based on the values in the input lists. Specifically, for each model in the models list, the function retrieves the corresponding accuracy, parameters, and time cost and adds them to a new row in the table. The code then calls this visualize function to create and print this table.

In [15]:
def visualize(models, accuracies, parameters, time_costs):
    pt = PrettyTable(field_names=[f"\033[1m{field}\033[0m" for field in ["Model", "Accuracy", "Parameters", "Time Cost"]])
    [pt.add_row([model, round(accuracies[i], 4), parameters[i], round(time_costs[i], 4)]) for i, model in enumerate(models)]
    print(pt)

visualize(models, accuracies, parameters, time_costs)

+-----------+----------+------------+-----------+
|   [1mModel[0m   | [1mAccuracy[0m | [1mParameters[0m | [1mTime Cost[0m |
+-----------+----------+------------+-----------+
|  1Uni-RNN |  0.8622  |  2136284   |   4.6727  |
|  1Bi-RNN  |  0.8642  |  2147164   |   4.9244  |
|  2Bi-RNN  |  0.8453  |  2171996   |   5.2599  |
| 1Uni-LSTM |  0.8843  |  2168156   |   5.1549  |
|  1Bi-LSTM |  0.8818  |  2210908   |   5.852   |
|  2Bi-LSTM |  0.8807  |  2310236   |   6.6313  |
+-----------+----------+------------+-----------+


`####################################################`

## **8. Analyze the results of the models**

The analyze_results function analyzes the results of a classification task by taking in the classes, models used and the misclassified texts. It creates a dictionary common_misclass_data to store the texts that were misclassified by all the models. For each text that was misclassified, it looks at the true label and stores the text along with the predicted labels from all the models in a list. It then calls other helper functions, count_times(), get_top_pair() and get_random_text(), to further analyze the data in terms of misclassification.

* `labels` is a list that stores all the predicted labels for every single text that is misclassified by the first model. The first element in labels is the predicted label of the first model, and the remaining elements are ether the predicted labels of the other models if they misclassified that text, or an empty string otherwise.

* `(text, labels)` is added to the list corresponding to the key true_label in the dictionary common_misclass_data only if all elements in labels are non-empty strings. If not, the else statement returns None. The reason we do this is to identify common text misclassifications across all models.


The count_times function takes as input the common_misclass_data dictionary and the list of classes that contains all the possible labels in the dataset. It first counts the number of misclassified texts for each true label by computing the length of the corresponding dictionary value (which is a list of tuples where each tuple contains a misclassified text and its predicted labels from all models). Then, it creates a pretty table to display the number of common misclassified texts for each true label across all models.

The get_top_pair function takes as input the common_misclass_data dictionary and the list of classes that contains all the possible labels in the dataset. It then iterates through each true label and the corresponding misclassified texts, and counts the number of times each pair of true label and predicted label occurs in the list of misclassified texts. It then determines the pair with the highest count and prints it as the most common misclassification pair. Finally, the function creates a PrettyTable object to display the frequencies of all the misclassification pairs, sorted in descending order by frequency.

The get_random_text function takes as input the list of all the RNN and LSTM models we used, the common_misclass_data dictionary and the list of classes that contains all the possible labels in the dataset. It randomly selects a misclassified text and displays it along with its true label. It then shows the predictions made by each model in a table. The purpose of this function is to allow the user to see an example of a misclassified text and the various predictions made by the models for that text.

The to_category function takes a numerical label as input, along with a list of classes, and returns the corresponding string label for that numerical value.

In [16]:
def analyze_results(classes, models, misclassified):
    common_misclass_data = defaultdict(list)
    for true_label in misclassified[0].keys():
        for text, label in misclassified[0][true_label]:
            labels = [label] + [next((l for t, l in model[true_label] if t == text), '') for model in misclassified[1:]]
            common_misclass_data[true_label].append((text, labels)) if all(labels) else None
    count_times(common_misclass_data, classes)
    get_top_pair(common_misclass_data, classes)
    get_random_text(models, common_misclass_data, classes)

def count_times(common_misclass_data, classes):
    misclass_counts = {true_label: len(misclass_tuples) for true_label, misclass_tuples in common_misclass_data.items()}
    pt = PrettyTable(field_names=[f"\033[1m{field}\033[0m" for field in ["True Label", "Misclassified Texts"]])
    [pt.add_row([to_category(true_label, classes), count]) for true_label, count in misclass_counts.items()]
    print("\033[1mCommon Misclassified Texts per Class:\033[0m")
    print(pt)

def get_top_pair(common_misclass_data, classes):
    misclass_freqs = defaultdict(int)
    for true_label, values in common_misclass_data.items():
        for text, pred_labels in values:
            for pl in pred_labels:
                misclass_freqs[(true_label, pl)] += 1
    max_tuple, max_count = max(misclass_freqs.items(), key=lambda x: x[1])
    sorted_tuples = sorted(misclass_freqs.items(), key=lambda x: x[1], reverse=True)
    print(f"\n\033[1mMost common Misclassification Pair:\033[0m ({to_category(max_tuple[0], classes)}, {to_category(max_tuple[1], classes)})")
    pt = PrettyTable(field_names=[f"\033[1m{field}\033[0m" for field in ["True Label", "Predicted Label", "Frequency"]])
    [pt.add_row([to_category(tup[0], classes), to_category(tup[1], classes), count]) for tup, count in sorted_tuples]
    print(pt)

def get_random_text(models, common_misclass_data, classes):
    rand_true_label = random.choice(list(common_misclass_data.keys()))
    rand_misclass_tuple = random.choice(common_misclass_data[rand_true_label])
    print("\n\033[1m" + "Random Text: " + "\033[0m" + rand_misclass_tuple[0] + "\033[1m" + "\nTrue Label: " + "\033[0m" + to_category(rand_true_label, classes))
    pt = PrettyTable(field_names=[f"\033[1m{field}\033[0m" for field in ["Model", "Prediction"]])
    [pt.add_row([model, to_category(rand_misclass_tuple[1][idx], classes)]) for idx, model in enumerate(models)]
    print(pt)

def to_category(label, classes):
    return classes[label]

analyze_results(classes, models, [misclass_data_1UniRNN, misclass_data_1BiRNN, misclass_data_2BiRNN, misclass_data_1UniLSTM, misclass_data_1BiLSTM, misclass_data_2BiLSTM])

[1mCommon Misclassified Texts per Class:[0m
+------------+---------------------+
| [1mTrue Label[0m | [1mMisclassified Texts[0m |
+------------+---------------------+
|  Sci/Tech  |          45         |
|  Business  |          83         |
|   World    |         110         |
|   Sports   |          6          |
+------------+---------------------+

[1mMost common Misclassification Pair:[0m (Business, Sci/Tech)
+------------+-----------------+-----------+
| [1mTrue Label[0m | [1mPredicted Label[0m | [1mFrequency[0m |
+------------+-----------------+-----------+
|  Business  |     Sci/Tech    |    461    |
|   World    |     Business    |    280    |
|  Sci/Tech  |     Business    |    247    |
|   World    |      Sports     |    216    |
|   World    |     Sci/Tech    |    164    |
|  Business  |      Sports     |     37    |
|   Sports   |     Business    |     25    |
|  Sci/Tech  |      Sports     |     23    |
|   Sports   |     Sci/Tech    |     11    |
+-----------

`####################################################`

## **9. RNN & LSTM models trained on AGNTC dataset with sequence length of 50 words**

This is a variation of the previous implementation (see: Unit 6), where the MAX_WORDS hyperparameter is now set to 50 instead of 25. This means that only the first 50 words of each text in our dataset are used as input to the model. The train_loader and test_loader are generated using the generate_loader function, which takes in the train_dataset and test_dataset, along with the MAX_WORDS and BATCH_SIZE hyperparameters. The last argument specifies whether the data is used for training (value: True) or testing (value: False), respectively.

In [17]:
MAX_WORDS = 50
train_loader, test_loader = generate_loader(train_dataset, MAX_WORDS, BATCH_SIZE, True), generate_loader(test_dataset, MAX_WORDS, BATCH_SIZE, False)
accuracies = []; parameters = []; time_costs = []

### **9.1. Unidirectional RNN model with 1 layer**

An instance of the RNN_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional RNN model with 1 layer. The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch. The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [18]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 24.78it/s]


[1mTrain Loss[0m: 1.379

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 20.65it/s]


[1mTrain Loss[0m: 1.346

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 25.45it/s]


[1mTrain Loss[0m: 1.348

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 21.16it/s]


[1mTrain Loss[0m: 1.342

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 24.62it/s]


[1mTrain Loss[0m: 1.357

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 24.82it/s]


[1mTrain Loss[0m: 1.360

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:06<00:00, 18.48it/s]


[1mTrain Loss[0m: 1.357

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 24.98it/s]


[1mTrain Loss[0m: 1.347

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 20.73it/s]


[1mTrain Loss[0m: 1.334

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 24.80it/s]


[1mTrain Loss[0m: 1.341

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:08<00:00, 13.87it/s]


[1mTrain Loss[0m: 1.333

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:06<00:00, 17.07it/s]


[1mTrain Loss[0m: 1.286

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 21.08it/s]


[1mTrain Loss[0m: 1.329

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 24.65it/s]


[1mTrain Loss[0m: 1.343

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 20.73it/s]


[1mTrain Loss[0m: 1.338

[1mTest Accuracy[0m: 0.342

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.40      0.18      0.25      1900
      Sports       0.36      0.64      0.46      1900
    Business       0.30      0.46      0.37      1900
    Sci/Tech       0.37      0.08      0.13      1900

    accuracy                           0.34      7600
   macro avg       0.36      0.34      0.30      7600
weighted avg       0.36      0.34      0.30      7600

[1mConfusion Matrix:[0m
 [[ 347  512  975   66]
 [  94 1218  489   99]
 [ 272  651  881   96]
 [ 154 1038  554  154]]


### **9.2. Bidirectional RNN model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional RNN with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [19]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:07<00:00, 16.17it/s]


[1mTrain Loss[0m: 1.374

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:06<00:00, 18.92it/s]


[1mTrain Loss[0m: 1.354

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:05<00:00, 22.46it/s]


[1mTrain Loss[0m: 1.330

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:06<00:00, 18.73it/s]


[1mTrain Loss[0m: 1.305

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 22.36it/s]


[1mTrain Loss[0m: 1.282

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:06<00:00, 19.08it/s]


[1mTrain Loss[0m: 1.312

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 21.77it/s]


[1mTrain Loss[0m: 1.344

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:05<00:00, 20.41it/s]


[1mTrain Loss[0m: 1.272

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 20.53it/s]


[1mTrain Loss[0m: 1.283

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:06<00:00, 19.44it/s]


[1mTrain Loss[0m: 1.218

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:05<00:00, 20.26it/s]


[1mTrain Loss[0m: 1.179

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 21.99it/s]


[1mTrain Loss[0m: 1.160

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 19.14it/s]


[1mTrain Loss[0m: 1.118

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 22.62it/s]


[1mTrain Loss[0m: 1.100

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 18.93it/s]


[1mTrain Loss[0m: 1.129

[1mTest Accuracy[0m: 0.614

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.56      0.86      0.67      1900
      Sports       0.68      0.75      0.71      1900
    Business       0.67      0.49      0.56      1900
    Sci/Tech       0.58      0.36      0.45      1900

    accuracy                           0.61      7600
   macro avg       0.62      0.61      0.60      7600
weighted avg       0.62      0.61      0.60      7600

[1mConfusion Matrix:[0m
 [[1629  124  111   36]
 [ 139 1423   34  304]
 [ 709  102  928  161]
 [ 450  445  315  690]]


### **9.3. Bidirectional RNN model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional RNN with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [20]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:06<00:00, 19.62it/s]


[1mTrain Loss[0m: 1.378

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:07<00:00, 16.77it/s]


[1mTrain Loss[0m: 1.333

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:06<00:00, 19.08it/s]


[1mTrain Loss[0m: 1.343

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:06<00:00, 17.36it/s]


[1mTrain Loss[0m: 1.293

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:06<00:00, 19.06it/s]


[1mTrain Loss[0m: 1.271

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:06<00:00, 17.41it/s]


[1mTrain Loss[0m: 1.266

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:06<00:00, 18.15it/s]


[1mTrain Loss[0m: 1.283

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:06<00:00, 18.44it/s]


[1mTrain Loss[0m: 1.280

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:06<00:00, 17.18it/s]


[1mTrain Loss[0m: 1.281

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:06<00:00, 19.44it/s]


[1mTrain Loss[0m: 1.251

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:06<00:00, 17.43it/s]


[1mTrain Loss[0m: 1.269

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:06<00:00, 19.56it/s]


[1mTrain Loss[0m: 1.283

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 17.50it/s]


[1mTrain Loss[0m: 1.257

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:06<00:00, 19.29it/s]


[1mTrain Loss[0m: 1.248

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 17.25it/s]


[1mTrain Loss[0m: 1.275

[1mTest Accuracy[0m: 0.456

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.51      0.54      0.52      1900
      Sports       0.82      0.31      0.45      1900
    Business       0.44      0.36      0.40      1900
    Sci/Tech       0.35      0.61      0.45      1900

    accuracy                           0.46      7600
   macro avg       0.53      0.46      0.45      7600
weighted avg       0.53      0.46      0.45      7600

[1mConfusion Matrix:[0m
 [[1031   69  330  470]
 [ 267  581  164  888]
 [ 430   17  689  764]
 [ 304   42  392 1162]]


### **9.4. Unidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a unidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [21]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:05<00:00, 20.77it/s]


[1mTrain Loss[0m: 1.320

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:06<00:00, 17.88it/s]


[1mTrain Loss[0m: 1.097

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:05<00:00, 21.12it/s]


[1mTrain Loss[0m: 0.995

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:06<00:00, 18.51it/s]


[1mTrain Loss[0m: 0.946

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 20.72it/s]


[1mTrain Loss[0m: 0.921

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:06<00:00, 18.34it/s]


[1mTrain Loss[0m: 0.912

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 20.73it/s]


[1mTrain Loss[0m: 0.896

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:06<00:00, 18.49it/s]


[1mTrain Loss[0m: 0.882

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 21.18it/s]


[1mTrain Loss[0m: 0.873

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:06<00:00, 18.50it/s]


[1mTrain Loss[0m: 0.866

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:05<00:00, 20.72it/s]


[1mTrain Loss[0m: 0.858

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:06<00:00, 18.28it/s]


[1mTrain Loss[0m: 0.858

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 21.30it/s]


[1mTrain Loss[0m: 0.847

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:06<00:00, 17.88it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 21.30it/s]


[1mTrain Loss[0m: 0.838

[1mTest Accuracy[0m: 0.884

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.93      0.85      0.89      1900
      Sports       0.93      0.96      0.95      1900
    Business       0.86      0.83      0.85      1900
    Sci/Tech       0.82      0.89      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.89      0.88      0.88      7600
weighted avg       0.89      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1615   81  123   81]
 [  14 1828    6   52]
 [  62   20 1582  236]
 [  45   35  126 1694]]


### **9.5. Bidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [22]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:07<00:00, 15.88it/s]


[1mTrain Loss[0m: 1.338

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:06<00:00, 18.53it/s]


[1mTrain Loss[0m: 1.078

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:07<00:00, 16.55it/s]


[1mTrain Loss[0m: 0.963

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:06<00:00, 18.37it/s]


[1mTrain Loss[0m: 0.920

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:07<00:00, 16.41it/s]


[1mTrain Loss[0m: 0.898

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:06<00:00, 18.59it/s]


[1mTrain Loss[0m: 0.887

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:07<00:00, 16.27it/s]


[1mTrain Loss[0m: 0.872

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:06<00:00, 16.94it/s]


[1mTrain Loss[0m: 0.861

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:06<00:00, 17.91it/s]


[1mTrain Loss[0m: 0.856

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:07<00:00, 16.27it/s]


[1mTrain Loss[0m: 0.856

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:06<00:00, 18.75it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:07<00:00, 16.38it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 18.34it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:07<00:00, 16.53it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 18.34it/s]


[1mTrain Loss[0m: 0.840

[1mTest Accuracy[0m: 0.878

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.92      0.85      0.89      1900
      Sports       0.89      0.97      0.93      1900
    Business       0.85      0.82      0.83      1900
    Sci/Tech       0.85      0.87      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1617  108  118   57]
 [  15 1848   15   22]
 [  69   59 1551  221]
 [  48   64  131 1657]]


### **9.6. Bidirectional LSTM model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional LSTM with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [23]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:07<00:00, 15.34it/s]


[1mTrain Loss[0m: 1.353

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:07<00:00, 15.33it/s]


[1mTrain Loss[0m: 1.265

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:06<00:00, 17.39it/s]


[1mTrain Loss[0m: 1.048

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:07<00:00, 15.32it/s]


[1mTrain Loss[0m: 0.953

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:07<00:00, 16.54it/s]


[1mTrain Loss[0m: 0.907

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:07<00:00, 15.79it/s]


[1mTrain Loss[0m: 0.886

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:07<00:00, 15.47it/s]


[1mTrain Loss[0m: 0.870

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:06<00:00, 17.09it/s]


[1mTrain Loss[0m: 0.859

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:07<00:00, 15.23it/s]


[1mTrain Loss[0m: 0.852

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:06<00:00, 16.93it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:07<00:00, 15.79it/s]


[1mTrain Loss[0m: 0.840

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:07<00:00, 15.49it/s]


[1mTrain Loss[0m: 0.837

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 17.33it/s]


[1mTrain Loss[0m: 0.832

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:07<00:00, 15.38it/s]


[1mTrain Loss[0m: 0.827

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 17.26it/s]


[1mTrain Loss[0m: 0.825

[1mTest Accuracy[0m: 0.882

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.90      0.87      0.89      1900
      Sports       0.91      0.97      0.94      1900
    Business       0.87      0.81      0.84      1900
    Sci/Tech       0.85      0.87      0.86      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1658   95   84   63]
 [  22 1836   11   31]
 [ 104   39 1547  210]
 [  53   47  139 1661]]


## **10. Visualize the performance metrics of the models**

The visualize function is called to generate and display tables that compare the performance of all the models trained above.

In [24]:
visualize(models, accuracies, parameters, time_costs)

+-----------+----------+------------+-----------+
|   [1mModel[0m   | [1mAccuracy[0m | [1mParameters[0m | [1mTime Cost[0m |
+-----------+----------+------------+-----------+
|  1Uni-RNN |  0.3421  |  2136284   |   5.561   |
|  1Bi-RNN  |  0.6145  |  2147164   |   5.8997  |
|  2Bi-RNN  |  0.4557  |  2171996   |   6.497   |
| 1Uni-LSTM |  0.8841  |  2168156   |   6.0233  |
|  1Bi-LSTM |  0.878   |  2210908   |   6.8385  |
|  2Bi-LSTM |  0.8818  |  2310236   |   7.3513  |
+-----------+----------+------------+-----------+


`################################################`

## **11. RNN & LSTM models trained on AGNTC dataset with sequence length of 25 words and pre-trained word embeddings (glove-6B100d)**

This is a variation of the very first implementation (see: Unit 6) that incorporates pre-trained word embeddings (glove-6B100d). The MAX_WORDS hyperparameter is set to 25, meaning that only the first 25 words of each text in the dataset are used as input to the model. The train_loader and test_loader are generated using the generate_loader function, which takes in the train_dataset and test_dataset, along with the MAX_WORDS and BATCH_SIZE hyperparameters. The last argument specifies whether the data is used for training (value: True) or testing (value: False), respectively.

In [25]:
MAX_WORDS = 25
train_loader, test_loader = generate_loader(train_dataset, MAX_WORDS, BATCH_SIZE, True), generate_loader(test_dataset, MAX_WORDS, BATCH_SIZE, False)
models = ["1Uni-preRNN", "1Bi-preRNN", "2Bi-preRNN", "1Uni-preLSTM", "1Bi-preLSTM", "2Bi-preLSTM"]; accuracies = []; parameters = []; time_costs = []

The load_embeddings function loads pre-trained word embeddings from a file located at the specified path and extracts the embeddings only for the words present in the provided vocabulary vocab of the given dimension. It then returns a tensor of size len(vocab) × dimension, where each row represents the word embedding for a word in the vocabulary. In this case, the function is called with the arguments "glove.6B.100d.txt" as the file path to the pre-trained embeddings, vocab as the vocabulary containing the words for which we need the embeddings and EMBEDDING_DIM as the dimension of the word embeddings. Finally, the resulting embeddings are assigned to the embeddings variable.

In [26]:
def load_embeddings(path, vocab, dimension):
    with open(path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    embeddings = torch.zeros(len(vocab), dimension)
    for line in lines:
        word, vec = line.strip().split(' ', 1)
        if word in vocab:
            embeddings[vocab[word]] = torch.tensor([float(x) for x in vec.split()])
    return embeddings

embeddings = load_embeddings("glove.6B.100d.txt", vocab, EMBEDDING_DIM)

The pretrained_RNN_model class is defined as a subclass of nn.Module and has a constructor that sets up the model architecture. The embedding_layer is defined using nn.Embedding and initialized with pre-trained embeddings. The embeddings are copied into the embedding_layer's weight tensor using the copy_ method, and the requires_grad attribute is set to freeze = False, to ensure that the weights of the embedding_layer will be updated during training. The pretrained_RNN_model constructor uses the get_directions function to determine the size of the hidden state based on whether or not the RNN is bidirectional. The get_directions function returns 2 if the RNN is bidirectional and 1 otherwise, which is used to compute the size of the hidden state in the linear layer. This ensures that the output of the RNN can be fed into the linear layer correctly, regardless of whether or not the RNN is bidirectional.

* `hidden_size` is set to the product of hidden_dim and the number of directions, which is either 1 or 2 depending on the bidirectional parameter. This is because in a bidirectional RNN, the number of hidden units in the forward and backward directions are added together to obtain the total number of hidden units, whereas in a unidirectional RNN, there is only one set of hidden units.

* `nn.Linear` is then defined with an input size equal to hidden_dim times the number of directions, and an output size of output_dim. This linear layer is used to map the final hidden state of the RNN to the output classes.

The forward function takes in a batch of input data X_batch and passes it through the model. The input data is first passed through an embedding layer to transform it into a dense vector representation. This embedding is then fed into an RNN layer, which processes the input data sequence and produces output at each time step. The output of the RNN is concatenated and passed through a linear layer to produce the final output logits, which are then passed through a softmax function to generate class probabilities. The final probabilities are returned as the output of the forward pass.

* `output_concat` is created by concatenating the last hidden_size units of the forward and backward RNN outputs. The output tensor has shape (batch_size, sequence_length, hidden_size*num_directions), so output[:, :, :self.hidden_size] selects the forward outputs and output[:, :, self.hidden_size:] selects the backward outputs.

* the : in `output_concat[:, :, :]` means that we include all elements in the first two dimensions of the tensor (i.e., the batch size and the number of hidden units in the linear layer). The -1 in output_concat[:, -1, :] means that we only take the last element along the second dimension (i.e., the last hidden state of the concatenated RNN outputs).

### **11.1. Unidirectional RNN model with 1 layer**

An instance of the RNN_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional RNN model with 1 layer. The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch. The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [27]:
class pretrained_RNN_model(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, num_layers, bidirectional, output_dim, embeddings, freeze):
        super(pretrained_RNN_model, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        self.embedding_layer.weight.data.copy_(embeddings)
        self.embedding_layer.weight.requires_grad = freeze  # freezes the weights of the embedding layer
        self.rnn = nn.RNN(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers, bidirectional=bidirectional, batch_first=True)
        self.hidden_size = hidden_dim * get_directions(bidirectional)
        self.linear = nn.Linear(hidden_dim * get_directions(bidirectional), output_dim)
    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, hidden = self.rnn(embeddings)
        output_concat = torch.cat([output[:, :, :self.hidden_size], output[:, :, self.hidden_size:]], dim=2) # concatenates outputs
        logits = self.linear(output_concat[:, -1, :]) # the last output of the concatenated RNN is used for sequence classification
        probs = F.softmax(logits, dim=1)
        return probs

classifier, loss_fn, optimizer = setup_model(device, pretrained_RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, embeddings, False)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:03<00:00, 30.02it/s]


[1mTrain Loss[0m: 1.061

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 28.99it/s]


[1mTrain Loss[0m: 0.905

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 24.22it/s]


[1mTrain Loss[0m: 0.900

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 29.16it/s]


[1mTrain Loss[0m: 0.887

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:03<00:00, 29.85it/s]


[1mTrain Loss[0m: 0.900

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 23.90it/s]


[1mTrain Loss[0m: 0.886

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:04<00:00, 29.45it/s]


[1mTrain Loss[0m: 0.889

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 28.74it/s]


[1mTrain Loss[0m: 0.890

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 23.80it/s]


[1mTrain Loss[0m: 0.882

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:03<00:00, 29.82it/s]


[1mTrain Loss[0m: 0.885

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 25.84it/s]


[1mTrain Loss[0m: 0.880

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 26.20it/s]


[1mTrain Loss[0m: 0.893

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:03<00:00, 30.42it/s]


[1mTrain Loss[0m: 0.891

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 25.41it/s]


[1mTrain Loss[0m: 0.917

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 28.05it/s]


[1mTrain Loss[0m: 1.006

[1mTest Accuracy[0m: 0.674

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.81      0.85      0.83      1900
      Sports       0.79      0.98      0.88      1900
    Business       0.50      0.86      0.64      1900
    Sci/Tech       1.00      0.00      0.01      1900

    accuracy                           0.67      7600
   macro avg       0.78      0.67      0.59      7600
weighted avg       0.78      0.67      0.59      7600

[1mConfusion Matrix:[0m
 [[1616  150  134    0]
 [  25 1860   15    0]
 [  95  166 1639    0]
 [ 257  165 1471    7]]


### **11.2. Bidirectional RNN model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional RNN with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [28]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, embeddings, False)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 27.60it/s]


[1mTrain Loss[0m: 1.076

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 23.57it/s]


[1mTrain Loss[0m: 0.912

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 29.28it/s]


[1mTrain Loss[0m: 0.905

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 28.11it/s]


[1mTrain Loss[0m: 0.897

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 23.53it/s]


[1mTrain Loss[0m: 0.889

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 29.21it/s]


[1mTrain Loss[0m: 0.884

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:04<00:00, 28.10it/s]


[1mTrain Loss[0m: 0.883

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:05<00:00, 22.66it/s]


[1mTrain Loss[0m: 0.880

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 28.31it/s]


[1mTrain Loss[0m: 0.881

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 24.52it/s]


[1mTrain Loss[0m: 0.877

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 27.11it/s]


[1mTrain Loss[0m: 0.880

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 28.90it/s]


[1mTrain Loss[0m: 0.881

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 22.72it/s]


[1mTrain Loss[0m: 0.885

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 28.84it/s]


[1mTrain Loss[0m: 0.887

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 28.92it/s]


[1mTrain Loss[0m: 0.880

[1mTest Accuracy[0m: 0.863

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.91      0.84      0.87      1900
      Sports       0.91      0.97      0.94      1900
    Business       0.86      0.76      0.81      1900
    Sci/Tech       0.78      0.89      0.83      1900

    accuracy                           0.86      7600
   macro avg       0.87      0.86      0.86      7600
weighted avg       0.87      0.86      0.86      7600

[1mConfusion Matrix:[0m
 [[1594   91   89  126]
 [  18 1835   17   30]
 [  85   57 1439  319]
 [  55   37  119 1689]]


### **11.3. Bidirectional RNN model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional RNN with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [29]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, embeddings, False)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:05<00:00, 22.22it/s]


[1mTrain Loss[0m: 1.013

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 28.30it/s]


[1mTrain Loss[0m: 0.917

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 24.39it/s]


[1mTrain Loss[0m: 0.904

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 24.71it/s]


[1mTrain Loss[0m: 0.926

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 27.32it/s]


[1mTrain Loss[0m: 0.906

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 21.51it/s]


[1mTrain Loss[0m: 0.894

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:04<00:00, 27.14it/s]


[1mTrain Loss[0m: 0.928

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 27.43it/s]


[1mTrain Loss[0m: 0.937

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 21.91it/s]


[1mTrain Loss[0m: 0.897

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 27.34it/s]


[1mTrain Loss[0m: 0.886

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:05<00:00, 21.73it/s]


[1mTrain Loss[0m: 0.888

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 26.54it/s]


[1mTrain Loss[0m: 0.888

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 26.62it/s]


[1mTrain Loss[0m: 0.911

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 22.37it/s]


[1mTrain Loss[0m: 0.887

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 26.79it/s]


[1mTrain Loss[0m: 0.881

[1mTest Accuracy[0m: 0.864

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.90      0.86      0.88      1900
      Sports       0.91      0.96      0.94      1900
    Business       0.80      0.83      0.82      1900
    Sci/Tech       0.85      0.80      0.82      1900

    accuracy                           0.86      7600
   macro avg       0.86      0.86      0.86      7600
weighted avg       0.86      0.86      0.86      7600

[1mConfusion Matrix:[0m
 [[1634   90  120   56]
 [  18 1828   27   27]
 [  84   46 1586  184]
 [  83   43  257 1517]]


The pretrained_LSTM_model class is defined as a subclass of nn.Module and has a constructor that sets up the model architecture. The embedding_layer is defined using nn.Embedding and initialized with pre-trained embeddings. The embeddings are copied into the embedding_layer's weight tensor using the copy_ method, and the requires_grad attribute is set to freeze = False, to ensure that the weights of the embedding_layer will be updated during training. The pretrained_LSTM_model constructor uses the get_directions function to determine the size of the hidden state based on whether or not the LSTM is bidirectional. The get_directions function returns 2 if the LSTM is bidirectional and 1 otherwise, which is used to compute the size of the hidden state in the linear layer. This ensures that the output of the LSTM can be fed into the linear layer correctly, regardless of whether or not the LSTM is bidirectional.

* `hidden_size` is set to the product of hidden_dim and the number of directions, which is either 1 or 2 depending on the bidirectional parameter. This is because in a bidirectional LSTM, the number of hidden units in the forward and backward directions are added together to obtain the total number of hidden units, whereas in a unidirectional LSTM, there is only one set of hidden units.

* `nn.Linear` is then defined with an input size equal to hidden_dim times the number of directions, and an output size of output_dim. This linear layer is used to map the final hidden state of the LSTM to the output classes.

The forward function takes in a batch of input data X_batch and passes it through the model. The input data is first passed through an embedding layer to transform it into a dense vector representation. This embedding is then fed into an LSTM layer, which processes the input data sequence, produces the output at each time step and updates the hidden and cell state. The output of the LSTM is concatenated and passed through a linear layer to produce the final output logits, which are then passed through a softmax function to generate class probabilities. The final probabilities are returned as the output of the forward pass.

* `output_concat` is created by concatenating the last hidden_size units of the forward and backward LSTM outputs. The output tensor has shape (batch_size, sequence_length, hidden_size*num_directions), so output[:, :, :self.hidden_size] selects the forward outputs and output[:, :, self.hidden_size:] selects the backward outputs.

* the : in `output_concat[:, :, :]` means that we include all elements in the first two dimensions of the tensor (i.e., the batch size and the number of hidden units in the linear layer). The -1 in output_concat[:, -1, :] means that we only take the last element along the second dimension (i.e., the last hidden state of the concatenated LSTM outputs).

### **11.4. Unidirectional LSTM model with 1 layer**

An instance of the pretrained_LSTM_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional LSTM model with 1 layer. The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch. The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [30]:
class pretrained_LSTM_model(nn.Module):
    def __init__(self, input_dim, embedding_dim, hidden_dim, num_layers, bidirectional, output_dim, embeddings, freeze):
        super(pretrained_LSTM_model, self).__init__()
        self.embedding_layer = nn.Embedding(num_embeddings=input_dim, embedding_dim=embedding_dim)
        self.embedding_layer.weight.data.copy_(embeddings)
        self.embedding_layer.weight.requires_grad = freeze  # freezes the weights of the embedding layer
        self.lstm = nn.LSTM(input_size=embedding_dim, hidden_size=hidden_dim, num_layers=num_layers, bidirectional=bidirectional, batch_first=True)
        self.hidden_size = hidden_dim * get_directions(bidirectional)
        self.linear = nn.Linear(hidden_dim * get_directions(bidirectional), output_dim)
    def forward(self, X_batch):
        embeddings = self.embedding_layer(X_batch)
        output, (hidden, cell) = self.lstm(embeddings)
        output_concat = torch.cat([output[:, :, :self.hidden_size], output[:, :, self.hidden_size:]], dim=2) # concatenates outputs
        logits = self.linear(output_concat[:, -1, :]) # the last output of the concatenated LSTM is used for sequence classification
        probs = F.softmax(logits, dim=1)
        return probs

classifier, loss_fn, optimizer = setup_model(device, pretrained_LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, embeddings, False)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 24.40it/s]


[1mTrain Loss[0m: 1.029

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 25.79it/s]


[1mTrain Loss[0m: 0.874

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 27.31it/s]


[1mTrain Loss[0m: 0.864

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 22.64it/s]


[1mTrain Loss[0m: 0.858

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 28.07it/s]


[1mTrain Loss[0m: 0.855

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 27.43it/s]


[1mTrain Loss[0m: 0.852

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 22.72it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 27.38it/s]


[1mTrain Loss[0m: 0.846

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 25.65it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 24.93it/s]


[1mTrain Loss[0m: 0.841

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 28.31it/s]


[1mTrain Loss[0m: 0.840

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 22.20it/s]


[1mTrain Loss[0m: 0.838

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 28.11it/s]


[1mTrain Loss[0m: 0.836

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 28.47it/s]


[1mTrain Loss[0m: 0.835

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 22.62it/s]


[1mTrain Loss[0m: 0.834

[1mTest Accuracy[0m: 0.893

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.88      0.90      0.89      1900
      Sports       0.95      0.96      0.96      1900
    Business       0.90      0.81      0.85      1900
    Sci/Tech       0.84      0.90      0.87      1900

    accuracy                           0.89      7600
   macro avg       0.89      0.89      0.89      7600
weighted avg       0.89      0.89      0.89      7600

[1mConfusion Matrix:[0m
 [[1716   58   68   58]
 [  39 1825   19   17]
 [  98   17 1532  253]
 [  86   13   89 1712]]


### **11.5. Bidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [31]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, embeddings, False)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 24.92it/s]


[1mTrain Loss[0m: 1.034

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 21.47it/s]


[1mTrain Loss[0m: 0.873

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 26.26it/s]


[1mTrain Loss[0m: 0.863

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 26.24it/s]


[1mTrain Loss[0m: 0.857

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 20.75it/s]


[1mTrain Loss[0m: 0.854

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 25.82it/s]


[1mTrain Loss[0m: 0.850

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 21.75it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 25.30it/s]


[1mTrain Loss[0m: 0.845

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 26.13it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 21.37it/s]


[1mTrain Loss[0m: 0.841

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 25.98it/s]


[1mTrain Loss[0m: 0.840

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 21.56it/s]


[1mTrain Loss[0m: 0.837

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 25.28it/s]


[1mTrain Loss[0m: 0.836

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 24.78it/s]


[1mTrain Loss[0m: 0.833

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 21.47it/s]


[1mTrain Loss[0m: 0.832

[1mTest Accuracy[0m: 0.899

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.92      0.88      0.90      1900
      Sports       0.95      0.97      0.96      1900
    Business       0.86      0.86      0.86      1900
    Sci/Tech       0.86      0.89      0.87      1900

    accuracy                           0.90      7600
   macro avg       0.90      0.90      0.90      7600
weighted avg       0.90      0.90      0.90      7600

[1mConfusion Matrix:[0m
 [[1678   62   94   66]
 [  25 1837   19   19]
 [  65   18 1631  186]
 [  51   11  153 1685]]


### **11.6. Bidirectional LSTM model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional LSTM with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [32]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, embeddings, False)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 23.85it/s]


[1mTrain Loss[0m: 1.011

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 20.07it/s]


[1mTrain Loss[0m: 0.873

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 24.71it/s]


[1mTrain Loss[0m: 0.868

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 22.24it/s]


[1mTrain Loss[0m: 0.860

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 22.92it/s]


[1mTrain Loss[0m: 0.855

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 25.18it/s]


[1mTrain Loss[0m: 0.851

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 20.57it/s]


[1mTrain Loss[0m: 0.849

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 25.02it/s]


[1mTrain Loss[0m: 0.846

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 21.23it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 24.09it/s]


[1mTrain Loss[0m: 0.843

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 24.51it/s]


[1mTrain Loss[0m: 0.840

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 20.84it/s]


[1mTrain Loss[0m: 0.839

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 25.05it/s]


[1mTrain Loss[0m: 0.838

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 21.03it/s]


[1mTrain Loss[0m: 0.835

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 24.51it/s]


[1mTrain Loss[0m: 0.835

[1mTest Accuracy[0m: 0.898

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.88      0.91      0.90      1900
      Sports       0.95      0.97      0.96      1900
    Business       0.87      0.85      0.86      1900
    Sci/Tech       0.88      0.86      0.87      1900

    accuracy                           0.90      7600
   macro avg       0.90      0.90      0.90      7600
weighted avg       0.90      0.90      0.90      7600

[1mConfusion Matrix:[0m
 [[1727   62   67   44]
 [  25 1843   20   12]
 [ 103   18 1614  165]
 [  97   13  149 1641]]


## **12. Visualize the performance metrics of the models**

The visualize function is called to generate and display tables that compare the performance of all the models trained above.

In [33]:
visualize(models, accuracies, parameters, time_costs)

+--------------+----------+------------+-----------+
|    [1mModel[0m     | [1mAccuracy[0m | [1mParameters[0m | [1mTime Cost[0m |
+--------------+----------+------------+-----------+
| 1Uni-preRNN  |  0.6739  |   10884    |   4.3174  |
|  1Bi-preRNN  |  0.8628  |   21764    |   4.4582  |
|  2Bi-preRNN  |  0.8638  |   46596    |   4.757   |
| 1Uni-preLSTM |  0.8928  |   42756    |   4.6302  |
| 1Bi-preLSTM  |  0.8988  |   85508    |   4.9775  |
| 2Bi-preLSTM  |  0.898   |   184836   |   5.1593  |
+--------------+----------+------------+-----------+


`##########################`

## **13. RNN & LSTM models trained on AGNTC dataset with sequence length of 25 words and frozen pre-trained word embeddings (glove-6B100d)**

This is a variation of the previous implementation (see: Unit 11) that incorporates frozen pre-trained word embeddings (glove-6B100d). We basically freeze the embeddings layer, so that its weights don't get updated during training (the pre-trained word embeddings are used as fixed features in the model).

In [34]:
accuracies = []; parameters = []; time_costs = []

### **13.1. Unidirectional RNN model with 1 layer**

An instance of the RNN_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional RNN model with 1 layer. The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch. The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [35]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, embeddings, True)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 25.42it/s]


[1mTrain Loss[0m: 1.088

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 25.22it/s]


[1mTrain Loss[0m: 0.892

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 27.20it/s]


[1mTrain Loss[0m: 0.892

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 22.95it/s]


[1mTrain Loss[0m: 0.865

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 27.34it/s]


[1mTrain Loss[0m: 0.859

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 28.42it/s]


[1mTrain Loss[0m: 0.855

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 22.88it/s]


[1mTrain Loss[0m: 0.851

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 27.37it/s]


[1mTrain Loss[0m: 0.853

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 24.97it/s]


[1mTrain Loss[0m: 0.845

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 25.18it/s]


[1mTrain Loss[0m: 0.854

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 27.96it/s]


[1mTrain Loss[0m: 0.865

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 22.38it/s]


[1mTrain Loss[0m: 0.911

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 28.02it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 26.86it/s]


[1mTrain Loss[0m: 0.862

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 22.77it/s]


[1mTrain Loss[0m: 0.849

[1mTest Accuracy[0m: 0.879

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.90      0.87      0.89      1900
      Sports       0.94      0.95      0.95      1900
    Business       0.86      0.81      0.83      1900
    Sci/Tech       0.81      0.89      0.85      1900

    accuracy                           0.88      7600
   macro avg       0.88      0.88      0.88      7600
weighted avg       0.88      0.88      0.88      7600

[1mConfusion Matrix:[0m
 [[1657   55   87  101]
 [  48 1802   15   35]
 [  82   34 1532  252]
 [  57   16  138 1689]]


### **13.2. Bidirectional RNN model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional RNN with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [36]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, embeddings, True)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 25.96it/s]


[1mTrain Loss[0m: 1.101

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 21.90it/s]


[1mTrain Loss[0m: 0.912

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 26.81it/s]


[1mTrain Loss[0m: 0.882

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:04<00:00, 26.37it/s]


[1mTrain Loss[0m: 0.872

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 21.40it/s]


[1mTrain Loss[0m: 0.872

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 26.72it/s]


[1mTrain Loss[0m: 0.856

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 23.35it/s]


[1mTrain Loss[0m: 0.853

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 24.77it/s]


[1mTrain Loss[0m: 0.853

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:04<00:00, 26.92it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 21.58it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 25.83it/s]


[1mTrain Loss[0m: 0.843

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 23.94it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:04<00:00, 23.63it/s]


[1mTrain Loss[0m: 0.839

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 26.45it/s]


[1mTrain Loss[0m: 0.839

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:05<00:00, 22.10it/s]


[1mTrain Loss[0m: 0.838

[1mTest Accuracy[0m: 0.889

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.92      0.88      0.90      1900
      Sports       0.94      0.96      0.95      1900
    Business       0.87      0.81      0.84      1900
    Sci/Tech       0.83      0.90      0.86      1900

    accuracy                           0.89      7600
   macro avg       0.89      0.89      0.89      7600
weighted avg       0.89      0.89      0.89      7600

[1mConfusion Matrix:[0m
 [[1673   66   91   70]
 [  16 1832   14   38]
 [  87   26 1543  244]
 [  49   19  124 1708]]


### **13.3. Bidirectional RNN model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional RNN with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [37]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, embeddings, True)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:04<00:00, 23.70it/s]


[1mTrain Loss[0m: 1.001

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 22.07it/s]


[1mTrain Loss[0m: 0.899

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:05<00:00, 22.47it/s]


[1mTrain Loss[0m: 0.873

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 23.47it/s]


[1mTrain Loss[0m: 0.858

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:05<00:00, 20.14it/s]


[1mTrain Loss[0m: 0.852

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:04<00:00, 24.41it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:05<00:00, 19.94it/s]


[1mTrain Loss[0m: 0.846

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:04<00:00, 24.41it/s]


[1mTrain Loss[0m: 0.848

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 20.55it/s]


[1mTrain Loss[0m: 0.871

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 23.06it/s]


[1mTrain Loss[0m: 0.863

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:04<00:00, 24.16it/s]


[1mTrain Loss[0m: 0.840

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 20.45it/s]


[1mTrain Loss[0m: 0.845

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 23.38it/s]


[1mTrain Loss[0m: 0.843

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 20.08it/s]


[1mTrain Loss[0m: 0.844

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 24.01it/s]


[1mTrain Loss[0m: 0.843

[1mTest Accuracy[0m: 0.889

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.95      0.85      0.90      1900
      Sports       0.92      0.97      0.95      1900
    Business       0.86      0.82      0.84      1900
    Sci/Tech       0.83      0.90      0.87      1900

    accuracy                           0.89      7600
   macro avg       0.89      0.89      0.89      7600
weighted avg       0.89      0.89      0.89      7600

[1mConfusion Matrix:[0m
 [[1621   83  128   68]
 [  12 1850   17   21]
 [  41   39 1566  254]
 [  35   33  116 1716]]


### **13.4. Unidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a unidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [38]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, embeddings, True)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:05<00:00, 20.28it/s]


[1mTrain Loss[0m: 1.019

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:04<00:00, 24.15it/s]


[1mTrain Loss[0m: 0.857

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:04<00:00, 24.48it/s]


[1mTrain Loss[0m: 0.841

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 20.95it/s]


[1mTrain Loss[0m: 0.831

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:04<00:00, 24.68it/s]


[1mTrain Loss[0m: 0.823

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 20.40it/s]


[1mTrain Loss[0m: 0.818

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:04<00:00, 24.89it/s]


[1mTrain Loss[0m: 0.813

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:05<00:00, 21.58it/s]


[1mTrain Loss[0m: 0.809

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:05<00:00, 23.42it/s]


[1mTrain Loss[0m: 0.807

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:04<00:00, 24.84it/s]


[1mTrain Loss[0m: 0.805

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:05<00:00, 20.45it/s]


[1mTrain Loss[0m: 0.804

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:04<00:00, 25.03it/s]


[1mTrain Loss[0m: 0.801

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:05<00:00, 20.99it/s]


[1mTrain Loss[0m: 0.800

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:04<00:00, 24.43it/s]


[1mTrain Loss[0m: 0.799

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:04<00:00, 23.66it/s]


[1mTrain Loss[0m: 0.798

[1mTest Accuracy[0m: 0.908

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.92      0.91      0.91      1900
      Sports       0.96      0.97      0.96      1900
    Business       0.88      0.87      0.87      1900
    Sci/Tech       0.88      0.89      0.88      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600

[1mConfusion Matrix:[0m
 [[1727   52   74   47]
 [  24 1845   17   14]
 [  64   16 1650  170]
 [  70   11  137 1682]]


### **13.5. Bidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [39]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, embeddings, True)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:05<00:00, 19.85it/s]


[1mTrain Loss[0m: 1.046

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:05<00:00, 20.53it/s]


[1mTrain Loss[0m: 0.860

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:06<00:00, 18.83it/s]


[1mTrain Loss[0m: 0.841

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:05<00:00, 20.55it/s]


[1mTrain Loss[0m: 0.831

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:06<00:00, 18.85it/s]


[1mTrain Loss[0m: 0.823

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:05<00:00, 21.38it/s]


[1mTrain Loss[0m: 0.818

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:06<00:00, 18.44it/s]


[1mTrain Loss[0m: 0.813

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:05<00:00, 21.10it/s]


[1mTrain Loss[0m: 0.810

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:06<00:00, 18.61it/s]


[1mTrain Loss[0m: 0.807

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:05<00:00, 21.33it/s]


[1mTrain Loss[0m: 0.805

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:06<00:00, 18.42it/s]


[1mTrain Loss[0m: 0.803

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:05<00:00, 21.09it/s]


[1mTrain Loss[0m: 0.802

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:06<00:00, 18.53it/s]


[1mTrain Loss[0m: 0.801

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:05<00:00, 21.60it/s]


[1mTrain Loss[0m: 0.799

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 18.91it/s]


[1mTrain Loss[0m: 0.798

[1mTest Accuracy[0m: 0.908

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.92      0.91      0.91      1900
      Sports       0.95      0.97      0.96      1900
    Business       0.88      0.86      0.87      1900
    Sci/Tech       0.88      0.89      0.88      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600

[1mConfusion Matrix:[0m
 [[1720   56   74   50]
 [  19 1846   16   19]
 [  70   19 1641  170]
 [  66   14  128 1692]]


### **13.6. Bidirectional LSTM model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional LSTM with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [40]:
classifier, loss_fn, optimizer = setup_model(device, pretrained_LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, embeddings, True)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 118/118 [00:06<00:00, 18.11it/s]


[1mTrain Loss[0m: 0.987

[1mEpoch[0m: 2


100%|██████████| 118/118 [00:07<00:00, 16.45it/s]


[1mTrain Loss[0m: 0.855

[1mEpoch[0m: 3


100%|██████████| 118/118 [00:06<00:00, 18.62it/s]


[1mTrain Loss[0m: 0.842

[1mEpoch[0m: 4


100%|██████████| 118/118 [00:07<00:00, 16.32it/s]


[1mTrain Loss[0m: 0.833

[1mEpoch[0m: 5


100%|██████████| 118/118 [00:07<00:00, 16.84it/s]


[1mTrain Loss[0m: 0.827

[1mEpoch[0m: 6


100%|██████████| 118/118 [00:06<00:00, 17.93it/s]


[1mTrain Loss[0m: 0.821

[1mEpoch[0m: 7


100%|██████████| 118/118 [00:07<00:00, 16.44it/s]


[1mTrain Loss[0m: 0.818

[1mEpoch[0m: 8


100%|██████████| 118/118 [00:06<00:00, 18.88it/s]


[1mTrain Loss[0m: 0.814

[1mEpoch[0m: 9


100%|██████████| 118/118 [00:07<00:00, 16.45it/s]


[1mTrain Loss[0m: 0.811

[1mEpoch[0m: 10


100%|██████████| 118/118 [00:06<00:00, 18.57it/s]


[1mTrain Loss[0m: 0.809

[1mEpoch[0m: 11


100%|██████████| 118/118 [00:07<00:00, 16.48it/s]


[1mTrain Loss[0m: 0.806

[1mEpoch[0m: 12


100%|██████████| 118/118 [00:06<00:00, 18.88it/s]


[1mTrain Loss[0m: 0.805

[1mEpoch[0m: 13


100%|██████████| 118/118 [00:07<00:00, 16.15it/s]


[1mTrain Loss[0m: 0.803

[1mEpoch[0m: 14


100%|██████████| 118/118 [00:06<00:00, 16.89it/s]


[1mTrain Loss[0m: 0.803

[1mEpoch[0m: 15


100%|██████████| 118/118 [00:06<00:00, 18.46it/s]


[1mTrain Loss[0m: 0.800

[1mTest Accuracy[0m: 0.908

[1mClassification Report:[0m
               precision    recall  f1-score   support

       World       0.92      0.90      0.91      1900
      Sports       0.94      0.98      0.96      1900
    Business       0.88      0.86      0.87      1900
    Sci/Tech       0.88      0.88      0.88      1900

    accuracy                           0.91      7600
   macro avg       0.91      0.91      0.91      7600
weighted avg       0.91      0.91      0.91      7600

[1mConfusion Matrix:[0m
 [[1714   67   71   48]
 [  14 1862   12   12]
 [  74   24 1643  159]
 [  64   20  135 1681]]


## **14. Visualize the performance metrics of the models**

The visualize function is called to generate and display tables that compare the performance of all the models trained above.

In [41]:
visualize(models, accuracies, parameters, time_costs)

+--------------+----------+------------+-----------+
|    [1mModel[0m     | [1mAccuracy[0m | [1mParameters[0m | [1mTime Cost[0m |
+--------------+----------+------------+-----------+
| 1Uni-preRNN  |  0.8789  |  2136284   |   4.6362  |
|  1Bi-preRNN  |  0.8889  |  2147164   |   4.8546  |
|  2Bi-preRNN  |  0.8886  |  2171996   |   5.3012  |
| 1Uni-preLSTM |  0.9084  |  2168156   |   5.1848  |
| 1Bi-preLSTM  |  0.9078  |  2210908   |   5.9696  |
| 2Bi-preLSTM  |  0.9079  |  2310236   |   6.8009  |
+--------------+----------+------------+-----------+


`##########################################################`

## **15. RNN & LSTM models trained on IMDB dataset with sequence length of 25 words**

This is a variation of the very first implementation (see: Unit 6), where the dataset is now the IMDB instead of the AGNTC dataset. The classes list specifies the different categories or classes that the classification models will be trained to predict. In this case, we have two classes: Positive and Negative. This suggest that our models will be trained to classify movie reviews into these two broad categories. We load the IMDB Dataset from a CSV file and create two datasets - train_dataset and test_dataset. The first contains 80% of the data and will be used for training, while the latter contains the remaining 20% of the data and will be used for evaluating the models' performance.

In [42]:
models = ["1Uni-RNN", "1Bi-RNN", "2Bi-RNN", "1Uni-LSTM", "1Bi-LSTM", "2Bi-LSTM"]; classes = ["Positive", "Negative"]; accuracies = []; parameters = []; time_costs = []
train_dataset, test_dataset = load_dataset("IMDB Dataset.csv", ["review"], "sentiment", 80, "start"), load_dataset("IMDB Dataset.csv", ["review"], "sentiment", 20, "end")

The replace_labels function takes in a dataset along with two lists: categorical and numerical, which represent the current labels in the dataset and their corresponding numerical values. The function replaces the categorical labels with their corresponding numerical values and returns a new dataset with the updated labels. The train_dataset and test_dataset are then assigned to the new datasets returned by the replace_labels function.

In [43]:
def replace_labels(dataset, categorical, numerical):
    mapping = {categorical[0]: numerical[0], categorical[1]: numerical[1]}
    return [(mapping[label], text) for label, text in dataset]

train_dataset, test_dataset = replace_labels(train_dataset, ["negative", "positive"], [1,2]), replace_labels(test_dataset, ["negative", "positive"], [1,2])

The data loaders for the training and testing datasets are generated, using the generate_loader function, and the vocabulary is built using the build_vocab function - just like we did in the very first implementation.

In [44]:
train_loader, test_loader = generate_loader(train_dataset, MAX_WORDS, BATCH_SIZE, True), generate_loader(test_dataset, MAX_WORDS, BATCH_SIZE, False)
vocab = build_vocab([train_dataset, test_dataset], MIN_FREQ, PADDED, UNKNOWN)

### **15.1. Unidirectional RNN model with 1 layer**

An instance of the RNN_model is created and passed as an argument to setup_model function along with other hyperparameters, to set up a specific configuration of the model - in this case: unidirectional RNN model with 1 layer. The train_model function is called with the specified parameters. The time_cost variable stores the average time taken for each epoch. The evaluate_model function is called with the necessary parameters. The returned values are stored in variables for later use. The accuracies, parameters, and time_costs lists are then updated with the accuracy score, parameter count, and time cost of the model. As we'll see, these values will be stored for each model separately and will later be used to create a pretty table for comparison.

In [45]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 40/40 [00:06<00:00,  6.05it/s]


[1mTrain Loss[0m: 0.694

[1mEpoch[0m: 2


100%|██████████| 40/40 [00:06<00:00,  5.96it/s]


[1mTrain Loss[0m: 0.688

[1mEpoch[0m: 3


100%|██████████| 40/40 [00:07<00:00,  5.57it/s]


[1mTrain Loss[0m: 0.668

[1mEpoch[0m: 4


100%|██████████| 40/40 [00:06<00:00,  6.30it/s]


[1mTrain Loss[0m: 0.642

[1mEpoch[0m: 5


100%|██████████| 40/40 [00:07<00:00,  5.58it/s]


[1mTrain Loss[0m: 0.612

[1mEpoch[0m: 6


100%|██████████| 40/40 [00:06<00:00,  6.48it/s]


[1mTrain Loss[0m: 0.590

[1mEpoch[0m: 7


100%|██████████| 40/40 [00:07<00:00,  5.62it/s]


[1mTrain Loss[0m: 0.567

[1mEpoch[0m: 8


100%|██████████| 40/40 [00:06<00:00,  6.41it/s]


[1mTrain Loss[0m: 0.546

[1mEpoch[0m: 9


100%|██████████| 40/40 [00:07<00:00,  5.59it/s]


[1mTrain Loss[0m: 0.532

[1mEpoch[0m: 10


100%|██████████| 40/40 [00:06<00:00,  6.50it/s]


[1mTrain Loss[0m: 0.516

[1mEpoch[0m: 11


100%|██████████| 40/40 [00:07<00:00,  5.59it/s]


[1mTrain Loss[0m: 0.507

[1mEpoch[0m: 12


100%|██████████| 40/40 [00:06<00:00,  6.02it/s]


[1mTrain Loss[0m: 0.494

[1mEpoch[0m: 13


100%|██████████| 40/40 [00:06<00:00,  5.91it/s]


[1mTrain Loss[0m: 0.482

[1mEpoch[0m: 14


100%|██████████| 40/40 [00:07<00:00,  5.55it/s]


[1mTrain Loss[0m: 0.475

[1mEpoch[0m: 15


100%|██████████| 40/40 [00:06<00:00,  6.44it/s]


[1mTrain Loss[0m: 0.469

[1mTest Accuracy[0m: 0.825

[1mClassification Report:[0m
               precision    recall  f1-score   support

    Positive       0.84      0.79      0.82      4954
    Negative       0.81      0.85      0.83      5046

    accuracy                           0.82     10000
   macro avg       0.83      0.82      0.82     10000
weighted avg       0.83      0.82      0.82     10000

[1mConfusion Matrix:[0m
 [[3938 1016]
 [ 738 4308]]


### **15.2. Bidirectional RNN model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional RNN with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [46]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 40/40 [00:07<00:00,  5.45it/s]


[1mTrain Loss[0m: 0.695

[1mEpoch[0m: 2


100%|██████████| 40/40 [00:06<00:00,  6.29it/s]


[1mTrain Loss[0m: 0.687

[1mEpoch[0m: 3


100%|██████████| 40/40 [00:07<00:00,  5.50it/s]


[1mTrain Loss[0m: 0.663

[1mEpoch[0m: 4


100%|██████████| 40/40 [00:06<00:00,  5.79it/s]


[1mTrain Loss[0m: 0.632

[1mEpoch[0m: 5


100%|██████████| 40/40 [00:06<00:00,  6.00it/s]


[1mTrain Loss[0m: 0.603

[1mEpoch[0m: 6


100%|██████████| 40/40 [00:07<00:00,  5.37it/s]


[1mTrain Loss[0m: 0.580

[1mEpoch[0m: 7


100%|██████████| 40/40 [00:06<00:00,  6.28it/s]


[1mTrain Loss[0m: 0.560

[1mEpoch[0m: 8


100%|██████████| 40/40 [00:07<00:00,  5.36it/s]


[1mTrain Loss[0m: 0.540

[1mEpoch[0m: 9


100%|██████████| 40/40 [00:06<00:00,  6.31it/s]


[1mTrain Loss[0m: 0.527

[1mEpoch[0m: 10


100%|██████████| 40/40 [00:07<00:00,  5.55it/s]


[1mTrain Loss[0m: 0.510

[1mEpoch[0m: 11


100%|██████████| 40/40 [00:06<00:00,  6.15it/s]


[1mTrain Loss[0m: 0.496

[1mEpoch[0m: 12


100%|██████████| 40/40 [00:07<00:00,  5.58it/s]


[1mTrain Loss[0m: 0.491

[1mEpoch[0m: 13


100%|██████████| 40/40 [00:07<00:00,  5.52it/s]


[1mTrain Loss[0m: 0.479

[1mEpoch[0m: 14


100%|██████████| 40/40 [00:06<00:00,  6.18it/s]


[1mTrain Loss[0m: 0.473

[1mEpoch[0m: 15


100%|██████████| 40/40 [00:07<00:00,  5.49it/s]


[1mTrain Loss[0m: 0.465

[1mTest Accuracy[0m: 0.828

[1mClassification Report:[0m
               precision    recall  f1-score   support

    Positive       0.82      0.83      0.83      4954
    Negative       0.83      0.83      0.83      5046

    accuracy                           0.83     10000
   macro avg       0.83      0.83      0.83     10000
weighted avg       0.83      0.83      0.83     10000

[1mConfusion Matrix:[0m
 [[4106  848]
 [ 877 4169]]


### **15.3. Bidirectional RNN model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional RNN with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [47]:
classifier, loss_fn, optimizer = setup_model(device, RNN_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiRNN = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 40/40 [00:06<00:00,  6.10it/s]


[1mTrain Loss[0m: 0.693

[1mEpoch[0m: 2


100%|██████████| 40/40 [00:07<00:00,  5.37it/s]


[1mTrain Loss[0m: 0.674

[1mEpoch[0m: 3


100%|██████████| 40/40 [00:07<00:00,  5.46it/s]


[1mTrain Loss[0m: 0.651

[1mEpoch[0m: 4


100%|██████████| 40/40 [00:06<00:00,  5.92it/s]


[1mTrain Loss[0m: 0.629

[1mEpoch[0m: 5


100%|██████████| 40/40 [00:07<00:00,  5.42it/s]


[1mTrain Loss[0m: 0.602

[1mEpoch[0m: 6


100%|██████████| 40/40 [00:06<00:00,  6.17it/s]


[1mTrain Loss[0m: 0.581

[1mEpoch[0m: 7


100%|██████████| 40/40 [00:07<00:00,  5.47it/s]


[1mTrain Loss[0m: 0.566

[1mEpoch[0m: 8


100%|██████████| 40/40 [00:06<00:00,  6.15it/s]


[1mTrain Loss[0m: 0.539

[1mEpoch[0m: 9


100%|██████████| 40/40 [00:07<00:00,  5.27it/s]


[1mTrain Loss[0m: 0.525

[1mEpoch[0m: 10


100%|██████████| 40/40 [00:07<00:00,  5.49it/s]


[1mTrain Loss[0m: 0.513

[1mEpoch[0m: 11


100%|██████████| 40/40 [00:06<00:00,  6.09it/s]


[1mTrain Loss[0m: 0.497

[1mEpoch[0m: 12


100%|██████████| 40/40 [00:07<00:00,  5.38it/s]


[1mTrain Loss[0m: 0.484

[1mEpoch[0m: 13


100%|██████████| 40/40 [00:06<00:00,  6.05it/s]


[1mTrain Loss[0m: 0.473

[1mEpoch[0m: 14


100%|██████████| 40/40 [00:07<00:00,  5.38it/s]


[1mTrain Loss[0m: 0.474

[1mEpoch[0m: 15


100%|██████████| 40/40 [00:06<00:00,  6.00it/s]


[1mTrain Loss[0m: 0.462

[1mTest Accuracy[0m: 0.828

[1mClassification Report:[0m
               precision    recall  f1-score   support

    Positive       0.83      0.83      0.83      4954
    Negative       0.83      0.83      0.83      5046

    accuracy                           0.83     10000
   macro avg       0.83      0.83      0.83     10000
weighted avg       0.83      0.83      0.83     10000

[1mConfusion Matrix:[0m
 [[4089  865]
 [ 853 4193]]


### **15.4. Unidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a unidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [48]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, False, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1UniLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 40/40 [00:06<00:00,  6.08it/s]


[1mTrain Loss[0m: 0.692

[1mEpoch[0m: 2


100%|██████████| 40/40 [00:07<00:00,  5.41it/s]


[1mTrain Loss[0m: 0.675

[1mEpoch[0m: 3


100%|██████████| 40/40 [00:06<00:00,  6.08it/s]


[1mTrain Loss[0m: 0.626

[1mEpoch[0m: 4


100%|██████████| 40/40 [00:07<00:00,  5.41it/s]


[1mTrain Loss[0m: 0.588

[1mEpoch[0m: 5


100%|██████████| 40/40 [00:06<00:00,  6.19it/s]


[1mTrain Loss[0m: 0.559

[1mEpoch[0m: 6


100%|██████████| 40/40 [00:07<00:00,  5.35it/s]


[1mTrain Loss[0m: 0.539

[1mEpoch[0m: 7


100%|██████████| 40/40 [00:07<00:00,  5.45it/s]


[1mTrain Loss[0m: 0.519

[1mEpoch[0m: 8


100%|██████████| 40/40 [00:06<00:00,  6.10it/s]


[1mTrain Loss[0m: 0.498

[1mEpoch[0m: 9


100%|██████████| 40/40 [00:07<00:00,  5.38it/s]


[1mTrain Loss[0m: 0.486

[1mEpoch[0m: 10


100%|██████████| 40/40 [00:06<00:00,  6.15it/s]


[1mTrain Loss[0m: 0.473

[1mEpoch[0m: 11


100%|██████████| 40/40 [00:07<00:00,  5.37it/s]


[1mTrain Loss[0m: 0.459

[1mEpoch[0m: 12


100%|██████████| 40/40 [00:06<00:00,  6.19it/s]


[1mTrain Loss[0m: 0.450

[1mEpoch[0m: 13


100%|██████████| 40/40 [00:07<00:00,  5.41it/s]


[1mTrain Loss[0m: 0.448

[1mEpoch[0m: 14


100%|██████████| 40/40 [00:07<00:00,  5.41it/s]


[1mTrain Loss[0m: 0.431

[1mEpoch[0m: 15


100%|██████████| 40/40 [00:06<00:00,  6.11it/s]


[1mTrain Loss[0m: 0.426

[1mTest Accuracy[0m: 0.862

[1mClassification Report:[0m
               precision    recall  f1-score   support

    Positive       0.86      0.87      0.86      4954
    Negative       0.87      0.86      0.86      5046

    accuracy                           0.86     10000
   macro avg       0.86      0.86      0.86     10000
weighted avg       0.86      0.86      0.86     10000

[1mConfusion Matrix:[0m
 [[4288  666]
 [ 715 4331]]


### **15.5. Bidirectional LSTM model with 1 layer**

We now apply the same procedure to another model, specifically a bidirectional LSTM with one layer. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [49]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 1, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_1BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 40/40 [00:07<00:00,  5.29it/s]


[1mTrain Loss[0m: 0.693

[1mEpoch[0m: 2


100%|██████████| 40/40 [00:06<00:00,  5.99it/s]


[1mTrain Loss[0m: 0.681

[1mEpoch[0m: 3


100%|██████████| 40/40 [00:07<00:00,  5.31it/s]


[1mTrain Loss[0m: 0.639

[1mEpoch[0m: 4


100%|██████████| 40/40 [00:07<00:00,  5.21it/s]


[1mTrain Loss[0m: 0.598

[1mEpoch[0m: 5


100%|██████████| 40/40 [00:06<00:00,  5.88it/s]


[1mTrain Loss[0m: 0.565

[1mEpoch[0m: 6


100%|██████████| 40/40 [00:07<00:00,  5.25it/s]


[1mTrain Loss[0m: 0.539

[1mEpoch[0m: 7


100%|██████████| 40/40 [00:06<00:00,  5.94it/s]


[1mTrain Loss[0m: 0.519

[1mEpoch[0m: 8


100%|██████████| 40/40 [00:07<00:00,  5.22it/s]


[1mTrain Loss[0m: 0.505

[1mEpoch[0m: 9


100%|██████████| 40/40 [00:07<00:00,  5.28it/s]


[1mTrain Loss[0m: 0.486

[1mEpoch[0m: 10


100%|██████████| 40/40 [00:06<00:00,  6.02it/s]


[1mTrain Loss[0m: 0.472

[1mEpoch[0m: 11


100%|██████████| 40/40 [00:07<00:00,  5.26it/s]


[1mTrain Loss[0m: 0.462

[1mEpoch[0m: 12


100%|██████████| 40/40 [00:06<00:00,  5.92it/s]


[1mTrain Loss[0m: 0.451

[1mEpoch[0m: 13


100%|██████████| 40/40 [00:07<00:00,  5.15it/s]


[1mTrain Loss[0m: 0.443

[1mEpoch[0m: 14


100%|██████████| 40/40 [00:07<00:00,  5.26it/s]


[1mTrain Loss[0m: 0.436

[1mEpoch[0m: 15


100%|██████████| 40/40 [00:06<00:00,  5.98it/s]


[1mTrain Loss[0m: 0.431

[1mTest Accuracy[0m: 0.839

[1mClassification Report:[0m
               precision    recall  f1-score   support

    Positive       0.88      0.79      0.83      4954
    Negative       0.81      0.89      0.85      5046

    accuracy                           0.84     10000
   macro avg       0.84      0.84      0.84     10000
weighted avg       0.84      0.84      0.84     10000

[1mConfusion Matrix:[0m
 [[3899 1055]
 [ 553 4493]]


### **15.6. Bidirectional LSTM model with 2 layers**

We now apply the same procedure to another model, specifically a bidirectional LSTM with two layers. We set up the model using the setup_model() function, train it using the train_model() function, evaluate it using the evaluate_model() function, and then store the accuracy, number of parameters, and time cost for the trained model in the corresponding lists.

In [50]:
classifier, loss_fn, optimizer = setup_model(device, LSTM_model, classes, vocab, EMBEDDING_DIM, HIDDEN_DIM, 2, True, LEARNING_RATE, None, None)
time_cost = train_model(classifier, loss_fn, optimizer, train_loader, EPOCHS)
_, Y_actual, Y_preds, misclass_data_2BiLSTM = evaluate_model(classes, classifier, loss_fn, test_loader, to_dict(test_dataset))
accuracies.append(accuracy_score(Y_actual, Y_preds))
parameters.append(count_parameters(classifier))
time_costs.append(time_cost)

[1mEpoch[0m: 1


100%|██████████| 40/40 [00:08<00:00,  4.85it/s]


[1mTrain Loss[0m: 0.688

[1mEpoch[0m: 2


100%|██████████| 40/40 [00:08<00:00,  4.98it/s]


[1mTrain Loss[0m: 0.651

[1mEpoch[0m: 3


100%|██████████| 40/40 [00:07<00:00,  5.46it/s]


[1mTrain Loss[0m: 0.604

[1mEpoch[0m: 4


100%|██████████| 40/40 [00:08<00:00,  4.94it/s]


[1mTrain Loss[0m: 0.570

[1mEpoch[0m: 5


100%|██████████| 40/40 [00:07<00:00,  5.48it/s]


[1mTrain Loss[0m: 0.545

[1mEpoch[0m: 6


100%|██████████| 40/40 [00:07<00:00,  5.12it/s]


[1mTrain Loss[0m: 0.523

[1mEpoch[0m: 7


100%|██████████| 40/40 [00:08<00:00,  5.00it/s]


[1mTrain Loss[0m: 0.506

[1mEpoch[0m: 8


100%|██████████| 40/40 [00:07<00:00,  5.64it/s]


[1mTrain Loss[0m: 0.489

[1mEpoch[0m: 9


100%|██████████| 40/40 [00:08<00:00,  4.97it/s]


[1mTrain Loss[0m: 0.475

[1mEpoch[0m: 10


100%|██████████| 40/40 [00:08<00:00,  4.99it/s]


[1mTrain Loss[0m: 0.470

[1mEpoch[0m: 11


100%|██████████| 40/40 [00:07<00:00,  5.63it/s]


[1mTrain Loss[0m: 0.457

[1mEpoch[0m: 12


100%|██████████| 40/40 [00:08<00:00,  4.98it/s]


[1mTrain Loss[0m: 0.445

[1mEpoch[0m: 13


100%|██████████| 40/40 [00:07<00:00,  5.37it/s]


[1mTrain Loss[0m: 0.437

[1mEpoch[0m: 14


100%|██████████| 40/40 [00:07<00:00,  5.04it/s]


[1mTrain Loss[0m: 0.430

[1mEpoch[0m: 15


100%|██████████| 40/40 [00:07<00:00,  5.03it/s]


[1mTrain Loss[0m: 0.427

[1mTest Accuracy[0m: 0.861

[1mClassification Report:[0m
               precision    recall  f1-score   support

    Positive       0.85      0.87      0.86      4954
    Negative       0.87      0.85      0.86      5046

    accuracy                           0.86     10000
   macro avg       0.86      0.86      0.86     10000
weighted avg       0.86      0.86      0.86     10000

[1mConfusion Matrix:[0m
 [[4300  654]
 [ 737 4309]]


## **16. Visualize the performance metrics of the models**

The visualize function is called to generate and display tables that compare the performance of all the models trained above.

In [51]:
visualize(models, accuracies, parameters, time_costs)

+-----------+----------+------------+-----------+
|   [1mModel[0m   | [1mAccuracy[0m | [1mParameters[0m | [1mTime Cost[0m |
+-----------+----------+------------+-----------+
|  1Uni-RNN |  0.8246  |  2929754   |   6.7313  |
|  1Bi-RNN  |  0.8275  |  2940506   |   6.9442  |
|  2Bi-RNN  |  0.8282  |  2965338   |   7.033   |
| 1Uni-LSTM |  0.8619  |  2961626   |   7.0056  |
|  1Bi-LSTM |  0.8392  |  3004250   |   7.2695  |
|  2Bi-LSTM |  0.8609  |  3103578   |   7.7724  |
+-----------+----------+------------+-----------+


`############################`