# Assignment 2 - Recurrent Neural Networks



## Programming (Full points: 100)

In this assignment, our goal is to use PyTorch to implement Recurrent Neural Networks (RNN) for sentiment analysis task. Sentiment analysis is to classify sentences (input) into certain sentiments (output labels), which includes positive, negative and neutral.

We will use a benckmark dataset, SST, for this assignment.
* we download the SST dataset from torchtext package, and do some preprocessing to build vocabulary and split the dataset into training/validation/test sets. You don't need to modify the code in this step.


In [1]:
import copy
import torch
from torch import nn
from torch import optim
import torchtext
from torchtext import data
from torchtext import datasets

TEXT = data.Field(sequential=True, batch_first=True, lower=True)
LABEL = data.LabelField()

# load data splits
train_data, val_data, test_data = datasets.SST.splits(TEXT, LABEL)

# build dictionary
TEXT.build_vocab(train_data)
LABEL.build_vocab(train_data)

# hyperparameters
vocab_size = len(TEXT.vocab)
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
embedding_dim = 128
hidden_dim = 128

# build iterators
train_iter, val_iter, test_iter = data.BucketIterator.splits(
    (train_data, val_data, test_data), 
    batch_size=32)

* define the training and evaluation function in the cell below.
### (25 points)


In [2]:
# your code here

# Define the training function
def train(model, iterator, optimizer, criterion):
    model.train()
    epoch_loss = 0
    
    for batch in iterator:
        text, labels = batch.text, batch.label
        optimizer.zero_grad()
        
        predictions = model(text)
        
        # Flatten predictions and labels to match shapes for loss calculation
        predictions = predictions.view(-1, predictions.shape[-1])
        labels = labels.view(-1)
        
        loss = criterion(predictions, labels)
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
    
    return epoch_loss / len(iterator)

# Define the evaluation function
def evaluate(model, iterator, criterion):
    model.eval()
    epoch_loss = 0
    
    with torch.no_grad():
        for batch in iterator:
            text, labels = batch.text, batch.label
            predictions = model(text)
            
            # Flatten predictions and labels to match shapes for loss calculation
            predictions = predictions.view(-1, predictions.shape[-1])
            labels = labels.view(-1)
            
            loss = criterion(predictions, labels)
            
            epoch_loss += loss.item()
    
    return epoch_loss / len(iterator)

* build a RNN model for sentiment analysis in the cell below.
We have provided several hyperparameters we needed for building the model, including vocabulary size (vocab_size), the word embedding dimension (embedding_dim), the hidden layer dimension (hidden_dim), the number of layers (num_layers) and the number of sentence labels (label_size). Please fill in the missing codes, and implement a RNN model.
### (40 points)

In [3]:
class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = 1  # Number of RNN layers

        # Embedding layer
        self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)

        # RNN layer
        self.rnn = nn.RNN(
            input_size=self.embedding_dim,
            hidden_size=self.hidden_dim,
            num_layers=self.num_layers,
            batch_first=True
        )

        # Fully connected layer for classification
        self.fc = nn.Linear(self.hidden_dim, self.label_size)

    def zero_state(self, batch_size):
        # Initialize the hidden state with zeros
        return torch.zeros(self.num_layers, batch_size, self.hidden_dim)

    def forward(self, text):
        # text: [batch_size, sequence_length]

        # Embedding layer
        embedded = self.embedding(text)  # embedded: [batch_size, sequence_length, embedding_dim]

        # Initialize the hidden state
        hidden = self.zero_state(text.size(0)).to(text.device)

        # RNN layer
        rnn_output, _ = self.rnn(embedded, hidden)
        # rnn_output: [batch_size, sequence_length, hidden_dim]

        # Get the final output for classification (using the last time step)
        final_output = rnn_output[:, -1, :]  # final_output: [batch_size, hidden_dim]

        # Fully connected layer for classification
        predictions = self.fc(final_output)  # predictions: [batch_size, label_size]

        return predictions

* train the model and compute the accuracy in the cell below.
### (20 points)

In [4]:
# Accuracy calculation function
def calculate_accuracy(predictions, labels):
    # Convert predictions to class labels (argmax)
    predicted_labels = torch.argmax(predictions, dim=1)
    # Compare with ground truth labels
    correct = (predicted_labels == labels).float()
    # Calculate accuracy
    accuracy = correct.sum() / len(correct)
    return accuracy

In [5]:
# Initialize the model
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Initialize lists to store predictions and ground truth labels
all_predictions = []
all_labels = []

# Training loop
num_epochs = 10
best_valid_loss = float('inf')

for epoch in range(num_epochs):
    train_loss = train(model, train_iter, optimizer, criterion)
    valid_loss = evaluate(model, val_iter, criterion)

    # Initialize variables to store accuracy
    train_correct = 0
    train_total = 0

    # Calculate accuracy on the training set
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        for batch in train_iter:
            text, labels = batch.text, batch.label
            predictions = model(text)
            all_predictions.append(predictions)
            all_labels.append(labels)

            # Calculate accuracy for this batch
            train_correct += torch.sum(torch.argmax(predictions, dim=1) == labels).item()
            train_total += labels.size(0)

    train_accuracy = train_correct / train_total
    model.train()  # Set the model back to training mode

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {train_loss:.4f} - Accuracy: {train_accuracy:.4f}')

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_model = copy.deepcopy(model.state_dict())

# Load the best model weights
model.load_state_dict(best_model)

# Concatenate the predictions and labels into single tensors
all_predictions = torch.cat(all_predictions)
all_labels = torch.cat(all_labels)

# Calculate the test accuracy using the calculate_accuracy function
test_accuracy = calculate_accuracy(all_predictions, all_labels)

print(f'Test Accuracy: {test_accuracy * 100:.2f}%')

Epoch [1/10] - Loss: 1.0540 - Accuracy: 0.4182
Epoch [2/10] - Loss: 1.0495 - Accuracy: 0.4264
Epoch [3/10] - Loss: 1.0463 - Accuracy: 0.4213
Epoch [4/10] - Loss: 1.0454 - Accuracy: 0.4243
Epoch [5/10] - Loss: 1.0457 - Accuracy: 0.4283
Epoch [6/10] - Loss: 1.0414 - Accuracy: 0.4216
Epoch [7/10] - Loss: 1.0415 - Accuracy: 0.4293
Epoch [8/10] - Loss: 1.0375 - Accuracy: 0.4316
Epoch [9/10] - Loss: 1.0372 - Accuracy: 0.4271
Epoch [10/10] - Loss: 1.0343 - Accuracy: 0.4346
Test Accuracy: 42.63%


* try to train a model with better accuracy in the cell below. For example, you can use different optimizers such as SGD and Adam. You can also compare different hyperparameters and model size.
### (15 points), to obtain FULL point in this problem, the accuracy needs to be higher than 70%

I was not able to get my accuracy higher than 70%. I have included several methods that I tried to get my accuracy above 70%, however none of them were able to succeed.

Trying with 5 Epochs instead of 10

In [6]:
# Initialize the model
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Initialize lists to store predictions and ground truth labels
all_predictions = []
all_labels = []

# Training loop
num_epochs = 5
best_valid_loss = float('inf')

for epoch in range(num_epochs):
    train_loss = train(model, train_iter, optimizer, criterion)
    valid_loss = evaluate(model, val_iter, criterion)

    # Initialize variables to store accuracy
    train_correct = 0
    train_total = 0

    # Calculate accuracy on the training set
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        for batch in train_iter:
            text, labels = batch.text, batch.label
            predictions = model(text)
            all_predictions.append(predictions)
            all_labels.append(labels)

            # Calculate accuracy for this batch
            train_correct += torch.sum(torch.argmax(predictions, dim=1) == labels).item()
            train_total += labels.size(0)

    train_accuracy = train_correct / train_total
    model.train()  # Set the model back to training mode

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {train_loss:.4f} - Accuracy: {train_accuracy:.4f}')

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_model = copy.deepcopy(model.state_dict())

# Load the best model weights
model.load_state_dict(best_model)

# Concatenate the predictions and labels into single tensors
all_predictions = torch.cat(all_predictions)
all_labels = torch.cat(all_labels)

# Calculate the test accuracy using the calculate_accuracy function
test_accuracy = calculate_accuracy(all_predictions, all_labels)

print(f'Test Accuracy: {test_accuracy * 100:.2f}%')

Epoch [1/5] - Loss: 1.0527 - Accuracy: 0.4196
Epoch [2/5] - Loss: 1.0486 - Accuracy: 0.4217
Epoch [3/5] - Loss: 1.0480 - Accuracy: 0.4225
Epoch [4/5] - Loss: 1.0458 - Accuracy: 0.4266
Epoch [5/5] - Loss: 1.0424 - Accuracy: 0.4273
Test Accuracy: 42.35%


Trying with 3 Epochs instead of 10

In [7]:
# Initialize the model
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Initialize lists to store predictions and ground truth labels
all_predictions = []
all_labels = []

# Training loop
num_epochs = 3
best_valid_loss = float('inf')

for epoch in range(num_epochs):
    train_loss = train(model, train_iter, optimizer, criterion)
    valid_loss = evaluate(model, val_iter, criterion)

    # Initialize variables to store accuracy
    train_correct = 0
    train_total = 0

    # Calculate accuracy on the training set
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        for batch in train_iter:
            text, labels = batch.text, batch.label
            predictions = model(text)
            all_predictions.append(predictions)
            all_labels.append(labels)

            # Calculate accuracy for this batch
            train_correct += torch.sum(torch.argmax(predictions, dim=1) == labels).item()
            train_total += labels.size(0)

    train_accuracy = train_correct / train_total
    model.train()  # Set the model back to training mode

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {train_loss:.4f} - Accuracy: {train_accuracy:.4f}')

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_model = copy.deepcopy(model.state_dict())

# Load the best model weights
model.load_state_dict(best_model)

# Concatenate the predictions and labels into single tensors
all_predictions = torch.cat(all_predictions)
all_labels = torch.cat(all_labels)

# Calculate the test accuracy using the calculate_accuracy function
test_accuracy = calculate_accuracy(all_predictions, all_labels)

print(f'Test Accuracy: {test_accuracy * 100:.2f}%')

Epoch [1/3] - Loss: 1.0524 - Accuracy: 0.3958
Epoch [2/3] - Loss: 1.0490 - Accuracy: 0.4254
Epoch [3/3] - Loss: 1.0478 - Accuracy: 0.4216
Test Accuracy: 41.43%


Trying with 15 Epochs instead of 10

In [8]:
# Initialize the model
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Initialize lists to store predictions and ground truth labels
all_predictions = []
all_labels = []

# Training loop
num_epochs = 15
best_valid_loss = float('inf')

for epoch in range(num_epochs):
    train_loss = train(model, train_iter, optimizer, criterion)
    valid_loss = evaluate(model, val_iter, criterion)

    # Initialize variables to store accuracy
    train_correct = 0
    train_total = 0

    # Calculate accuracy on the training set
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        for batch in train_iter:
            text, labels = batch.text, batch.label
            predictions = model(text)
            all_predictions.append(predictions)
            all_labels.append(labels)

            # Calculate accuracy for this batch
            train_correct += torch.sum(torch.argmax(predictions, dim=1) == labels).item()
            train_total += labels.size(0)

    train_accuracy = train_correct / train_total
    model.train()  # Set the model back to training mode

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {train_loss:.4f} - Accuracy: {train_accuracy:.4f}')

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        best_model = copy.deepcopy(model.state_dict())

# Load the best model weights
model.load_state_dict(best_model)

# Concatenate the predictions and labels into single tensors
all_predictions = torch.cat(all_predictions)
all_labels = torch.cat(all_labels)

# Calculate the test accuracy using the calculate_accuracy function
test_accuracy = calculate_accuracy(all_predictions, all_labels)

print(f'Test Accuracy: {test_accuracy * 100:.2f}%')

Epoch [1/15] - Loss: 1.0523 - Accuracy: 0.4198
Epoch [2/15] - Loss: 1.0482 - Accuracy: 0.4264
Epoch [3/15] - Loss: 1.0460 - Accuracy: 0.4230
Epoch [4/15] - Loss: 1.0660 - Accuracy: 0.4088
Epoch [5/15] - Loss: 1.0549 - Accuracy: 0.4074
Epoch [6/15] - Loss: 1.0564 - Accuracy: 0.4235
Epoch [7/15] - Loss: 1.0543 - Accuracy: 0.4249
Epoch [8/15] - Loss: 1.0503 - Accuracy: 0.4118
Epoch [9/15] - Loss: 1.0492 - Accuracy: 0.4304
Epoch [10/15] - Loss: 1.0471 - Accuracy: 0.4267
Epoch [11/15] - Loss: 1.0465 - Accuracy: 0.4082
Epoch [12/15] - Loss: 1.0545 - Accuracy: 0.4144
Epoch [13/15] - Loss: 1.0548 - Accuracy: 0.4089
Epoch [14/15] - Loss: 1.0572 - Accuracy: 0.4011
Epoch [15/15] - Loss: 1.0564 - Accuracy: 0.4144
Test Accuracy: 41.67%


Trained with 1 LSTM Layer with no dropout and Bidirectionality with 10 Epochs

In [9]:
from torch.optim.lr_scheduler import ReduceLROnPlateau
from torchtext.vocab import GloVe

# Define the RNN model class
class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, num_layers=1, dropout=0.0):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = num_layers
        self.dropout = dropout

        # Embedding layer with pretrained word vectors
        self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)

        # Bidirectional LSTM layer
        self.rnn = nn.LSTM(
            input_size=self.embedding_dim,
            hidden_size=self.hidden_dim,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=self.dropout
        )

        # Fully connected layer for classification
        self.fc = nn.Linear(2 * self.hidden_dim, self.label_size)

    def forward(self, text):
        # text: [batch_size, sequence_length]

        # Embedding layer
        embedded = self.embedding(text)

        # RNN layer
        rnn_output, _ = self.rnn(embedded)

        # Get the final output for classification (using the last time step)
        final_output = rnn_output[:, -1, :]

        # Fully connected layer for classification
        predictions = self.fc(final_output)

        return predictions

In [10]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 10

# Initialize the model with pretrained embeddings and num_layers as 2
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, num_layers=2, dropout=0.2)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/10] - Loss: 1.0504 - Accuracy: 0.4137
Epoch [2/10] - Loss: 1.0464 - Accuracy: 0.4147
Epoch [3/10] - Loss: 1.0408 - Accuracy: 0.4274
Epoch [4/10] - Loss: 1.0052 - Accuracy: 0.5094
Epoch [5/10] - Loss: 0.9103 - Accuracy: 0.6024
Epoch [6/10] - Loss: 0.7797 - Accuracy: 0.6833
Epoch [7/10] - Loss: 0.6513 - Accuracy: 0.7353
Epoch [8/10] - Loss: 0.5259 - Accuracy: 0.7968
Epoch [9/10] - Loss: 0.4393 - Accuracy: 0.8414
Epoch [10/10] - Loss: 0.3553 - Accuracy: 0.8810
Test Accuracy: 55.67%


Trained with 1 LSTM Layer with no dropout and Bidirectionality with 5 Epochs

In [11]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 5

# Initialize the model with pretrained embeddings and num_layers as 2
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, num_layers=2, dropout=0.2)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/5] - Loss: 1.0495 - Accuracy: 0.4170
Epoch [2/5] - Loss: 1.0479 - Accuracy: 0.4192
Epoch [3/5] - Loss: 1.0411 - Accuracy: 0.4315
Epoch [4/5] - Loss: 0.9919 - Accuracy: 0.5253
Epoch [5/5] - Loss: 0.9048 - Accuracy: 0.6025
Test Accuracy: 50.45%


Trained with 1 LSTM Layer with no dropout and Bidirectionality with 3 Epochs

In [12]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 3

# Initialize the model with pretrained embeddings and num_layers as 2
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, num_layers=2, dropout=0.2)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/3] - Loss: 1.0494 - Accuracy: 0.4153
Epoch [2/3] - Loss: 1.0478 - Accuracy: 0.4183
Epoch [3/3] - Loss: 1.0439 - Accuracy: 0.4119
Test Accuracy: 41.21%


Trained with 1 LSTM Layer with no dropout and Bidirectionality with 15 Epochs

In [13]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 15

# Initialize the model with pretrained embeddings and num_layers as 2
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, num_layers=2, dropout=0.2)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/15] - Loss: 1.0503 - Accuracy: 0.4136
Epoch [2/15] - Loss: 1.0477 - Accuracy: 0.4173
Epoch [3/15] - Loss: 1.0439 - Accuracy: 0.4205
Epoch [4/15] - Loss: 1.0337 - Accuracy: 0.4428
Epoch [5/15] - Loss: 0.9759 - Accuracy: 0.5439
Epoch [6/15] - Loss: 0.8430 - Accuracy: 0.6489
Epoch [7/15] - Loss: 0.6912 - Accuracy: 0.7206
Epoch [8/15] - Loss: 0.5656 - Accuracy: 0.7789
Epoch [9/15] - Loss: 0.4588 - Accuracy: 0.8294
Epoch [10/15] - Loss: 0.3705 - Accuracy: 0.8723
Epoch [11/15] - Loss: 0.3002 - Accuracy: 0.9038
Epoch [12/15] - Loss: 0.2473 - Accuracy: 0.9258
Epoch [13/15] - Loss: 0.2077 - Accuracy: 0.9423
Epoch [14/15] - Loss: 0.1831 - Accuracy: 0.9496
Epoch 00014: reducing learning rate of group 0 to 5.0000e-04.
Epoch [15/15] - Loss: 0.1315 - Accuracy: 0.9652
Test Accuracy: 56.52%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Glove Embeddings with 10 Epochs

In [14]:
# Define the RNN model class
class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, pretrained_embeddings=None, num_layers=1, dropout=0.0):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = num_layers
        self.dropout = dropout

        # Embedding layer with pretrained word vectors
        if pretrained_embeddings is not None:
            self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings, padding_idx=padding_idx, freeze=False)
        else:
            self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)

        # LSTM layer with bidirectionality
        self.rnn = nn.LSTM(
            input_size=self.embedding_dim,
            hidden_size=self.hidden_dim,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=self.dropout
        )

        # Dropout layer for regularization
        self.dropout = nn.Dropout(dropout)

        # Fully connected layer for classification
        self.fc = nn.Linear(2 * self.hidden_dim, self.label_size)

    def forward(self, text):
        # text: [batch_size, sequence_length]

        # Embedding layer
        embedded = self.embedding(text)

        # LSTM layer
        rnn_output, _ = self.rnn(embedded)

        # Apply dropout
        rnn_output = self.dropout(rnn_output)

        # Get the final output for classification (using the last time step)
        final_output = rnn_output[:, -1, :]

        # Fully connected layer for classification
        predictions = self.fc(final_output)

        return predictions

In [15]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 10

# Initialize the model with pretrained embeddings, more layers, and dropout
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,
                      dropout=0.5)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)

# Training loop
clip_value = 1.0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Learning rate scheduling based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/10] - Loss: 1.0525 - Accuracy: 0.4150
Epoch [2/10] - Loss: 1.0514 - Accuracy: 0.4081
Epoch [3/10] - Loss: 1.0494 - Accuracy: 0.4105
Epoch [4/10] - Loss: 1.0491 - Accuracy: 0.4139
Epoch 00004: reducing learning rate of group 0 to 5.0000e-04.
Epoch [5/10] - Loss: 1.0474 - Accuracy: 0.4198
Epoch [6/10] - Loss: 1.0459 - Accuracy: 0.4188
Epoch [7/10] - Loss: 1.0358 - Accuracy: 0.4278
Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch [8/10] - Loss: 0.9810 - Accuracy: 0.5176
Epoch [9/10] - Loss: 0.8956 - Accuracy: 0.5897
Epoch [10/10] - Loss: 0.8087 - Accuracy: 0.6428
Test Accuracy: 56.61%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Glove Embeddings with 5 Epochs

In [16]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 5

# Initialize the model with pretrained embeddings, more layers, and dropout
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,
                      dropout=0.5)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)

# Training loop
clip_value = 1.0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Learning rate scheduling based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/5] - Loss: 1.0543 - Accuracy: 0.4108
Epoch [2/5] - Loss: 1.0495 - Accuracy: 0.4153
Epoch [3/5] - Loss: 1.0496 - Accuracy: 0.4160
Epoch [4/5] - Loss: 1.0498 - Accuracy: 0.4148
Epoch [5/5] - Loss: 1.0491 - Accuracy: 0.4171
Test Accuracy: 41.25%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Glove Embeddings with 3 Epochs

In [17]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 3

# Initialize the model with pretrained embeddings, more layers, and dropout
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,
                      dropout=0.5)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)

# Training loop
clip_value = 1.0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Learning rate scheduling based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/3] - Loss: 1.0518 - Accuracy: 0.4187
Epoch [2/3] - Loss: 1.0500 - Accuracy: 0.4121
Epoch [3/3] - Loss: 1.0494 - Accuracy: 0.4106
Test Accuracy: 41.25%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Glove Embeddings with 15 Epochs

In [18]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 15

# Initialize the model with pretrained embeddings, more layers, and dropout
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,
                      dropout=0.5)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)

# Training loop
clip_value = 1.0

for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Learning rate scheduling based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/15] - Loss: 1.0532 - Accuracy: 0.4086
Epoch [2/15] - Loss: 1.0506 - Accuracy: 0.4177
Epoch [3/15] - Loss: 1.0499 - Accuracy: 0.4177
Epoch [4/15] - Loss: 1.0488 - Accuracy: 0.4221
Epoch [5/15] - Loss: 1.0487 - Accuracy: 0.4158
Epoch 00005: reducing learning rate of group 0 to 5.0000e-04.
Epoch [6/15] - Loss: 1.0414 - Accuracy: 0.4283
Epoch [7/15] - Loss: 0.9242 - Accuracy: 0.5837
Epoch [8/15] - Loss: 0.7477 - Accuracy: 0.6821
Epoch 00008: reducing learning rate of group 0 to 2.5000e-04.
Epoch [9/15] - Loss: 0.6179 - Accuracy: 0.7362
Epoch [10/15] - Loss: 0.5606 - Accuracy: 0.7534
Epoch [11/15] - Loss: 0.5096 - Accuracy: 0.7676
Epoch [12/15] - Loss: 0.4652 - Accuracy: 0.7945
Epoch [13/15] - Loss: 0.4188 - Accuracy: 0.8256
Epoch [14/15] - Loss: 0.3721 - Accuracy: 0.8522
Epoch [15/15] - Loss: 0.3337 - Accuracy: 0.8756
Test Accuracy: 49.11%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Embeddings with 10 Epochs

In [19]:
# Define the RNN model class
class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, pretrained_embeddings=None):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = 1

        # Embedding layer with pretrained word vectors
        if pretrained_embeddings is not None:
            self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings, padding_idx=padding_idx)
        else:
            self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)

        # Bidirectional LSTM layer
        self.rnn = nn.LSTM(
            input_size=self.embedding_dim,
            hidden_size=self.hidden_dim,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True
        )

        # Fully connected layer for classification
        self.fc = nn.Linear(2 * self.hidden_dim, self.label_size)

    def forward(self, text):
        # text: [batch_size, sequence_length]

        # Embedding layer
        embedded = self.embedding(text)

        # Bidirectional LSTM layer
        rnn_output, _ = self.rnn(embedded)

        # Get the final output for classification (using the last time step)
        final_output = rnn_output[:, -1, :]

        # Fully connected layer for classification
        predictions = self.fc(final_output)

        return predictions

In [20]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 10

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.5)

# Training loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        _, predicted_labels = torch.max(predictions, 1)
        accuracy = (predicted_labels == labels).float().mean()

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        _, predicted_labels = torch.max(predictions, 1)
        test_accuracy = (predicted_labels == labels).float().mean()
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/10] - Loss: 1.0523 - Accuracy: 0.4143
Epoch [2/10] - Loss: 1.0492 - Accuracy: 0.4127
Epoch [3/10] - Loss: 1.0482 - Accuracy: 0.4235
Epoch [4/10] - Loss: 1.0477 - Accuracy: 0.4198
Epoch [5/10] - Loss: 1.0476 - Accuracy: 0.4202
Epoch [6/10] - Loss: 1.0469 - Accuracy: 0.4144
Epoch [7/10] - Loss: 1.0468 - Accuracy: 0.4224
Epoch [8/10] - Loss: 1.0458 - Accuracy: 0.4222
Epoch [9/10] - Loss: 1.0452 - Accuracy: 0.4243
Epoch [10/10] - Loss: 1.0431 - Accuracy: 0.4387
Test Accuracy: 43.97%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Embeddings with 5 Epochs

In [21]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 5

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.5)

# Training loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        _, predicted_labels = torch.max(predictions, 1)
        accuracy = (predicted_labels == labels).float().mean()

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        _, predicted_labels = torch.max(predictions, 1)
        test_accuracy = (predicted_labels == labels).float().mean()
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/5] - Loss: 1.0509 - Accuracy: 0.4151
Epoch [2/5] - Loss: 1.0486 - Accuracy: 0.4168
Epoch [3/5] - Loss: 1.0478 - Accuracy: 0.4189
Epoch [4/5] - Loss: 1.0487 - Accuracy: 0.4197
Epoch [5/5] - Loss: 1.0482 - Accuracy: 0.4194
Test Accuracy: 41.29%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Embeddings with 3 Epochs

In [22]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 3

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.5)

# Training loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        _, predicted_labels = torch.max(predictions, 1)
        accuracy = (predicted_labels == labels).float().mean()

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        _, predicted_labels = torch.max(predictions, 1)
        test_accuracy = (predicted_labels == labels).float().mean()
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/3] - Loss: 1.0518 - Accuracy: 0.4121
Epoch [2/3] - Loss: 1.0488 - Accuracy: 0.4149
Epoch [3/3] - Loss: 1.0490 - Accuracy: 0.4165
Test Accuracy: 41.25%


Trained with 2 LSTM Layers with dropout, Bidirectionality, and Pretrained Embeddings with 15 Epochs

In [23]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 15

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors)

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate scheduler
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=3, gamma=0.5)

# Training loop
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        _, predicted_labels = torch.max(predictions, 1)
        accuracy = (predicted_labels == labels).float().mean()

        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        _, predicted_labels = torch.max(predictions, 1)
        test_accuracy = (predicted_labels == labels).float().mean()
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy * 100:.2f}%')

Epoch [1/15] - Loss: 1.0512 - Accuracy: 0.4133
Epoch [2/15] - Loss: 1.0489 - Accuracy: 0.4205
Epoch [3/15] - Loss: 1.0481 - Accuracy: 0.4226
Epoch [4/15] - Loss: 1.0478 - Accuracy: 0.4189
Epoch [5/15] - Loss: 1.0478 - Accuracy: 0.4188
Epoch [6/15] - Loss: 1.0474 - Accuracy: 0.4225
Epoch [7/15] - Loss: 1.0461 - Accuracy: 0.4217
Epoch [8/15] - Loss: 1.0471 - Accuracy: 0.4202
Epoch [9/15] - Loss: 1.0458 - Accuracy: 0.4181
Epoch [10/15] - Loss: 1.0458 - Accuracy: 0.4204
Epoch [11/15] - Loss: 1.0446 - Accuracy: 0.4239
Epoch [12/15] - Loss: 1.0427 - Accuracy: 0.4208
Epoch [13/15] - Loss: 1.0420 - Accuracy: 0.4272
Epoch [14/15] - Loss: 1.0406 - Accuracy: 0.4233
Epoch [15/15] - Loss: 1.0389 - Accuracy: 0.4239
Test Accuracy: 34.91%


Trained with 1 LSTM Layers with dropout, Bidirectionality, Pretrained Glove Embeddings, and Learning Rate Scheduler with 10 Epochs

In [24]:
# Define the RNN model class
class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, pretrained_embeddings=None, num_layers=1, dropout=0.0):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = num_layers  # Number of RNN layers
        self.dropout = dropout

        # Embedding layer with pretrained word vectors
        if pretrained_embeddings is not None:
            self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings, padding_idx=padding_idx)
        else:
            self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)

        # Bidirectional LSTM layer
        self.rnn = nn.LSTM(
            input_size=self.embedding_dim,
            hidden_size=self.hidden_dim,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True,  # Bidirectional RNN
            dropout=self.dropout  # Apply dropout to RNN layers
        )

        # Fully connected layer for classification
        self.fc = nn.Linear(2 * self.hidden_dim, self.label_size)  # Multiply hidden_dim by 2 due to bidirectionality

    def forward(self, text):
        # text: [batch_size, sequence_length]

        # Embedding layer
        embedded = self.embedding(text)  # embedded: [batch_size, sequence_length, embedding_dim]

        # Bidirectional LSTM layer
        rnn_output, _ = self.rnn(embedded)
        # rnn_output: [batch_size, sequence_length, 2 * hidden_dim]

        # Get the final output for classification (using the last time step)
        final_output = rnn_output[:, -1, :]  # final_output: [batch_size, 2 * hidden_dim]

        # Fully connected layer for classification
        predictions = self.fc(final_output)  # predictions: [batch_size, label_size]

        return predictions

In [25]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100  # Match the GloVe embedding dimension
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 10

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,  # Experiment with the number of layers
                      dropout=0.2)  # Experiment with different dropout rates

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0  # Experiment with the clipping threshold
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy:.4f}')

Epoch [1/10] - Loss: 1.0507 - Accuracy: 0.4158
Epoch [2/10] - Loss: 1.0491 - Accuracy: 0.4183
Epoch [3/10] - Loss: 1.0478 - Accuracy: 0.4199
Epoch [4/10] - Loss: 1.0486 - Accuracy: 0.4187
Epoch [5/10] - Loss: 1.0478 - Accuracy: 0.4222
Epoch [6/10] - Loss: 1.0473 - Accuracy: 0.4177
Epoch [7/10] - Loss: 1.0470 - Accuracy: 0.4225
Epoch [8/10] - Loss: 1.0478 - Accuracy: 0.4190
Epoch [9/10] - Loss: 1.0474 - Accuracy: 0.4225
Epoch [10/10] - Loss: 1.0465 - Accuracy: 0.4229
Test Accuracy: 0.4125


Trained with 1 LSTM Layers with dropout, Bidirectionality, Pretrained Glove Embeddings, and Learning Rate Scheduler with 5 Epochs

In [26]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100  # Match the GloVe embedding dimension
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 5

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,  # Experiment with the number of layers
                      dropout=0.2)  # Experiment with different dropout rates

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0  # Experiment with the clipping threshold
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy:.4f}')

Epoch [1/5] - Loss: 1.0505 - Accuracy: 0.4189
Epoch [2/5] - Loss: 1.0495 - Accuracy: 0.4157
Epoch [3/5] - Loss: 1.0482 - Accuracy: 0.4163
Epoch [4/5] - Loss: 1.0486 - Accuracy: 0.4192
Epoch 00004: reducing learning rate of group 0 to 5.0000e-04.
Epoch [5/5] - Loss: 1.0476 - Accuracy: 0.4225
Test Accuracy: 0.4125


Trained with 1 LSTM Layers with dropout, Bidirectionality, Pretrained Glove Embeddings, and Learning Rate Scheduler with 3 Epochs

In [27]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100  # Match the GloVe embedding dimension
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 3

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,  # Experiment with the number of layers
                      dropout=0.2)  # Experiment with different dropout rates

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0  # Experiment with the clipping threshold
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy:.4f}')

Epoch [1/3] - Loss: 1.0522 - Accuracy: 0.4142
Epoch [2/3] - Loss: 1.0481 - Accuracy: 0.4149
Epoch [3/3] - Loss: 1.0485 - Accuracy: 0.4134
Test Accuracy: 0.4125


Trained with 1 LSTM Layers with dropout, Bidirectionality, Pretrained Glove Embeddings, and Learning Rate Scheduler with 15 Epochs

In [28]:
# Load pretrained word vectors (GloVe)
glove_vectors = GloVe(name='6B', dim=100)

# Define hyperparameters
vocab_size = len(TEXT.vocab)
embedding_dim = 100  # Match the GloVe embedding dimension
hidden_dim = 128
label_size = len(LABEL.vocab)
padding_idx = TEXT.vocab.stoi['<pad>']
num_epochs = 15

# Initialize the model with pretrained embeddings
model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                      num_layers=2,  # Experiment with the number of layers
                      dropout=0.2)  # Experiment with different dropout rates

# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)  # Learning rate scheduler

# Training loop with gradient clipping
clip_value = 1.0  # Experiment with the clipping threshold
for epoch in range(num_epochs):
    model.train()
    total_loss = 0.0
    total_accuracy = 0.0

    for batch in train_iter:
        text, labels = batch.text, batch.label

        optimizer.zero_grad()

        predictions = model(text)
        loss = criterion(predictions, labels)
        accuracy = calculate_accuracy(predictions, labels)

        loss.backward()

        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip_value)

        optimizer.step()

        total_loss += loss.item()
        total_accuracy += accuracy.item()

    average_loss = total_loss / len(train_iter)
    average_accuracy = total_accuracy / len(train_iter)

    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

    # Adjust learning rate using the scheduler based on validation performance
    model.eval()
    val_loss = evaluate(model, val_iter, criterion)
    scheduler.step(val_loss)  # Adjust learning rate

# After training, evaluate on the test set
model.eval()
total_test_accuracy = 0.0

with torch.no_grad():
    for batch in test_iter:
        text, labels = batch.text, batch.label
        predictions = model(text)
        test_accuracy = calculate_accuracy(predictions, labels)
        total_test_accuracy += test_accuracy.item()

average_test_accuracy = total_test_accuracy / len(test_iter)
print(f'Test Accuracy: {average_test_accuracy:.4f}')

Epoch [1/15] - Loss: 1.0502 - Accuracy: 0.4147
Epoch [2/15] - Loss: 1.0487 - Accuracy: 0.4182
Epoch [3/15] - Loss: 1.0484 - Accuracy: 0.4174
Epoch [4/15] - Loss: 1.0482 - Accuracy: 0.4147
Epoch [5/15] - Loss: 1.0484 - Accuracy: 0.4201
Epoch 00005: reducing learning rate of group 0 to 5.0000e-04.
Epoch [6/15] - Loss: 1.0477 - Accuracy: 0.4225
Epoch [7/15] - Loss: 1.0468 - Accuracy: 0.4225
Epoch [8/15] - Loss: 1.0473 - Accuracy: 0.4225
Epoch 00008: reducing learning rate of group 0 to 2.5000e-04.
Epoch [9/15] - Loss: 1.0463 - Accuracy: 0.4225
Epoch [10/15] - Loss: 1.0459 - Accuracy: 0.4225
Epoch [11/15] - Loss: 1.0458 - Accuracy: 0.4225
Epoch [12/15] - Loss: 1.0456 - Accuracy: 0.4225
Epoch [13/15] - Loss: 1.0449 - Accuracy: 0.4225
Epoch 00013: reducing learning rate of group 0 to 1.2500e-04.
Epoch [14/15] - Loss: 1.0450 - Accuracy: 0.4225
Epoch [15/15] - Loss: 1.0449 - Accuracy: 0.4225
Test Accuracy: 0.4125


Trained with a Grid Search over embedding_dim, hidden_dim, num_layers, and dropout and Learning Rate Scheduler with 10 Epochs 

In [29]:
# Define the RNN model class
class RNNClassifier(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, pretrained_embeddings=None, num_layers=1, dropout=0.0):
        super(RNNClassifier, self).__init__()
        self.vocab_size = vocab_size
        self.embedding_dim = embedding_dim
        self.hidden_dim = hidden_dim
        self.label_size = label_size
        self.num_layers = num_layers
        self.dropout = dropout

        # Embedding layer with pretrained word vectors
        if pretrained_embeddings is not None:
            self.embedding = nn.Embedding.from_pretrained(pretrained_embeddings, padding_idx=padding_idx)
        else:
            self.embedding = nn.Embedding(self.vocab_size, self.embedding_dim, padding_idx=padding_idx)

        # Bidirectional LSTM layer
        self.rnn = nn.LSTM(
            input_size=self.embedding_dim,
            hidden_size=self.hidden_dim,
            num_layers=self.num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=self.dropout
        )

        # Fully connected layer for classification
        self.fc = nn.Linear(2 * self.hidden_dim, self.label_size)

    def forward(self, text):
        # text: [batch_size, sequence_length]

        # Embedding layer
        embedded = self.embedding(text)

        # Bidirectional LSTM layer
        rnn_output, _ = self.rnn(embedded)

        # Get the final output for classification (using the last time step)
        final_output = rnn_output[:, -1, :]

        # Fully connected layer for classification
        predictions = self.fc(final_output)

        return predictions

# Load pretrained word vectors (GloVe)
glove_vectors = torchtext.vocab.GloVe(name='6B', dim=100)

# Define hyperparameter search grid
hyperparameters = {
    'embedding_dim': [100],
    'hidden_dim': [128, 256],
    'num_layers': [1, 2],
    'dropout': [0.2, 0.5],
}

best_accuracy = 0.0
best_model = None

for embedding_dim in hyperparameters['embedding_dim']:
    for hidden_dim in hyperparameters['hidden_dim']:
        for num_layers in hyperparameters['num_layers']:
            for dropout in hyperparameters['dropout']:
                print(f"Training with: embedding_dim={embedding_dim}, hidden_dim={hidden_dim}, num_layers={num_layers}, dropout={dropout}")
                
                # Initialize the model with pretrained embeddings
                model = RNNClassifier(vocab_size, embedding_dim, hidden_dim, label_size, padding_idx, glove_vectors.vectors,
                                      num_layers=num_layers,
                                      dropout=dropout)
                
                # Loss function and optimizer
                criterion = nn.CrossEntropyLoss()
                optimizer = optim.Adam(model.parameters(), lr=0.001)
                scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=2, verbose=True)
                
                # Training loop with gradient clipping
                num_epochs = 10
                best_valid_loss = float('inf')

                for epoch in range(num_epochs):
                    model.train()
                    total_loss = 0.0
                    total_accuracy = 0.0

                    for batch in train_iter:
                        text, labels = batch.text, batch.label

                        optimizer.zero_grad()

                        predictions = model(text)
                        loss = criterion(predictions, labels)
                        accuracy = calculate_accuracy(predictions, labels)

                        loss.backward()
                        optimizer.step()

                        total_loss += loss.item()
                        total_accuracy += accuracy.item()

                    average_loss = total_loss / len(train_iter)
                    average_accuracy = total_accuracy / len(train_iter)

                    print(f'Epoch [{epoch + 1}/{num_epochs}] - Loss: {average_loss:.4f} - Accuracy: {average_accuracy:.4f}')

                    scheduler.step(average_accuracy)  # Adjust learning rate based on training performance

                    # Check if the current model performed better than the previous best model
                    if average_accuracy > best_accuracy:
                        best_accuracy = average_accuracy
                        best_model = copy.deepcopy(model.state_dict())

                # Evaluate the current model on the test set
                model.eval()
                total_test_accuracy = 0.0

                with torch.no_grad():
                    for batch in test_iter:
                        text, labels = batch.text, batch.label
                        predictions = model(text)
                        test_accuracy = calculate_accuracy(predictions, labels)
                        total_test_accuracy += test_accuracy.item()

                average_test_accuracy = total_test_accuracy / len(test_iter)
                print(f'Test Accuracy: {average_test_accuracy:.4f}')

Training with: embedding_dim=100, hidden_dim=128, num_layers=1, dropout=0.2




Epoch [1/10] - Loss: 1.0504 - Accuracy: 0.4058
Epoch [2/10] - Loss: 1.0484 - Accuracy: 0.4158
Epoch [3/10] - Loss: 1.0488 - Accuracy: 0.4177
Epoch [4/10] - Loss: 1.0483 - Accuracy: 0.4189
Epoch [5/10] - Loss: 1.0477 - Accuracy: 0.4194
Epoch [6/10] - Loss: 1.0475 - Accuracy: 0.4183
Epoch [7/10] - Loss: 1.0472 - Accuracy: 0.4191
Epoch [8/10] - Loss: 1.0468 - Accuracy: 0.4189
Epoch 00008: reducing learning rate of group 0 to 5.0000e-04.
Epoch [9/10] - Loss: 1.0449 - Accuracy: 0.4232
Epoch [10/10] - Loss: 1.0440 - Accuracy: 0.4218
Test Accuracy: 0.3857
Training with: embedding_dim=100, hidden_dim=128, num_layers=1, dropout=0.5




Epoch [1/10] - Loss: 1.0496 - Accuracy: 0.4247
Epoch [2/10] - Loss: 1.0486 - Accuracy: 0.4163
Epoch [3/10] - Loss: 1.0484 - Accuracy: 0.4178
Epoch [4/10] - Loss: 1.0480 - Accuracy: 0.4136
Epoch 00004: reducing learning rate of group 0 to 5.0000e-04.
Epoch [5/10] - Loss: 1.0474 - Accuracy: 0.4226
Epoch [6/10] - Loss: 1.0471 - Accuracy: 0.4226
Epoch [7/10] - Loss: 1.0467 - Accuracy: 0.4215
Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch [8/10] - Loss: 1.0459 - Accuracy: 0.4197
Epoch [9/10] - Loss: 1.0453 - Accuracy: 0.4256
Epoch [10/10] - Loss: 1.0429 - Accuracy: 0.4354
Test Accuracy: 0.4830
Training with: embedding_dim=100, hidden_dim=128, num_layers=2, dropout=0.2
Epoch [1/10] - Loss: 1.0508 - Accuracy: 0.4146
Epoch [2/10] - Loss: 1.0492 - Accuracy: 0.4181
Epoch [3/10] - Loss: 1.0484 - Accuracy: 0.4226
Epoch [4/10] - Loss: 1.0479 - Accuracy: 0.4190
Epoch [5/10] - Loss: 1.0482 - Accuracy: 0.4212
Epoch [6/10] - Loss: 1.0473 - Accuracy: 0.4183
Epoch 00006: reducing lea