# Sentiment Analysis With LLMs
### Pre-trained Model vs Fine-Tuned

This notebook was specifically constructed to work with MacOS MPS as a GPU. It has code that allows it to run on CUDA or CPU, but I have not validated if it works or not.

This implementation uses custom native PyTorch to contrast the methods. For a more streamlined way to do this with built in methods, see this documentation on Fine-Tuning with Hugging Face: [https://huggingface.co/docs/transformers/training](https://huggingface.co/docs/transformers/training) 

## Setup

In [1]:
# !pip install torch
# !pip install transformers
# !pip install datasets


In [2]:
# Imports
import torch
import torch.nn.functional as F 
from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification, AdamW
from datasets import load_dataset, load_metric
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report


In [3]:
# set up CUDA/MPS 
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
print(device)

mps


In [4]:
# dataset containing text descriptions of movies, and a sentiment label
dataset = load_dataset("imdb")

In [5]:
# train test split
train_texts, test_texts, train_labels, test_labels = train_test_split(
    dataset["train"]["text"], dataset["train"]["label"], test_size=0.2, random_state=42
)

In [6]:
# load Tokenizer and BERT for Classification 
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

In [7]:
# Tokenize and preprocess the data for both pre-training and fine-tuning
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=256)
test_encodings = tokenizer(test_texts, truncation=True, padding=True, max_length=256)

In [8]:
# Prep Dataset Object
class IMDbDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

In [9]:
# Encoded train and test dataset
train_dataset = IMDbDataset(train_encodings, train_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)

# Pre-Trained 

In [10]:
# Load a pretrained model
pretrained_model = AutoModel.from_pretrained(model_name)

# Freeze it (no fine tuning)
for param in pretrained_model.parameters():
    param.requires_grad = False

# Add a classification head
classification_head = torch.nn.Linear(pretrained_model.config.hidden_size, 2)

# Create a custom model class
class CustomModel(torch.nn.Module):
    def __init__(self, base_model, classification_head):
        super(CustomModel, self).__init__()
        self.base_model = base_model
        self.classification_head = classification_head

    def forward(self, input_ids, attention_mask=None, token_type_ids=None):  # Add token_type_ids
        outputs = self.base_model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)  # Pass token_type_ids
        logits = self.classification_head(outputs.pooler_output)  # Use pooler_output for BERT-based models
        return logits

# Combine the base model and classification head
model_frozen_weights = CustomModel(pretrained_model, classification_head)

# Train the pre-trained model
optimizer_pretrain = AdamW(model_frozen_weights.parameters(), lr=5e-5, no_deprecation_warning=True)
train_loader_pretrain = torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True)

# Move the optimizer state to GPU
optimizer_pretrain.state = {key: value.to(device) for key, value in optimizer_pretrain.state.items()}

# Training loop for pre-training with frozen weights
# This loop redefines the optimizer in each loop because I was running into a type error caused by weird mps handling. 
# Its a bit verbose as a result.
num_epochs_pretrain = 1
count_epoch = 1
for epoch in range(num_epochs_pretrain):
    model_frozen_weights.train()
    for batch in train_loader_pretrain:
        # Move the model to the selected device
        model_frozen_weights.to(device)

        # Create a new optimizer for each batch
        optimizer_pretrain = AdamW(model_frozen_weights.parameters(), lr=5e-5, no_deprecation_warning=True)

        optimizer_pretrain.zero_grad()

        # Move input tensors to the selected device
        inputs = {key: value.to(device) for key, value in batch.items()}

        # Pass token_type_ids to the model if available
        if "token_type_ids" in inputs:
            outputs = model_frozen_weights(inputs["input_ids"], attention_mask=inputs["attention_mask"], token_type_ids=inputs["token_type_ids"])
        else:
            outputs = model_frozen_weights(inputs["input_ids"], attention_mask=inputs["attention_mask"])

        logits = outputs
        loss = F.cross_entropy(logits, inputs["labels"])
        loss.backward()
        optimizer_pretrain.step()
    print(f"Training Epoch {count_epoch} Complete")
    count_epoch += 1

Training Epoch 1 Complete


In [14]:
# Move the model to MPS
model_frozen_weights = model_frozen_weights.to(device)
model_frozen_weights.eval()
predictions_pretrain = []

with torch.no_grad():
    for batch in torch.utils.data.DataLoader(test_dataset, batch_size=8):
        # Move the model to the selected device
        model_frozen_weights.to(device)

        inputs = batch["input_ids"]
        inputs = inputs.to(device)  # Move inputs to MPS device

        outputs = model_frozen_weights(input_ids=inputs)
        predictions_pretrain.extend(torch.argmax(outputs, dim=1).tolist())

# Calculate and print performance metrics for the pre-trained model 
accuracy_pretrain = accuracy_score(test_labels, predictions_pretrain)
report_pretrain = classification_report(test_labels, predictions_pretrain)

print("\nResults for Pre-trained Model with Frozen Weights:")
print(f"Accuracy: {accuracy_pretrain:.2f}")
print("Classification Report:\n", report_pretrain)




Results for Pre-trained Model with Frozen Weights:
Accuracy: 0.55
Classification Report:
               precision    recall  f1-score   support

           0       0.75      0.16      0.27      2515
           1       0.53      0.94      0.68      2485

    accuracy                           0.55      5000
   macro avg       0.64      0.55      0.47      5000
weighted avg       0.64      0.55      0.47      5000



## Fine Tuning

In [15]:
# Fine-tuned Model
finetuned_model = AutoModelForSequenceClassification.from_pretrained(model_name)
optimizer_finetune = AdamW(finetuned_model.parameters(), lr=5e-5, no_deprecation_warning=True)

# Move the model to the selected device
finetuned_model.to(device)

# Move the optimizer state to selected device
optimizer_finetune.state = {key: value.to(device) for key, value in optimizer_finetune.state.items()}

# Fine-tuning loop for the fine-tuned model
num_epochs_finetune = 1
count_epoch = 1

for epoch in range(num_epochs_finetune):
    finetuned_model.train()
    
    # Create a new optimizer for each epoch
    optimizer_finetune = AdamW(finetuned_model.parameters(), lr=5e-5, no_deprecation_warning=True)

    for batch in torch.utils.data.DataLoader(train_dataset, batch_size=8, shuffle=True):
        optimizer_finetune.zero_grad()

        # Move input tensors to the selected device
        inputs = {key: value.to(device) for key, value in batch.items()}

        outputs = finetuned_model(**inputs)
        loss = outputs.loss
        loss.backward()
        optimizer_finetune.step()
    print(f"Training Epoch {count_epoch} Complete")
    count_epoch += 1


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Training Epoch 1 Complete


In [16]:
# Move the model to MPS
finetuned_model = finetuned_model.to(device)
finetuned_model.eval()
predictions_finetune = []

with torch.no_grad():
    for batch in torch.utils.data.DataLoader(test_dataset, batch_size=8):
        # Move the model to the selected device
        finetuned_model.to(device)

        inputs = {key: value.to(device) for key, value in batch.items()}

        outputs = finetuned_model(**inputs)
        logits = outputs.logits
        predictions_finetune.extend(torch.argmax(logits, dim=1).tolist())

# Calculate and print performance metrics for fine-tuned model
accuracy_finetune = accuracy_score(test_labels, predictions_finetune)
report_finetune = classification_report(test_labels, predictions_finetune)

print("\nResults for Fine-tuned Model:")
print(f"Accuracy: {accuracy_finetune:.2f}")
print("Classification Report:\n", report_finetune)



Results for Fine-tuned Model:
Accuracy: 0.90
Classification Report:
               precision    recall  f1-score   support

           0       0.91      0.88      0.90      2515
           1       0.88      0.92      0.90      2485

    accuracy                           0.90      5000
   macro avg       0.90      0.90      0.90      5000
weighted avg       0.90      0.90      0.90      5000



## Results Summary

Fine tuning is much more accurate than using a frozen pre-trained model (91% vs 55% in one epoch on a balanced dataset). 

Fine tuning is more computationally and memory intensive. We are training all of the weights in the model, not just the final layers. We are required to load and track gradients for these parameters in memory as well.

My general guidance: fine-tuning is the way to go. If you're coming to BERT and other high level contextual models, you're looking for high accuracy on challenging data. If you want to minimize computation in a classification problem, there are simpler methods than going to contextual embeddings. 

BERT "understands" natural language in general, and fine tuning allows us to make slight focusing adjustments to this understanding to achieve our goals. It's extremely powerful and most modern NLP applications are built on transfer learning in some form (i.e. using BERT or OpenAI's API). No need to start from scratch when we stand on the shoulders of giants with extremely powerful public models available.