<center><br><font size=6>Final Project</font><br>
<font size=5>Advanced Topics in Deep Learning</font><br>
<b><font size=4>Part B</font></b>
<br><font size=4>Models Compression</font><br><br>
Authors: Ido Rappaport & Eran Tascesme
</font></center>

**Submission Details:**
<font size=2>
<br>Ido Rappaport, ID: 322891623
<br>Eran Tascesme , ID: 205708720 </font>


**Import libraries**

❗Note the versions of the packages, we have included information in requirements.txt❗

In [2]:
# Standard libraries
import os
import re
import string
import random
import warnings
from collections import Counter

# Data handling and visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# NLP libraries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
import nlpaug.augmenter.word as naw
import nlpaug.augmenter.sentence as nas
from gensim import corpora, models
from urllib.parse import urlparse

# Machine learning and deep learning
import torch
import torch.nn.utils.prune as prune
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torch import nn, optim
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    accuracy_score,
    classification_report,
    confusion_matrix,
    ConfusionMatrixDisplay
)

# Hugging Face Transformers
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TrainingArguments,
    Trainer,
    EarlyStoppingCallback,
    set_seed,
    TrainerCallback,
    TrainerState,
    TrainerControl,
    DataCollatorWithPadding,
    RobertaForSequenceClassification,
    MarianMTModel,
    MarianTokenizer
)
from datasets import Dataset, DatasetDict, load_dataset
from transformers.modeling_outputs import SequenceClassifierOutput
from peft import LoraConfig, get_peft_model, PeftModel
import evaluate

# Other libraries
import optuna
import wandb
from tqdm import tqdm

# Filter warnings
warnings.filterwarnings('ignore')

# Download NLTK resources
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')
try:
    nltk.data.find('corpora/stopwords')
except LookupError:
    nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [4]:
from huggingface_hub import login
login()

**Load CSV Files**

Based on the results, the best-performing models were those trained on the clean, truncated dataset after augmentation. Therefore, we will proceed using this dataset and these specific models.

In [5]:
train_data = pd.read_csv("data/train_balanced.csv", encoding="ISO-8859-1")
test_data = pd.read_csv("data/test_clean.csv", encoding="ISO-8859-1")

<h2>First Compression Method: <u><b>Quantization</b></u>.</h2>
In this section, we will perform quantization on the two selected models.

 `QuantizeModel`

This function takes a base model and its weights and applies dynamic quantization to reduce the model's size and potentially speed up inference. It performs the following steps:

1.  Loads the tokenizer and the base model with pre-trained weights.
2.  Applies dynamic quantization, converting specified layers (like `Linear` layers) to a lower precision (e.g., 8-bit integers).

In [42]:
def QuantizeModel(base_weights_path, quantized_model_path, model_name):
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Load base model and weights
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, num_labels=5, ignore_mismatched_sizes=True
    )
    state_dict = torch.load(base_weights_path, map_location="cpu")
    model.load_state_dict(state_dict)
    model.eval()

    # Quantize
    quantized_model = torch.quantization.quantize_dynamic(
        model, {nn.Linear}, dtype=torch.qint8
    )
    quantized_model.eval()

    os.makedirs(quantized_model_path, exist_ok=True)

    q_state_path = os.path.join(quantized_model_path, "model.pt")
    torch.save(quantized_model.state_dict(), q_state_path)

    # Save a tiny meta file so you remember what to reconstruct
    with open(os.path.join(quantized_model_path, "meta.txt"), "w") as f:
        f.write(f"model_name={model_name}\nnum_labels={5}\nquantized_layers=Linear\n")

    print(f"✅ Quantized model saved at {quantized_model_path}")

**First Model**

twitter-roberta-base-sentiment

from excercise 4

In [43]:
base_weights_path = "final_models/roberta_sentiment_exc4_weights.pt"
quantized_model_path = "final_models/roberta_sentiment_exc4_quantized"
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

QuantizeModel(base_weights_path, quantized_model_path, model_name)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ Quantized model saved at /content/drive/MyDrive/Colab Notebooks/final_models/roberta_sentiment_exc4_quantized


from excercise 5

In [44]:
base_weights_path = "final_models/roberta_sentiment_weights.pt"
quantized_model_path = "final_models/roberta_sentiment_quantized"
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

QuantizeModel(base_weights_path, quantized_model_path, model_name)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ Quantized model saved at /content/drive/MyDrive/Colab Notebooks/final_models/roberta_sentiment_quantized


**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

from excercise 4

In [45]:
base_weights_path = "final_models/distilbert_exc4_weights.pt"
quantized_model_path = "final_models/distilbert_exc4_quantized"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

QuantizeModel(base_weights_path, quantized_model_path, model_name)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Quantized model saved at /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_exc4_quantized


from excercise 5

In [46]:
base_weights_path = "final_models/distilbert_weights.pt"
quantized_model_path = "final_models/distilbert_quantized"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

QuantizeModel(base_weights_path, quantized_model_path, model_name)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Quantized model saved at /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_quantized


<h2>Second Compression Method: <u><b>Pruning</b></u>.</h2>
In this section, we will perform 40% pruning on the two selected models.

 `PruneModel`

This function takes a base model and its weights and applies unstructured L1 pruning to reduce the number of parameters. It performs the following steps:

1.  Loads the tokenizer and the base model with pre-trained weights.
2.  Applies L1 unstructured pruning to the `weight` of all `Linear` layers, setting a percentage of the smallest weights to zero.
3.  Removes the pruning reparameterization to make the pruned weights permanent.
4.  Saves the state dictionary of the pruned model.

In [12]:
def PruneModel(base_weights_path, pruned_model_path, model_name, prune_amount=0.4):
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Load base model and weights
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, num_labels=5, ignore_mismatched_sizes=True
    )
    state_dict = torch.load(base_weights_path, map_location="cpu")
    model.load_state_dict(state_dict)

    # Apply pruning
    for name, module in model.named_modules():
        if isinstance(module, nn.Linear):
            prune.l1_unstructured(module, name="weight", amount=prune_amount)

    # Remove pruning reparameterization so pruned weights are stored directly
    for name, module in model.named_modules():
        if isinstance(module, nn.Linear):
            prune.remove(module, "weight")

    # Count parameters (still the same, but many are zero)
    num_params = sum(p.numel() for p in model.parameters())
    num_nonzero = sum((p != 0).sum().item() for p in model.parameters())
    print(f"Total params: {num_params}, Non-zero params: {num_nonzero}")

    # === Save with PyTorch + tokenizer ===
    os.makedirs(pruned_model_path, exist_ok=True)
    torch.save(model, os.path.join(pruned_model_path, "model.pt"))
    tokenizer.save_pretrained(pruned_model_path)

    print(f"✅ Pruned model saved to {pruned_model_path}")

**First Model**

twitter-roberta-base-sentiment

from excercise 4

In [13]:
base_weights_path = "final_models/roberta_sentiment_exc4_weights.pt"
pruned_model_path = "final_models/roberta_exc4_pruned"
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

PruneModel(base_weights_path, pruned_model_path, model_name, prune_amount=0.4)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

Total params: 124649477, Non-zero params: 90437371
✅ Pruned model saved to /content/drive/MyDrive/Colab Notebooks/final_models/roberta_exc4_pruned


from excercise 5

In [14]:
base_weights_path = "final_models/roberta_sentiment_weights.pt"
pruned_model_path = "final_models/roberta_pruned"
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

PruneModel(base_weights_path, pruned_model_path, model_name)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

Total params: 124649477, Non-zero params: 90437371
✅ Pruned model saved to /content/drive/MyDrive/Colab Notebooks/final_models/roberta_pruned


**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

from excercise 4

In [15]:
base_weights_path = "final_models/distilbert_exc4_weights.pt"
pruned_model_path = "final_models/distilbert_exc4_pruned"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

PruneModel(base_weights_path, pruned_model_path, model_name)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Total params: 66957317, Non-zero params: 49732915
✅ Pruned model saved to /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_exc4_pruned


from excercise 5

In [16]:
base_weights_path = "final_models/distilbert_weights.pt"
pruned_model_path = "final_models/distilbert_pruned"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

PruneModel(base_weights_path, pruned_model_path, model_name)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Total params: 66957317, Non-zero params: 49732915
✅ Pruned model saved to /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_pruned


<h2>Third Compression Method: <u><b>Distillation</b></u>.</h2>
In this section, we will perform Distillation with two different models on the two selected models.

`DistillationTrainer`

This custom trainer class extends the standard HuggingFace `Trainer` to perform knowledge distillation. It incorporates a teacher model to guide the training of a student model. Key aspects include:

1.  **Initialization:** Takes a `teacher_model`, temperature for softening logits, and an alpha parameter to balance the hard (cross-entropy) and soft (KL divergence) losses.
2.  **`compute_loss` Method:** Overrides the standard loss computation to include both:
    *   A **hard loss** (cross-entropy) between the student's predictions and the true labels.
    *   A **soft loss** (KL divergence) between the softened logits of the student and the teacher.
    The final loss is a weighted sum of these two losses, controlled by the `alpha` parameter.

In [17]:
class DistillationTrainer(Trainer):
    def __init__(self, teacher_model=None, alpha=0.5, temperature=2.0, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.teacher = teacher_model
        self.teacher.eval()
        self.alpha = alpha
        self.temperature = temperature

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.get("labels")
        outputs_student = model(**inputs)
        student_logits = outputs_student.logits

        with torch.no_grad():
            outputs_teacher = self.teacher(**inputs)
            teacher_logits = outputs_teacher.logits

        # Hard loss
        loss_ce = F.cross_entropy(student_logits, labels)

        # Soft loss
        loss_kl = F.kl_div(
            F.log_softmax(student_logits / self.temperature, dim=-1),
            F.softmax(teacher_logits / self.temperature, dim=-1),
            reduction="batchmean"
        ) * (self.temperature ** 2)

        loss = self.alpha * loss_ce + (1.0 - self.alpha) * loss_kl
        return (loss, outputs_student) if return_outputs else loss

`freeze_student_layers`

This function is a helper used in the distillation process to selectively freeze the layers of the student model during training. It aims to prevent the initial layers of the student from changing significantly and focuses the training on the later layers and the classifier head. Specifically, it:

1.  Sets `requires_grad` to `False` for the parameters of the student's base model (all layers except the classifier).
2.  Explicitly sets `requires_grad` to `True` for the parameters of the **last transformer block** and the **classifier** layer, allowing only these parts of the model to be trained during distillation.

In [18]:
def freeze_student_layers(student):
    for param in student.base_model.parameters():
        param.requires_grad = False

    # last transformer block
    if hasattr(student.base_model, "encoder"):
        last_layer = student.base_model.encoder.layer[-1]
        for param in last_layer.parameters():
            param.requires_grad = True

    # classifier
    for param in student.classifier.parameters():
        param.requires_grad = True

    print("✅ Froze all layers except last block + classifier")
    return student

`distill_student`

This function orchestrates the knowledge distillation process to train a student model using a pre-trained teacher model. Key steps include:

1.  **Data Loading and Preparation:** Loads training and test data and tokenizes it using the student model's tokenizer.
2.  **Model Initialization:** Loads the teacher model (optionally with custom weights) and the student model.
3.  **Layer Freezing:** Freezes most layers of the student model, except for the last transformer block and the classifier, to focus training.
4.  **Trainer Setup:** Initializes a `DistillationTrainer` with the student and teacher models, training arguments, and datasets.
5.  **Training:** Starts the distillation training process.
6.  **Saving:** Saves the distilled student model and its tokenizer.

In [19]:
# ==== General Function ====
def distill_student(base_weights_path, teacher_name, student_name, output_dir, num_labels,
                    train_data, test_data, text_col="text", label_col="label"):
    """
    Perform teacher-student distillation training.

    Args:
        base_weights_path (str or None): Optional .pt checkpoint for teacher (PyTorch state_dict).
        teacher_name (str): HuggingFace model name for teacher.
        student_name (str): HuggingFace model name for student.
        output_dir (str): Path to save distilled student.
        num_labels (int): Number of labels for classification.
        train_data (pd.DataFrame or Dataset): Raw training data with 'text' and 'label'.
        test_data (pd.DataFrame or Dataset): Raw test data with 'text' and 'label'.
        text_col (str): Name of text column.
        label_col (str): Name of label column.
    """

    # Convert pandas → Dataset if needed
    if not isinstance(train_data, Dataset):
        train_data = Dataset.from_pandas(train_data)
    if not isinstance(test_data, Dataset):
        test_data = Dataset.from_pandas(test_data)

    # Load teacher
    teacher = AutoModelForSequenceClassification.from_pretrained(teacher_name, num_labels=num_labels, ignore_mismatched_sizes=True)

    # If custom weights provided, load them into teacher
    if base_weights_path is not None:
        checkpoint = torch.load(base_weights_path, map_location="cpu")
        teacher.load_state_dict(checkpoint, strict=False)
        print(f"✅ Loaded teacher weights from {base_weights_path}")

    # Load student
    student = AutoModelForSequenceClassification.from_pretrained(student_name, num_labels=num_labels, ignore_mismatched_sizes=True)
    student = freeze_student_layers(student)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    teacher.to(device)
    student.to(device)

    # Tokenizer
    tokenizer = AutoTokenizer.from_pretrained(student_name)

    # Tokenization function
    def tokenize_fn(batch):
      return tokenizer(
          [str(x) for x in batch[text_col]],
          truncation=True,
          padding="max_length",
          max_length=128
      )

    tokenized_train = train_data.map(tokenize_fn, batched=True)
    tokenized_test = test_data.map(tokenize_fn, batched=True)

    tokenized_train = tokenized_train.rename_column(label_col, "labels")
    tokenized_test = tokenized_test.rename_column(label_col, "labels")

    tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
    tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "labels"])

    # Training args
    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        save_strategy="epoch",
        logging_dir=f"{output_dir}/logs",
        learning_rate=5e-5,
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
        num_train_epochs=3,
        weight_decay=0.01,
        metric_for_best_model="accuracy",
        load_best_model_at_end=True,
    )

    def compute_metrics(eval_pred):
      logits, labels = eval_pred
      preds = logits.argmax(axis=1)
      acc = accuracy_score(labels, preds)
      f1 = f1_score(labels, preds, average="macro")
      return {"accuracy": acc, "f1": f1}


    # Trainer
    trainer_distill = DistillationTrainer(
        model=student,
        teacher_model=teacher,
        args=training_args,
        train_dataset=tokenized_train,
        eval_dataset=tokenized_test,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics
    )

    # Train
    trainer_distill.train()

    # === Save ===
    os.makedirs(output_dir, exist_ok=True)
    torch.save(student, os.path.join(output_dir, "model.pt"))
    tokenizer.save_pretrained(output_dir)

    print(f"✅ Weights saved to {output_dir}")


**First Model**

twitter-roberta-base-sentiment

student: distilroberta-base


From excercise 4

In [20]:
distill_student(
    base_weights_path="final_models/roberta_sentiment_exc4_weights.pt",
    teacher_name="cardiffnlp/twitter-roberta-base-sentiment-latest",
    student_name="distilroberta-base",
    output_dir="final_models/distilroberta_exc4-base",
    num_labels=5,
    train_data=train_data,
    test_data=test_data
)


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ Loaded teacher weights from /content/drive/MyDrive/Colab Notebooks/final_models/roberta_sentiment_exc4_weights.pt


config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/331M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Froze all layers except last block + classifier


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/3798 [00:00<?, ? examples/s]

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33merantascesme[0m ([33merantascesme-tau[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,1.6099,1.157857,0.512375,0.518285
2,1.2044,1.115284,0.52475,0.530687
3,1.1701,1.080193,0.532122,0.53957


✅ Weights saved to /content/drive/MyDrive/Colab Notebooks/final_models/distilroberta_exc4-base


From excercise 5

In [21]:
student = distill_student(
    base_weights_path="final_models/roberta_sentiment_weights.pt",
    teacher_name="cardiffnlp/twitter-roberta-base-sentiment-latest",
    student_name="distilroberta-base",
    output_dir="final_models/distilroberta-base",
    num_labels=5,
    train_data=train_data,
    test_data=test_data
)


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ Loaded teacher weights from /content/drive/MyDrive/Colab Notebooks/final_models/roberta_sentiment_weights.pt


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at distilroberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Froze all layers except last block + classifier


Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/3798 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,1.6477,1.287802,0.500527,0.503265
2,1.252,1.217458,0.525803,0.532052
3,1.2136,1.188516,0.537915,0.544353


✅ Weights saved to /content/drive/MyDrive/Colab Notebooks/final_models/distilroberta-base


**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

student Harsha901/tinybert-imdb-sentiment-analysis-model

From excercise 4

In [22]:
student2 = distill_student(
    base_weights_path="final_models/distilbert_exc4_weights.pt",
    teacher_name="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    student_name="Harsha901/tinybert-imdb-sentiment-analysis-model",
    output_dir="final_models/tinybert_exc4",
    num_labels=5,
    train_data=train_data,
    test_data=test_data
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Loaded teacher weights from /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_exc4_weights.pt


config.json:   0%|          | 0.00/844 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/57.4M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at Harsha901/tinybert-imdb-sentiment-analysis-model and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 312]) in the checkpoint and torch.Size([5, 312]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Froze all layers except last block + classifier


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/3798 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,2.6587,2.05436,0.323591,0.263523
2,2.2922,1.967033,0.365192,0.32079
3,2.2533,1.931177,0.373354,0.331221


✅ Weights saved to /content/drive/MyDrive/Colab Notebooks/final_models/tinybert_exc4


From excercise 5

In [23]:
student2 = distill_student(
    base_weights_path="final_models/distilbert_weights.pt",
    teacher_name="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    student_name="Harsha901/tinybert-imdb-sentiment-analysis-model",
    output_dir="final_models/tinybert",
    num_labels=5,
    train_data=train_data,
    test_data=test_data
)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Loaded teacher weights from /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_weights.pt


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at Harsha901/tinybert-imdb-sentiment-analysis-model and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 312]) in the checkpoint and torch.Size([5, 312]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Froze all layers except last block + classifier


Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/3798 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss,Accuracy,F1
1,1.7564,1.474477,0.345182,0.306904
2,1.469,1.402876,0.397578,0.373936
3,1.4382,1.373414,0.406793,0.386043


✅ Weights saved to /content/drive/MyDrive/Colab Notebooks/final_models/tinybert


<h2>Forth Compression Method: <u><b>Low-Rank Factorization using SVD</b></u>.</h2>

`compress_model_low_rank`

This function applies low-rank factorization using Singular Value Decomposition (SVD) to compress a transformer model. It focuses on compressing the weight matrices of Linear layers within the model, particularly within the attention and feed-forward components, by approximating them with lower-rank matrices derived from SVD. The compressed model is then saved along with its tokenizer.


In [19]:
def compress_model_low_rank(base_weights_path, save_model_path, model_name, rank=64):
    """
    Apply low-rank SVD compression to transformer model and save it.
    """
    # === Load pretrained model & tokenizer ===
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name,
        num_labels=5,
        ignore_mismatched_sizes=True
    )

    checkpoint = torch.load(base_weights_path, map_location="cpu")
    if "model_state_dict" in checkpoint:
        model.load_state_dict(checkpoint['model_state_dict'], strict=False)
    else:
        model.load_state_dict(checkpoint, strict=False)

    tokenizer = AutoTokenizer.from_pretrained(model_name)
    print("✅ Model loaded successfully")

    # === Helper: Low-rank factorization ===
    def low_rank_approximation_linear(layer: nn.Linear, rank: int):
        W = layer.weight.data  # [out, in]
        U, S, Vt = torch.linalg.svd(W, full_matrices=False)  # better numerics than torch.svd

        U_r = U[:, :rank]      # [out, rank]
        S_r = S[:rank]         # [rank]
        Vt_r = Vt[:rank, :]    # [rank, in]

        # First layer: in_features -> rank
        first = nn.Linear(layer.in_features, rank, bias=False)
        first.weight.data = Vt_r

        # Second layer: rank -> out_features
        second = nn.Linear(rank, layer.out_features, bias=True)
        second.weight.data = (U_r * S_r).T

        if layer.bias is not None:
            second.bias.data = layer.bias.data.clone()

        return nn.Sequential(first, second)

    # === Apply compression ONLY to attention & feed-forward projections ===
    for name, module in model.named_modules():
        if isinstance(module, nn.Linear):
            # Skip classifier head (must remain unchanged)
            if name.startswith("classifier"):
                continue
            # Compress only square projections (768x768, 3072x768 etc.)
            if module.in_features >= rank and module.out_features >= rank:
                new_layer = low_rank_approximation_linear(module, rank)

                parent = model
                for attr in name.split(".")[:-1]:
                    parent = getattr(parent, attr)
                setattr(parent, name.split(".")[-1], new_layer)

    # === Save compressed model (architecture + weights) ===
    os.makedirs(save_model_path, exist_ok=True)
    torch.save(model, os.path.join(save_model_path, "model.pt"))
    tokenizer.save_pretrained(save_model_path)

    print(f"💾 Compressed model saved at: {save_model_path}")


**First Model**

twitter-roberta-base-sentiment


From excercise 4

In [17]:
base_weights_path = "final_models/roberta_sentiment_exc4_weights.pt"
save_model_path = "final_models/roberta_sentiment_exc4_SVD"
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

compress_model_low_rank(base_weights_path, save_model_path, model_name, rank=64)


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ Model loaded successfully
Compressing roberta.encoder.layer.0.attention.self.query (768x768) with rank=64
Compressing roberta.encoder.layer.0.attention.self.key (768x768) with rank=64
Compressing roberta.encoder.layer.0.attention.self.value (768x768) with rank=64
Compressing roberta.encoder.layer.0.attention.output.dense (768x768) with rank=64
Compressing roberta.encoder.layer.0.intermediate.dense (768x3072) with rank=64
Compressing roberta.encoder.layer.0.output.dense (3072x768) with rank=64
Compressing roberta.encoder.layer.1.attention.self.query (768x768) with rank=64
Compressing roberta.encoder.layer.1.attention.self.key (768x768) with rank=64
Compressing roberta.encoder.layer.1.attention.self.value (768x768) with rank=64
Compressing roberta.encoder.layer.1.attention.output.dense (768x768) with rank=64
Compressing roberta.encoder.layer.1.intermediate.dense (768x3072) with rank=64
Compressing roberta.encoder.layer.1.output.dense (3072x768) with rank=64
Compressing roberta.encoder.

From excercise 5

In [50]:
base_weights_path = "final_models/roberta_sentiment_weights.pt"
save_model_path = "final_models/roberta_sentiment_SVD"
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

compress_model_low_rank(base_weights_path, save_model_path, model_name, rank=64)


Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ Model loaded successfully
Compressing roberta.encoder.layer.0.attention.self.query with rank=64
Compressing roberta.encoder.layer.0.attention.self.key with rank=64
Compressing roberta.encoder.layer.0.attention.self.value with rank=64
Compressing roberta.encoder.layer.0.attention.output.dense with rank=64
Compressing roberta.encoder.layer.1.attention.self.query with rank=64
Compressing roberta.encoder.layer.1.attention.self.key with rank=64
Compressing roberta.encoder.layer.1.attention.self.value with rank=64
Compressing roberta.encoder.layer.1.attention.output.dense with rank=64
Compressing roberta.encoder.layer.2.attention.self.query with rank=64
Compressing roberta.encoder.layer.2.attention.self.key with rank=64
Compressing roberta.encoder.layer.2.attention.self.value with rank=64
Compressing roberta.encoder.layer.2.attention.output.dense with rank=64
Compressing roberta.encoder.layer.3.attention.self.query with rank=64
Compressing roberta.encoder.layer.3.attention.self.key with ra

**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

From excercise 4

In [51]:
base_weights_path = "final_models/distilbert_exc4_weights.pt"
save_model_path = "final_models/distilbert_exc4_SVD"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

compress_model_low_rank(base_weights_path, save_model_path, model_name, rank=64)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Model loaded successfully
Compressing distilbert.transformer.layer.0.attention.q_lin with rank=64
Compressing distilbert.transformer.layer.0.attention.k_lin with rank=64
Compressing distilbert.transformer.layer.0.attention.v_lin with rank=64
Compressing distilbert.transformer.layer.0.attention.out_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.q_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.k_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.v_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.out_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.q_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.k_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.v_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.out_lin with rank=64
Compressing distilbert.transformer.layer.3.attention.q_lin with rank=64
Compressing distilbert.transfo

From excercise 5

In [52]:
base_weights_path = "final_models/distilbert_weights.pt"
save_model_path = "final_models/distilbert_SVD"
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

compress_model_low_rank(base_weights_path, save_model_path, model_name, rank=64)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Model loaded successfully
Compressing distilbert.transformer.layer.0.attention.q_lin with rank=64
Compressing distilbert.transformer.layer.0.attention.k_lin with rank=64
Compressing distilbert.transformer.layer.0.attention.v_lin with rank=64
Compressing distilbert.transformer.layer.0.attention.out_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.q_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.k_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.v_lin with rank=64
Compressing distilbert.transformer.layer.1.attention.out_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.q_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.k_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.v_lin with rank=64
Compressing distilbert.transformer.layer.2.attention.out_lin with rank=64
Compressing distilbert.transformer.layer.3.attention.q_lin with rank=64
Compressing distilbert.transfo

<h2>Fifth Compression Method: <u><b>LoRA</b></u>.</h2>

LoRA, or Low-Rank Adaptation, is a technique that freezes a large model's weights and only trains small, new matrices to adapt it for a specific task. We use it to drastically reduce the number of trainable parameters, making fine-tuning much faster and more memory-efficient. This results in a tiny, task-specific model that is quick to train and easy to store.

In [6]:
# === Load CSV files ===
drive_path = "data/"
train_dataset = pd.read_csv(drive_path + "train_balanced.csv", encoding="ISO-8859-1")
eval_dataset = pd.read_csv(drive_path + "val_clean.csv", encoding="ISO-8859-1")

# Convert pandas → HF Dataset
train_dataset = Dataset.from_pandas(train_dataset)
eval_dataset = Dataset.from_pandas(eval_dataset)

`lora_training`

This function sets up and runs the training process for a model with LoRA (Low-Rank Adaptation) adapters. It prepares the datasets, configures the HuggingFace TrainingArguments for training parameters, initializes a Trainer object with the LoRA-enabled model and data, and then starts the training. The function handles tokenization, data formatting, and uses the specified training arguments. After training, it returns the trained LoRA model.

In [7]:
def lora_training(model, tokenizer, train_dataset, eval_dataset, output_dir, num_epochs=3):
    def preprocess(batch):
        texts = [str(x) if x is not None else "" for x in batch["text"]]
        return tokenizer(texts, truncation=True, padding="max_length", max_length=128)

    train_dataset_proc = train_dataset.map(preprocess, batched=True)
    eval_dataset_proc = eval_dataset.map(preprocess, batched=True)

    train_dataset_proc = train_dataset_proc.rename_column("label", "labels")
    eval_dataset_proc = eval_dataset_proc.rename_column("label", "labels")

    train_dataset_proc.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
    eval_dataset_proc.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    training_args = TrainingArguments(
        output_dir=output_dir,
        eval_strategy="epoch",
        save_strategy="epoch",
        learning_rate=5e-4,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=num_epochs,
        weight_decay=0.01,
        logging_dir=output_dir + "/logs",
        logging_steps=50,
        save_total_limit=1,
        load_best_model_at_end=True,
        push_to_hub=False,
        report_to="none"
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset_proc,
        eval_dataset=eval_dataset_proc,
        tokenizer=tokenizer
    )

    trainer.train()
    return model

`LoraModel`

This function orchestrates the process of applying and training LoRA adapters on a base model, and then merging the trained adapters back into the base model.

In [19]:
def LoraModel(model_name, base_weights_path, save_lora_path, model_type="roberta", num_labels=5):
    """
    Apply LoRA adapters to a pretrained model (Roberta / DistilBERT).

    Args:
        model_name (str): HuggingFace model name
        base_weights_path (str): Path to fine-tuned base model weights
        save_lora_path (str): Directory to save the trained LoRA model
        model_type (str): "roberta" or "distilbert"
        num_labels (int): Number of classification labels
    """

    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Load base model with correct number of labels
    base_model = AutoModelForSequenceClassification.from_pretrained(
        model_name, num_labels=num_labels, ignore_mismatched_sizes=True
    )

    # Load fine-tuned weights
    state_dict = torch.load(base_weights_path, map_location="cpu")
    base_model.load_state_dict(state_dict, strict=False)

    # Choose target_modules depending on model type
    if model_type.lower() == "roberta":
        target_modules = ["query", "value"]
    elif model_type.lower() == "distilbert":
        target_modules = ["q_lin", "v_lin"]
    else:
        raise ValueError(f"Unknown model_type '{model_type}'. Use 'roberta' or 'distilbert'.")

    # LoRA config
    lora_config = LoraConfig(
        r=8,
        lora_alpha=32,
        target_modules=target_modules,
        lora_dropout=0.1,
        bias="none",
        task_type="SEQ_CLS"
    )

    # Apply LoRA adapters
    lora_model = get_peft_model(base_model, lora_config)

    # === Train LoRA adapters ===
    lora_model = lora_training(
        model=lora_model,
        tokenizer=tokenizer,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        output_dir=save_lora_path,
        num_epochs=3
    )
    print("✅ LoRA training finished.")

    # Save tokenizer + model (before training, so you can reload later)
    tokenizer.save_pretrained(save_lora_path)
    lora_model.save_pretrained(save_lora_path)
    print("✅ Trained LoRA model saved to:", save_lora_path)



**First Model**

twitter-roberta-base-sentiment

From excercise 4

In [11]:
model_name = "cardiffnlp/twitter-roberta-base-sentiment-latest"

base_weights_path = "final_models/roberta_sentiment_exc4_weights.pt"
save_lora_path = "final_models/roberta_sentiment_exc4_LoRA"
LoraModel(model_name, base_weights_path, save_lora_path)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ LoRA adapters applied.


Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/4116 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss
1,0.4674,0.447802
2,0.4644,0.346314
3,0.3871,0.297852


✅ LoRA training finished.
✅ Trained LoRA model saved to: /content/drive/MyDrive/Colab Notebooks/final_models/roberta_sentiment_exc4_LoRA


From excercise 5

In [10]:
base_weights_path = "final_models/roberta_sentiment_weights.pt"
save_lora_path = "final_models/roberta_sentiment_exc5_LoRA"
LoraModel(model_name, base_weights_path, save_lora_path)

config.json:   0%|          | 0.00/929 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/501M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/501M [00:00<?, ?B/s]

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest and are newly initialized because the shapes did not match:
- classifier.out_proj.weight: found shape torch.Size([3, 768]) in the checkpo

✅ LoRA adapters applied.


Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/4116 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss
1,0.2766,0.376805
2,0.2761,0.291098
3,0.2681,0.255997


✅ LoRA training finished.
✅ Trained LoRA model saved to: /content/drive/MyDrive/Colab Notebooks/final_models/roberta_sentiment_exc5_LoRA


**Second Model**

distilbert-base-uncased-finetuned-sst-2-english

From excercise 4

In [20]:
model_name = "distilbert/distilbert-base-uncased-finetuned-sst-2-english"

base_weights_path = "final_models/distilbert_exc4_weights.pt"
save_lora_path = "final_models/distilbert_exc4_LoRA"
LoraModel(model_name, base_weights_path, save_lora_path, model_type = "distilbert")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/4116 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss
1,0.2816,0.175807
2,0.1945,0.126904
3,0.1514,0.114513


✅ LoRA training finished.
✅ Trained LoRA model saved to: /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_exc4_LoRA


From excercise 5

In [21]:
base_weights_path = "final_models/distilbert_weights.pt"
save_lora_path = "final_models/distilbert_exc5_LoRA"
LoraModel(model_name, base_weights_path, save_lora_path, model_type = "distilbert")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([5]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([5, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/48910 [00:00<?, ? examples/s]

Map:   0%|          | 0/4116 [00:00<?, ? examples/s]

Epoch,Training Loss,Validation Loss
1,0.298,0.388128
2,0.279,0.300767
3,0.2799,0.286446


✅ LoRA training finished.
✅ Trained LoRA model saved to: /content/drive/MyDrive/Colab Notebooks/final_models/distilbert_exc5_LoRA


<center><h1>END</h1></center>
