# Assignment 3.3
Fine-tune a pre-trained transformer on a classification task (e.g., sentiment analysis or intent classification). Analyze the trade-offs between accuracy, training time, and memory usage for different fine-tuning techniques.

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
from sklearn.metrics import accuracy_score
import torch
import time
import psutil

  from .autonotebook import tqdm as notebook_tqdm





## 1. Load Model and Tokenizer

In [2]:
model_ckpt = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)

# Prepare two models: one for full fine-tuning, one for LoRA
model_full = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)
model_lora = AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=2)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-strea

## 2. Load and Preprocess Data
Using the IMDB dataset for binary sentiment classification.

In [3]:
dataset = load_dataset("imdb")

def preprocess(example):
    return tokenizer(example["text"], padding="max_length", truncation=True, max_length=128)

tokenized = dataset.map(preprocess, batched=True)
train_data = tokenized["train"].shuffle(seed=1).select(range(800))
test_data = tokenized["test"].shuffle(seed=1).select(range(200))

Map: 100%|██████████| 25000/25000 [00:07<00:00, 3458.47 examples/s]



## 3. LoRA Parameter-Efficient Fine-Tuning

In [4]:
lora_cfg = LoraConfig(
    r=4,
    lora_alpha=8,
    target_modules=["q_lin"],
    lora_dropout=0.05,
    bias="none",
    task_type="SEQ_CLS"
)
model_lora = get_peft_model(model_lora, lora_cfg)
model_lora.print_trainable_parameters()

trainable params: 628,994 || all params: 67,584,004 || trainable%: 0.9307


## 4. Training and Evaluation Utility

In [5]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    acc = accuracy_score(labels, predictions)
    return {"accuracy": acc}

def train_eval(model, train, test, label):
    args = TrainingArguments(
        output_dir=f"results_{label}",
        learning_rate=3e-4,
        per_device_train_batch_size=8,
        num_train_epochs=1,
        weight_decay=0.01,
        eval_strategy="epoch",
        save_strategy="no",
        logging_dir="logs",
        logging_steps=10,
        report_to="none"
    )
    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=train,
        eval_dataset=test,
        tokenizer=tokenizer,
        compute_metrics=compute_metrics
    )
    process = psutil.Process()
    mem_before = process.memory_info().rss / 1e6
    t0 = time.time()
    trainer.train()
    t1 = time.time()
    mem_after = process.memory_info().rss / 1e6
    eval_result = trainer.evaluate()
    print(f"\n{label} Results:")
    print(f"Accuracy: {eval_result.get('eval_accuracy', eval_result.get('accuracy', 'N/A'))}")
    print(f"Training Time: {t1-t0:.2f} seconds")
    print(f"Memory Usage: {mem_after-mem_before:.2f} MB (approximate)")
    return eval_result, t1-t0, mem_after-mem_before

## 5. Run Experiments

In [6]:
# Full fine-tuning
full_result, full_time, full_mem = train_eval(model_full, train_data, test_data, "FullFineTune")

# LoRA fine-tuning
lora_result, lora_time, lora_mem = train_eval(model_lora, train_data, test_data, "LoRA")

  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.6913,0.694215,0.465




  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.



FullFineTune Results:
Accuracy: 0.465
Training Time: 259.53 seconds
Memory Usage: 1261.50 MB (approximate)




Epoch,Training Loss,Validation Loss,Accuracy
1,0.5996,0.608323,0.74





LoRA Results:
Accuracy: 0.74
Training Time: 184.91 seconds
Memory Usage: 88.94 MB (approximate)


## 6. Discussion

**Trade-offs between Full Fine-Tuning and LoRA:**

- **Accuracy:**  
  Full fine-tuning achieved an accuracy of **0.465**, while LoRA fine-tuning achieved a much higher accuracy of **0.74**. This suggests that, for this experiment, LoRA not only matched but outperformed full fine-tuning in terms of classification accuracy.

- **Training Time:**  
  Full fine-tuning required **259.53 seconds**, whereas LoRA fine-tuning completed in **184.91 seconds**. LoRA reduced training time by about 29%, making it more efficient for rapid experimentation or deployment.

- **Memory Usage:**  
  Full fine-tuning used **1261.50 MB** of memory, while LoRA used only **88.94 MB**. This is a dramatic reduction (over 90% less memory), highlighting LoRA's suitability for environments with limited resources.

Hence, LoRA fine-tuning provided better accuracy, faster training, and drastically lower memory usage compared to full fine-tuning in this experiment. This demonstrates the effectiveness and efficiency of parameter-efficient fine-tuning methods like LoRA, especially for large models or when computational resources are limited.