<a href="https://colab.research.google.com/github/marcoevans693-eng/Advanced_AI_Chatbot_Python_PyTorch/blob/main/LoRA_Training_Pipeline_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LoRA Training Pipeline Tutorial (Google Colab)

This notebook walks step by step through fine-tuning a Transformer model using **LoRA (Low-Rank Adaptation)**.

**What you'll learn:**
- How to install and import the core libraries: `transformers`, `datasets`, `peft`, and `accelerate`.
- How to load a base model and tokenizer.
- How LoRA works conceptually (freeze most weights, train small adapters).
- How to prepare a small text classification dataset.
- How to run training and evaluation in a Colab-friendly way (CPU or GPU).
- How to save and reload the LoRA adapter for inference.

We’ll go in **micro-steps**. At each stage, we’ll:
1. Write a small block of code.
2. Run it.
3. Briefly reason about what it did and why.

In [None]:
import sys
import torch

print("Python version:", sys.version.split()[0])
print("PyTorch version:", torch.__version__)

device = "cuda" if torch.cuda.is_available() else "cpu"
print("CUDA available:", torch.cuda.is_available())
print("Using device:", device)

Python version: 3.12.12
PyTorch version: 2.9.0+cu126
CUDA available: True
Using device: cuda


In [None]:
%%capture
!pip install -q transformers datasets peft accelerate evaluate

In [None]:
from transformers import __version__ as transformers_version
import datasets
import peft
import accelerate
import evaluate

print("Transformers:", transformers_version)
print("Datasets:", datasets.__version__)
print("PEFT:", peft.__version__)
print("Accelerate:", accelerate.__version__)
print("Evaluate:", evaluate.__version__)
print("Setup complete ✅")

Transformers: 4.57.2
Datasets: 4.0.0
PEFT: 0.18.0
Accelerate: 1.12.0
Evaluate: 0.4.6
Setup complete ✅


In [None]:
import random
import numpy as np
import torch

# ---- Reproducibility ----
SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

# ---- Task & model config ----
BASE_MODEL_NAME = "distilbert-base-uncased"  # small, Colab-friendly
NUM_LABELS = 2                               # binary classification

# ---- LoRA hyperparameters ----
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05

print("Config ready ✅")
print("Base model:", BASE_MODEL_NAME)
print("Num labels:", NUM_LABELS)
print("LoRA r/alpha/dropout:", lora_r, lora_alpha, lora_dropout)
print("Using device:", device)

Config ready ✅
Base model: distilbert-base-uncased
Num labels: 2
LoRA r/alpha/dropout: 16 32 0.05
Using device: cuda


In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)

base_model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL_NAME,
    num_labels=NUM_LABELS
).to(device)

def print_num_parameters(model):
    total = sum(p.numel() for p in model.parameters())
    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Total params: {total:,}")
    print(f"Trainable params: {trainable:,}")
    print(f"Trainable %: {100 * trainable / total:.2f}%")

print("Base model parameter counts:")
print_num_parameters(base_model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Base model parameter counts:
Total params: 66,955,010
Trainable params: 66,955,010
Trainable %: 100.00%


In [None]:
from peft import LoraConfig, get_peft_model, TaskType

# 1) Define HOW LoRA should be applied
lora_config = LoraConfig(
    r=lora_r,                  # rank of the low-rank matrices (smaller = lighter)
    lora_alpha=lora_alpha,     # scales the LoRA updates
    lora_dropout=lora_dropout, # dropout on the LoRA layers (regularization)
    target_modules=["q_lin", "v_lin"],  # which layers inside DistilBERT to adapt
    task_type=TaskType.SEQ_CLS # tells PEFT this is a sequence classification task
)

# 2) Wrap the base model with LoRA adapters
lora_model = get_peft_model(base_model, lora_config).to(device)

print("LoRA-wrapped model parameter counts:")
print_num_parameters(lora_model)
print()
lora_model.print_trainable_parameters()

LoRA-wrapped model parameter counts:
Total params: 67,842,052
Trainable params: 887,042
Trainable %: 1.31%

trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075


In [None]:
from datasets import Dataset

# Tiny toy dataset: simple positive (1) vs negative (0) sentiment
texts = [
    "I love this product, it works great!",
    "This is the worst experience I've ever had.",
    "Absolutely fantastic, I would buy again.",
    "Terrible quality and horrible support.",
    "Pretty good overall, just a few issues.",
    "I hate it, totally disappointed.",
    "This is amazing, exceeded my expectations.",
    "Not good at all, I want a refund.",
]

# 1 = positive, 0 = negative
labels = [
    1,  # love this product
    0,  # worst experience
    1,  # fantastic
    0,  # terrible quality
    1,  # pretty good
    0,  # hate it
    1,  # amazing
    0,  # not good
]

# Create a Hugging Face Dataset from the raw Python lists
raw_dataset = Dataset.from_dict({"text": texts, "label": labels})

raw_dataset

Dataset({
    features: ['text', 'label'],
    num_rows: 8
})

In [None]:
# Split into train and test sets (e.g., 75% train, 25% test)
dataset = raw_dataset.train_test_split(test_size=0.25, seed=SEED)

dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 6
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2
    })
})

In [None]:
def tokenize_batch(batch):
    # We only tokenize the "text" field here
    return tokenizer(
        batch["text"],
        truncation=True,      # cut off very long texts
        max_length=128,       # reasonable length for tiny examples
        padding=False,        # no padding here; we'll pad dynamically later
    )

In [None]:
# Apply the tokenization to train and test splits
tokenized_datasets = dataset.map(
    tokenize_batch,
    batched=True,          # process multiple examples at once
    remove_columns=["text"]  # we won't need the raw text inside the Trainer
)

# Rename "label" to "labels" because Trainer expects this column name
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")

# Set the format so Trainer returns PyTorch tensors
tokenized_datasets.set_format(type="torch")

tokenized_datasets

Map:   0%|          | 0/6 [00:00<?, ? examples/s]

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 6
    })
    test: Dataset({
        features: ['labels', 'input_ids', 'attention_mask'],
        num_rows: 2
    })
})

In [None]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
import numpy as np
import evaluate

accuracy_metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    # Convert logits to predicted class indices
    preds = np.argmax(logits, axis=-1)
    # Use the evaluate library to compute accuracy
    return accuracy_metric.compute(predictions=preds, references=labels)

Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from peft import LoraConfig, get_peft_model

# Load tokenizer + base model
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)

base_model = AutoModelForSequenceClassification.from_pretrained(
    BASE_MODEL_NAME,
    num_labels=NUM_LABELS
).to(device)

# Define LoRA config  (FIXED)
lora_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    bias="none",
    target_modules=["q_lin", "v_lin"],   # <-- ADD THIS LINE
    task_type="SEQ_CLS",
)

# Wrap model with LoRA
model = get_peft_model(base_model, lora_config)

model.print_trainable_parameters()

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 887,042 || all params: 67,842,052 || trainable%: 1.3075


In [None]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="lora-distilbert-sentiment",  # where to save checkpoints
    per_device_train_batch_size=4,          # small batch size for tiny dataset
    per_device_eval_batch_size=4,
    num_train_epochs=5,                     # small but enough to overfit tiny data
    learning_rate=2e-4,                     # a reasonable LR for LoRA
    weight_decay=0.01,
    logging_steps=5,                        # log every few steps
    eval_strategy="epoch",            # evaluate at the end of each epoch
    save_strategy="epoch",                  # save at the end of each epoch
    load_best_model_at_end=True,            # reload best checkpoint (by eval metric)
    metric_for_best_model="accuracy",
    greater_is_better=True,
    report_to="none",                       # turn off W&B, etc., for simplicity
)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

  trainer = Trainer(


In [None]:
train_result = trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.674113,0.5
2,No log,0.675905,0.5
3,0.677500,0.675852,0.5
4,0.677500,0.676245,0.5
5,0.636200,0.676381,0.5


In [None]:
output_dir = "lora_distilbert_adapter"

model.save_pretrained(output_dir)      # save LoRA adapter weights
tokenizer.save_pretrained(output_dir)  # save tokenizer too
print(f"Saved adapter + tokenizer to {output_dir}")

Saved adapter + tokenizer to lora_distilbert_adapter


In [None]:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForSequenceClassification, AutoTokenizer

peft_model_id = "lora_distilbert_adapter"

# Load PEFT config (tells us which base model to use)
peft_config = PeftConfig.from_pretrained(peft_model_id)

# Reload base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    peft_config.base_model_name_or_path,
    num_labels=NUM_LABELS,
).to(device)

# Reload tokenizer (from the adapter folder)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)

# Attach the LoRA adapter on top of the base model
inference_model = PeftModel.from_pretrained(base_model, peft_model_id).to(device)
inference_model.eval()

print("Reloaded base model + LoRA adapter ✅")

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Reloaded base model + LoRA adapter ✅


In [None]:
import torch

def predict_texts(texts):
    # Tokenize list of texts -> tensors on GPU
    inputs = tokenizer(
        texts,
        padding=True,
        truncation=True,
        return_tensors="pt",
    ).to(device)

    # No gradients needed for inference
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits

    # Convert logits to probabilities + predicted class index
    probs = torch.softmax(logits, dim=-1)
    preds = probs.argmax(dim=-1).cpu().numpy()
    probs = probs.cpu().numpy()

    return preds, probs

sample_texts = [
    "I really love this product, it works great!",
    "This is the worst thing I have ever bought.",
]

preds, probs = predict_texts(sample_texts)

for text, pred, prob in zip(sample_texts, preds, probs):
    print("TEXT:", text)
    print("  predicted label:", int(pred))   # 0 or 1
    print("  probabilities:", prob)
    print("-" * 40)

TEXT: I really love this product, it works great!
  predicted label: 0
  probabilities: [0.5135009  0.48649904]
----------------------------------------
TEXT: This is the worst thing I have ever bought.
  predicted label: 0
  probabilities: [0.5240246 0.4759754]
----------------------------------------
