# What is LoRA?

**LoRA (Low-Rank Adaptation)** is a parameter-efficient fine-tuning technique that:

1. **Freezes** the pre-trained model weights
2. **Injects** trainable low-rank decomposition matrices into each layer
3. **Updates** only these small matrices during training

**Architecture:**
```
Frozen BART Weights + Low-Rank Adapters → Fine-tuned for Summarization
```

# Install Requirements

In [None]:
!pip install rouge_score
!pip install evaluate
!pip install kagglehub
!pip install peft  # Parameter-Efficient Fine-Tuning library
!pip install accelerate

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=92fb1ee3c0c7b0c30925df59e20345b338891270bcfdd520c6e603d9aba740f0
  Stored in directory: /root/.cache/pip/wheels/85/9d/af/01feefbe7d55ef5468796f0c68225b6788e85d9d0a281e7a70
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2
Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


# Import Libraries

In [None]:
import pandas as pd
import numpy as np
import os

import matplotlib.pyplot as plt
import seaborn as sns

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

from transformers import AutoTokenizer
from transformers import BartForConditionalGeneration
from transformers import DataCollatorForSeq2Seq
from transformers import Seq2SeqTrainer
from transformers import Seq2SeqTrainingArguments

# PEFT imports for LoRA
from peft import LoraConfig, get_peft_model, TaskType, PeftModel

import datasets
from datasets import Dataset as HFDataset
from datasets import DatasetDict

from tqdm import tqdm
import kagglehub

import nltk
from nltk.tokenize import sent_tokenize
nltk.download("punkt")
nltk.download("punkt_tab")

import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cuda


# Download Dataset

In [None]:
# Download BBC News Summary dataset
path = kagglehub.dataset_download("pariza/bbc-news-summary")
print("Path to dataset files:", path)

Using Colab cache for faster access to the 'bbc-news-summary' dataset.
Path to dataset files: /kaggle/input/bbc-news-summary


# Load and Prepare Data

In [None]:
paths = os.listdir(os.path.join(path, 'BBC News Summary/News Articles'))
articles_path = os.path.join(path, 'BBC News Summary/News Articles/')
summaries_path = os.path.join(path, 'BBC News Summary/Summaries/')

articles = []
summaries = []
file_arr = []

for p in paths:
    files = os.listdir(articles_path + p)
    for file in files:
        article_file_path = articles_path + p + '/' + file
        summary_file_path = summaries_path + p + '/' + file
        try:
            with open(article_file_path, 'r', encoding='utf-8', errors='ignore') as f:
                articles.append('.'.join([line.rstrip() for line in f.readlines()]))
            with open(summary_file_path, 'r', encoding='utf-8', errors='ignore') as f:
                summaries.append('.'.join([line.rstrip() for line in f.readlines()]))
            file_arr.append(p + '/' + file)
        except:
            pass

In [None]:
df = pd.DataFrame({'path': file_arr, 'article': articles, 'summary': summaries})
df.head()

Unnamed: 0,path,article,summary
0,politics/361.txt,Budget to set scene for election..Gordon Brown...,- Increase in the stamp duty threshold from £6...
1,politics/245.txt,Army chiefs in regiments decision..Military ch...,"""They are very much not for the good and will ..."
2,politics/141.txt,Howard denies split over ID cards..Michael How...,Michael Howard has denied his shadow cabinet w...
3,politics/372.txt,Observers to monitor UK election..Ministers wi...,The report said individual registration should...
4,politics/333.txt,Kilroy names election seat target..Ex-chat sho...,"UKIP's leader, Roger Knapman, has said he is g..."


In [None]:
# Remove NaNs
df.dropna(inplace=True)
df.isnull().sum()

Unnamed: 0,0
path,0
article,0
summary,0


In [None]:
def word_count(sentence):
    sentences = sentence.split()
    return len(sentences)

df['num_words_article'] = df['article'].apply(word_count)
df['num_words_summary'] = df['summary'].apply(word_count)
df.head()

Unnamed: 0,path,article,summary,num_words_article,num_words_summary
0,politics/361.txt,Budget to set scene for election..Gordon Brown...,- Increase in the stamp duty threshold from £6...,532,192
1,politics/245.txt,Army chiefs in regiments decision..Military ch...,"""They are very much not for the good and will ...",496,266
2,politics/141.txt,Howard denies split over ID cards..Michael How...,Michael Howard has denied his shadow cabinet w...,533,225
3,politics/372.txt,Observers to monitor UK election..Ministers wi...,The report said individual registration should...,490,223
4,politics/333.txt,Kilroy names election seat target..Ex-chat sho...,"UKIP's leader, Roger Knapman, has said he is g...",435,185


In [None]:
# Extract statistics
Q1 = df["num_words_article"].quantile(0.25)
Q3 = df["num_words_article"].quantile(0.75)
IQR = Q3 - Q1
article_upper_whisker = Q3 + 1.5 * IQR
article_lower_whisker = Q1 - 1.5 * IQR

print("ARTICLE LENGTH STATISTICS")
print(f"  Q1 (25th percentile): {Q1:.0f} words")
print(f"  Q3 (75th percentile): {Q3:.0f} words")
print(f"  IQR: {IQR:.0f} words")
print(f"  Upper whisker: {article_upper_whisker:.0f} words")
print(f"  Lower whisker: {article_lower_whisker:.0f} words")

ARTICLE LENGTH STATISTICS
  Q1 (25th percentile): 242 words
  Q3 (75th percentile): 465 words
  IQR: 223 words
  Upper whisker: 800 words
  Lower whisker: -92 words


In [None]:
# Extract statistics
Q1 = df["num_words_summary"].quantile(0.25)
Q3 = df["num_words_summary"].quantile(0.75)
IQR = Q3 - Q1
summary_upper_whisker = Q3 + 1.5 * IQR
summary_lower_whisker = Q1 - 1.5 * IQR


print("SUMMARY LENGTH STATISTICS")

print(f"  Q1 (25th percentile): {Q1:.0f} words")

print(f"  Q3 (75th percentile): {Q3:.0f} words")

print(f"  IQR: {IQR:.0f} words")
print(f"  Upper whisker: {summary_upper_whisker:.0f} words")
print(f"  Lower whisker: {summary_lower_whisker:.0f} words")

SUMMARY LENGTH STATISTICS
  Q1 (25th percentile): 103 words
  Q3 (75th percentile): 202 words
  IQR: 99 words
  Upper whisker: 350 words
  Lower whisker: -46 words


In [None]:
# Remove outliers
new_df = df[(df['num_words_summary'] <= summary_upper_whisker) &
            (df['num_words_article'] <= article_upper_whisker)]

new_df = new_df.drop(columns=["num_words_article", "num_words_summary", "path"])
new_df = new_df.sample(frac=0.3, random_state=42)  # Using 30% of data
new_df = new_df.reset_index(drop=True)

print(f"Dataset size: {len(new_df)}")
new_df.head()

Dataset size: 644


Unnamed: 0,article,summary
0,Jobs growth still slow in the US..The US creat...,The job gains mean that President Bush can cel...
1,French suitor holds LSE meeting..European stoc...,European stock market Euronext has met with th...
2,"Nat Insurance to rise, say Tories..National In...",Tony Blair has said he does not want higher ta...
3,Blair Labour's longest-serving PM..Tony Blair ...,Both Mr Brown and Mr Blair rose to prominence ...
4,Virus poses as Christmas e-mail..Security firm...,Anti-virus firm Sophos said that 10% of the e-...


In [None]:
# Split into train and test
ratio = 0.8
split = int(len(new_df) * ratio)

train_df = new_df.iloc[:split]
test_df = new_df.iloc[split:]

print(f"Train size: {len(train_df)}")
print(f"Test size: {len(test_df)}")

Train size: 515
Test size: 129


In [None]:
# Convert to HuggingFace Dataset
train_dataset = HFDataset.from_pandas(train_df)
test_dataset = HFDataset.from_pandas(test_df)
dataset = DatasetDict({"train": train_dataset, "test": test_dataset})
dataset

DatasetDict({
    train: Dataset({
        features: ['article', 'summary'],
        num_rows: 515
    })
    test: Dataset({
        features: ['article', 'summary'],
        num_rows: 129
    })
})

# Tokenizer Setup

In [None]:
# Using BART-large for better performance
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large")

max_input_length = int(article_upper_whisker)
max_target_length = int(summary_upper_whisker)

print(f"Max article length: {max_input_length}")
print(f"Max summary length: {max_target_length}")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Max article length: 799
Max summary length: 350


In [None]:
def preprocess_function(examples):
    model_inputs = tokenizer(
        examples["article"],
        max_length=max_input_length,
        truncation=True,
    )
    labels = tokenizer(
        examples["summary"],
        max_length=max_target_length,
        truncation=True
    )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_datasets = dataset.map(
    preprocess_function,
    batched=True,
    remove_columns=["article", "summary"]
)

Map:   0%|          | 0/515 [00:00<?, ? examples/s]

Map:   0%|          | 0/129 [00:00<?, ? examples/s]

# Load BART Model with LoRA Configuration

In [None]:
# Load base BART-large model
model_checkpoint = "facebook/bart-large"
base_model = BartForConditionalGeneration.from_pretrained(model_checkpoint)

print(f"Loaded base model: {model_checkpoint}")
print(f"Base model parameters: {sum(p.numel() for p in base_model.parameters()):,}")

pytorch_model.bin:   0%|          | 0.00/1.02G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.02G [00:00<?, ?B/s]

Loaded base model: facebook/bart-large
Base model parameters: 406,291,456


In [None]:
# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,  # Sequence-to-sequence task
    r=8,                               # LoRA rank (dimensionality of low-rank matrices)
    lora_alpha=32,                     # LoRA scaling factor
    lora_dropout=0.1,                  # Dropout for LoRA layers
    target_modules=["q_proj", "v_proj"],  # Apply LoRA to attention layers
    bias="none",
)

# Apply LoRA to the model
model = get_peft_model(base_model, lora_config)
model = model.to(device)

# Print trainable parameters
model.print_trainable_parameters()

trainable params: 1,179,648 || all params: 407,471,104 || trainable%: 0.2895


# ROUGE Metrics

In [None]:
import evaluate
rouge_score = evaluate.load("rouge")

Downloading builder script: 0.00B [00:00, ?B/s]

In [None]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred

    # Decode generated summaries
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)

    # Replace -100 in labels
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # ROUGE expects newline after each sentence
    decoded_preds = ["\n".join(sent_tokenize(pred.strip())) for pred in decoded_preds]
    decoded_labels = ["\n".join(sent_tokenize(label.strip())) for label in decoded_labels]

    # Compute ROUGE scores
    result = rouge_score.compute(
        predictions=decoded_preds,
        references=decoded_labels,
        use_stemmer=True
    )

    return {k: round(v, 4) for k, v in result.items()}

# Training Configuration

In [None]:
# Data collator
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model, padding=True)

In [None]:
batch_size = 10
num_train_epochs = 10
logging_steps = len(tokenized_datasets["train"]) // batch_size
model_name = "bart-large-lora-summarizer"

args = Seq2SeqTrainingArguments(
    output_dir=f"{model_name}-LoRA",
    eval_strategy="epoch",
    learning_rate=1e-4,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    weight_decay=0.01,
    save_total_limit=2,
    num_train_epochs=num_train_epochs,
    predict_with_generate=True,
    logging_steps=logging_steps,
    fp16=torch.cuda.is_available(),
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="rouge1",
    gradient_accumulation_steps=2,
    warmup_steps=100,
    report_to="none",
)

In [None]:
# Initialize trainer
trainer = Seq2SeqTrainer(
    model=model,
    args=args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

# Train the Model

In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum
1,No log,0.697933,0.1429,0.0886,0.1155,0.1191
2,0.799000,0.540023,0.1432,0.0897,0.1161,0.1199
3,0.799000,0.31202,0.1415,0.0896,0.1197,0.1219
4,0.446800,0.222797,0.1437,0.1046,0.1324,0.1342
5,0.446800,0.191082,0.1612,0.1293,0.1538,0.1546
6,0.223400,0.190055,0.1647,0.1378,0.1589,0.1598
7,0.223400,0.182366,0.1622,0.1346,0.1564,0.1565
8,0.200900,0.177929,0.1699,0.1444,0.1641,0.1652
9,0.200900,0.174747,0.1715,0.1464,0.1661,0.1667
10,0.198100,0.17439,0.1727,0.1473,0.1669,0.1679


TrainOutput(global_step=260, training_loss=0.3700549698792971, metrics={'train_runtime': 898.2964, 'train_samples_per_second': 5.733, 'train_steps_per_second': 0.289, 'total_flos': 7802856670371840.0, 'train_loss': 0.3700549698792971, 'epoch': 10.0})

# Inference with LoRA Model

In [None]:
def generate_summary(model, tokenizer, article, max_length=None, device='cuda'):
    if max_length is None:
        max_length = max_target_length

    model.eval()

    # Tokenize article
    inputs = tokenizer(
        article,
        max_length=max_input_length,
        padding='max_length',
        truncation=True,
        return_tensors='pt'
    ).to(device)

    # Generate summary
    with torch.no_grad():
        summary_ids = model.generate(
            input_ids=inputs['input_ids'],
            attention_mask=inputs['attention_mask'],
            max_length=max_length,
            num_beams=4,
            early_stopping=True,
            no_repeat_ngram_size=3,
            length_penalty=2.0,
        )

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

In [None]:
def print_summary(idx):
    article = dataset["test"][idx]["article"]
    summary = dataset["test"][idx]["summary"]
    g_summary = generate_summary(model, tokenizer, article, device=device)
    score = rouge_score.compute(predictions=[g_summary], references=[summary])
    scores = {k: round(v, 4) for k, v in score.items()}

    print(f">>> Article: {article[:500]}...")
    print(f"\n>>> Reference Summary: {summary}")
    print(f"\n>>> Generated Summary: {g_summary}")
    print(f"\n>>> ROUGE Score: {scores}")

print_summary(69)

>>> Article: Lewis-Francis eyeing world gold..Mark Lewis-Francis says his Olympic success has made him determined to bag World Championship 100m gold in 2005...The 22-year-old pipped Maurice Greene on the last leg of the 4x100m relay in Athens to take top honours for Team GB. But individually, the Birchfield Harrier has yet to build on his World Junior Championship win four years ago. "The gold medal in Athens has made me realise that I can get to the top level and I want to get there again. It can happen, I...

>>> Reference Summary: Mark Lewis-Francis says his Olympic success has made him determined to bag World Championship 100m gold in 2005.Lewis-Francis has still to decided what events will feature in his build-up to the worlds - with one exception.But individually, the Birchfield Harrier has yet to build on his World Junior Championship win four years ago."The gold medal in Athens has made me realise that I can get to the top level and I want to get there again.

>>> Generated Su

# Save LoRA Model

**Important:** With LoRA, we only save the adapter weights (small!), not the entire model.

In [None]:
# Save only the LoRA adapters (very small file!)
output_dir = "./bart-lora-summarizer-final"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"LoRA adapters saved to {output_dir}!")
print(f"\nAdapter file size is ~10-20MB (vs ~1.5GB for full BART-large)")
print("\nTo load the model later:")
print(f"  base_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')")
print(f"  model = PeftModel.from_pretrained(base_model, '{output_dir}')")
print(f"  tokenizer = AutoTokenizer.from_pretrained('{output_dir}')")

LoRA adapters saved to ./bart-lora-summarizer-final!

Adapter file size is ~10-20MB (vs ~1.5GB for full BART-large)

To load the model later:
  base_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
  model = PeftModel.from_pretrained(base_model, './bart-lora-summarizer-final')
  tokenizer = AutoTokenizer.from_pretrained('./bart-lora-summarizer-final')


# References

* **LoRA Paper**: [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685)
* **PEFT Library**: https://github.com/huggingface/peft
* **HuggingFace Transformers**: https://huggingface.co/docs/transformers/
* **BART Paper**: [BART: Denoising Sequence-to-Sequence Pre-training](https://arxiv.org/abs/1910.13461)
* **Rouge Metric**: https://huggingface.co/spaces/evaluate-metric/rouge