# üìò Summarization Model with LoRA and PEFT using FLAN-T5

### üîç Objective:
The goal of this notebook is to build a lightweight, instruction-following **summarization model** that generates concise answers from a given **question + context** input.

We fine-tune the **FLAN-T5** model using a **parameter-efficient method** (PEFT) called **LoRA (Low-Rank Adaptation)**. The dataset consists of context-question-answer triples, where the answer serves as a summary of the context in response to the question.

---

### üõ†Ô∏è Approach Summary:
- **Model**: `google/flan-t5-base` (pretrained on instruction-following tasks)
- **Task framing**: Instruction-based summarization using `"summarize: "` prefix
- **Data**: `neural-bridge/rag-dataset-12000` (QA-style summarization)
- **Fine-tuning**: We apply **LoRA adapters** to inject trainable parameters without updating the full model
- **Training Framework**: `adapter-transformers` + HuggingFace Trainer

---

### üß† What is LoRA?

**LoRA (Low-Rank Adaptation)** is a technique for fine-tuning large language models by **inserting trainable rank-decomposed matrices** into each layer, while **keeping the original weights frozen**.

Instead of updating the full weight matrix $W \in \mathbb{R}^{d \times d}$ , LoRA approximates the update as:

$$
\Delta W \approx A B \quad \text{where } A \in \mathbb{R}^{d \times r}, \; B \in \mathbb{R}^{r \times d}, \; r \ll d
$$

This drastically reduces the number of trainable parameters and allows efficient adaptation with less compute and memory.

---

### üß© What is PEFT?

**PEFT (Parameter-Efficient Fine-Tuning)** refers to any technique that fine-tunes only a **subset of a model's parameters**.  
LoRA is one such method under this umbrella, making it possible to:
- Reuse the same base model across tasks
- Add/remove adapters without retraining
- Reduce storage and deployment cost

---

### ‚ö° Why This Setup?

Fine-tuning large models like FLAN-T5 from scratch is expensive and often unnecessary.  
LoRA + PEFT lets us train compact and effective models even on modest hardware, making this setup ideal for:
- Domain adaptation
- Instruction tuning
- Fast experimentation

## 1. Load Tokenizer

We load the tokenizer for the FLAN-T5 base model.  
This tokenizer will be used for encoding inputs (question + context) and decoding model outputs (answers).

In [None]:
from transformers import AutoTokenizer

base_model = "google/flan-t5-base"

tokenizer = AutoTokenizer.from_pretrained(base_model)
prefix = 'summarize: '

## 2. Define Tokenization Function

This function encodes a batch of data:
- Inputs are created by concatenating `question` + `context`, prefixed with `"summarize: "`.
- Targets are the `answer` texts.
- Both are tokenized with truncation and padding.
- The result is returned as a dictionary with `input_ids` and `labels`.


In [None]:
def encode_batch(examples):
    text_column1 = 'context'
    text_column2 = 'question'
    summary_column = 'answer'
    
    padding = "max_length"

    inputs, targets = [], []
    for i in range(len(examples[text_column1])):
        if examples[text_column1][i] and examples[text_column2][i] and examples[summary_column][i]:
            # Concatenate question + context
            input_text = examples[text_column2][i] + " " + examples[text_column1][i]
            inputs.append(input_text)
            targets.append(examples[summary_column][i])


    inputs = [prefix + inp for inp in inputs]

    model_inputs = tokenizer(inputs, max_length=512, padding=padding, truncation=True)
    labels = tokenizer(targets, max_length=128, padding=padding, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

## 3. Load and Prepare Dataset

This function loads the dataset split (`train` or `test`) and:
- Filters out rows with missing `context` or `answer`
- Limits the number of rows to `max_items`
- Applies tokenization using `encode_batch`
- Formats the dataset as PyTorch tensors for training


In [None]:
def load_split(split_name, max_items):
    
    dataset = load_dataset("neural-bridge/rag-dataset-12000")[split_name] 


    dataset = dataset.filter(lambda example: example['context'] is not None and example['answer'] is not None)
    
    dataset = dataset.filter(lambda _, idx: idx < max_items, with_indices=True)
    
    
    dataset = dataset.map(
        encode_batch,
        batched=True,
        remove_columns=dataset.column_names,
        desc="Running tokenizer on " + split_name + " dataset",
    )
    
    dataset.set_format(type="torch", columns=["input_ids", "labels"])

    return dataset

## 4. Load Model and Add LoRA Adapter

We use the AdapterHub-compatible model (`AutoAdapterModel`) and apply **LoRA (Low-Rank Adaptation)**:
- `r = 8`: Low-rank dimensionality
- `alpha = 16`: Scaling factor
- `intermediate_lora` and `output_lora`: Apply LoRA to both FFN and output layers

This enables **parameter-efficient fine-tuning** without updating the entire base model.


In [None]:
#from transformers import AutoModelForSeq2SeqLM
from adapters import LoRAConfig

from adapters import AutoAdapterModel

model = AutoAdapterModel.from_pretrained(base_model)

# Load the model
#model = AutoModelForSeq2SeqLM.from_pretrained(base_model)

config = LoRAConfig(
    r=8,
    alpha=16,
    intermediate_lora=True,
    output_lora=True
)


In [17]:
print(type(model))

<class 'transformers.models.t5.modeling_t5.T5ForConditionalGeneration'>


In [5]:
print(type(model))

<class 'adapters.models.t5.adapter_model.T5AdapterModel'>


In [None]:
#model.add_adapter("my_summary_adapter", config=config, adapter_type="lora")
model.add_adapter(adapter_name="my_summary_adapter", config=config)

model.train_adapter("my_summary_adapter")
model.set_active_adapters("my_summary_adapter")

## 5. Define Training Configuration and Start Training

We configure the trainer using HuggingFace's `TrainingArguments`:
- 2 epochs
- Batch size of 2
- Logging every 50 steps

We use `AdapterTrainer` to train only the adapter layer, leaving the base model frozen.

Training and evaluation are performed on subsets of 1,000 training and 100 test examples.


In [None]:
from transformers import TrainingArguments
from adapters import AdapterTrainer
from datasets import load_dataset
batch_size = 2  

training_args = TrainingArguments(
    learning_rate=3e-4,
    num_train_epochs=2,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    logging_steps=50,
    output_dir="./training_output",
    overwrite_output_dir=True,
    remove_unused_columns=False,
)

trainer = AdapterTrainer(
    model=model,
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=load_split("train", 1000),
    eval_dataset=load_split("test", 100),
)

trainer.train()


Filter:   0%|          | 0/9600 [00:00<?, ? examples/s]

Filter:   0%|          | 0/9598 [00:00<?, ? examples/s]

Running tokenizer on train dataset:   0%|          | 0/1000 [00:00<?, ? examples/s]

Filter:   0%|          | 0/2400 [00:00<?, ? examples/s]

Filter:   0%|          | 0/2399 [00:00<?, ? examples/s]

Running tokenizer on test dataset:   0%|          | 0/100 [00:00<?, ? examples/s]

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Step,Training Loss
50,23.367
100,5.4429
150,2.8273
200,1.0929
250,0.5771
300,0.5632
350,0.5436
400,0.6298
450,0.5274
500,0.454


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


TrainOutput(global_step=1000, training_loss=2.032686315536499, metrics={'train_runtime': 383.374, 'train_samples_per_second': 5.217, 'train_steps_per_second': 2.608, 'total_flos': 1381594300416000.0, 'train_loss': 2.032686315536499, 'epoch': 2.0})

In [12]:
trainer.evaluate()

{'eval_loss': 0.3849603533744812,
 'eval_runtime': 6.2349,
 'eval_samples_per_second': 16.039,
 'eval_steps_per_second': 8.019,
 'epoch': 2.0}

## 6. Merge Adapter with Base Model

After training, the adapter is merged into the base model so it can be used for standalone inference without adapter activation.

In [None]:

model.merge_adapter("my_summary_adapter")

## 7. Run Inference

We test the trained summarization model by passing in a question and a long context.  
The model generates a summary using its learned instruction-following ability.

#### Output:
> **Generated Summary**:  
> *The story of the magic mill spread far and wide.*

This shows the model's ability to extract and compress key information into a brief, high-level summary.


In [None]:
context = """
Once upon a time, there were two brothers ‚Äî one was rich, and the other was poor. The poor brother ran out of food and went to his rich brother, begging for something to eat.

The rich brother, not happy about helping, said, ‚ÄúI‚Äôll give you this ham, but you must take it to Dead Man‚Äôs Hall.‚Äù

Grateful for the food, the poor brother agreed. He walked all day and finally reached a large building at dusk. Outside, an old man was chopping wood.

‚ÄúExcuse me, sir,‚Äù said the poor brother. ‚ÄúIs this the way to Dead Man‚Äôs Hall?‚Äù

‚ÄúYes, you‚Äôve arrived,‚Äù replied the old man. ‚ÄúInside, they will want to buy your ham. But don‚Äôt sell it unless they give you the hand-mill that stands behind the door.‚Äù

The poor brother thanked the old man, went inside, and everything happened just as the old man had said. The poor brother left with the hand-mill and asked the old man how to use it. Then, he set off home.

The hand-mill was magical. When the poor brother got home, he asked it to grind a feast of food and drink. To stop the mill, he simply had to say, ‚ÄúThank you, magic mill, you can stop now.‚Äù

When the rich brother saw that his brother was no longer poor, he became jealous. ‚ÄúGive me that mill!‚Äù he demanded. The poor brother, having everything he needed, agreed to sell it but didn‚Äôt tell his rich brother how to stop it.

The rich brother eagerly asked the mill to grind food when he got home, but because he didn‚Äôt know how to stop it, the mill kept grinding until food overflowed from the house and across the fields. In a panic, he ran to his poor brother‚Äôs house. ‚ÄúPlease take it back!‚Äù he cried. ‚ÄúIf it doesn‚Äôt stop, the whole town will be buried!‚Äù

The poor brother took the mill back and was never poor or hungry again.

Soon, the story of the magic mill spread far and wide. One day, a sailor knocked at the poor brother‚Äôs door. ‚ÄúDoes the mill grind salt?‚Äù he asked.

‚ÄúOf course,‚Äù replied the brother. ‚ÄúIt will grind anything you ask.‚Äù

The sailor, eager to stop traveling far for salt, offered a thousand coins for the mill. Though the brother was hesitant, he eventually agreed.

In his hurry, the sailor forgot to ask how to stop the mill. Once at sea, he placed the mill on deck and commanded, ‚ÄúGrind salt, and grind quickly!‚Äù

The mill obeyed, but it didn‚Äôt stop. The pile of salt grew and grew until the ship sank under its weight.

The mill still lies at the bottom of the sea, grinding salt to this day, and that‚Äôs why the sea is salty.

"""
question = "Summarize the story."

input_text = prefix + question + " " + context

inputs = tokenizer(input_text, return_tensors="pt", truncation=True).to(model.device)

output = model.generate(**inputs, max_length=128)

generated_summary = tokenizer.decode(output[0], skip_special_tokens=True)
print("Input:\n", input_text)
print("\nGenerated Summary:\n", generated_summary)


Input:
 summarize: Summarize the story. 
Once upon a time, there were two brothers ‚Äî one was rich, and the other was poor. The poor brother ran out of food and went to his rich brother, begging for something to eat.

The rich brother, not happy about helping, said, ‚ÄúI‚Äôll give you this ham, but you must take it to Dead Man‚Äôs Hall.‚Äù

Grateful for the food, the poor brother agreed. He walked all day and finally reached a large building at dusk. Outside, an old man was chopping wood.

‚ÄúExcuse me, sir,‚Äù said the poor brother. ‚ÄúIs this the way to Dead Man‚Äôs Hall?‚Äù

‚ÄúYes, you‚Äôve arrived,‚Äù replied the old man. ‚ÄúInside, they will want to buy your ham. But don‚Äôt sell it unless they give you the hand-mill that stands behind the door.‚Äù

The poor brother thanked the old man, went inside, and everything happened just as the old man had said. The poor brother left with the hand-mill and asked the old man how to use it. Then, he set off home.

The hand-mill was magical.