<a href="https://colab.research.google.com/github/ruksad/Machine-learning/blob/master/deep_learning_gen_ai/flan_t5_quickstart_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FLAN-T5 Quickstart: Classification and Summarization (Beginner-Friendly)

Welcome! This notebook shows how to use [FLAN-T5](https://huggingface.co/google/flan-t5-small) for:
- Zero-shot classification (SST-2 style: positive/negative)
- Zero-shot summarization (dialogues)
- Optional: a tiny parameter-efficient fine-tune (LoRA) on a small slice of SST-2

We keep things simple first, explain each step, and keep runtime short. Later, you can scale up.

## Setup
We'll install the required libraries (works on Google Colab). If you're running locally, ensure you have a recent GPU-enabled PyTorch.

Tip: On Colab, go to Runtime → Change runtime type → Hardware accelerator → GPU.

In [2]:
# Install libraries (quiet mode). Safe to run multiple times.
!pip -q install -U "transformers>=4.44" "datasets>=2.20" "peft>=0.12" accelerate evaluate rouge-score scikit-learn sentencepiece "pyarrow<20.0.0a0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m494.8/494.8 kB[0m [31m15.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.1/42.1 MB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[?25h

## Load model and tokenizer
We'll use the small FLAN-T5 model to keep things light.
- Tokenizer converts text ↔ tokens
- Model generates outputs given the tokens

In [2]:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

MODEL_NAME = "google/flan-t5-small"
tok = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)
device

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

'cuda'

## Zero-shot classification (SST-2 style)
FLAN-T5 understands instructions. For SST-2, prompting with `sst2: <text>` often produces `positive` or `negative`.
We'll write a tiny helper to classify one or more texts.

In [7]:
def classify(texts, max_new_tokens=8): # Increased max_new_tokens
    if isinstance(texts, str):
        texts = [texts]
    prompts = [f"sst2: {t}" for t in texts]
    enc = tok(prompts, return_tensors='pt', padding=True).to(device)
    with torch.no_grad():
        out = model.generate(**enc, max_new_tokens=max_new_tokens)
    decoded = tok.batch_decode(out, skip_special_tokens=True)
    # Normalize and try to find 'positive' or 'negative'
    normalized_preds = []
    for d in decoded:
        d_lower = d.strip().lower()
        if 'positive' in d_lower:
            normalized_preds.append('positive')
        elif 'negative' in d_lower:
            normalized_preds.append('negative')
        else:
            # Fallback if neither is found (can be adjusted)
            normalized_preds.append(d_lower.split(' ')[0]) # Keep original logic as fallback
    return normalized_preds

examples = [
    "I absolutely loved this movie. It was fantastic!",
    "The plot was predictable and the acting was bad.",
    "Not great, not terrible."
]
preds = classify(examples)
for t, p in zip(examples, preds):
    print(f"Text: {t}\nPrediction: {p}")

Text: I absolutely loved this movie. It was fantastic!
Prediction: i
Text: The plot was predictable and the acting was bad.
Prediction: sst2:
Text: Not great, not terrible.
Prediction: sst2:


## Zero-shot summarization
For summarization, prefix the input with `summarize:` and provide the content (e.g., a short dialogue).

In [8]:
def summarize(text, max_new_tokens=80):
    prompt = f'summarize: {text}'
    enc = tok(prompt, return_tensors='pt').to(device)
    with torch.no_grad():
        out = model.generate(**enc, max_new_tokens=max_new_tokens)
    return tok.decode(out[0], skip_special_tokens=True)

dialogue = (
    "John: Let's meet at 5 pm.\n"
    "Jane: Can we do 6 pm instead?\n"
    "John: Sure. See you then."
)
print(summarize(dialogue))

John and Jane will meet at 5 pm.


## Optional: tiny fine-tuning for classification (LoRA)
If you'd like to improve classification beyond zero-shot, we can fine-tune a small number of adapter parameters using [LoRA](https://arxiv.org/abs/2106.09685).
- We freeze the base model and train only small adapters → faster, less memory.
- We'll use a tiny subset of SST-2 (e.g., 1000 examples) and 1 epoch to keep it quick.

You can skip this section if you're just exploring.

In [13]:
from datasets import load_dataset
from transformers import DataCollatorForSeq2Seq, Seq2SeqTrainer, Seq2SeqTrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from sklearn.metrics import accuracy_score, f1_score
import numpy as np

def set_seed(seed=42):
    import random
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

set_seed(42)

# 1) Load SST-2 and prep a tiny subset
ds = load_dataset('glue', 'sst2')
label_map = {0: 'negative', 1: 'positive'}

def tokenize_batch(batch, max_src=256, max_tgt=4):
    inputs = [f"sst2: {s}" for s in batch['sentence']]
    targets = [label_map[int(l)] for l in batch['label']]
    enc = tok(
        inputs, truncation=True, padding='max_length', max_length=max_src
    )
    with tok.as_target_tokenizer():
        lab = tok(
            targets, truncation=True, padding='max_length', max_length=max_tgt
        )
    enc['labels'] = lab['input_ids']
    return enc

cols = ds['train'].column_names
train_small = ds['train'].select(range(min(1000, len(ds['train']))))
val_split = 'validation' if 'validation' in ds else 'test'
eval_small = ds[val_split].select(range(min(1000, len(ds[val_split]))))
train_tok = train_small.map(tokenize_batch, batched=True, remove_columns=cols)
eval_tok = eval_small.map(tokenize_batch, batched=True, remove_columns=cols)

# 2) Wrap model with LoRA (adapters only)
base_model = model  # reuse already loaded model
lora_cfg = LoraConfig(
    task_type=TaskType.SEQ_2_SEQ_LM,
    r=8,                 # adapter rank (capacity)
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=['q', 'v']  # T5 attention projections; simple & effective
)
ft_model = get_peft_model(base_model, lora_cfg)
ft_model.print_trainable_parameters()

# 3) Trainer setup
data_collator = DataCollatorForSeq2Seq(tok, model=ft_model)

def normalize_labels(strs):
    out = []
    for s in strs:
        s = s.strip().lower()
        if 'positive' in s and 'negative' in s:
            out.append('positive' if s.find('positive') <= s.find('negative') else 'negative')
        elif 'positive' in s:
            out.append('positive')
        elif 'negative' in s:
            out.append('negative')
        else:
            out.append('positive' if s.startswith('p') else 'negative')
    return out

def compute_metrics(eval_pred):
    pred_ids, label_ids = eval_pred
    preds = tok.batch_decode(pred_ids, skip_special_tokens=True)
    label_ids = [[(tid if tid != -100 else tok.pad_token_id) for tid in seq] for seq in label_ids]
    labels = tok.batch_decode(label_ids, skip_special_tokens=True)
    preds = normalize_labels(preds)
    labels = normalize_labels(labels)
    return {
        'accuracy': accuracy_score(labels, preds),
        'f1': f1_score(labels, preds, pos_label='positive'),
    }

args = Seq2SeqTrainingArguments(
    output_dir='out_lora_sst2',
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    learning_rate=5e-4,
    num_train_epochs=1,
    eval_strategy='epoch', # Reverted to evaluation_strategy
    save_strategy='epoch', # Added to match eval_strategy
    predict_with_generate=True,
    generation_max_length=4,
    logging_steps=50,
    save_total_limit=1,
    fp16=torch.cuda.is_available(),
    report_to=['none'],
    load_best_model_at_end=True,
    metric_for_best_model='f1',
    greater_is_better=True,
    seed=42,
)

trainer = Seq2SeqTrainer(
    model=ft_model,
    args=args,
    train_dataset=train_tok,
    eval_dataset=eval_tok,
    tokenizer=tok,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

train_result = trainer.train()
metrics = trainer.evaluate()
metrics

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]



Map:   0%|          | 0/872 [00:00<?, ? examples/s]

  trainer = Seq2SeqTrainer(


trainable params: 344,064 || all params: 77,305,216 || trainable%: 0.4451


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.0,,0.489679,0.004474


{'eval_loss': nan,
 'eval_accuracy': 0.4896788990825688,
 'eval_f1': 0.0044742729306487695,
 'eval_runtime': 9.9026,
 'eval_samples_per_second': 88.057,
 'eval_steps_per_second': 5.554,
 'epoch': 1.0}

In [14]:
train_result

TrainOutput(global_step=63, training_loss=0.0, metrics={'train_runtime': 23.1922, 'train_samples_per_second': 43.118, 'train_steps_per_second': 2.716, 'total_flos': 93473734656000.0, 'train_loss': 0.0, 'epoch': 1.0})

In [15]:
metrics

{'eval_loss': nan,
 'eval_accuracy': 0.4896788990825688,
 'eval_f1': 0.0044742729306487695,
 'eval_runtime': 9.9026,
 'eval_samples_per_second': 88.057,
 'eval_steps_per_second': 5.554,
 'epoch': 1.0}

## Try the fine-tuned classifier
Let's compare predictions from the fine-tuned model on a few texts.

In [18]:
def classify_with(model_obj, texts, max_new_tokens=8): # Increased max_new_tokens
    if isinstance(texts, str):
        texts = [texts]
    prompts = [f'sst2: {t}' for t in texts]
    enc = tok(prompts, return_tensors='pt', padding=True).to(device)
    with torch.no_grad():
        out = model_obj.generate(**enc, max_new_tokens=max_new_tokens)
    decoded = tok.batch_decode(out, skip_special_tokens=True)
    # Normalize and try to find 'positive' or 'negative'
    normalized_preds = []
    for d in decoded:
        d_lower = d.strip().lower()
        if 'positive' in d_lower:
            normalized_preds.append('positive')
        elif 'negative' in d_lower:
            normalized_preds.append('negative')
        else:
            # Fallback if neither is found (can be adjusted)
            normalized_preds.append(d_lower.split(' ')[0]) # Keep original logic as fallback
    return normalized_preds

test_texts = [
    "A joyful, heartwarming story.",
    "What a waste of time, utterly boring."
]
print('Zero-shot:', classify(test_texts))
print('LoRA fine-tuned:', classify_with(ft_model, test_texts))

Zero-shot: ['sst2:', 'sst2:']
LoRA fine-tuned: ['sst2:', 'sst2:']


## Save and load the adapter (optional)
You can save just the LoRA adapter weights (small!) and load them later to reuse the fine-tuning without storing the full base model.

In [None]:
adapter_path = 'lora_sst2_adapter'
ft_model.save_pretrained(adapter_path)
print('Saved adapter to', adapter_path)

# To load later:
from peft import PeftModel
base = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME).to(device)
loaded = PeftModel.from_pretrained(base, adapter_path).to(device)
print('Loaded adapter back into base model.')

## Where to go next
- Try different capacities for LoRA (r=4, 8, 16) to see trade-offs.
- Explore Prefix-Tuning and Prompt-Tuning (see project README for a full script).
- Scale up: use `google/flan-t5-base` if you have more GPU memory.
- Evaluate summarization with ROUGE using `evaluate` if you create a validation set.