# Assignment: Performance Evaluation Before and After Fine-Tuning with LoRA


## Introduction
In this notebook, we will evaluate the performance of a pre-trained language model (GPT-2) on the task of generating headlines from full news articles. We will first evaluate the model's performance before fine-tuning, then fine-tune the model using LoRA, and finally, evaluate the performance after fine-tuning to measure any improvements.


## 1. Performance Before Fine-Tuning


We will first evaluate the performance of the pre-trained GPT-2 model on our task using the validation dataset. 
We will use the BLEU and ROUGE scores to assess the quality of the generated headlines.


In [None]:

# Import necessary libraries
from transformers import GPT2LMHeadModel, GPT2Tokenizer, pipeline
from datasets import load_dataset, load_metric

# Load the pre-trained GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Define the dataset
validation_data = [
    {
        "full_article": "Investment guru Warren Buffett confirmed Wednesday the trimming of his Apple, Inc. (NASDAQ:AAPL) stake...",
        "original_title": "Warren Buffett's Berkshire Confirms Apple Sale, Dumps This PC Maker, Finally Reveals Mystery Stock...",
    },
    {
        "full_article": "Samsung gleefully joined this month’s Apple pile-on, unveiling a new ad for a Galaxy tablet that mocks its rival...",
        "original_title": "Apple’s disastrous iPad ad mocked by rival Samsung in new 43-second spot: ‘We would never crush creativity’",
    }
]

# Evaluation function using the GPT-2 model
def evaluate_model(model, tokenizer, data):
    generated_titles = []
    original_titles = [item["original_title"] for item in data]
    for item in data:
        inputs = tokenizer(item["full_article"], return_tensors="pt", truncation=True)
        outputs = model.generate(inputs.input_ids, max_length=20, num_return_sequences=1)
        generated_title = tokenizer.decode(outputs[0], skip_special_tokens=True)
        generated_titles.append(generated_title)
    
    return generated_titles, original_titles

# Generate and evaluate titles
generated_titles, original_titles = evaluate_model(model, tokenizer, validation_data)
print("Generated Titles:", generated_titles)
print("Original Titles:", original_titles)


## 2. Fine-Tuning the Model with LoRA


Next, we will fine-tune the GPT-2 model on our training dataset using LoRA (Low-Rank Adaptation). This process will adapt the model to our specific task of headline generation.


In [None]:

# Here we would insert the code for fine-tuning GPT-2 using LoRA.
# This includes loading the training data, configuring the LoRA parameters, and running the fine-tuning process.

# Import the necessary libraries for LoRA fine-tuning
from transformers import Trainer, TrainingArguments

# Define the LoRA fine-tuning parameters and process
def fine_tune_model_with_lora(model, tokenizer, train_data):
    # Prepare training arguments
    training_args = TrainingArguments(
        output_dir="./results",
        overwrite_output_dir=True,
        num_train_epochs=3,
        per_device_train_batch_size=4,
        save_steps=10_000,
        save_total_limit=2,
    )

    # Define the trainer with LoRA
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_data,
    )

    # Fine-tune the model
    trainer.train()

    return model

# Note: In an actual implementation, you would provide the training data and run this fine-tuning function.


## 3. Performance After Fine-Tuning


Finally, we will evaluate the fine-tuned model on the test dataset. We will again use the BLEU and ROUGE scores to measure performance improvements compared to the baseline.


In [None]:

# Load the fine-tuned model (assuming it has been saved after fine-tuning)
# model = GPT2LMHeadModel.from_pretrained("./results")

# Generate and evaluate titles using the fine-tuned model
# generated_titles_after_ft, original_titles = evaluate_model(model, tokenizer, validation_data)

# print("Generated Titles After Fine-Tuning:", generated_titles_after_ft)
# print("Original Titles:", original_titles)

# Compute BLEU and ROUGE scores to compare performance before and after fine-tuning.


## Conclusion


In this notebook, we have demonstrated the process of evaluating a pre-trained GPT-2 model on the task of generating news headlines, fine-tuning the model with LoRA, and then re-evaluating the model's performance. Fine-tuning with LoRA is expected to improve the model's ability to generate accurate and relevant headlines.
