## Import all relevant libraries essential for completing the tasks in this assignment.

In [None]:
!pip install transformers datasets evaluate rouge_score

## Q1) Load the Dataset (5)

We begin by loading the CNN/DailyMail news summarization dataset, which will be used throughout this assignment evaluate the pre-trained models.

This dataset contains news articles paired with human-written summaries, providing a rich source of real-world examples for model development and testing.

You can find details about the dataset and instructions for loading it here: https://huggingface.co/datasets/abisee/cnn_dailymail

In [None]:
from datasets import load_dataset

# Load the CNN/DailyMail dataset from huggingface :

dataset = load_dataset("abisee/cnn_dailymail", "3.0.0") # We will use the version 3.0.0 in this assignment

## Q2) Create a summarization Pipeline (10)

In this step, we create a text summarization pipeline using a pre-trained model from the Hugging Face Transformers library.
You will be working with two models `"google-t5/t5-small"` and `"sshleifer/distilbart-cnn-12-6"` .

*Note:* Ensure the pipeline is configured to generate summaries with a minimum length of 30 tokens and a maximum of 128 tokens.

Find more details about the models and pipelines below:

t5-small: https://huggingface.co/google-t5/t5-small

distilbart-cnn-12-6: https://huggingface.co/sshleifer/distilbart-cnn-12-6

Pipeline: https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.SummarizationPipeline


In [None]:
from transformers import pipeline


model_name = "sshleifer/distilbart-cnn-12-6" #[""google-t5/t5-small","sshleifer/distilbart-cnn-12-6"]
# TODO:CREATE a pipeline for summarization, in the pipeline set minimum length to 30 and maximum to 128.
summarizer = pipeline("summarization", model=model_name, tokenizer=model_name, min_length=30, max_length=128)

## Q3) Summary Generation (10)

In this section, you will generate summaries for the first 20 articles from the test split of the CNN/DailyMail dataset using the summarization pipeline you created earlier.

For each article, you'll fetch the text, generate a summary with truncation enabled, and then store both the original article and its summary in separate lists.

Refer to the example provided here to work with Hugging Face datasets: https://huggingface.co/docs/datasets/en/access

In [None]:
summaries = []
articles = []

# We will generate summaries for the first 20 articles in the datasetfrom the 'test' split.
for i in range(20):
    # Get the article from the dataset's test split
    article =  dataset["test"][i]["article"]

    # Generate a summary for the fetched article using the summarization pipeline, set the 'truncation' to 'True' while generatiing sumamries
    output =  summarizer(article, truncation=True)

    # TODO: Append the generated summary from the output to the summaries list
    summaries.append(output[0]["summary_text"])

    # TODO: Append the original article to the articles list
    articles.append(article)

## Q4) Evaluating the Summaries (5)

In this section, you will evaluate the quality of the generated summaries by comparing them with the reference summaries using the ROUGE metric.

Specifically, you will calculate the ROUGE-1 F1 score for each summary and compute the average across all 20 examples to assess overall summarization performance.

You can read more about the metric and it's usage here: https://huggingface.co/spaces/evaluate-metric/rouge

In [None]:
from evaluate import load

## TODO: Load the ROUGE metric
rouge = load("rouge")

## We will load the refrence summaries
reference_summaries = [dataset["test"][i]['highlights'] for i in range(20)]

total_rouge1_f1 = 0
# Printing out the F1 scores for ROUGE-1
for i, (pred, ref) in enumerate(zip(summaries, reference_summaries)):
    # TODO: Compute the ROUGE-1 scores for each summary
    result = rouge.compute(predictions=[pred], references=[ref], use_stemmer=True)
    rouge_1_f1 = result["rouge1"]

    total_rouge1_f1 += rouge_1_f1 # ROGUE_1_F1_SCORE for the summary


rouge1_f1_score = total_rouge1_f1 / len(summaries)
print(f"Average ROUGE-1 F1 Score is : {rouge1_f1_score:.2f}")

## Output Storage (Optional)

You can store your summaries as shown below, and then repeat the process for the other model.

Feel free to use loops or print statements to analyze the five summaries for the written part of the assignment. You can also store them to a csv or JSON and analyze them separately.

In [None]:
# #Ensure the model used for the run before storing them
# t5_summaries = summaries

In [None]:
#Ensure the model used for the run before storing them
bart_summaries = summaries

In [None]:
# Example for analysis
for i in range(5):
  print("Article", articles[i])
  print("----------------------------------------------------------------------------------------- \n")
  # print("Summary generated by t5: ", t5_summaries[i])
  # print("----------------------------------------------------------------------------------------- \n")
  print("Summary generated by Distill-bart: ", bart_summaries[i])
  print("------XX------XX------XX------XX------XX------XX------XX------XX------XX------XX-------XX \n")

In [None]:
# Example for storing the summaries in csv file for later analysis
import pandas as pd


df = pd.DataFrame({
    'Article': articles,
    # 'T-5 Summary':  t5_summaries,
    'Distill-bart Summary': bart_summaries
})

# df.to_csv('cs421_assgn4_summ_results_t5.csv', index=False)
df.to_csv('cs421_assgn4_summ_results_distillbart.csv', index=False)