<a href="https://colab.research.google.com/github/springboardmentor0327/Text_Summarization_Infosys_Internship_Oct2024/blob/BandariRohith/abstractive_summarization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Setup

In [None]:
!pip install transformers datasets rouge-score nltk
!pip install evaluate


Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3


Importing Necessary Libraries

In [None]:
# Importing necessary libraries
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration, BartTokenizer, BartForConditionalGeneration
from datasets import load_dataset
from evaluate import load  # This is the updated import for metrics like ROUGE
import nltk
nltk.download('punkt')

# Load the ROUGE metric using evaluate
rouge = load('rouge')

# Function to evaluate predictions using ROUGE
def evaluate_summary(predictions, references):
    rouge_output = rouge.compute(predictions=predictions, references=references)
    return rouge_output


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [None]:
rouge = load('rouge')


Loading the Datasets

In [None]:
# Load CNN DailyMail Dataset
dataset = load_dataset("cnn_dailymail", '3.0.0', split='test')
print(dataset[0])  # Print the first item in the test dataset


{'article': '(CNN)The Palestinian Authority officially became the 123rd member of the International Criminal Court on Wednesday, a step that gives the court jurisdiction over alleged crimes in Palestinian territories. The formal accession was marked with a ceremony at The Hague, in the Netherlands, where the court is based. The Palestinians signed the ICC\'s founding Rome Statute in January, when they also accepted its jurisdiction over alleged crimes committed "in the occupied Palestinian territory, including East Jerusalem, since June 13, 2014." Later that month, the ICC opened a preliminary examination into the situation in Palestinian territories, paving the way for possible war crimes investigations against Israelis. As members of the court, Palestinians may be subject to counter-charges as well. Israel and the United States, neither of which is an ICC member, opposed the Palestinians\' efforts to join the body. But Palestinian Foreign Minister Riad al-Malki, speaking at Wednesday

Loading Pre-trained BART and T5 Models

For BART:

In [None]:
bart_model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
bart_tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



For T5:

In [None]:
t5_model = T5ForConditionalGeneration.from_pretrained('t5-base')
t5_tokenizer = T5Tokenizer.from_pretrained('t5-base')


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


 Summarization Functions

BART Summarization Function:

In [None]:
def summarize_bart(text):
    inputs = bart_tokenizer([text], max_length=1024, return_tensors='pt', truncation=True)
    summary_ids = bart_model.generate(inputs['input_ids'], num_beams=4, max_length=150, early_stopping=True)
    summary = bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


T5 Summarization Function:

In [None]:
def summarize_t5(text):
    inputs = t5_tokenizer("summarize: " + text, return_tensors='pt', max_length=512, truncation=True)
    summary_ids = t5_model.generate(inputs['input_ids'], num_beams=4, max_length=150, early_stopping=True)
    summary = t5_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary


Evaluating Using ROUGE

In [None]:
from evaluate import load  # Import the evaluate library

# Load the ROUGE metric using evaluate
rouge = load('rouge')

# Function to evaluate predictions using ROUGE
def evaluate_summary(predictions, references):
    rouge_output = rouge.compute(predictions=predictions, references=references)
    return rouge_output


Testing on CNN/DailyMail Dataset

In [None]:
# Take a small sample of the dataset
sample = dataset.select([0])  # Select a specific article for demo

# Get the original text and summary
original_text = sample['article'][0]
reference_summary = sample['highlights'][0]

# Generate BART and T5 summaries
bart_summary = summarize_bart(original_text)
t5_summary = summarize_t5(original_text)

# Evaluate using ROUGE
print("BART Summary:\n", bart_summary)
print("T5 Summary:\n", t5_summary)

# ROUGE Evaluation
bart_rouge = evaluate_summary([bart_summary], [reference_summary])
t5_rouge = evaluate_summary([t5_summary], [reference_summary])

print("BART ROUGE Scores:", bart_rouge)
print("T5 ROUGE Scores:", t5_rouge)


BART Summary:
 The Palestinian Authority becomes the 123rd member of the International Criminal Court. The move gives the court jurisdiction over alleged crimes in Palestinian territories. Israel and the United States opposed the Palestinians' efforts to join the body. But Palestinian Foreign Minister Riad al-Malki said it was a move toward greater justice.
T5 Summary:
 the formal accession was marked by a ceremony at The Hague, in the Netherlands. the ICC opened a preliminary examination into the situation in the occupied territories. as members of the court, Palestinians may be subject to counter-charges.
BART ROUGE Scores: {'rouge1': 0.441860465116279, 'rouge2': 0.30952380952380953, 'rougeL': 0.39534883720930236, 'rougeLsum': 0.39534883720930236}
T5 ROUGE Scores: {'rouge1': 0.2191780821917808, 'rouge2': 0.028169014084507043, 'rougeL': 0.1643835616438356, 'rougeLsum': 0.2191780821917808}
