---
# Text Summarization Evaluation using BLEU, METEOR, and ROUGE Scores

This notebook demonstrates the process of summarizing a given text using the Hugging Face Transformers library and evaluating the generated summary using three popular evaluation metrics: BLEU, METEOR, and ROUGE scores. 

The notebook is organized as follows:

1. **Installation and Importing Libraries**: We begin by installing the required libraries, including the Hugging Face Transformers library and NLTK. 

2. **Loading the Summarization Pipeline**: Next, we load the summarization pipeline from the Hugging Face Transformers library.

3. **Generating a Summary**: We use the loaded pipeline to generate a summary for a given input text. The summary is generated by setting specific parameters such as `max_length`, `min_length`, `length_penalty`, and `num_beams`.

4. **Defining Evaluation Metrics**: We define the functions to calculate the evaluation scores, including BLEU, METEOR, and ROUGE scores. These metrics help us assess the quality of the generated summary by comparing it with the original text.

5. **Testing Evaluation Functions**: We test the evaluation functions by calculating the scores for a given example of original text and a manually created summary.

6. **Summarize and Evaluate**: We generate summaries for the original text with varying output lengths and evaluate them using the defined metrics. This step allows us to understand how the output length affects the evaluation scores.

By the end of this notebook, you will have a better understanding of how to generate summaries using the Hugging Face Transformers library and how to evaluate their quality using BLEU, METEOR, and ROUGE scores.

---


# install libs

In [None]:
!pip install transformers
import nltk
nltk.download('wordnet')

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.27.1-py3-none-any.whl (6.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.7/6.7 MB[0m [31m42.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m38.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.13.2-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.2/199.2 KB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.13.2 tokenizers-0.13.2 transformers-4.27.1


# import libs and load model into pipeline

In [None]:
from nltk.util import ngrams
from nltk.translate.bleu_score import corpus_bleu
from nltk.translate.meteor_score import single_meteor_score
from transformers import pipeline

# Load the summarization pipeline
summarization_pipeline = pipeline("summarization")

# Input text
text = "In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures, including the closure of non-essential businesses. Despite the challenges, some entrepreneurs have found ways to adapt and even thrive in the new environment. For example, a restaurant in Italy has started offering home delivery, while a clothing store in the United States has shifted to online sales."

# Generate a summary of the input text
summary = summarization_pipeline(text, max_length=70, min_length=30, length_penalty=2.0, num_beams=4)[0]['summary_text']

print("Original text:")
print(text)
print("\nSummary:")
print(summary)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Original text:
In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures, including the closure of non-essential businesses. Despite the challenges, some entrepreneurs have found ways to adapt and even thrive in the new environment. For example, a restaurant in Italy has started offering home delivery, while a clothing store in the United States has shifted to online sales.

Summary:
 In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures, including the closure of non-essential businesses . Despite the challenges, some entrepreneurs have found ways to adapt and thrive in the new environment .


# define scoring functions

In [None]:
def evaluation_scores(original_text, summary):
    reference_summary = original_text.split()
    generated_summary = summary.split()
    
    bleu = corpus_bleu([reference_summary], [generated_summary])
    meteor = single_meteor_score(generated_summary, reference_summary)
    original_ngrams = list(ngrams(original_text.split(), 1)) + list(ngrams(original_text.split(), 2)) + list(ngrams(original_text.split(), 3))
    summary_ngrams = list(ngrams(summary.split(), 1)) + list(ngrams(summary.split(), 2)) + list(ngrams(summary.split(), 3))
    original_ngrams = set(original_ngrams)
    summary_ngrams = set(summary_ngrams)
    
    overlap = original_ngrams & summary_ngrams
    rouge_1 = len(overlap) / len(original_ngrams)
    rouge_2 = len(overlap) / len(summary_ngrams)
    rouge_l = max(rouge_1, rouge_2)
    
    return bleu, meteor, rouge_1, rouge_2, rouge_l

# Test the function

original_text = "In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures, including the closure of non-essential businesses. Despite the challenges, some entrepreneurs have found ways to adapt and even thrive in the new environment. For example, a restaurant in Italy has started offering home delivery, while a clothing store in the United States has shifted to online sales."
summary = "Many countries have closed non-essential businesses to slow the spread of COVID-19. Some entrepreneurs have adapted and thrived, such as a restaurant in Italy offering home delivery and a clothing store in the US shifting to online sales."

bleu, meteor, rouge_1, rouge_2, rouge_l = evaluation_scores(original_text, summary)

print("BLEU score:", bleu)
print("METEOR score:", meteor)
print("ROUGE-1 score:", rouge_1)
print("ROUGE-2 score:", rouge_2)
print("ROUGE-L score:", rouge_l)


BLEU score: 8.726094729337945e-232
METEOR score: 0.6741036650012007
ROUGE-1 score: 0.24431818181818182
ROUGE-2 score: 0.4095238095238095
ROUGE-L score: 0.4095238095238095


# summarize and evaluate

## default pipeline

In [None]:
original_text = "In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures, including the closure of non-essential businesses. Despite the challenges, some entrepreneurs have found ways to adapt and even thrive in the new environment. For example, a restaurant in Italy has started offering home delivery, while a clothing store in the United States has shifted to online sales."

In [None]:
summary = summarization_pipeline(original_text, max_length=70, min_length=30, length_penalty=2.0, num_beams=4)[0]['summary_text']

In [None]:
bleu, meteor, rouge_1, rouge_2, rouge_l = evaluation_scores(original_text, summary)
print("BLEU score:", bleu)
print("METEOR score:", meteor)
print("ROUGE-1 score:", rouge_1)
print("ROUGE-2 score:", rouge_2)
print("ROUGE-L score:", rouge_l)

BLEU score: 7.199666163340923e-232
METEOR score: 0.8199542154975648
ROUGE-1 score: 0.5454545454545454
ROUGE-2 score: 0.8495575221238938
ROUGE-L score: 0.8495575221238938


In [None]:
summary

' In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures, including the closure of non-essential businesses . Despite the challenges, some entrepreneurs have found ways to adapt and thrive in the new environment .'

In [None]:
print(f"current size is {round(len(summary) / len(original_text) * 100, 1)}% of original")

current size is 64.6% of original


## even less output length

In [None]:
summary = summarization_pipeline(original_text, max_length=30, min_length=15, length_penalty=2.0, num_beams=4)[0]['summary_text']

In [None]:
bleu, meteor, rouge_1, rouge_2, rouge_l = evaluation_scores(original_text, summary)
print("BLEU score:", bleu)
print("METEOR score:", meteor)
print("ROUGE-1 score:", rouge_1)
print("ROUGE-2 score:", rouge_2)
print("ROUGE-L score:", rouge_l)

BLEU score: 8.510469113101058e-232
METEOR score: 0.6307139807079587
ROUGE-1 score: 0.2840909090909091
ROUGE-2 score: 0.847457627118644
ROUGE-L score: 0.847457627118644


In [None]:
summary

' In an effort to help slow the spread of COVID-19, many countries have implemented social distancing measures . Despite the challenges,'

In [None]:
print(f"current size is {round(len(summary) / len(original_text) * 100, 1)}% of original")

current size is 32.8% of original
