# T5 Model

In [1]:
!pip install transformers torch
from transformers import T5Tokenizer, T5ForConditionalGeneration



In [2]:
model_name = 't5-small'
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [3]:
text = """
On 24 February 2022, Russia invaded Ukraine in a major escalation of the Russo-Ukrainian War, which started in 2014.
The invasion, the largest conflict in Europe since World War II, has caused hundreds of thousands of military casualties and tens
of thousands of Ukrainian civilian casualties. As of 2024, Russian troops occupy about 20% of Ukraine. From a population of 41 million,
about 8 million Ukrainians had been internally displaced and more than 8.2 million had fled the country by April 2023, creating Europe's largest
refugee crisis since World War II.
"""

preprocessed_text = "summarize: " + text

inputs = tokenizer.encode(preprocessed_text, return_tensors="pt", max_length=512, truncation=True)

summary_ids = model.generate(inputs, max_length=100, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)

summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:\n", summary)

Summary:
 as of 2024, Russian troops occupy about 20% of Ukraine. from a population of 41 million, about 8 million Ukrainians had been internally displaced and more than 8.2 million had fled the country by April 2023.


# BART Model

In [4]:
!pip install transformers torch
from transformers import BartForConditionalGeneration, BartTokenizer



In [5]:
model_name = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(model_name)
model = BartForConditionalGeneration.from_pretrained(model_name)

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

In [6]:
text = """Extractive summarization is a text summarization technique based on identifying and separating the primary sentences or phrases in the source text to create summary.
The extractive summarization systems employ statistical algorithms and linguistic analysis to assess word frequency, sentence position, and keyword occurrence to gauge the importance of each type of textual input.
The prioritized sentences are then placed together to develop a brief, information summary.
The primary benefit of extractive summarization is its simplicity and the ability for computational deployment. Additionally, the process is relatively straight forward, as the summary is based on the pre-existing text and its extraction. However, in the operational mode, the summaries may lose interpersonal aspects and lack a wholistic context.
"""

In [10]:
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=2000, truncation=True)

In [11]:
summary_ids = model.generate(inputs, max_length=150, min_length=40, length_penalty=5.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:\n", summary)
print("Length of text:", len(text))
print("Length of the summary:", len(summary))

Summary:
  extractive summarization systems employ statistical algorithms and linguistic analysis to assess word frequency, sentence position, and keyword occurrence to gauge the importance of each type of textual input. The prioritized sentences are then placed together to develop a brief, information summary.
Length of text: 820
Length of the summary: 302


# LLM

In [12]:
from google.colab import userdata
import os
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')

In [13]:
!pip install --upgrade --quiet tiktoken langchain langchain-google-genai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.2 MB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.2 MB[0m [31m6.7 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.2/1.2 MB[0m [31m12.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [14]:
from langchain_google_genai import ChatGoogleGenerativeAI

def load_llm(model="gemini-1.5-pro"):

  if model == "gemini-1.5-pro":
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-pro",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2)
    return llm
  elif model == "gemini-1.5-flash":
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-flash",
        temperature=0,
        max_tokens=None,
        timeout=None,
        max_retries=2)
    return llm
  else:
    raise ValueError("Invalid model name")

In [31]:
from langchain_core.prompts import ChatPromptTemplate

def get_prompt_template():
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Write a concise summary of the following in {num_words} words:\\n\\n",
            ),
            ("human", "{context}")
        ]
    )
    return prompt

def summarize_text(text, num_words=50, model="gemini-1.5-flash"):
    llm = load_llm(model)
    prompt = get_prompt_template()
    chain = prompt | llm

    result = chain.invoke({
        "context": text,
        "num_words": num_words
    })

    # Post-process the result to enforce word limit
    summary = result.content
    words = summary.split()
    if len(words) > num_words:
        summary = " ".join(words[:num_words]) + "..."
    return summary

In [32]:
# Input text
text = '''In this notebook we delve into the evaluation techniques for abstractive summarization tasks using a simple example. We explore traditional evaluation methods like ROUGE and BERTScore, in addition to showcasing a more novel approach using LLMs as evaluators.
Evaluating the quality of summaries is a time-consuming process, as it involves different quality metrics such as coherence, conciseness, readability and content.
Traditional automatic evaluation metrics such as ROUGE and BERTScore and others are concrete and reliable, but they may not correlate well with the actual quality of summaries. They show relatively low correlation with human judgments, especially for open-ended generation tasks (Liu et al., 2023). There's a growing need to lean on human evaluations, user feedback, or model-based metrics while being vigilant about potential biases.
While human judgment provides invaluable insights, it is often not scalable and can be cost-prohibitive.'''

# Generate summary
summary = summarize_text(text, num_words=50, model="gemini-1.5-flash")

print(f"\nSummary: {summary}")
print(f"\nText Length: {len(text)}")
print(f"\nSummary Length: {len(summary.split())}")


Summary: This notebook explores abstractive summarization evaluation, comparing traditional metrics (ROUGE, BERTScore) with LLM-based evaluation.  While traditional methods are efficient, they often poorly correlate with human judgment.  Human evaluation, though ideal, is costly and unscalable.


Text Length: 961

Summary Length: 34


In [38]:
from langchain_core.prompts import ChatPromptTemplate

def get_prompt_template():
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Write a summary of the following in {num_words} words:\\n\\n",
            ),
            ("human", "{context}")
        ]
    )
    return prompt

def summarize_text(text, num_words=100, model="gemini-1.5-flash"):
    llm = load_llm(model)
    prompt = get_prompt_template()
    chain = prompt | llm

    result = chain.invoke({
        "context": text,
        "num_words": num_words
    })

    summary = result.content
    return summary

# Input text
text = '''In this notebook we delve into the evaluation techniques for abstractive summarization tasks using a simple example. We explore traditional evaluation methods like ROUGE and BERTScore, in addition to showcasing a more novel approach using LLMs as evaluators.
Evaluating the quality of summaries is a time-consuming process, as it involves different quality metrics such as coherence, conciseness, readability and content.
Traditional automatic evaluation metrics such as ROUGE and BERTScore and others are concrete and reliable, but they may not correlate well with the actual quality of summaries. They show relatively low correlation with human judgments, especially for open-ended generation tasks (Liu et al., 2023). There's a growing need to lean on human evaluations, user feedback, or model-based metrics while being vigilant about potential biases.
While human judgment provides invaluable insights, it is often not scalable and can be cost-prohibitive.'''

# Generate summary
summary = summarize_text(text, num_words=100, model="gemini-1.5-flash")

print(f"\nSummary: {summary}")
print(f"\nText Length: {len(text)}")
print(f"\nSummary Length: {len(summary.split())}")



Summary: This notebook examines abstractive summarization evaluation, comparing traditional methods (ROUGE, BERTScore) with a novel LLM-based approach.  Traditional metrics, while reliable and concrete, often poorly correlate with human judgments of summary quality, particularly in open-ended tasks.  Human evaluation offers superior insights but suffers from scalability and cost limitations.  The notebook highlights the need for a balanced approach, combining automatic metrics with human assessment or LLM evaluation to overcome the shortcomings of each individual method.


Text Length: 961

Summary Length: 73
