During experimentation, I observed that the ROUGE score of the fine-tuned T5 summarization model consistently plateaued around 37, despite multiple training attempts. This behavior suggests a limitation of the model–metric setup rather than a deficiency in summary quality.
Since ROUGE, while not fully representative of semantic quality, is still used as a primary evaluation indicator, I explored possible strategies to improve robustness. Inspired by ensemble methods commonly reported in the text summarization literature, I proposed an ensemble-based approach combining T5 and BART.
Due to hardware memory constraints, a full ensemble implementation was not feasible. Instead, I implemented a proof of concept on a single news article to validate the idea. Rather than selecting summaries based on ROUGE scores—which is computationally expensive and impractical during inference—I adopted a length-based selection heuristic, choosing the longer summary under the assumption that longer outputs preserve more semantic content and tend to correlate with higher ROUGE overlap.
This approach was designed as a memory-efficient and scalable alternative, allowing future experimentation if additional computational resources become available, while maintaining a balance between performance and efficiency.

In [None]:
!pip install -q transformers datasets evaluate rouge_score accelerate

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone


In [None]:
!pip install transformers datasets

!pip install accelerate -U

import os

os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"


!pip install transformers[torch]

!pip install rouge

from transformers import T5ForConditionalGeneration, T5Tokenizer, Trainer, TrainingArguments

from datasets import load_dataset

Collecting rouge
  Downloading rouge-1.0.1-py3-none-any.whl.metadata (4.1 kB)
Downloading rouge-1.0.1-py3-none-any.whl (13 kB)
Installing collected packages: rouge
Successfully installed rouge-1.0.1


In [None]:
dataset = load_dataset("cnn_dailymail", "3.0.0")


model_name = "t5-base"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

3.0.0/train-00000-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

3.0.0/train-00001-of-00003.parquet:   0%|          | 0.00/257M [00:00<?, ?B/s]

3.0.0/train-00002-of-00003.parquet:   0%|          | 0.00/259M [00:00<?, ?B/s]

3.0.0/validation-00000-of-00001.parquet:   0%|          | 0.00/34.7M [00:00<?, ?B/s]

3.0.0/test-00000-of-00001.parquet:   0%|          | 0.00/30.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/287113 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/13368 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/11490 [00:00<?, ? examples/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/257 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 287113
    })
    validation: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 13368
    })
    test: Dataset({
        features: ['article', 'highlights', 'id'],
        num_rows: 11490
    })
})

In [None]:
def preprocess_function(examples):

   inputs = [doc for doc in examples['article']]

   model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")


   labels = tokenizer(examples['highlights'], max_length=128, truncation=True, padding="max_length")

   model_inputs["labels"] = labels["input_ids"]

   return model_inputs

encoded_dataset = dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/287113 [00:00<?, ? examples/s]

Map:   0%|          | 0/13368 [00:00<?, ? examples/s]

Map:   0%|          | 0/11490 [00:00<?, ? examples/s]

In [None]:
train_dataset = encoded_dataset["train"].shuffle(seed=42).select(range(2000))

test_dataset = encoded_dataset["validation"].shuffle(seed=42).select(range(1000))

In [None]:

from transformers import TrainingArguments, Trainer

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=3e-4,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=3,
    weight_decay=0.01,
    save_total_limit=3,
    report_to="none"
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,

)

# Train the model
trainer.train()

Step,Training Loss
500,1.053061
1000,0.754733
1500,0.600444


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

TrainOutput(global_step=1500, training_loss=0.8027460327148438, metrics={'train_runtime': 1393.7357, 'train_samples_per_second': 4.305, 'train_steps_per_second': 1.076, 'total_flos': 3653747343360000.0, 'train_loss': 0.8027460327148438, 'epoch': 3.0})

In [None]:
trainer.evaluate()


{'eval_loss': 1.0850255489349365,
 'eval_runtime': 66.9662,
 'eval_samples_per_second': 14.933,
 'eval_steps_per_second': 3.733,
 'epoch': 3.0}

In [None]:
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model.to(device)

def generate_summary_batch(batch):
   with torch.no_grad():
    input_ids = tokenizer(batch["article"], padding=True, truncation=True, max_length=512, return_tensors="pt").to(device)
    output = model.generate(
        input_ids["input_ids"],
        max_length=150,
        num_beams=5,
        temperature=0.7,
        top_k=50,
        top_p=0.95,
        do_sample=True,
        early_stopping=True
    )

    summaries = tokenizer.batch_decode(output, skip_special_tokens=True)
    torch.cuda.empty_cache()

    return {"summary": summaries}

summaries = test_dataset.map(generate_summary_batch, batched=True, batch_size=8)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
from rouge import Rouge

def calculate_rouge(reference_list,generated_list):
    rouge=Rouge()
    scores=rouge.get_scores(generated_list,reference_list)
    rouge_1=sum(score['rouge-1']['f'] for score in scores)/len(scores)
    rouge_2=sum(score['rouge-2']['f'] for score in scores)/len(scores)
    rouge_l=sum(score['rouge-l']['f'] for score in scores)/len(scores)
    return rouge_1,rouge_2,rouge_l

# Initialize lists to store reference and generated summaries

reference_summaries = [example["highlights"] for example in test_dataset]
generated_summaries = [example["summary"] for example in summaries]

# Calculate ROUGE scores

rouge_1, rouge_2, rouge_l = calculate_rouge(reference_summaries,generated_summaries)

print("Average ROUGE-1:", rouge_1)
print("Average ROUGE-2:", rouge_2)

print("Average ROUGE-L:", rouge_l)

Average ROUGE-1: 0.3972270465694003
Average ROUGE-2: 0.1848670426897062
Average ROUGE-L: 0.3757243550218815


In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration, BartForConditionalGeneration, BartTokenizer

original_text = """
Jarryd Hayne's move to the NFL is a boost for rugby league in the United States, it has been claimed.
The Australia international full-back or centre quit the National Rugby League in October to try his luck in American football
and was this week given a three-year contract with the San Francisco 49ers.
Peter Illfield, chairman of US Association of Rugby League, said: 'Jarryd, at 27, is one of the most gifted and talented rugby league players in Australia.
He is an extraordinary athlete. His three-year deal with the 49ers, as an expected running back, gives the USA Rugby League a connection with the American football lover like never before.'
"""
t5_model_name = "t5-small"
t5_tokenizer = T5Tokenizer.from_pretrained(t5_model_name)
t5_model = T5ForConditionalGeneration.from_pretrained(t5_model_name)

bart_model_name = "facebook/bart-large-cnn"
bart_tokenizer = BartTokenizer.from_pretrained(bart_model_name)
bart_model = BartForConditionalGeneration.from_pretrained(bart_model_name)

def generate_t5_summary(text):
    inputs = t5_tokenizer("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
    summary_ids = t5_model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True)
    return t5_tokenizer.decode(summary_ids[0], skip_special_tokens=True)

def generate_bart_summary(text):
    inputs = bart_tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    summary_ids = bart_model.generate(inputs["input_ids"], max_length=128, num_beams=4, early_stopping=True)
    return bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)

def ensemble_summary(text):
    t5_summary = generate_t5_summary(text)

    bart_summary = generate_bart_summary(text)

    return t5_summary if len(t5_summary) > len(bart_summary) else bart_summary

ensemble_summary_text = ensemble_summary(original_text)
print("Ensemble Summary:", ensemble_summary_text)

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/131 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Please make sure the generation config includes `forced_bos_token_id=0`. 


Loading weights:   0%|          | 0/511 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Ensemble Summary: Jarryd Hayne's move to the NFL is a boost for rugby league in the United States, it has been claimed. The Australia international full-back or centre quit the National Rugby League in October to try his luck in American football. The 27-year-old was this week given a three-year contract with the San Francisco 49ers.


In [None]:
pip install gensim

Collecting gensim
  Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (8.4 kB)
Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (27.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m57.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gensim
Successfully installed gensim-4.4.0


In [None]:
pip install gensim



In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:

import pandas as pd
from gensim import corpora, models, similarities
from gensim.parsing.preprocessing import preprocess_string

def gensim_textrank_summarizer(text, top_n=3):

    try:
        sentences = text.split('. ')
        if len(sentences) <= top_n: return text

        processed_sentences = [preprocess_string(s) for s in sentences]

        dictionary = corpora.Dictionary(processed_sentences)
        corpus = [dictionary.doc2bow(doc) for doc in processed_sentences]

        tfidf = models.TfidfModel(corpus)
        index = similarities.MatrixSimilarity(tfidf[corpus],
    num_features=len(dictionary))

        sentence_ranks = []
        for i in range(len(corpus)):
            sims = index[tfidf[corpus[i]]]
            sentence_ranks.append(sum(sims))

        top_indices = sorted(range(len(sentence_ranks)), key=lambda i: sentence_ranks[i], reverse=True)[:top_n]
        top_indices.sort()

        summary = ". ".join([sentences[i].strip() for i in top_indices])
        return summary + "."
    except Exception as e:
        return text[:300]


In [None]:

dataset_subset = dataset['validation'].select(range(1000))

def apply_gensim(example):
    try:
        example['gensim_extractive'] = gensim_textrank_summarizer(example['article'])
    except:
        example['gensim_extractive'] = ""
    return example

dataset_subset = dataset_subset.map(apply_gensim)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:

import pandas as pd

df = pd.DataFrame(dataset_subset)

display_df = df[['article', 'highlights', 'gensim_extractive']].head(50)


display_df

Unnamed: 0,article,highlights,gensim_extractive
0,"(CNN)Share, and your gift will be multiplied. ...",Zully Broussard decided to give a kidney to a ...,"""The ages of the donors and recipients range f..."
1,"(CNN)On the 6th of April 1996, San Jose Clash ...",The 20th MLS season begins this weekend .\nLea...,"Then there's the way the league develops, attr..."
2,"(CNN)French striker Bafetimbi Gomis, who has a...",Bafetimbi Gomis collapses within 10 minutes of...,"(CNN)French striker Bafetimbi Gomis, who has a..."
3,(CNN)It was an act of frustration perhaps more...,Rory McIlroy throws club into water at WGC Cad...,McIlroy composed himself to finish with a seco...
4,(CNN)A Pennsylvania community is pulling toget...,"Cayman Naib, 13, hasn't been heard from since ...","The parents of Cayman Naib, 13, have been comm..."
5,(CNN)My vote for Father of the Year goes to Cu...,Ruben Navarrette: Schilling deserves praise fo...,What was said about 17-year-old Gabby Schillin...
6,"(CNN)Another one for the ""tourists behaving ba...",Two American women arrested for carving initia...,Two American women have reportedly been arrest...
7,(CNN)Following last year's successful U.K. tou...,It will be a first time for the tour stateside...,"tour, Prince and 3rdEyeGirl are bringing the H..."
8,(CNN)A shooting at a bar popular with expatria...,A jihadist group claims responsibility in an a...,(CNN)A shooting at a bar popular with expatria...
9,(CNN)Manchester United defender Jonny Evans an...,Alleged incident happened in match at St James...,Both Evans and Cisse released statements the d...
