<a href="https://colab.research.google.com/github/gcosma/COP509/blob/main/Tutorials/Tutorial7Summarization_with_user_pasted_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Abstractive summarisation models

In [1]:
# @title Step 1: Give the paragraph you want to summarise.
# concatenated_texts_list = A number of the messages are from people who are facing up to their own worries about cancer. Rebecca Stead from Macmillan Cancer Support told BBC News. Hearing the news that you have cancer is a huge moment in anyone's life and there is no right or wrong way to respond. Going through waves of different emotions is completely normal.We do know, however, that many people will experience uncertainty or worry.This could be about practical matters such as paying the mortgage or being confused by the treatment being given, she says, urging people to get advice.The King's openness about having cancer has also been praised as helping remove taboos around the disease. This could be about practical matters such as paying the mortgage or being confused by the treatment being given, she says, urging people to get advice. According to Macmillan's, the King's public acknowledgement of his cancer prompted a surge in people seeking information.
concatenated_texts_list = input("Paste your data: ")
# print(concatenated_texts_list)

Paste your data: How does an authoritarian regime die? As Ernest Hemingway famously said about going broke – gradually then suddenly.  The protesters in Iran and their supporters abroad were hoping that the Islamic regime in Tehran was at the suddenly stage. The signs are, if it is dying, it is still at gradual.  The last two weeks of unrest add up to a big crisis for the regime. Iranian anger and frustration have exploded into the streets before, but the latest explosion comes on top of all the military blows inflicted on Iran in the last two years by the US and Israel.  But more significant for hard-pressed Iranians struggling to feed their families has been the impact of sanctions.  In the latest blow for the Iranian economy, all the UN sanctions lifted under the now dead 2015 nuclear deal were reimposed by the UK, Germany and France in September. In 2025 food price inflation was more than 70%. The currency, the rial, reached a record low in December.  While the Iranian regime is un

## Model 1: primera

In [2]:
# @title Load model and summarise
# Importing necessary modules
from IPython.display import clear_output
from transformers import (
    AutoTokenizer,
    LEDForConditionalGeneration,
    LEDConfig,
)

import torch
# Initializing variables
TOKENIZER = AutoTokenizer.from_pretrained("allenai/PRIMERA-multinews")
CONFIG = LEDConfig.from_pretrained("allenai/PRIMERA-multinews")
MODEL = LEDForConditionalGeneration.from_pretrained("allenai/PRIMERA-multinews", config=CONFIG)
# MODEL.gradient_checkpointing_enable()
PAD_TOKEN_ID = TOKENIZER.pad_token_id
DOCSEP_TOKEN_ID = TOKENIZER.convert_tokens_to_ids("<doc-sep>")

# Use a pipeline as a high-level helper
import torch
from transformers import pipeline

pipe = pipeline(
    task = "text2text-generation",
    model = MODEL,
    tokenizer = TOKENIZER,
    torch_dtype=torch.bfloat16,
)

# Use model
result = pipe(
    concatenated_texts_list,
    use_cache = True,
    min_length = 128, # It is recommended to have it at least around 64
    num_beams = 5,
    max_length = 1024, # Has to be at least higher than min_length
    pad_token_id = TOKENIZER.pad_token_id,
    bos_token_id = TOKENIZER.bos_token_id,
    eos_token_id = TOKENIZER.eos_token_id,
    do_sample=True, # Only necessary to enable if want to use temperature or top_p parameters
    temperature=0.1, # Will control 'randomness' of the answer
    top_p=0.3
    )

clear_output()
print(result)

[{'generated_text': '– Ernest Hemingway famously said that authoritarian regimes die by going broke—gradually, then suddenly. The signs are that Iran\'s Islamic regime is not about to suddenly die, but is instead in "a big crisis for the regime" after two weeks of unrest, writes Jonathan Chait at the Guardian. Food price inflation is at more than 70%, the rial is at a record low, and all UN sanctions lifted under the 2015 nuclear deal are back in place. "Iranian anger and frustration have exploded into the streets before, but the latest explosion comes on top of all the military blows inflicted on Iran in the last two years by the US and Israel," writes Chait. "The signs are, if it is dying, it is still at gradual."'}]


## Model 2: BRIO Model

In [3]:
# @title Load model and summarise
from transformers import BartTokenizer, PegasusTokenizer
from transformers import BartForConditionalGeneration, PegasusForConditionalGeneration

IS_CNNDM = True # whether to use CNNDM dataset (BART-base) or XSum dataset (PEGASUS-base)
LOWER = False

# Load our model checkpoints
if IS_CNNDM:
    model = BartForConditionalGeneration.from_pretrained('Yale-LILY/brio-cnndm-uncased')
    tokenizer = BartTokenizer.from_pretrained('Yale-LILY/brio-cnndm-uncased')
else:
    model = PegasusForConditionalGeneration.from_pretrained('Yale-LILY/brio-xsum-cased')
    tokenizer = PegasusTokenizer.from_pretrained('Yale-LILY/brio-xsum-cased')

max_length = 1024 if IS_CNNDM else 512


# Tokenize the text
input_ids = tokenizer.encode(concatenated_texts_list, return_tensors='pt')

# Generate summary with the model
summary_ids = model.generate(input_ids, max_length=max_length)

# Decode the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

clear_output()
# Print the summary
print(summary)

Iranian regime is under huge pressure but evidence shows it's not about to die. The signs are that the Islamic regime in Tehran is not going to collapse suddenly. The last two weeks of unrest add up to a big crisis for the regime. But the signs are gradual and the security forces remain loyal.


## Model 4:  EFactsum

In [4]:
# @title Load model and summarise
from transformers import BartTokenizer, PegasusTokenizer
from transformers import BartForConditionalGeneration, PegasusForConditionalGeneration

IS_CNNDM = True
max_length = 1024 if IS_CNNDM else 512

if IS_CNNDM:
    model = BartForConditionalGeneration.from_pretrained('tanay/efactsum-bart-cnndm')
    tokenizer = BartTokenizer.from_pretrained('tanay/efactsum-bart-cnndm')
else:
    model = PegasusForConditionalGeneration.from_pretrained('tanay/efactsum-pegasus-xsum')
    tokenizer = PegasusTokenizer.from_pretrained('tanay/efactsum-pegasus-xsum')

# Tokenize the text
input_ids = tokenizer.encode(concatenated_texts_list, return_tensors='pt')

# Generate summary with the model
summary_ids = model.generate(input_ids, max_length=max_length)

# Decode the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)


clear_output()
# Print the summary
print(summary)


the last two weeks of unrest add up to a big crisis for the Iranian regime . the latest explosion comes on top of all the military blows inflicted on Iran in the last two years by the US and Israel . more significant for hard-pressed Iranians struggling to feed their families has been the impact of sanctions .


# Extractive summarization methods

## Model 1:  BERTSUM

In [13]:
# @title load the model and summarise
%cd /content/
!pip install -q bert-extractive-summarizer
from summarizer import Summarizer
model = Summarizer()
# Summarize the text
summary = model(concatenated_texts_list, num_sentences=3)

clear_output()
# Print the summary
print("- " + summary + "\n")

- As Ernest Hemingway famously said about going broke – gradually then suddenly. The protesters in Iran and their supporters abroad were hoping that the Islamic regime in Tehran was at the suddenly stage. In the latest blow for the Iranian economy, all the UN sanctions lifted under the now dead 2015 nuclear deal were reimposed by the UK, Germany and France in September.



## Model 2:  Sbert

In [None]:
# @title load the model and summarise (SBERT)
%cd /content/
!pip install -q sentence-transformers bert-extractive-summarizer

from summarizer.sbert import SBertSummarizer
from IPython.display import clear_output

model = SBertSummarizer('paraphrase-MiniLM-L6-v2')

# IMPORTANT: SBERT expects a single string
text = " ".join(concatenated_texts_list)

# Summarize the text
summary = model(text, num_sentences=3)

clear_output()
print("- " + summary + "\n")


/content
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m899.7/899.7 MB[0m [31m466.6 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m594.3/594.3 MB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/10.2 MB[0m [31m87.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m88.0/88.0 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m954.8/954.8 kB[0m [31m60.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m706.8/706.8 MB[0m [31m722.6 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.1/193.1 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m70.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━