# Abstractive Text Summarization Using Pegasus
_By: Ling Li Ya_

References:
1. [Exploring Pegasus - A New Text Summarization NLP Model](https://signal.onepointltd.com/post/102ghb9/exploring-pegasus-a-new-text-summarization-nlp-model)
2. [Notebook referred to prepare this notebook](https://colab.research.google.com/drive/1-zq8AJktuC3gQAHTuSiiZ_qvDl4wK7rq#scrollTo=S3PYeeGuda0m)

## 1. Install and Import Dependencies

In [17]:
!pip install sentencepiece



In [18]:
!pip install transformers



In [1]:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer, pipeline
from bs4 import BeautifulSoup
import torch



## 2. Create Model

In [2]:
# model_name = 'google/pegasus-xsum'
# torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
# tokenizer = PegasusTokenizer.from_pretrained(model_name)
# model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [3]:
model_name = 'google/pegasus-xsum'
torch_device = 'cuda'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name)

In [4]:
src_text = [
    """Rococo (/rəˈkoʊkoʊ/, also US: /ˌroʊkəˈkoʊ/), less commonly Roccoco or Late Baroque, is an exceptionally ornamental and theatrical style of architecture, art and decoration which combines asymmetry, scrolling curves, gilding, white and pastel colors, sculpted molding, and trompe-l'œil frescoes to create surprise and the illusion of motion and drama. It is often described as the final expression of the Baroque movement.[1] The Rococo style began in France in the 1730s as a reaction against the more formal and geometric Style Louis XIV. It was known as the style rocaille, or rocaille style.[2] It soon spread to other parts of Europe, particularly northern Italy, Austria, southern Germany, Central Europe and Russia.[3] It also came to influence the other arts, particularly sculpture, furniture, silverware, glassware, painting, music, and theatre.[4] Although originally a secular style primarily used for interiors of private residences the Rococo had a spiritual aspect to it which led to its widespread use in church interiors, particularly in Central Europe, Portugal, and South America.[5]""", """The word rococo was first used as a humorous variation of the word rocaille.[6][7] Rocaille was originally a method of decoration, using pebbles, seashells and cement, which was often used to decorate grottoes and fountains since the Renaissance.[8][9] In the late 17th and early 18th century rocaille became the term for a kind of decorative motif or ornament that appeared in the late Style Louis XIV, in the form of a seashell interlaced with acanthus leaves. In 1736 the designer and jeweler Jean Mondon published the Premier Livre de forme rocquaille et cartel, a collection of designs for ornaments of furniture and interior decoration. It was the first appearance in print of the term "rocaille" to designate the style.[10] The carved or molded seashell motif was combined with palm leaves or twisting vines to decorate doorways, furniture, wall panels and other architectural elements.[11]"""
]

In [5]:
# batch = tokenizer.prepare_seq2seq_batch(src_text, truncation=True, padding='longest').to(torch_device)
# translated = model.generate(**batch)
# tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
# print(tgt_text)

In [7]:
summarizer = pipeline('summarization', model=model, tokenizer=tokenizer)
res = summarizer(src_text[0], min_length=40, max_length=60)

To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  ..\aten\src\ATen\native\BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


Format the `dict` object into a `string`.

In [8]:
summary = ''
for result in res:
    summary += ''.join(str(val.capitalize()) + "\n" for _, val in result.items())

summary = summary.replace(' .', '.')
summary = summary.replace(" !", "!")
summary = summary.replace(" ?", "?")

Some statistics and the final result.

In [16]:
words_after = len(summary.split(' '))
words_before = len(src_text[0].split(' '))
reduced_by = (words_before - words_after) / words_before * 100

print("Number of words in summary: " + str(words_after))
print("Number of words in original article: " + str(words_before))
print("Reduced by: " + str(round(reduced_by, 2)) + "%\n")
print(summary)

Number of words in summary: 35
Number of words in original article: 163
Reduced by: 78.53%

The style is often described as the final expression of the baroque movement and is often described as the final expression of the style rocaille (/rkoko/, also us: /rokko/), less commonly roccoco or late baroque,

