# **ABSTRACTIVE TEXT SUMMARIZATION USING PEGASUS**

---
*   Pegasus is a text-summarization transforfmer originally sourced from huggingface.co.
*   It provides a human-interpretation like summarization of the text considered.

*   Pegasus is a self-supervised deep learning model.
*   The basic working involves masking the main statements and ignoring the supporting statements and background information to provide compact summaries.






### **Importing the necessary libraries**

In [24]:
! pip install transformers #installing transformers library

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [25]:
! pip install sentencepiece # installing sentencepiece library to support tokenizing

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [26]:
from transformers import PegasusForConditionalGeneration, PegasusTokenizer 
import torch

### **Initializing variables with the PEGASUS model and setting up the GPA supporting cuda package**

In [27]:
summarizer = 'google/pegasus-xsum'

In [28]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

### **Setting up the Tokenizer and the Model**

In [29]:
tokenizer = PegasusTokenizer.from_pretrained(summarizer)
model = PegasusForConditionalGeneration.from_pretrained(summarizer).to(device)

### **Performing Summarization**

In [30]:
text = """
Every year the average surface temperature of the earth gets more than the previous year. 
Every year it gets warmer than the previous year. Since 1880 the average surface temperature of earth has increased by about 0.8°Celsius.
The rate of warming has been around 0.15°-0.2°Celsius per decade. 
This is a global change in the earth’s temperature and must not be confused with local changes that we experience every day, during day and night, summer and winter, etc.
The global average temperature of the earth mainly depends on the amount of heat it receives from the sun and what it radiates back into the atmosphere. 
The heat radiated back by the earth depends on the chemical composition of the atmosphere.

"""


### **Creating Tokens**

In [33]:
batch = tokenizer.prepare_seq2seq_batch(text, truncation=True, padding='longest')

In [34]:
tokens = tokenizer(text, truncation=True, padding="longest", return_tensors="pt")

In [35]:
tokens

{'input_ids': tensor([[ 2317,   232,   109,  1077,  1494,  1972,   113,   109,  2776,  1476,
           154,   197,   109,  1331,   232,   107,  2317,   232,   126,  1476,
          9061,   197,   109,  1331,   232,   107,  1685, 21666,   109,  1077,
          1494,  1972,   113,  2776,   148,  1562,   141,   160, 21358,   105,
         40553,   116, 11641,   107,   139,   872,   113,  8309,   148,   174,
           279, 67310,   105,   121, 35968,   105, 40553,   116, 11641,   446,
          3496,   107,   182,   117,   114,  1122,   411,   115,   109,  2776,
           123,   116,  1972,   111,   355,   146,   129,  6436,   122,   391,
           852,   120,   145,   306,   290,   242,   108,   333,   242,   111,
           565,   108,   922,   111,  1582,   108,   733,   107,   139,  1122,
          1077,  1972,   113,   109,  2776,  3187,  3551,   124,   109,   713,
           113,  1206,   126,  7183,   135,   109,  1796,   111,   180,   126,
         53399,   247,   190,   109,  

### **Summarize**

In [46]:
import warnings
warnings.filterwarnings('ignore')

In [47]:
summary = model.generate(**tokens) 

In [48]:
summary[0]

tensor([   0,  139, 1077, 1494, 1972,  113,  109, 2776,  148, 1562,  204,  109,
         289, 1902,  107,    1])

In [49]:
tokenizer.decode(summary[0],skip_special_tokens=True)

'The average surface temperature of the earth has increased over the last century.'