# Natural Language Processing (NLP) Approaches for Text Summarization

There are two main approaches in NLP for text summarization. They are- Extractive Summarization and Abstractive Summarization. \
**Extractive Summarization** is the process of extracting most important sentences from original text by scoring them and picking the high socred sentences. \
**Abstratctive Summarization** generate new sentences keeping the main ideas from original sentences.


# Extractive Summarization

## Install

In [1]:
# !pip install nltk
# !pip install numpy
# !pip install pandas
# !pip install transformers
# !pip install sentencepiece

## Import Libraries

In [2]:
import nltk
import numpy as np
import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from heapq import nlargest

## Load Text

In [3]:
text = "The old house at the end of the street had always been shrouded in mystery. The rumors were that it was once owned by a powerful witch who practiced dark magic. But most people dismissed it as a silly superstition. One night, a group of curious teenagers decided to break into the house to see if the rumors were true. As they searched the dusty rooms, they found strange symbols etched into the walls and floor. Suddenly, they heard a cackling laughter coming from somewhere deep within the house. As they followed the sound, they stumbled upon a secret room filled with potions, herbs, and ancient books on witchcraft. Suddenly, the door slammed shut behind them, and they were trapped. One by one, the teenagers began to disappear, taken by an unseen force. They could hear whispers and chants coming from the shadows, and they knew they were not alone. In a desperate attempt to escape, the remaining teenagers tried to break down the door, but it was too strong. As they huddled in the corner, they saw a figure emerge from the darkness. It was the witch, more terrifying than they could have ever imagined. With a wave of her hand, she cast a spell that sent them all into a deep sleep. When they awoke, they were outside the house, unharmed but forever changed by their encounter with the witch. From that day forward, they never spoke of what happened in the old house, but they could never forget the feeling of dread that came with the whispers of witchcraft. They knew that the witch was still out there, waiting for her next victims to wander into her grasp."

In [4]:
text

'The old house at the end of the street had always been shrouded in mystery. The rumors were that it was once owned by a powerful witch who practiced dark magic. But most people dismissed it as a silly superstition. One night, a group of curious teenagers decided to break into the house to see if the rumors were true. As they searched the dusty rooms, they found strange symbols etched into the walls and floor. Suddenly, they heard a cackling laughter coming from somewhere deep within the house. As they followed the sound, they stumbled upon a secret room filled with potions, herbs, and ancient books on witchcraft. Suddenly, the door slammed shut behind them, and they were trapped. One by one, the teenagers began to disappear, taken by an unseen force. They could hear whispers and chants coming from the shadows, and they knew they were not alone. In a desperate attempt to escape, the remaining teenagers tried to break down the door, but it was too strong. As they huddled in the corner, 

## Text Preprocessing

In [5]:
# nltk.download('stopwords')

In [6]:
# nltk.download('punkt')

#### Removing the stopwords and punctuation

In [7]:
stop_words = set(stopwords.words('english'))
words = word_tokenize(text.lower())
filtered_words = [word for word in words if word not in stop_words and word.isalnum()]

## Sentence Score

In [8]:
sentences = sent_tokenize(text)
sentence_scores = {}
for sentence in sentences:
    for word in word_tokenize(sentence.lower()):
        if word in filtered_words:
            if sentence not in sentence_scores:
                sentence_scores[sentence] = 1
            else:
                sentence_scores[sentence] += 1

In [9]:
sentences

['The old house at the end of the street had always been shrouded in mystery.',
 'The rumors were that it was once owned by a powerful witch who practiced dark magic.',
 'But most people dismissed it as a silly superstition.',
 'One night, a group of curious teenagers decided to break into the house to see if the rumors were true.',
 'As they searched the dusty rooms, they found strange symbols etched into the walls and floor.',
 'Suddenly, they heard a cackling laughter coming from somewhere deep within the house.',
 'As they followed the sound, they stumbled upon a secret room filled with potions, herbs, and ancient books on witchcraft.',
 'Suddenly, the door slammed shut behind them, and they were trapped.',
 'One by one, the teenagers began to disappear, taken by an unseen force.',
 'They could hear whispers and chants coming from the shadows, and they knew they were not alone.',
 'In a desperate attempt to escape, the remaining teenagers tried to break down the door, but it was to

In [10]:
len(sentence_scores)

17

## High Score Sentences

In [11]:
sentence_scores

{'The old house at the end of the street had always been shrouded in mystery.': 7,
 'The rumors were that it was once owned by a powerful witch who practiced dark magic.': 7,
 'But most people dismissed it as a silly superstition.': 4,
 'One night, a group of curious teenagers decided to break into the house to see if the rumors were true.': 11,
 'As they searched the dusty rooms, they found strange symbols etched into the walls and floor.': 9,
 'Suddenly, they heard a cackling laughter coming from somewhere deep within the house.': 9,
 'As they followed the sound, they stumbled upon a secret room filled with potions, herbs, and ancient books on witchcraft.': 12,
 'Suddenly, the door slammed shut behind them, and they were trapped.': 6,
 'One by one, the teenagers began to disappear, taken by an unseen force.': 8,
 'They could hear whispers and chants coming from the shadows, and they knew they were not alone.': 8,
 'In a desperate attempt to escape, the remaining teenagers tried to br

## Print Summarization

In [12]:
num_sentences = 5
summary_sentences = nlargest(num_sentences, sentence_scores, key=sentence_scores.get)
summary = ' '.join(summary_sentences)
print('Original\n', text)
print('\nSummary\n',summary)

Original
 The old house at the end of the street had always been shrouded in mystery. The rumors were that it was once owned by a powerful witch who practiced dark magic. But most people dismissed it as a silly superstition. One night, a group of curious teenagers decided to break into the house to see if the rumors were true. As they searched the dusty rooms, they found strange symbols etched into the walls and floor. Suddenly, they heard a cackling laughter coming from somewhere deep within the house. As they followed the sound, they stumbled upon a secret room filled with potions, herbs, and ancient books on witchcraft. Suddenly, the door slammed shut behind them, and they were trapped. One by one, the teenagers began to disappear, taken by an unseen force. They could hear whispers and chants coming from the shadows, and they knew they were not alone. In a desperate attempt to escape, the remaining teenagers tried to break down the door, but it was too strong. As they huddled in the

# Abstractive Summarization

## Import

In [13]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

## Load the T5 tokenizer and model

In [14]:
tokenizer = T5Tokenizer.from_pretrained('t5-large')
model = T5ForConditionalGeneration.from_pretrained('t5-large')

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


## Generate summarization of the text

In [15]:
inputs = tokenizer.encode("summarize: " + text, return_tensors='pt', max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4) #early_stopping=True
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

## Print Summarization

In [16]:
print("Original Document:")
print(text)
print("\nGenerated Summary:")
print(summary)

Original Document:
The old house at the end of the street had always been shrouded in mystery. The rumors were that it was once owned by a powerful witch who practiced dark magic. But most people dismissed it as a silly superstition. One night, a group of curious teenagers decided to break into the house to see if the rumors were true. As they searched the dusty rooms, they found strange symbols etched into the walls and floor. Suddenly, they heard a cackling laughter coming from somewhere deep within the house. As they followed the sound, they stumbled upon a secret room filled with potions, herbs, and ancient books on witchcraft. Suddenly, the door slammed shut behind them, and they were trapped. One by one, the teenagers began to disappear, taken by an unseen force. They could hear whispers and chants coming from the shadows, and they knew they were not alone. In a desperate attempt to escape, the remaining teenagers tried to break down the door, but it was too strong. As they huddl