In [1]:
## input text article
article_text="A litany of text summarization methods have been developed over the last several decades, so answering how text summarization works doesn’t have a single answer. This having been said, these methods can be classified according to their general approaches in addressing the challenge of text summarization. Perhaps the most clear-cut and helpful distinction is that between Extractive and Abstractive text summarization methods. Extractive methods seek to extract the most pertinent information from a text. Extractive text summarization is the more traditional of the two methods, in part because of their relative simplicity compared to abstractive methods. Abstractive methods instead seek to generate a novel body of text that accurately summarizes the original text. Already we can see how this is a more difficult problem - there is a significant degree of freedom in not being limited to simply returning a subset of the original text. This difficulty comes with an upside, though. Despite their relative complexity, Abstractive methods produce much more flexible and arguably faithful summaries, especially in the age of Large Language Models. "

## Import Modules

In [2]:
import re
import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Data Preprocessing

In [3]:
article_text = article_text.lower()
article_text

'a litany of text summarization methods have been developed over the last several decades, so answering how text summarization works doesn’t have a single answer. this having been said, these methods can be classified according to their general approaches in addressing the challenge of text summarization. perhaps the most clear-cut and helpful distinction is that between extractive and abstractive text summarization methods. extractive methods seek to extract the most pertinent information from a text. extractive text summarization is the more traditional of the two methods, in part because of their relative simplicity compared to abstractive methods. abstractive methods instead seek to generate a novel body of text that accurately summarizes the original text. already we can see how this is a more difficult problem - there is a significant degree of freedom in not being limited to simply returning a subset of the original text. this difficulty comes with an upside, though. despite the

In [4]:
# remove spaces, punctuations and numbers
clean_text = re.sub('[^a-zA-Z]', ' ', article_text)
clean_text = re.sub('\s+', ' ', clean_text)
clean_text

'a litany of text summarization methods have been developed over the last several decades so answering how text summarization works doesn t have a single answer this having been said these methods can be classified according to their general approaches in addressing the challenge of text summarization perhaps the most clear cut and helpful distinction is that between extractive and abstractive text summarization methods extractive methods seek to extract the most pertinent information from a text extractive text summarization is the more traditional of the two methods in part because of their relative simplicity compared to abstractive methods abstractive methods instead seek to generate a novel body of text that accurately summarizes the original text already we can see how this is a more difficult problem there is a significant degree of freedom in not being limited to simply returning a subset of the original text this difficulty comes with an upside though despite their relative co

In [5]:
# split into sentence list
sentence_list = nltk.sent_tokenize(article_text)
sentence_list

['a litany of text summarization methods have been developed over the last several decades, so answering how text summarization works doesn’t have a single answer.',
 'this having been said, these methods can be classified according to their general approaches in addressing the challenge of text summarization.',
 'perhaps the most clear-cut and helpful distinction is that between extractive and abstractive text summarization methods.',
 'extractive methods seek to extract the most pertinent information from a text.',
 'extractive text summarization is the more traditional of the two methods, in part because of their relative simplicity compared to abstractive methods.',
 'abstractive methods instead seek to generate a novel body of text that accurately summarizes the original text.',
 'already we can see how this is a more difficult problem - there is a significant degree of freedom in not being limited to simply returning a subset of the original text.',
 'this difficulty comes with a

In [6]:
## run this cell once to download stopwords
# import nltk
# nltk.download('stopwords')

## Word Frequencies

In [7]:
stopwords = nltk.corpus.stopwords.words('english')

word_frequencies = {}
for word in nltk.word_tokenize(clean_text):
    if word not in stopwords:
        if word not in word_frequencies:
            word_frequencies[word] = 1
        else:
            word_frequencies[word] += 1

In [8]:
maximum_frequency = max(word_frequencies.values())

for word in word_frequencies:
    word_frequencies[word] = word_frequencies[word] / maximum_frequency

## Calculate Sentence Scores

In [9]:
sentence_scores = {}

for sentence in sentence_list:
    for word in nltk.word_tokenize(sentence):
        if word in word_frequencies and len(sentence.split(' ')) < 30:
            if sentence not in sentence_scores:
                sentence_scores[sentence] = word_frequencies[word]
            else:
                sentence_scores[sentence] += word_frequencies[word]

In [10]:
word_frequencies

{'litany': 0.1111111111111111,
 'text': 1.0,
 'summarization': 0.5555555555555556,
 'methods': 0.8888888888888888,
 'developed': 0.1111111111111111,
 'last': 0.1111111111111111,
 'several': 0.1111111111111111,
 'decades': 0.1111111111111111,
 'answering': 0.1111111111111111,
 'works': 0.1111111111111111,
 'single': 0.1111111111111111,
 'answer': 0.1111111111111111,
 'said': 0.1111111111111111,
 'classified': 0.1111111111111111,
 'according': 0.1111111111111111,
 'general': 0.1111111111111111,
 'approaches': 0.1111111111111111,
 'addressing': 0.1111111111111111,
 'challenge': 0.1111111111111111,
 'perhaps': 0.1111111111111111,
 'clear': 0.1111111111111111,
 'cut': 0.1111111111111111,
 'helpful': 0.1111111111111111,
 'distinction': 0.1111111111111111,
 'extractive': 0.3333333333333333,
 'abstractive': 0.4444444444444444,
 'seek': 0.2222222222222222,
 'extract': 0.1111111111111111,
 'pertinent': 0.1111111111111111,
 'information': 0.1111111111111111,
 'traditional': 0.1111111111111111,
 '

In [11]:
sentence_scores

{'a litany of text summarization methods have been developed over the last several decades, so answering how text summarization works doesn’t have a single answer.': 4.999999999999998,
 'this having been said, these methods can be classified according to their general approaches in addressing the challenge of text summarization.': 3.2222222222222223,
 'perhaps the most clear-cut and helpful distinction is that between extractive and abstractive text summarization methods.': 3.555555555555556,
 'extractive methods seek to extract the most pertinent information from a text.': 2.7777777777777777,
 'extractive text summarization is the more traditional of the two methods, in part because of their relative simplicity compared to abstractive methods.': 4.888888888888889,
 'abstractive methods instead seek to generate a novel body of text that accurately summarizes the original text.': 4.444444444444445,
 'this difficulty comes with an upside, though.': 0.4444444444444444,
 'despite their rel

## Text Summarization

In [12]:
# get top 5 sentences
import heapq
summary = heapq.nlargest(5, sentence_scores, key=sentence_scores.get)

print(" ".join(summary))

a litany of text summarization methods have been developed over the last several decades, so answering how text summarization works doesn’t have a single answer. extractive text summarization is the more traditional of the two methods, in part because of their relative simplicity compared to abstractive methods. abstractive methods instead seek to generate a novel body of text that accurately summarizes the original text. perhaps the most clear-cut and helpful distinction is that between extractive and abstractive text summarization methods. this having been said, these methods can be classified according to their general approaches in addressing the challenge of text summarization.
