# **Text Summarization using NLP**


**What is text summarization?**

Text summarization is the process of distilling the most important information from a source text.

**Why automatic text summarization?**



1.   Summaries reduce reading time.
2.   When researching documents,summaries make the  selection process easier.
3.   Automatic summarization improves the effectiveness of indexing.
4.   Automatice summarization algorithms are less biased than human summarization.
5.   Personalized summaries are useful in question-answering systems as they provied personalized information.
6.   Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of text documents they are able to process.





# **Type of summarization**

![alt text](https://drive.google.com/uc?id=1AqwSGEpi3vzAOLVt_5XXRXokZHvcn43B)



**How to do text summarization**


*   Text cleaning
*   Sentence tokenization
*   Word tokenzation
*   Word-frequency table
*   Summarization 
 
 

  **Text variable**








In [65]:
 text = '''Santiago is a Shepherd who has a recurring dream which is supposedly prophetic. Inspired on learning this, he undertakes a journey to Egypt to discover the meaning of life and fulfill his destiny. During the course of his travels, he learns of his true purpose and meets many characters, including an “Alchemist”, that teach him valuable lessons about achieving his dreams. Santiago sets his sights on obtaining a certain kind of “treasure” for which he travels to Egypt. The key message is, “when you want something, all the universe conspires in helping you to achieve it.” Towards the final arc, Santiago gets robbed by bandits who end up revealing that the “treasure” he was looking for is buried in the place where his journey began. The end.'''



# Let's Get Started with SpaCy

In [66]:
# !pip instlla -U spacy
# !python -m spacy download en_core_web_sm

In [67]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [68]:
stopwords = list(STOP_WORDS)

In [69]:
!python -m spacy download en_core_web_en


✘ No compatible package found for 'en_core_web_en' (spaCy v3.0.9)



2023-02-18 23:51:11.899961: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2023-02-18 23:51:11.901319: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [70]:
# !pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz
nlp = spacy.load('en_core_web_sm')

In [71]:
doc = nlp(text)

In [72]:
tokens = [token.text for token in doc]
print(tokens)

['Santiago', 'is', 'a', 'Shepherd', 'who', 'has', 'a', 'recurring', 'dream', 'which', 'is', 'supposedly', 'prophetic', '.', 'Inspired', 'on', 'learning', 'this', ',', 'he', 'undertakes', 'a', 'journey', 'to', 'Egypt', 'to', 'discover', 'the', 'meaning', 'of', 'life', 'and', 'fulfill', 'his', 'destiny', '.', 'During', 'the', 'course', 'of', 'his', 'travels', ',', 'he', 'learns', 'of', 'his', 'true', 'purpose', 'and', 'meets', 'many', 'characters', ',', 'including', 'an', '“', 'Alchemist', '”', ',', 'that', 'teach', 'him', 'valuable', 'lessons', 'about', 'achieving', 'his', 'dreams', '.', 'Santiago', 'sets', 'his', 'sights', 'on', 'obtaining', 'a', 'certain', 'kind', 'of', '“', 'treasure', '”', 'for', 'which', 'he', 'travels', 'to', 'Egypt', '.', 'The', 'key', 'message', 'is', ',', '“', 'when', 'you', 'want', 'something', ',', 'all', 'the', 'universe', 'conspires', 'in', 'helping', 'you', 'to', 'achieve', 'it', '.', '”', 'Towards', 'the', 'final', 'arc', ',', 'Santiago', 'gets', 'robbed'

In [73]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [74]:
word_frequencies = {}
for word in doc:
  if word.text.lower() not in stopwords:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequencies.keys():
        word_frequencies[word.text] = 1
      else:
        word_frequencies[word.text] += 1

In [75]:
print(word_frequencies)

{'Santiago': 3, 'Shepherd': 1, 'recurring': 1, 'dream': 1, 'supposedly': 1, 'prophetic': 1, 'Inspired': 1, 'learning': 1, 'undertakes': 1, 'journey': 2, 'Egypt': 2, 'discover': 1, 'meaning': 1, 'life': 1, 'fulfill': 1, 'destiny': 1, 'course': 1, 'travels': 2, 'learns': 1, 'true': 1, 'purpose': 1, 'meets': 1, 'characters': 1, 'including': 1, '“': 4, 'Alchemist': 1, '”': 4, 'teach': 1, 'valuable': 1, 'lessons': 1, 'achieving': 1, 'dreams': 1, 'sets': 1, 'sights': 1, 'obtaining': 1, 'certain': 1, 'kind': 1, 'treasure': 2, 'key': 1, 'message': 1, 'want': 1, 'universe': 1, 'conspires': 1, 'helping': 1, 'achieve': 1, 'final': 1, 'arc': 1, 'gets': 1, 'robbed': 1, 'bandits': 1, 'end': 2, 'revealing': 1, 'looking': 1, 'buried': 1, 'place': 1, 'began': 1}


In [76]:
max_frequency = max(word_frequencies.values())

In [77]:
max_frequency

4

In [78]:
for word in word_frequencies.keys():
  word_frequencies[word] = word_frequencies[word]/max_frequency

In [79]:
print(word_frequencies)

{'Santiago': 0.75, 'Shepherd': 0.25, 'recurring': 0.25, 'dream': 0.25, 'supposedly': 0.25, 'prophetic': 0.25, 'Inspired': 0.25, 'learning': 0.25, 'undertakes': 0.25, 'journey': 0.5, 'Egypt': 0.5, 'discover': 0.25, 'meaning': 0.25, 'life': 0.25, 'fulfill': 0.25, 'destiny': 0.25, 'course': 0.25, 'travels': 0.5, 'learns': 0.25, 'true': 0.25, 'purpose': 0.25, 'meets': 0.25, 'characters': 0.25, 'including': 0.25, '“': 1.0, 'Alchemist': 0.25, '”': 1.0, 'teach': 0.25, 'valuable': 0.25, 'lessons': 0.25, 'achieving': 0.25, 'dreams': 0.25, 'sets': 0.25, 'sights': 0.25, 'obtaining': 0.25, 'certain': 0.25, 'kind': 0.25, 'treasure': 0.5, 'key': 0.25, 'message': 0.25, 'want': 0.25, 'universe': 0.25, 'conspires': 0.25, 'helping': 0.25, 'achieve': 0.25, 'final': 0.25, 'arc': 0.25, 'gets': 0.25, 'robbed': 0.25, 'bandits': 0.25, 'end': 0.5, 'revealing': 0.25, 'looking': 0.25, 'buried': 0.25, 'place': 0.25, 'began': 0.25}


In [80]:
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)

[Santiago is a Shepherd who has a recurring dream which is supposedly prophetic., Inspired on learning this, he undertakes a journey to Egypt to discover the meaning of life and fulfill his destiny., During the course of his travels, he learns of his true purpose and meets many characters, including an “Alchemist”, that teach him valuable lessons about achieving his dreams., Santiago sets his sights on obtaining a certain kind of “treasure” for which he travels to Egypt., The key message is, “when you want something, all the universe conspires in helping you to achieve it.”, Towards the final arc, Santiago gets robbed by bandits who end up revealing that the “treasure” he was looking for is buried in the place where his journey began., The end.]


In [81]:
sentence_scores = {}
for sent in sentence_tokens:
  for word in sent:
    if word.text.lower() in word_frequencies.keys():
      if sent not in sentence_scores.keys():
        sentence_scores[sent] = word_frequencies[word.text.lower()]
      else:
        sentence_scores[sent] += word_frequencies[word.text.lower()]


In [82]:
sentence_scores

{Santiago is a Shepherd who has a recurring dream which is supposedly prophetic.: 1.0,
 Inspired on learning this, he undertakes a journey to Egypt to discover the meaning of life and fulfill his destiny.: 2.25,
 During the course of his travels, he learns of his true purpose and meets many characters, including an “Alchemist”, that teach him valuable lessons about achieving his dreams.: 5.5,
 Santiago sets his sights on obtaining a certain kind of “treasure” for which he travels to Egypt.: 4.25,
 The key message is, “when you want something, all the universe conspires in helping you to achieve it.”: 3.75,
 Towards the final arc, Santiago gets robbed by bandits who end up revealing that the “treasure” he was looking for is buried in the place where his journey began.: 6.0,
 The end.: 0.5}

In [83]:
from heapq import nlargest

In [84]:
select_length = int(len(sentence_tokens)*0.3)
select_length

2

In [85]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)

In [86]:
summary

[Towards the final arc, Santiago gets robbed by bandits who end up revealing that the “treasure” he was looking for is buried in the place where his journey began.,
 During the course of his travels, he learns of his true purpose and meets many characters, including an “Alchemist”, that teach him valuable lessons about achieving his dreams.]

In [87]:
final_summary = [word.text for word in summary]

In [88]:
summary = ' '.join(final_summary)

In [89]:
print(text)

Santiago is a Shepherd who has a recurring dream which is supposedly prophetic. Inspired on learning this, he undertakes a journey to Egypt to discover the meaning of life and fulfill his destiny. During the course of his travels, he learns of his true purpose and meets many characters, including an “Alchemist”, that teach him valuable lessons about achieving his dreams. Santiago sets his sights on obtaining a certain kind of “treasure” for which he travels to Egypt. The key message is, “when you want something, all the universe conspires in helping you to achieve it.” Towards the final arc, Santiago gets robbed by bandits who end up revealing that the “treasure” he was looking for is buried in the place where his journey began. The end.


In [90]:
print(summary)

Towards the final arc, Santiago gets robbed by bandits who end up revealing that the “treasure” he was looking for is buried in the place where his journey began. During the course of his travels, he learns of his true purpose and meets many characters, including an “Alchemist”, that teach him valuable lessons about achieving his dreams.
