# **Text Summarization using NLP**


**What is text summarization?**

Text summarization is the process of distilling the most important information from a source text.

**Why automatic text summarization?**



1.   Summaries reduce reading time.
2.   When researching documents,summaries make the  selection process easier.
3.   Automatic summarization improves the effectiveness of indexing.
4.   Automatice summarization algorithms are less biased than human summarization.
5.   Personalized summaries are useful in question-answering systems as they provied personalized information.
6.   Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of text documents they are able to process.





# **Type of summarization**

![image.png](attachment:d24db0d3-4692-4849-a727-b12c61f57fac.png)



**How to do text summarization**


*   Text cleaning
*   Sentence tokenization
*   Word tokenzation
*   Word-frequency table
*   Summarization 
 
 

  **Text variable**








In [39]:
# Get user input for the paragraph
print("Please enter the paragraph you want to summarize:")
text = input()


Please enter the paragraph you want to summarize:


 Many students who study and work have to balance their schedules carefully. They have to make time to go to class, go to work, and also complete their homework. If they don’t plan their time carefully, they may not be able to meet these obligations and then they will face serious consequences. They may lose money by not making time for work or they may get bad grades in their classes by not having time to study. After these obligations are met, there are other activities many students enjoy like spending time with friends, doing hobbies, or dating. They will not have time for these extra activities without balancing their schedules first. It can be very difficult for students to make time for all of their obligations, but it is essential to their success.




# Let's Get Started with SpaCy

In [40]:
!pip install -U spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------- 0.0/12.8 MB 991.0 kB/s eta 0:00:13
     --------------------------------------- 0.1/12.8 MB 819.2 kB/s eta 0:00:16
      --------------------------------------- 0.2/12.8 MB 1.2 MB/s eta 0:00:11
     - -------------------------------------- 0.3/12.8 MB 1.5 MB/s eta 0:00:09
     - -------------------------------------- 0.4/12.8 MB 1.6 MB/s eta 0:00:08
     - -------------------------------------- 0.4/12.8 MB 1.6 MB/s eta 0:00:08
     - -------------------------------------- 0.4/12.8 MB 1.6 MB/s eta 0:00:08
     -- ------------------------------------- 0.6/12.8 MB 1.6 MB/s eta 0:00:08
     -- ------------------------------------- 

In [47]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [42]:
stopwords = list(STOP_WORDS)

In [43]:
nlp = spacy.load('en_core_web_sm')

In [44]:
doc = nlp(text)

In [45]:
tokens = [token.text for token in doc]
print(tokens)

['Many', 'students', 'who', 'study', 'and', 'work', 'have', 'to', 'balance', 'their', 'schedules', 'carefully', '.', 'They', 'have', 'to', 'make', 'time', 'to', 'go', 'to', 'class', ',', 'go', 'to', 'work', ',', 'and', 'also', 'complete', 'their', 'homework', '.', 'If', 'they', 'do', 'n’t', 'plan', 'their', 'time', 'carefully', ',', 'they', 'may', 'not', 'be', 'able', 'to', 'meet', 'these', 'obligations', 'and', 'then', 'they', 'will', 'face', 'serious', 'consequences', '.', 'They', 'may', 'lose', 'money', 'by', 'not', 'making', 'time', 'for', 'work', 'or', 'they', 'may', 'get', 'bad', 'grades', 'in', 'their', 'classes', 'by', 'not', 'having', 'time', 'to', 'study', '.', 'After', 'these', 'obligations', 'are', 'met', ',', 'there', 'are', 'other', 'activities', 'many', 'students', 'enjoy', 'like', 'spending', 'time', 'with', 'friends', ',', 'doing', 'hobbies', ',', 'or', 'dating', '.', 'They', 'will', 'not', 'have', 'time', 'for', 'these', 'extra', 'activities', 'without', 'balancing', 

In [46]:
punctuation = punctuation + '\n'
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [48]:
word_frequencies = {}
for word in doc:
  if word.text.lower() not in stopwords:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequencies.keys():
        word_frequencies[word.text] = 1
      else:
        word_frequencies[word.text] += 1

In [49]:
print(word_frequencies)

{'students': 3, 'study': 2, 'work': 3, 'balance': 1, 'schedules': 2, 'carefully': 2, 'time': 7, 'class': 1, 'complete': 1, 'homework': 1, 'plan': 1, 'able': 1, 'meet': 1, 'obligations': 3, 'face': 1, 'consequences': 1, 'lose': 1, 'money': 1, 'making': 1, 'bad': 1, 'grades': 1, 'classes': 1, 'having': 1, 'met': 1, 'activities': 2, 'enjoy': 1, 'like': 1, 'spending': 1, 'friends': 1, 'hobbies': 1, 'dating': 1, 'extra': 1, 'balancing': 1, 'difficult': 1, 'essential': 1, 'success': 1}


In [50]:
max_frequency = max(word_frequencies.values())

In [51]:
max_frequency

7

In [52]:
for word in word_frequencies.keys():
  word_frequencies[word] = word_frequencies[word]/max_frequency

In [53]:
print(word_frequencies)

{'students': 0.42857142857142855, 'study': 0.2857142857142857, 'work': 0.42857142857142855, 'balance': 0.14285714285714285, 'schedules': 0.2857142857142857, 'carefully': 0.2857142857142857, 'time': 1.0, 'class': 0.14285714285714285, 'complete': 0.14285714285714285, 'homework': 0.14285714285714285, 'plan': 0.14285714285714285, 'able': 0.14285714285714285, 'meet': 0.14285714285714285, 'obligations': 0.42857142857142855, 'face': 0.14285714285714285, 'consequences': 0.14285714285714285, 'lose': 0.14285714285714285, 'money': 0.14285714285714285, 'making': 0.14285714285714285, 'bad': 0.14285714285714285, 'grades': 0.14285714285714285, 'classes': 0.14285714285714285, 'having': 0.14285714285714285, 'met': 0.14285714285714285, 'activities': 0.2857142857142857, 'enjoy': 0.14285714285714285, 'like': 0.14285714285714285, 'spending': 0.14285714285714285, 'friends': 0.14285714285714285, 'hobbies': 0.14285714285714285, 'dating': 0.14285714285714285, 'extra': 0.14285714285714285, 'balancing': 0.142857

In [54]:
sentence_tokens = [sent for sent in doc.sents]
print(sentence_tokens)

[Many students who study and work have to balance their schedules carefully., They have to make time to go to class, go to work, and also complete their homework., If they don’t plan their time carefully, they may not be able to meet these obligations and then they will face serious consequences., They may lose money by not making time for work or they may get bad grades in their classes by not having time to study., After these obligations are met, there are other activities many students enjoy like spending time with friends, doing hobbies, or dating., They will not have time for these extra activities without balancing their schedules first., It can be very difficult for students to make time for all of their obligations, but it is essential to their success.]


In [55]:
sentence_scores = {}
for sent in sentence_tokens:
  for word in sent:
    if word.text.lower() in word_frequencies.keys():
      if sent not in sentence_scores.keys():
        sentence_scores[sent] = word_frequencies[word.text.lower()]
      else:
        sentence_scores[sent] += word_frequencies[word.text.lower()]


In [56]:
sentence_scores

{Many students who study and work have to balance their schedules carefully.: 1.8571428571428568,
 They have to make time to go to class, go to work, and also complete their homework.: 1.857142857142857,
 If they don’t plan their time carefully, they may not be able to meet these obligations and then they will face serious consequences.: 2.428571428571428,
 They may lose money by not making time for work or they may get bad grades in their classes by not having time to study.: 3.714285714285714,
 After these obligations are met, there are other activities many students enjoy like spending time with friends, doing hobbies, or dating.: 3.1428571428571423,
 They will not have time for these extra activities without balancing their schedules first.: 1.8571428571428568,
 It can be very difficult for students to make time for all of their obligations, but it is essential to their success.: 2.2857142857142856}

In [57]:
from heapq import nlargest

In [58]:
select_length = int(len(sentence_tokens)*0.3)
select_length

2

In [59]:
summary = nlargest(select_length, sentence_scores, key = sentence_scores.get)

In [60]:
summary

[They may lose money by not making time for work or they may get bad grades in their classes by not having time to study.,
 After these obligations are met, there are other activities many students enjoy like spending time with friends, doing hobbies, or dating.]

In [61]:
final_summary = [word.text for word in summary]

In [62]:
summary = ' '.join(final_summary)

In [63]:
print(text)

Many students who study and work have to balance their schedules carefully. They have to make time to go to class, go to work, and also complete their homework. If they don’t plan their time carefully, they may not be able to meet these obligations and then they will face serious consequences. They may lose money by not making time for work or they may get bad grades in their classes by not having time to study. After these obligations are met, there are other activities many students enjoy like spending time with friends, doing hobbies, or dating. They will not have time for these extra activities without balancing their schedules first. It can be very difficult for students to make time for all of their obligations, but it is essential to their success.


In [67]:
print(summary)

They may lose money by not making time for work or they may get bad grades in their classes by not having time to study. After these obligations are met, there are other activities many students enjoy like spending time with friends, doing hobbies, or dating.


In [68]:
print("Length of Original Passage: ",len(text.split()))

Length of Original Passage:  134


In [69]:
print("Length of Summarized Passage: ",len(summary.split()))

Length of Summarized Passage:  46
