<a href="https://colab.research.google.com/github/sanjivch/NLP-Text-Summarizer/blob/main/Text_Summarizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install spacy


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
from collections import Counter
from heapq import nlargest

In [3]:
nlp = spacy.load("en_core_web_sm")

In [4]:
text = '''Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task. Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task. Machine learning algorithms are used in the applications of email filtering, detection of network intruders, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning. Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning. In its application across business problems, machine learning is also referred to as predictive analytics.'''

In [5]:
doc = nlp(text)

In [6]:
list(doc.sents)

[Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task.,
 Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.,
 Machine learning algorithms are used in the applications of email filtering, detection of network intruders, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task.,
 Machine learning is closely related to computational statistics, which focuses on making predictions using computers.,
 The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning.,
 Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning.,
 In its application acros

In [7]:
keyword = list()
stop_words = list(STOP_WORDS)

In [8]:
pos_tag = ['PROPN', 'ADJ', 'NOUN', 'VERB']

In [9]:
for token in doc:
  #print(token)
  if (token.text in stop_words) or (token.text in punctuation):
    continue
  if token.pos_ in pos_tag:
    keyword.append(token.text)

In [31]:
len(keyword)

90

In [32]:
word_count = []
for token in doc:
  #print(token)
  if (token.text not in punctuation):
    word_count.append(token.text)

In [34]:
len(word_count)

155

In [10]:
freq_word = Counter(keyword)

In [11]:
#freq_word.most_common(10)
#pickup the max occuring word - which is teh first one in the list of tuples
max_freq = freq_word.most_common(1)[0][1]

for word in freq_word.keys():
  freq_word[word] = freq_word[word]/max_freq


In [12]:
freq_word

Counter({'Data': 0.125,
         'ML': 0.125,
         'Machine': 0.5,
         'algorithm': 0.125,
         'algorithms': 0.375,
         'analysis': 0.125,
         'analytics': 0.125,
         'application': 0.25,
         'applications': 0.125,
         'build': 0.125,
         'business': 0.125,
         'computational': 0.125,
         'computer': 0.25,
         'computers': 0.125,
         'data': 0.375,
         'decisions': 0.125,
         'delivers': 0.125,
         'detection': 0.125,
         'develop': 0.125,
         'domains': 0.125,
         'email': 0.125,
         'exploratory': 0.125,
         'field': 0.25,
         'filtering': 0.125,
         'focuses': 0.25,
         'improve': 0.125,
         'infeasible': 0.125,
         'instructions': 0.125,
         'intruders': 0.125,
         'known': 0.125,
         'learning': 1.0,
         'machine': 0.375,
         'making': 0.125,
         'mathematical': 0.25,
         'methods': 0.125,
         'mining': 0.125,
    

In [13]:
sentence_strength = dict()

In [16]:
for sentence in doc.sents:
  #print(sentence)
  for word in sentence:
    #print(word)
    if word.text in freq_word.keys():
      if sentence in sentence_strength.keys():
        sentence_strength[sentence] += freq_word[word.text]
      else:
        sentence_strength[sentence] = freq_word[word.text]



In [17]:
sentence_strength

{Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to progressively improve their performance on a specific task.: 4.125,
 Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.: 4.625,
 Machine learning algorithms are used in the applications of email filtering, detection of network intruders, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task.: 4.25,
 Machine learning is closely related to computational statistics, which focuses on making predictions using computers.: 2.625,
 The study of mathematical optimization delivers methods, theory and application domains to the field of machine learning.: 3.125,
 Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised le

In [18]:
summarized_sentences = nlargest(3, sentence_strength, key=sentence_strength.get)

In [19]:
summarized_sentences

[Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.,
 Machine learning algorithms are used in the applications of email filtering, detection of network intruders, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task.,
 Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning.]

In [20]:
final_summary = [w.text for w in summarized_sentences]

In [22]:
final_summary = ''.join(final_summary)

In [23]:
final_summary

'Machine learning algorithms build a mathematical model of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.Machine learning algorithms are used in the applications of email filtering, detection of network intruders, and computer vision, where it is infeasible to develop an algorithm of specific instructions for performing the task.Data mining is a field of study within machine learning and focuses on exploratory data analysis through unsupervised learning.'

In [28]:
Counter(([token for token in doc]))

Counter({Machine: 1,
         learning: 1,
         (: 1,
         ML: 1,
         ): 1,
         is: 1,
         the: 1,
         scientific: 1,
         study: 1,
         of: 1,
         algorithms: 1,
         and: 1,
         statistical: 1,
         models: 1,
         that: 1,
         computer: 1,
         systems: 1,
         use: 1,
         to: 1,
         progressively: 1,
         improve: 1,
         their: 1,
         performance: 1,
         on: 1,
         a: 1,
         specific: 1,
         task: 1,
         .: 1,
         Machine: 1,
         learning: 1,
         algorithms: 1,
         build: 1,
         a: 1,
         mathematical: 1,
         model: 1,
         of: 1,
         sample: 1,
         data: 1,
         ,: 1,
         known: 1,
         as: 1,
         “: 1,
         training: 1,
         data: 1,
         ”: 1,
         ,: 1,
         in: 1,
         order: 1,
         to: 1,
         make: 1,
         predictions: 1,
         or: 1,
         decisio

In [29]:
for x in doc:
  print(x)

Machine
learning
(
ML
)
is
the
scientific
study
of
algorithms
and
statistical
models
that
computer
systems
use
to
progressively
improve
their
performance
on
a
specific
task
.
Machine
learning
algorithms
build
a
mathematical
model
of
sample
data
,
known
as
“
training
data
”
,
in
order
to
make
predictions
or
decisions
without
being
explicitly
programmed
to
perform
the
task
.
Machine
learning
algorithms
are
used
in
the
applications
of
email
filtering
,
detection
of
network
intruders
,
and
computer
vision
,
where
it
is
infeasible
to
develop
an
algorithm
of
specific
instructions
for
performing
the
task
.
Machine
learning
is
closely
related
to
computational
statistics
,
which
focuses
on
making
predictions
using
computers
.
The
study
of
mathematical
optimization
delivers
methods
,
theory
and
application
domains
to
the
field
of
machine
learning
.
Data
mining
is
a
field
of
study
within
machine
learning
and
focuses
on
exploratory
data
analysis
through
unsupervised
learning
.
In
its
application
a