**IMPORTING REQUIRED MODULES AND PACKAGES**

In [117]:
import numpy as np
import pandas as pd
import nltk
nltk.download('punkt') # one time execution

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

**INPUT TEXT**

The input text:
Extractive text summarization is a vital natural language processing (NLP) task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages. The process involves leveraging techniques from information retrieval, machine learning, and linguistic analysis to identify and rank the most significant content in a document. Methods such as TextRank and TF-IDF are commonly used to assess sentence importance based on features like word frequency, sentence position, and relationships between sentences. The objective is to create summaries that faithfully represent the primary ideas of the original text, providing users with a succinct overview without the need to delve into the entire document.

Extractive summarization faces challenges in maintaining coherence, handling redundancy, and striking a balance between informativeness and brevity. Algorithms may use graph-based models, feature-based approaches, or machine learning models to score sentences, and they are evaluated using metrics like ROUGE. Real-time applications, such as news aggregation and social media content summarization, benefit from the speed and efficiency of extractive techniques. Ongoing advancements in neural networks, reinforcement learning, and hybrid models that combine extractive and abstractive approaches are shaping the future of extractive text summarization. The field's evolution continues to address limitations, making it a dynamic and critical area within NLP.

Extractive text summarization finds applications across diverse domains, including news articles, legal documents, research papers, and online content. Its real-time capabilities make it valuable for time-sensitive information retrieval, allowing users to quickly access crucial details. The method is particularly relevant in scenarios where brevity is essential, providing decision-makers with summarized insights without the need for exhaustive reading. Redundancy mitigation techniques ensure that key information is not needlessly repeated, contributing to more concise and diverse summaries. Extractive summarization APIs and platforms equipped with pre-trained models democratize the use of summarization functionalities, enabling developers and businesses to integrate these tools seamlessly into their applications. As research in this field advances, there is a growing emphasis on addressing challenges, enhancing the accuracy of summaries, and exploring innovative approaches for improved outcomes in extractive text summarization.

In [118]:
s='Extractive text summarization is a vital natural language processing (NLP) task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages. The process involves leveraging techniques from information retrieval, machine learning, and linguistic analysis to identify and rank the most significant content in a document. Methods such as TextRank and TF-IDF are commonly used to assess sentence importance based on features like word frequency, sentence position, and relationships between sentences. The objective is to create summaries that faithfully represent the primary ideas of the original text, providing users with a succinct overview without the need to delve into the entire document.Extractive summarization faces challenges in maintaining coherence, handling redundancy, and striking a balance between informativeness and brevity. Algorithms may use graph-based models, feature-based approaches, or machine learning models to score sentences, and they are evaluated using metrics like ROUGE. Real-time applications, such as news aggregation and social media content summarization, benefit from the speed and efficiency of extractive techniques. Ongoing advancements in neural networks, reinforcement learning, and hybrid models that combine extractive and abstractive approaches are shaping the future of extractive text summarization. The fields evolution continues to address limitations, making it a dynamic and critical area within NLP.Extractive text summarization finds applications across diverse domains, including news articles, legal documents, research papers, and online content. Its real-time capabilities make it valuable for time-sensitive information retrieval, allowing users to quickly access crucial details. The method is particularly relevant in scenarios where brevity is essential, providing decision-makers with summarized insights without the need for exhaustive reading. Redundancy mitigation techniques ensure that key information is not needlessly repeated, contributing to more concise and diverse summaries. Extractive summarization APIs and platforms equipped with pre-trained models democratize the use of summarization functionalities, enabling developers and businesses to integrate these tools seamlessly into their applications. As research in this field advances, there is a growing emphasis on addressing challenges, enhancing the accuracy of summaries, and exploring innovative approaches for improved outcomes in extractive text summarization.'

In [119]:
print("Input text:",s)

Input text: Extractive text summarization is a vital natural language processing (NLP) task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages. The process involves leveraging techniques from information retrieval, machine learning, and linguistic analysis to identify and rank the most significant content in a document. Methods such as TextRank and TF-IDF are commonly used to assess sentence importance based on features like word frequency, sentence position, and relationships between sentences. The objective is to create summaries that faithfully represent the primary ideas of the original text, providing users with a succinct overview without the need to delve into the entire document.Extractive summarization faces challenges in maintaining coherence, handling redundancy, and striking a balance between informativeness and brevity. Algorithms may use graph-based models, feature-based approaches, or mach

**TOKENIZATION OF SENTENCES**

In [120]:
from nltk.tokenize import sent_tokenize
sentences=[]
sentences.append(sent_tokenize(s))
sentences = [word for sent in sentences for word in sent]

In [121]:
print("After tokenization:",sentences)

After tokenization: ['Extractive text summarization is a vital natural language processing (NLP) task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages.', 'The process involves leveraging techniques from information retrieval, machine learning, and linguistic analysis to identify and rank the most significant content in a document.', 'Methods such as TextRank and TF-IDF are commonly used to assess sentence importance based on features like word frequency, sentence position, and relationships between sentences.', 'The objective is to create summaries that faithfully represent the primary ideas of the original text, providing users with a succinct overview without the need to delve into the entire document.Extractive summarization faces challenges in maintaining coherence, handling redundancy, and striking a balance between informativeness and brevity.', 'Algorithms may use graph-based models, feature-bas

**REMOVE PUNCTUATIONS, NUMBERS AND SPECIAL CHARACTERS**

In [122]:
# remove punctuations, numbers and special characters
clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")
print("clean sentences:")
print(clean_sentences)

clean sentences:
0     Extractive text summarization is a vital natur...
1     The process involves leveraging techniques fro...
2     Methods such as TextRank and TF IDF are common...
3     The objective is to create summaries that fait...
4     Algorithms may use graph based models  feature...
5     Real time applications  such as news aggregati...
6     Ongoing advancements in neural networks  reinf...
7     The fields evolution continues to address limi...
8     Its real time capabilities make it valuable fo...
9     The method is particularly relevant in scenari...
10    Redundancy mitigation techniques ensure that k...
11    Extractive summarization APIs and platforms eq...
12    As research in this field advances  there is a...
dtype: object


  clean_sentences = pd.Series(sentences).str.replace("[^a-zA-Z]", " ")


In [123]:
# make alphabets lowercase
clean_sentences = [s.lower() for s in clean_sentences]

In [124]:
print(clean_sentences)

['extractive text summarization is a vital natural language processing  nlp  task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages ', 'the process involves leveraging techniques from information retrieval  machine learning  and linguistic analysis to identify and rank the most significant content in a document ', 'methods such as textrank and tf idf are commonly used to assess sentence importance based on features like word frequency  sentence position  and relationships between sentences ', 'the objective is to create summaries that faithfully represent the primary ideas of the original text  providing users with a succinct overview without the need to delve into the entire document extractive summarization faces challenges in maintaining coherence  handling redundancy  and striking a balance between informativeness and brevity ', 'algorithms may use graph based models  feature based approaches  or ma

**REMOVAL OF STOPWORDS**

In [125]:
nltk.download('stopwords')
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
print("before removing stop words")
print(clean_sentences)
def remove_stopwords(sen):
    sample_list=[]
    for i in sen:
      if i not in stop_words:
        sample_list.append(i)
    sen_new=" ".join(sample_list)
    return sen_new
clean_sentences = [remove_stopwords(r.split()) for r in clean_sentences]
print("removing stop words:")
print(clean_sentences)

before removing stop words
['extractive text summarization is a vital natural language processing  nlp  task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages ', 'the process involves leveraging techniques from information retrieval  machine learning  and linguistic analysis to identify and rank the most significant content in a document ', 'methods such as textrank and tf idf are commonly used to assess sentence importance based on features like word frequency  sentence position  and relationships between sentences ', 'the objective is to create summaries that faithfully represent the primary ideas of the original text  providing users with a succinct overview without the need to delve into the entire document extractive summarization faces challenges in maintaining coherence  handling redundancy  and striking a balance between informativeness and brevity ', 'algorithms may use graph based models  feat

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


**LEMMATIZATION OF SENTENCES**

In [126]:
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
print("Before lemmatization:")
print(clean_sentences)

from nltk import pos_tag
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer


part = {
    'N' : 'n',
    'V' : 'v',
    'J' : 'a',
    'R' : 'r'
}

wnl = WordNetLemmatizer()

def convert_tag(penn_tag):
    if penn_tag in part.keys():
        return part[penn_tag]
    else:
        return 'n'


def tag_and_lem(element):
    sent = pos_tag(word_tokenize(element))
    return ' '.join([wnl.lemmatize(sent[k][0], convert_tag(sent[k][1][0]))
                    for k in range(len(sent))])
result_sentence=[]
for i in clean_sentences:
    value=tag_and_lem(i)
    result_sentence.append(value)
lemmatized_sentences=result_sentence
print("After lemmatization:")
print(lemmatized_sentences)

Before lemmatization:
['extractive text summarization vital natural language processing nlp task aimed condensing large volumes textual information concise summaries selecting assembling key sentences passages', 'process involves leveraging techniques information retrieval machine learning linguistic analysis identify rank significant content document', 'methods textrank tf idf commonly used assess sentence importance based features like word frequency sentence position relationships sentences', 'objective create summaries faithfully represent primary ideas original text providing users succinct overview without need delve entire document extractive summarization faces challenges maintaining coherence handling redundancy striking balance informativeness brevity', 'algorithms may use graph based models feature based approaches machine learning models score sentences evaluated using metrics like rouge', 'real time applications news aggregation social media content summarization benefit s

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [127]:
len(lemmatized_sentences)

13

**SIMILARITY OF SENTENCES USING COSINE SIMILARITY**

In [128]:
import math

In [129]:
sim_mat = np.zeros([len(sentences), len(sentences)])

In [130]:
sim_mat

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

In [131]:
#similarity_matrix=[]
for x in range(0,len(lemmatized_sentences)):
   # sim_mat=[]
    s1=lemmatized_sentences[x]
    s1_words=s1.split()
    for y in range(0,len(lemmatized_sentences)):
        s2=lemmatized_sentences[y]
        s2_words=s2.split()
        unique_words=[]
        for word in s1_words:
            if word not in unique_words:
                unique_words.append(word)
        for word in s2_words:
            if word not in unique_words:
                unique_words.append(word)
        d={}
        for j in unique_words:
            d[j]=0
        d1=[]
        d2=[]
        for k in d.keys():
            d1.append(s1_words.count(k))
            d2.append(s2_words.count(k))
        sum_d1=0
        for i in d1:
            sum_d1=sum_d1+i*i
            s1_norm=math.sqrt(sum_d1)
        sum_d2=0
        for i in d2:
            sum_d2=sum_d2+i*i
            s2_norm=math.sqrt(sum_d2)
        similarity_value=0
        for i in range(0,len(d1)):
            d1[i]=d1[i]/s1_norm
            d2[i]=d2[i]/s2_norm
            similarity_value+=d1[i]*d2[i]
        sim_mat[x][y]=similarity_value

In [132]:
sim_mat

array([[1.        , 0.05504819, 0.13055824, 0.15569979, 0.04264014,
        0.11396058, 0.1956464 , 0.16116459, 0.05170877, 0.        ,
        0.24618298, 0.13957263, 0.20100756],
       [0.05504819, 1.        , 0.        , 0.04714045, 0.10327956,
        0.13801311, 0.05923489, 0.09759001, 0.12524486, 0.        ,
        0.1490712 , 0.        , 0.        ],
       [0.13055824, 0.        , 1.        , 0.        , 0.36742346,
        0.        , 0.        , 0.        , 0.        , 0.05270463,
        0.        , 0.04454354, 0.        ],
       [0.15569979, 0.04714045, 0.        , 1.        , 0.        ,
        0.09759001, 0.16754156, 0.13801311, 0.04428074, 0.18856181,
        0.10540926, 0.11952286, 0.21516574],
       [0.04264014, 0.10327956, 0.36742346, 0.        , 1.        ,
        0.        , 0.18353259, 0.        , 0.        , 0.        ,
        0.        , 0.17457431, 0.04714045],
       [0.11396058, 0.13801311, 0.        , 0.09759001, 0.        ,
        1.        , 0.18394

In [133]:
m=len(sim_mat)

In [134]:
print(m)

13


**CALCULATING TEXTRANK FOR SENTENCES**

In [135]:
damping_factor_matrix = []

for i in range(m):
  a=[]
  for j in range(1):
    a.append(0.85)
  damping_factor_matrix.append(a)
print(damping_factor_matrix)

[[0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85], [0.85]]


In [136]:
for i in range(m):
    for j in range(1):
        print(damping_factor_matrix[i][j], end = " ")
    print()

0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 
0.85 


In [137]:
transpose_matrix=np.transpose(sim_mat)

In [138]:
for k in range(3):
  res = np.dot(transpose_matrix,damping_factor_matrix)
  damping_factor_matrix=res

In [139]:
print(res)

[[10.75817698]
 [ 5.78931206]
 [ 4.45560132]
 [ 9.29758771]
 [ 6.48513699]
 [10.41245592]
 [11.44607289]
 [10.57803548]
 [ 4.40609073]
 [ 2.41741714]
 [ 6.10673794]
 [10.1264003 ]
 [10.78217957]]


In [140]:
rank_dict={}
for s in range(len(sentences)):
  rank_dict[s]=res[s][0]
print(rank_dict)

{0: 10.75817697748218, 1: 5.789312060016041, 2: 4.455601315427076, 3: 9.29758771103963, 4: 6.485136994735127, 5: 10.41245591633683, 6: 11.446072894629252, 7: 10.578035480653654, 8: 4.4060907294096605, 9: 2.4174171388628727, 10: 6.106737939733235, 11: 10.126400304103619, 12: 10.782179574959532}


**SORTING THE SENTENCES BASED ON THEIR RANKS**

In [141]:
ranked_sentences = sorted(((rank_dict[i],s) for i,s in enumerate(sentences)), reverse=True)
print(ranked_sentences)

[(11.446072894629252, 'Ongoing advancements in neural networks, reinforcement learning, and hybrid models that combine extractive and abstractive approaches are shaping the future of extractive text summarization.'), (10.782179574959532, 'As research in this field advances, there is a growing emphasis on addressing challenges, enhancing the accuracy of summaries, and exploring innovative approaches for improved outcomes in extractive text summarization.'), (10.75817697748218, 'Extractive text summarization is a vital natural language processing (NLP) task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages.'), (10.578035480653654, 'The fields evolution continues to address limitations, making it a dynamic and critical area within NLP.Extractive text summarization finds applications across diverse domains, including news articles, legal documents, research papers, and online content.'), (10.41245591633683,

**PRINTING THE TOP MOST IMPORTANT SENTENCES IN A PARAGRAPH**

In [142]:
for i in range(5):
    print(ranked_sentences[i][1])

Ongoing advancements in neural networks, reinforcement learning, and hybrid models that combine extractive and abstractive approaches are shaping the future of extractive text summarization.
As research in this field advances, there is a growing emphasis on addressing challenges, enhancing the accuracy of summaries, and exploring innovative approaches for improved outcomes in extractive text summarization.
Extractive text summarization is a vital natural language processing (NLP) task aimed at condensing large volumes of textual information into concise summaries by selecting and assembling key sentences or passages.
The fields evolution continues to address limitations, making it a dynamic and critical area within NLP.Extractive text summarization finds applications across diverse domains, including news articles, legal documents, research papers, and online content.
Real-time applications, such as news aggregation and social media content summarization, benefit from the speed and eff