**Extractive** Extractive summarization involves identifying and extracting the most important sentences from the original text. 
Extractive summarization is easier to implement and can be done quickly using an unsupervised approach that does not require prior training. 

## Install Libraries

In [39]:
#!pip install  sumy

## Importing Libraries

In [10]:
import nltk
import seaborn as sns 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import warnings
import os

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
import numpy as np
import networkx as nx
import re


## Loading data into DataFrame

In [2]:
path_, filename_, category_, article_or_summary_ = [],[],[],[]
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        path_.append(os.path.join(dirname, filename))
        filename_.append(filename)
        category_.append(dirname.split("/")[-1])
        article_or_summary_.append(dirname.split("/")[-2])

In [3]:
bbc = pd.DataFrame({"path":path_, "filename":filename_, "category":category_, "article_or_summary":article_or_summary_}, columns=["path", "filename", "category", "article_or_summary"])
bbc.head()

Unnamed: 0,path,filename,category,article_or_summary
0,/kaggle/input/bbc-news-summary/BBC News Summar...,361.txt,politics,Summaries
1,/kaggle/input/bbc-news-summary/BBC News Summar...,245.txt,politics,Summaries
2,/kaggle/input/bbc-news-summary/BBC News Summar...,141.txt,politics,Summaries
3,/kaggle/input/bbc-news-summary/BBC News Summar...,372.txt,politics,Summaries
4,/kaggle/input/bbc-news-summary/BBC News Summar...,333.txt,politics,Summaries


## Preprocessing
### Sentence Tokenization

In [7]:
def read_article(text):        
    sentences =[]        
    sentences = sent_tokenize(text)    
    for sentence in sentences:        
        sentence.replace("[^a-zA-Z0-9]"," ")     
    return sentences

In [8]:
file_path = bbc[bbc['article_or_summary']=='News Articles'].iloc[0]['path']
with open(file_path, "r") as f:
    article = f.read()

In [11]:
sent_tok = read_article(article)
sent_tok

["Budget to set scene for election\n\nGordon Brown will seek to put the economy at the centre of Labour's bid for a third term in power when he delivers his ninth Budget at 1230 GMT.",
 'He is expected to stress the importance of continued economic stability, with low unemployment and interest rates.',
 'The chancellor is expected to freeze petrol duty and raise the stamp duty threshold from £60,000.',
 'But the Conservatives and Lib Dems insist voters face higher taxes and more means-testing under Labour.',
 'Treasury officials have said there will not be a pre-election giveaway, but Mr Brown is thought to have about £2bn to spare.',
 "- Increase in the stamp duty threshold from £60,000 \n - A freeze on petrol duty \n - An extension of tax credit scheme for poorer families \n - Possible help for pensioners The stamp duty threshold rise is intended to help first time buyers - a likely theme of all three of the main parties' general election manifestos.",
 'Ten years ago, buyers had a m

### Sentence Similarity

In [12]:
import tensorflow_hub as hub

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

def sentence_similarity(sent1,sent2,embed):  
    A = embed([sent1])[0]
    B = embed([sent2])[0]
    return 1 - (np.dot(A,B)/(np.linalg.norm(A)*np.linalg.norm(B)))

2024-04-06 05:35:17.627446: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-06 05:35:17.627569: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-06 05:35:17.776808: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [14]:
print(f"\033[92m Sentence 1 : {sent_tok[0]}")
print(f"\033[92m Sentence 2 : {sent_tok[1]}")
print(f"\033[92m Similarity Score : {sentence_similarity(sent_tok[0], sent_tok[1], embed)}")

[92m Sentence 1 : Budget to set scene for election

Gordon Brown will seek to put the economy at the centre of Labour's bid for a third term in power when he delivers his ninth Budget at 1230 GMT.
[92m Sentence 2 : He is expected to stress the importance of continued economic stability, with low unemployment and interest rates.
[92m Similarity Score : 0.7918333411216736


In [13]:
def build_similarity_matrix(sentences,embeds):
    similarity_matrix = np.zeros((len(sentences),len(sentences)))
    for idx1 in range(len(sentences)):
        for idx2 in range(len(sentences)):
            if idx1!=idx2:
                similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1],sentences[idx2],embeds)
    return similarity_matrix

In [15]:
sim_mat = build_similarity_matrix(sent_tok, embed)

## Summarization

In [16]:
file_path_summary = bbc[bbc['article_or_summary']=='Summaries'].iloc[0]['path']
with open(file_path_summary, "r") as f:
    actual_summary = f.read()

In [17]:
def generate_extractive_summary(text,top_n,embeds):
    
    summarize_text = []  
    sentences = read_article(text)           
    sentence_similarity_matrix = build_similarity_matrix(sentences,embeds)  
    sentence_similarity_graph = nx.from_numpy_array(sentence_similarity_matrix)
    scores = nx.pagerank(sentence_similarity_graph) 
    ranked_sentences = sorted(((scores[i],s) for i,s in enumerate(sentences)),reverse=True)
    
    for i in range(top_n):
        summarize_text.append(ranked_sentences[i][1]) 
    
    return " ".join(summarize_text)

In [27]:
" ".join(sent_tok)

'Budget to set scene for election\n\nGordon Brown will seek to put the economy at the centre of Labour\'s bid for a third term in power when he delivers his ninth Budget at 1230 GMT. He is expected to stress the importance of continued economic stability, with low unemployment and interest rates. The chancellor is expected to freeze petrol duty and raise the stamp duty threshold from £60,000. But the Conservatives and Lib Dems insist voters face higher taxes and more means-testing under Labour. Treasury officials have said there will not be a pre-election giveaway, but Mr Brown is thought to have about £2bn to spare. - Increase in the stamp duty threshold from £60,000 \n - A freeze on petrol duty \n - An extension of tax credit scheme for poorer families \n - Possible help for pensioners The stamp duty threshold rise is intended to help first time buyers - a likely theme of all three of the main parties\' general election manifestos. Ten years ago, buyers had a much greater chance of a

In [18]:
Original_Text = " ".join(sent_tok)
Summarized_Text = generate_extractive_summary(Original_Text, top_n=5, embeds = embed)

In [19]:
Original_Text

'Budget to set scene for election\n\nGordon Brown will seek to put the economy at the centre of Labour\'s bid for a third term in power when he delivers his ninth Budget at 1230 GMT. He is expected to stress the importance of continued economic stability, with low unemployment and interest rates. The chancellor is expected to freeze petrol duty and raise the stamp duty threshold from £60,000. But the Conservatives and Lib Dems insist voters face higher taxes and more means-testing under Labour. Treasury officials have said there will not be a pre-election giveaway, but Mr Brown is thought to have about £2bn to spare. - Increase in the stamp duty threshold from £60,000 \n - A freeze on petrol duty \n - An extension of tax credit scheme for poorer families \n - Possible help for pensioners The stamp duty threshold rise is intended to help first time buyers - a likely theme of all three of the main parties\' general election manifestos. Ten years ago, buyers had a much greater chance of a

In [20]:
Summarized_Text

'He added: "I don\'t accept there is any need for any changes to the plans we have set out to meet our spending commitments." He is expected to stress the importance of continued economic stability, with low unemployment and interest rates. "But a lot of that is built on an increase in personal and consumer debt over the last few years - that makes the economy quite vulnerable potentially if interest rates ever do have to go up in a significant way." Plaid Cymru\'s economics spokesman Adam Price said he wanted help to get people on the housing ladder and an increase in the minimum wage to £5.60 an hour. Treasury officials have said there will not be a pre-election giveaway, but Mr Brown is thought to have about £2bn to spare.'

In [21]:
actual_summary

'- Increase in the stamp duty threshold from £60,000 - A freeze on petrol duty - An extension of tax credit scheme for poorer families - Possible help for pensioners The stamp duty threshold rise is intended to help first time buyers - a likely theme of all three of the main parties\' general election manifestos.The chancellor is expected to freeze petrol duty and raise the stamp duty threshold from £60,000.The Tories are also thought likely to propose increased thresholds, with shadow chancellor Oliver Letwin branding stamp duty a "classic Labour stealth tax".Tax credits As a result, the number of properties incurring stamp duty has rocketed as has the government\'s tax take.Since then, average UK property prices have more than doubled while the starting threshold for stamp duty has not increased.For the Lib Dems David Laws said: "The chancellor will no doubt tell us today how wonderfully the economy is doing," he said.The Liberal Democrats unveiled their own proposals to raise the st

In [22]:
hypothesis = Summarized_Text
reference = actual_summary
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis)
print(f"BLEUscore : {BLEUscore}")

BLEUscore : 0.28932823819107034


In [23]:
print(f"Senetence Similarity Score : {sentence_similarity(Summarized_Text, actual_summary, embed)}")

Senetence Similarity Score : 0.5488407611846924


In [35]:
sample_text1 = " The Indian government has cleared the supply of several essential commodities to the Maldives, including items such as rice, wheat and onions whose exports are currently banned, amid a downturn in relations between the two sides.\
The government allowed the export of these commodities for 2024-25 under a bilateral mechanism at the request of the Maldivian government, the Indian high commission in Male said in a statement on Friday. The approved quantities are also the highest since the mechanism was put in place in 1981.\
The clearance for the exports comes at a time when ties between India and the Maldives are at a low, especially after the election last year of President Mohamed Muizzu, who has sought to end the Indian archipelago’s dependence on India in strategic sectors. Muizzu has also moved the Maldives closer to China."

summ_text1 = generate_extractive_summary(sample_text1, top_n = 2, embeds = embed)

print("SUMMARY : \n",summ_text1)

SUMMARY : 
 Muizzu has also moved the Maldives closer to China.  The Indian government has cleared the supply of several essential commodities to the Maldives, including items such as rice, wheat and onions whose exports are currently banned, amid a downturn in relations between the two sides.The government allowed the export of these commodities for 2024-25 under a bilateral mechanism at the request of the Maldivian government, the Indian high commission in Male said in a statement on Friday.


In [36]:
sample_text2 = "A recent episode of Shark Tank India saw a lucrative deal that involved all five sharks on the panel — \
Anupam Mittal, Aman Gupta, Azhar Iqubal, Namita Thapar and Radhika Gupta. Two co-founders representing their biomaterial \
science company Canvaloop, which specialises in producing alternative fibres that can reduce the environmental toll left by \
cotton and synthetic, asked for Rs 1 crore in exchange of 1.33% equity, valuing the company at Rs 75 crore.The founders claimed \
that their alternative fibres, which are created from their ‘zero-waste proprietary technology’, consumes 82% less energy, 87% \
less carbon emissions, and 99% less water than synthetic and cotton manufacturing. They said that their claims have been independently \
verified by a third-party organisation, and it was only after this that they started getting business from global brands such as Levi’s."

summ_text2 = generate_extractive_summary(sample_text2, top_n = 3, embeds = embed)

print("SUMMARY : \n",summ_text2)

SUMMARY : 
 A recent episode of Shark Tank India saw a lucrative deal that involved all five sharks on the panel — Anupam Mittal, Aman Gupta, Azhar Iqubal, Namita Thapar and Radhika Gupta. They said that their claims have been independently verified by a third-party organisation, and it was only after this that they started getting business from global brands such as Levi’s. Two co-founders representing their biomaterial science company Canvaloop, which specialises in producing alternative fibres that can reduce the environmental toll left by cotton and synthetic, asked for Rs 1 crore in exchange of 1.33% equity, valuing the company at Rs 75 crore.The founders claimed that their alternative fibres, which are created from their ‘zero-waste proprietary technology’, consumes 82% less energy, 87% less carbon emissions, and 99% less water than synthetic and cotton manufacturing.


In [40]:
import sumy
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

In [41]:
parser = PlaintextParser.from_string(Original_Text,Tokenizer("english"))

summarizer = LexRankSummarizer()
#Summarize the document with 2 sentences
summary = summarizer(parser.document, 5)

for sentence in summary:
    print(sentence)

The chancellor is expected to freeze petrol duty and raise the stamp duty threshold from £60,000.
Treasury officials have said there will not be a pre-election giveaway, but Mr Brown is thought to have about £2bn to spare.
- Increase in the stamp duty threshold from £60,000 - A freeze on petrol duty - An extension of tax credit scheme for poorer families - Possible help for pensioners The stamp duty threshold rise is intended to help first time buyers - a likely theme of all three of the main parties' general election manifestos.
The Tories say whatever the chancellor gives away will be clawed back in higher taxes if Labour is returned to power.
"If Labour is elected there will be a very substantial tax increase in the Budget after the election, of the order of around £10bn."


In [53]:
def summarize_extractive(text):
    parser = PlaintextParser.from_string(text,Tokenizer("english"))

    summarizer = LexRankSummarizer()
    summary = summarizer(parser.document, 2)

    for sentence in summary:
        print(sentence)

In [54]:
summarize_extractive(sample_text1)

The Indian government has cleared the supply of several essential commodities to the Maldives, including items such as rice, wheat and onions whose exports are currently banned, amid a downturn in relations between the two sides.The government allowed the export of these commodities for 2024-25 under a bilateral mechanism at the request of the Maldivian government, the Indian high commission in Male said in a statement on Friday.
The approved quantities are also the highest since the mechanism was put in place in 1981.The clearance for the exports comes at a time when ties between India and the Maldives are at a low, especially after the election last year of President Mohamed Muizzu, who has sought to end the Indian archipelago’s dependence on India in strategic sectors.


In [55]:
summarize_extractive(sample_text2)

A recent episode of Shark Tank India saw a lucrative deal that involved all five sharks on the panel — Anupam Mittal, Aman Gupta, Azhar Iqubal, Namita Thapar and Radhika Gupta.
Two co-founders representing their biomaterial science company Canvaloop, which specialises in producing alternative fibres that can reduce the environmental toll left by cotton and synthetic, asked for Rs 1 crore in exchange of 1.33% equity, valuing the company at Rs 75 crore.The founders claimed that their alternative fibres, which are created from their ‘zero-waste proprietary technology’, consumes 82% less energy, 87% less carbon emissions, and 99% less water than synthetic and cotton manufacturing.
