<a href="https://colab.research.google.com/github/manocharith/Text_Summarization/blob/main/160121733120.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import requests
from bs4 import BeautifulSoup

In [None]:
def extract_text_from_article(url, max_chars=None):
    try:
        response = requests.get(url)
        soup = BeautifulSoup(response.text, 'html.parser')
        article_text = ''
        for paragraph in soup.find_all('p'):
            article_text += paragraph.text + '\n'
            if max_chars and len(article_text) >= max_chars:
                break
        return article_text[:max_chars] if max_chars else article_text
    except Exception as e:
        return None

In [None]:
article_url = 'https://en.wikipedia.org/wiki/Kakatiya_dynasty#:~:text=The%20Kakatiya%20dynasty%20(IAST%3A%20K%C4%81kat%C4%ABya,Tamil%20Nadu%2C%20and%20southern%20Odisha.'
max_chars = 1000

In [None]:
tex = extract_text_from_article(article_url, max_chars)

In [None]:
print(tex if tex else "Failed to extract text from the article.")





The Kakatiya dynasty (IAST: Kākatīya)[a] was an Indian dynasty that ruled most of eastern Deccan region in present-day India between 12th and 14th centuries.[6] Their territory comprised much of the present day Telangana and Andhra Pradesh, and parts of eastern Karnataka, northern Tamil Nadu, and southern Odisha.[7][8] Their capital was Orugallu, now known as Warangal.The Kakatiya rulers traced their ancestry to a legendary chief or ruler named Durjaya, a descendant of Karikala Chola.  

Early Kakatiya rulers served as feudatories to Rashtrakutas and Western Chalukyas for more than two centuries. They assumed sovereignty under Prataparudra I in 1163 CE by suppressing other Chalukya subordinates in the Telangana region.[9] Ganapati Deva (r. 1199–1262) significantly expanded Kakatiya lands during the 1230s and brought under Kakatiya control the Telugu-speaking lowland delta areas around the Godavari and Krishna rivers. Ganapati Deva was succeeded by Rudrama Devi (r. 1262–1289) who is

In [None]:
len(tex)

1000

In [None]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [None]:
nlp = spacy.load('en_core_web_sm')

In [None]:
doc = nlp(tex)

In [None]:
tokens = [token.text.lower() for token in doc
          if not token.is_stop and
          not token.is_punct]

In [None]:
tokens

['\n\n\n\n',
 'kakatiya',
 'dynasty',
 'iast',
 'kākatīya)[a',
 'indian',
 'dynasty',
 'ruled',
 'eastern',
 'deccan',
 'region',
 'present',
 'day',
 'india',
 '12th',
 '14th',
 'centuries.[6',
 'territory',
 'comprised',
 'present',
 'day',
 'telangana',
 'andhra',
 'pradesh',
 'parts',
 'eastern',
 'karnataka',
 'northern',
 'tamil',
 'nadu',
 'southern',
 'odisha.[7][8',
 'capital',
 'orugallu',
 'known',
 'warangal',
 'kakatiya',
 'rulers',
 'traced',
 'ancestry',
 'legendary',
 'chief',
 'ruler',
 'named',
 'durjaya',
 'descendant',
 'karikala',
 'chola',
 ' \n\n',
 'early',
 'kakatiya',
 'rulers',
 'served',
 'feudatories',
 'rashtrakutas',
 'western',
 'chalukyas',
 'centuries',
 'assumed',
 'sovereignty',
 'prataparudra',
 '1163',
 'ce',
 'suppressing',
 'chalukya',
 'subordinates',
 'telangana',
 'region.[9',
 'ganapati',
 'deva',
 'r.',
 '1199–1262',
 'significantly',
 'expanded',
 'kakatiya',
 'lands',
 '1230s',
 'brought',
 'kakatiya',
 'control',
 'telugu',
 'speaking',
 

In [None]:
from collections import Counter

In [None]:
wor_fre = Counter(tokens)

In [None]:
wor_fre

Counter({'\n\n\n\n': 1,
         'kakatiya': 5,
         'dynasty': 2,
         'iast': 1,
         'kākatīya)[a': 1,
         'indian': 1,
         'ruled': 1,
         'eastern': 2,
         'deccan': 1,
         'region': 1,
         'present': 2,
         'day': 2,
         'india': 1,
         '12th': 1,
         '14th': 1,
         'centuries.[6': 1,
         'territory': 1,
         'comprised': 1,
         'telangana': 2,
         'andhra': 1,
         'pradesh': 1,
         'parts': 1,
         'karnataka': 1,
         'northern': 1,
         'tamil': 1,
         'nadu': 1,
         'southern': 1,
         'odisha.[7][8': 1,
         'capital': 1,
         'orugallu': 1,
         'known': 1,
         'warangal': 1,
         'rulers': 2,
         'traced': 1,
         'ancestry': 1,
         'legendary': 1,
         'chief': 1,
         'ruler': 1,
         'named': 1,
         'durjaya': 1,
         'descendant': 1,
         'karikala': 1,
         'chola': 1,
         ' \n\n'

In [None]:
max_fre = max(wor_fre.values())

In [None]:
max_fre

5

In [None]:
for wor in wor_fre.keys():
  wor_fre[wor] = wor_fre[wor]/max_fre

In [None]:
wor_fre

Counter({'\n\n\n\n': 0.04,
         'kakatiya': 0.2,
         'dynasty': 0.08,
         'iast': 0.04,
         'kākatīya)[a': 0.04,
         'indian': 0.04,
         'ruled': 0.04,
         'eastern': 0.08,
         'deccan': 0.04,
         'region': 0.04,
         'present': 0.08,
         'day': 0.08,
         'india': 0.04,
         '12th': 0.04,
         '14th': 0.04,
         'centuries.[6': 0.04,
         'territory': 0.04,
         'comprised': 0.04,
         'telangana': 0.08,
         'andhra': 0.04,
         'pradesh': 0.04,
         'parts': 0.04,
         'karnataka': 0.04,
         'northern': 0.04,
         'tamil': 0.04,
         'nadu': 0.04,
         'southern': 0.04,
         'odisha.[7][8': 0.04,
         'capital': 0.04,
         'orugallu': 0.04,
         'known': 0.04,
         'warangal': 0.04,
         'rulers': 0.08,
         'traced': 0.04,
         'ancestry': 0.04,
         'legendary': 0.04,
         'chief': 0.04,
         'ruler': 0.04,
         'named': 

In [None]:
sent_tok = [sent.text for sent in doc.sents]

In [None]:
sent_tok

['\n\n\n\nThe Kakatiya dynasty (IAST: Kākatīya)[a] was an Indian dynasty that ruled most of eastern Deccan region in present-day India between 12th and 14th centuries.[6]',
 'Their territory comprised much of the present day Telangana and Andhra Pradesh, and parts of eastern Karnataka, northern Tamil Nadu, and southern Odisha.[7][8]',
 'Their capital was Orugallu, now known as Warangal.',
 'The Kakatiya rulers traced their ancestry to a legendary chief or ruler named Durjaya, a descendant of Karikala Chola.  \n\n',
 'Early Kakatiya rulers served as feudatories to Rashtrakutas and Western Chalukyas for more than two centuries.',
 'They assumed sovereignty under Prataparudra I in 1163 CE by suppressing other Chalukya subordinates in the Telangana region.[9]',
 'Ganapati Deva (r. 1199–1262) significantly expanded Kakatiya lands during the 1230s and brought under Kakatiya control the Telugu-speaking lowland delta areas around the Godavari and Krishna rivers.',
 'Ganapati Deva was succeeded

In [None]:
sent_scor = {}

for sent in sent_tok:
  for wor in sent.split():
    if wor.lower() in wor_fre.keys():
      if sent not in sent_scor.keys():
        sent_scor[sent] = wor_fre[wor.lower()]
      else:
        sent_scor[sent] += wor_fre[wor.lower()]
      print(wor)

Kakatiya
dynasty
Indian
dynasty
ruled
eastern
Deccan
region
India
12th
14th
territory
comprised
present
day
Telangana
Andhra
parts
eastern
northern
Tamil
southern
capital
known
Kakatiya
rulers
traced
ancestry
legendary
chief
ruler
named
descendant
Karikala
Early
Kakatiya
rulers
served
feudatories
Rashtrakutas
Western
Chalukyas
assumed
sovereignty
Prataparudra
1163
CE
suppressing
Chalukya
subordinates
Telangana
Ganapati
Deva
significantly
expanded
Kakatiya
lands
1230s
brought
Kakatiya
control
lowland
delta
areas
Godavari
Krishna
Ganapati
Deva
succeeded
Rudrama
Devi


In [None]:
sent_scor

{'\n\n\n\nThe Kakatiya dynasty (IAST: Kākatīya)[a] was an Indian dynasty that ruled most of eastern Deccan region in present-day India between 12th and 14th centuries.[6]': 0.7200000000000002,
 'Their territory comprised much of the present day Telangana and Andhra Pradesh, and parts of eastern Karnataka, northern Tamil Nadu, and southern Odisha.[7][8]': 0.6000000000000001,
 'Their capital was Orugallu, now known as Warangal.': 0.08,
 'The Kakatiya rulers traced their ancestry to a legendary chief or ruler named Durjaya, a descendant of Karikala Chola.  \n\n': 0.6,
 'Early Kakatiya rulers served as feudatories to Rashtrakutas and Western Chalukyas for more than two centuries.': 0.5199999999999999,
 'They assumed sovereignty under Prataparudra I in 1163 CE by suppressing other Chalukya subordinates in the Telangana region.[9]': 0.4,
 'Ganapati Deva (r. 1199–1262) significantly expanded Kakatiya lands during the 1230s and brought under Kakatiya control the Telugu-speaking lowland delta a

In [None]:
import pandas as pd

In [None]:
pd.DataFrame(list(sent_scor.items()),columns = ['Sentence', 'Score'])

Unnamed: 0,Sentence,Score
0,\n\n\n\nThe Kakatiya dynasty (IAST: Kākatīya)[...,0.72
1,Their territory comprised much of the present ...,0.6
2,"Their capital was Orugallu, now known as Waran...",0.08
3,The Kakatiya rulers traced their ancestry to a...,0.6
4,Early Kakatiya rulers served as feudatories to...,0.52
5,They assumed sovereignty under Prataparudra I ...,0.4
6,Ganapati Deva (r. 1199–1262) significantly exp...,1.0
7,Ganapati Deva was succeeded by Rudrama Devi (r...,0.28


In [None]:
from heapq import nlargest

In [None]:
num_sen = 3
temp = nlargest(num_sen, sent_scor, key = sent_scor.get)

In [None]:
" ".join(temp)

'Ganapati Deva (r. 1199–1262) significantly expanded Kakatiya lands during the 1230s and brought under Kakatiya control the Telugu-speaking lowland delta areas around the Godavari and Krishna rivers. \n\n\n\nThe Kakatiya dynasty (IAST: Kākatīya)[a] was an Indian dynasty that ruled most of eastern Deccan region in present-day India between 12th and 14th centuries.[6] Their territory comprised much of the present day Telangana and Andhra Pradesh, and parts of eastern Karnataka, northern Tamil Nadu, and southern Odisha.[7][8]'

In [None]:
from transformers import pipeline

In [None]:
summarizer = pipeline('summarization', model = 't5-base', tokenizer = 't5-base', framework = 'pt')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [None]:
summary = summarizer(tex, max_length = 150, min_length = 20, do_sample = False)

In [None]:
print(summary[0]['summary_text'])

the Kakatiya dynasty ruled most of eastern Deccan region in present-day India . their territory comprised much of the present day Telangana and Andhra Pradesh . they assumed sovereignty under Prataparudra I in 1163 CE .
