<a href="https://colab.research.google.com/github/rulocastellanos/practice_data_science_ml/blob/main/Automated_Text_Summarization_Using_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [42]:
text = """In a world often dominated by negativity, it's important to remember the power of kindness and compassion. Small acts of kindness have the ability to brighten someone's day, uplift spirits, and create a ripple effect of positivity that can spread far and wide. Whether it's a smile to a stranger, a helping hand to a friend in need, or a thoughtful gesture to a colleague, every act of kindness has the potential to make a difference in someone's life.Beyond individual actions, there is also immense power in collective efforts to create positive change. When communities come together to support one another, incredible things can happen. From grassroots initiatives to global movements, people are uniting to tackle pressing social and environmental issues, driving meaningful progress and inspiring hope for a better future.It's also important to recognize the strength that lies within each and every one of us. We all have the ability to make a positive impact, no matter how small our actions may seem. By tapping into our innate compassion and empathy, we can cultivate a culture of kindness and empathy that enriches our lives and those around us.So let's embrace the power of kindness, and strive to make the world a better place one small act at a time. Together, we can create a brighter, more compassionate future for all."""

In [2]:
len(text)

1335

In [3]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [4]:
nlp = spacy.load('en_core_web_sm')

In [5]:
doc = nlp(text)

Tokenization and removing stop words.

In [8]:
#perform tokenization and text cleaning
tokens = [token.text.lower() for token in doc
          if not token.is_stop
          and not token.is_punct
          and token.text != '\n']

In [9]:
tokens

['world',
 'dominated',
 'negativity',
 'important',
 'remember',
 'power',
 'kindness',
 'compassion',
 'small',
 'acts',
 'kindness',
 'ability',
 'brighten',
 'day',
 'uplift',
 'spirits',
 'create',
 'ripple',
 'effect',
 'positivity',
 'spread',
 'far',
 'wide',
 'smile',
 'stranger',
 'helping',
 'hand',
 'friend',
 'need',
 'thoughtful',
 'gesture',
 'colleague',
 'act',
 'kindness',
 'potential',
 'difference',
 'life',
 'individual',
 'actions',
 'immense',
 'power',
 'collective',
 'efforts',
 'create',
 'positive',
 'change',
 'communities',
 'come',
 'support',
 'incredible',
 'things',
 'happen',
 'grassroots',
 'initiatives',
 'global',
 'movements',
 'people',
 'uniting',
 'tackle',
 'pressing',
 'social',
 'environmental',
 'issues',
 'driving',
 'meaningful',
 'progress',
 'inspiring',
 'hope',
 'better',
 'future',
 'important',
 'recognize',
 'strength',
 'lies',
 'ability',
 'positive',
 'impact',
 'matter',
 'small',
 'actions',
 'tapping',
 'innate',
 'compassion'

In [10]:
#Other way of removing stop words

tokens1 = []
stopwords = list(STOP_WORDS)
allowed_pos = ['ADJ', 'PROPN', 'VERB', 'NOUN']
for token in doc:
  if token.text in stopwords or token.text in punctuation:
    continue
  if token.pos_ in allowed_pos:
    tokens1.append(token.text)

In [13]:
len(tokens)

105

In [12]:
len(tokens1)

103

Calculate the frecuency of each word

In [14]:
from collections import Counter

In [18]:
word_freq = Counter(tokens)

In [19]:
max_freq = max(word_freq.values())

In [20]:
max_freq

5

In [22]:
#normalizing each word between 0 and 1
#the most frequent word has a frequency of 1.
for word in word_freq.keys():
  word_freq[word] =  word_freq[word] / max_freq

In [23]:
word_freq

Counter({'world': 0.4,
         'dominated': 0.2,
         'negativity': 0.2,
         'important': 0.4,
         'remember': 0.2,
         'power': 0.6,
         'kindness': 1.0,
         'compassion': 0.4,
         'small': 0.6,
         'acts': 0.2,
         'ability': 0.4,
         'brighten': 0.2,
         'day': 0.2,
         'uplift': 0.2,
         'spirits': 0.2,
         'create': 0.6,
         'ripple': 0.2,
         'effect': 0.2,
         'positivity': 0.2,
         'spread': 0.2,
         'far': 0.2,
         'wide': 0.2,
         'smile': 0.2,
         'stranger': 0.2,
         'helping': 0.2,
         'hand': 0.2,
         'friend': 0.2,
         'need': 0.2,
         'thoughtful': 0.2,
         'gesture': 0.2,
         'colleague': 0.2,
         'act': 0.4,
         'potential': 0.2,
         'difference': 0.2,
         'life': 0.2,
         'individual': 0.2,
         'actions': 0.4,
         'immense': 0.2,
         'collective': 0.2,
         'efforts': 0.2,
        

Sentence tokenization
Creating a score for each sentence

In [27]:
sent_token = [sent.text for sent in doc.sents]
sent_token

["In a world often dominated by negativity, it's important to remember the power of kindness and compassion.",
 "Small acts of kindness have the ability to brighten someone's day, uplift spirits, and create a ripple effect of positivity that can spread far and wide.",
 "Whether it's a smile to a stranger, a helping hand to a friend in need, or a thoughtful gesture to a colleague, every act of kindness has the potential to make a difference in someone's life.",
 'Beyond individual actions, there is also immense power in collective efforts to create positive change.',
 'When communities come together to support one another, incredible things can happen.',
 'From grassroots initiatives to global movements, people are uniting to tackle pressing social and environmental issues, driving meaningful progress and inspiring hope for a better future.',
 "It's also important to recognize the strength that lies within each and every one of us.",
 'We all have the ability to make a positive impact, 

In [29]:
sent_score = {}
for sent in sent_token:
  for word in sent.split():
    if word.lower() in word_freq.keys():
      if sent not in sent_score.keys():
        sent_score[sent] = word_freq[word]
      else:
        sent_score[sent] += word_freq[word]

sent_score

{"In a world often dominated by negativity, it's important to remember the power of kindness and compassion.": 2.8,
 "Small acts of kindness have the ability to brighten someone's day, uplift spirits, and create a ripple effect of positivity that can spread far and wide.": 3.600000000000001,
 "Whether it's a smile to a stranger, a helping hand to a friend in need, or a thoughtful gesture to a colleague, every act of kindness has the potential to make a difference in someone's life.": 3.0000000000000004,
 'Beyond individual actions, there is also immense power in collective efforts to create positive change.': 2.4,
 'When communities come together to support one another, incredible things can happen.': 1.0,
 'From grassroots initiatives to global movements, people are uniting to tackle pressing social and environmental issues, driving meaningful progress and inspiring hope for a better future.': 3.2,
 "It's also important to recognize the strength that lies within each and every one of 

In [30]:
import pandas as pd

In [31]:
pd.DataFrame (list(sent_score.items()),
              columns= ['Sentence', 'Score'])

Unnamed: 0,Sentence,Score
0,"In a world often dominated by negativity, it's...",2.8
1,Small acts of kindness have the ability to bri...,3.6
2,"Whether it's a smile to a stranger, a helping ...",3.0
3,"Beyond individual actions, there is also immen...",2.4
4,When communities come together to support one ...,1.0
5,From grassroots initiatives to global movement...,3.2
6,It's also important to recognize the strength ...,1.0
7,We all have the ability to make a positive imp...,2.0
8,By tapping into our innate compassion and empa...,3.0
9,"So let's embrace the power of kindness, and st...",3.0


In [32]:
from heapq import nlargest

In [37]:
num_sentences = 3
n = nlargest(num_sentences, sent_score, key = sent_score.get)

In [38]:
" ".join(n)

"Small acts of kindness have the ability to brighten someone's day, uplift spirits, and create a ripple effect of positivity that can spread far and wide. From grassroots initiatives to global movements, people are uniting to tackle pressing social and environmental issues, driving meaningful progress and inspiring hope for a better future. Whether it's a smile to a stranger, a helping hand to a friend in need, or a thoughtful gesture to a colleague, every act of kindness has the potential to make a difference in someone's life."

Summarization using transformers

In [39]:
from transformers import  pipeline

In [40]:
summarizer = pipeline("summarization",
         model = 't5-base',
         tokenizer ='t5-base',
         framework = 'pt')

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

In [44]:
summary = summarizer(text,
           max_length = 100,
           min_length = 10,
           do_sample = False )

In [45]:
summary

[{'summary_text': "small acts of kindness can brighten someone's day, uplift spirits, and create a ripple effect of positivity . when communities come together to support one another, incredible things can happen . we all have the ability to make a positive impact, no matter how small our actions may seem ."}]

In [46]:
print(summary[0]['summary_text'])

small acts of kindness can brighten someone's day, uplift spirits, and create a ripple effect of positivity . when communities come together to support one another, incredible things can happen . we all have the ability to make a positive impact, no matter how small our actions may seem .
