In [1]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [2]:
text = """Samsung recently cancelled its in-person MWC 2021 event, instead, committing to an online-only format. The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

The event will be held on June 28 at 17:15 UTC (22:45 IST) and will be live-streamed on YouTube. In its release, Samsung says that it will introduce its “ever-expanding Galaxy device ecosystem”. Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.

Samsung will also showcase its vision for the future of smartwatches to provide new experiences for users and new opportunities for developers. Samsung also shared an image for the event with silhouettes of a smartwatch, a smartphone, a tablet and a laptop."""

- Stop words are the words in a stop list (or stoplist or negative dictionary) which are filtered out (i.e. stopped) before or after processing of natural language data (text) because they are insignificant.

- So we have to drop stop_words as well as punctuations

In [3]:
stopwords = list(STOP_WORDS)

print(stopwords)

['me', 'out', 'somehow', 'whither', 'five', "'re", 'along', 'used', 'whoever', 'twenty', 'they', 'give', 'nowhere', 'sixty', 'same', 'and', 'regarding', 'one', 'it', '‘d', 'some', 'those', 'beyond', 'his', 'after', 'else', 'herein', 'again', 'yourself', 'such', 'anyway', 'my', 'always', 'hereupon', 'something', 'using', 'our', 'whenever', 'also', 'into', 'would', '‘s', 'down', 'never', '’m', 'yet', 'nothing', 'moreover', 'often', 'for', 'somewhere', 'he', 'sometimes', 'yourselves', 'perhaps', 'did', 'someone', "'m", 'former', 'ca', 'itself', 'name', 'please', 'have', 'should', 'might', 'can', 'anyone', 'first', 'ours', 'put', 'we', 'n’t', 'otherwise', 'sometime', 'up', 'was', 'before', 'side', 'both', 'thereafter', 'why', 'at', 'further', 'noone', 'rather', 'thence', 'ever', 'or', 'besides', "'d", 'latterly', 'say', 'hence', 'therein', '’ll', 'even', 'has', 'once', 'which', 'not', 'herself', 'in', '‘re', 'another', 'indeed', 'became', 'to', 'last', 'more', 'whereby', 'anything', 'witho

In [4]:
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)

print(doc)

Samsung recently cancelled its in-person MWC 2021 event, instead, committing to an online-only format. The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

The event will be held on June 28 at 17:15 UTC (22:45 IST) and will be live-streamed on YouTube. In its release, Samsung says that it will introduce its “ever-expanding Galaxy device ecosystem”. Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.

Samsung will also showcase its vision for the future of smartwatches to provide new experiences for users and new opportunities for developers. Samsung also shared an image for the event with silhouettes of a smartwatch, a smartphone, a tablet and a laptop.


- storing each word of 'doc' into list tokens (here we are considering each word as a token).

In [5]:
tokens = [token.text for token in doc]

print(tokens)

['Samsung', 'recently', 'cancelled', 'its', 'in', '-', 'person', 'MWC', '2021', 'event', ',', 'instead', ',', 'committing', 'to', 'an', 'online', '-', 'only', 'format', '.', 'The', 'South', 'Korean', 'tech', 'giant', 'recently', 'made', 'it', 'official', ',', 'setting', 'a', 'time', 'and', 'date', 'for', 'the', 'Samsung', 'Galaxy', 'MWC', 'Virtual', 'Event', '.', '\n\n', 'The', 'event', 'will', 'be', 'held', 'on', 'June', '28', 'at', '17:15', 'UTC', '(', '22:45', 'IST', ')', 'and', 'will', 'be', 'live', '-', 'streamed', 'on', 'YouTube', '.', 'In', 'its', 'release', ',', 'Samsung', 'says', 'that', 'it', 'will', 'introduce', 'its', '“', 'ever', '-', 'expanding', 'Galaxy', 'device', 'ecosystem', '”', '.', 'Samsung', 'also', 'plans', 'to', 'present', 'the', 'latest', 'technologies', 'and', 'innovation', 'efforts', 'in', 'relation', 'to', 'the', 'growing', 'importance', 'of', 'smart', 'device', 'security', '.', '\n\n', 'Samsung', 'will', 'also', 'showcase', 'its', 'vision', 'for', 'the', 'f

- Creating a dictionary of word frequency

In [6]:
word_freq = {}
for word in doc:
    # picking those words which aren't stopwords and punctuations
    if word.text.lower() not in stopwords and word.text.lower() not in punctuation:
        if word.text not in word_freq.keys():
            word_freq[word.text] = 1
        else:
            word_freq[word.text] += 1
            
print(word_freq)

{'Samsung': 6, 'recently': 2, 'cancelled': 1, 'person': 1, 'MWC': 2, '2021': 1, 'event': 3, 'instead': 1, 'committing': 1, 'online': 1, 'format': 1, 'South': 1, 'Korean': 1, 'tech': 1, 'giant': 1, 'official': 1, 'setting': 1, 'time': 1, 'date': 1, 'Galaxy': 2, 'Virtual': 1, 'Event': 1, '\n\n': 2, 'held': 1, 'June': 1, '28': 1, '17:15': 1, 'UTC': 1, '22:45': 1, 'IST': 1, 'live': 1, 'streamed': 1, 'YouTube': 1, 'release': 1, 'says': 1, 'introduce': 1, '“': 1, 'expanding': 1, 'device': 2, 'ecosystem': 1, '”': 1, 'plans': 1, 'present': 1, 'latest': 1, 'technologies': 1, 'innovation': 1, 'efforts': 1, 'relation': 1, 'growing': 1, 'importance': 1, 'smart': 1, 'security': 1, 'showcase': 1, 'vision': 1, 'future': 1, 'smartwatches': 1, 'provide': 1, 'new': 2, 'experiences': 1, 'users': 1, 'opportunities': 1, 'developers': 1, 'shared': 1, 'image': 1, 'silhouettes': 1, 'smartwatch': 1, 'smartphone': 1, 'tablet': 1, 'laptop': 1}


- Extracting the word which have highest frequency

In [7]:
max_freq = max(word_freq.values())

print(max_freq)

6


- Evaluating Normalized frequency (i.e., `current_word_frequency / maximum_frequency`)

In [8]:
for word in word_freq.keys():
    word_freq[word] = word_freq[word]/max_freq
    
print(word_freq)

{'Samsung': 1.0, 'recently': 0.3333333333333333, 'cancelled': 0.16666666666666666, 'person': 0.16666666666666666, 'MWC': 0.3333333333333333, '2021': 0.16666666666666666, 'event': 0.5, 'instead': 0.16666666666666666, 'committing': 0.16666666666666666, 'online': 0.16666666666666666, 'format': 0.16666666666666666, 'South': 0.16666666666666666, 'Korean': 0.16666666666666666, 'tech': 0.16666666666666666, 'giant': 0.16666666666666666, 'official': 0.16666666666666666, 'setting': 0.16666666666666666, 'time': 0.16666666666666666, 'date': 0.16666666666666666, 'Galaxy': 0.3333333333333333, 'Virtual': 0.16666666666666666, 'Event': 0.16666666666666666, '\n\n': 0.3333333333333333, 'held': 0.16666666666666666, 'June': 0.16666666666666666, '28': 0.16666666666666666, '17:15': 0.16666666666666666, 'UTC': 0.16666666666666666, '22:45': 0.16666666666666666, 'IST': 0.16666666666666666, 'live': 0.16666666666666666, 'streamed': 0.16666666666666666, 'YouTube': 0.16666666666666666, 'release': 0.1666666666666666

- Storing sentence tokens in list

In [9]:
sent_tokens=[sent for sent in doc.sents]

print(sent_tokens)

[Samsung recently cancelled its in-person MWC 2021 event, instead, committing to an online-only format., The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

, The event will be held on June 28 at 17:15 UTC (22:45 IST) and will be live-streamed on YouTube., In its release, Samsung says that it will introduce its “ever-expanding Galaxy device ecosystem”., Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.

, Samsung will also showcase its vision for the future of smartwatches to provide new experiences for users and new opportunities for developers., Samsung also shared an image for the event with silhouettes of a smartwatch, a smartphone, a tablet and a laptop.]


- Addition of normalized fequency of each word of a sentence

In [10]:
sent_scores={}
for sent in sent_tokens:
    for word in sent:
        if word.text in word_freq.keys():
            if sent not in sent_scores.keys():
                sent_scores[sent]=word_freq[word.text]
            else:
                sent_scores[sent]+=word_freq[word.text]
                
print(sent_scores)

{Samsung recently cancelled its in-person MWC 2021 event, instead, committing to an online-only format.: 3.3333333333333326, The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

: 4.0, The event will be held on June 28 at 17:15 UTC (22:45 IST) and will be live-streamed on YouTube.: 2.1666666666666665, In its release, Samsung says that it will introduce its “ever-expanding Galaxy device ecosystem”.: 2.8333333333333335, Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.

: 3.5, Samsung will also showcase its vision for the future of smartwatches to provide new experiences for users and new opportunities for developers.: 3.1666666666666665, Samsung also shared an image for the event with silhouettes of a smartwatch, a smartphone, a tablet and a laptop.: 2.666666666666666}


- Length of summary = 30% of actual paragraph

In [11]:
select_len=int(len(sent_tokens)*0.3)

print(select_len)

2


- Preparing Summary.
- Takes those sentences(i.e., 'select_len' no of sentences) which have highest accuracy score

In [12]:
from heapq import nlargest

summary=nlargest(select_len,sent_scores,key=sent_scores.get)

print(summary)

[The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

, Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.

]


- Final summary(joining sentences from `summary` list)

In [13]:
final_summary=[word.text for word in summary]
summary=' '.join(final_summary)

print(summary)

The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

 Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.




In [14]:
print(text)

Samsung recently cancelled its in-person MWC 2021 event, instead, committing to an online-only format. The South Korean tech giant recently made it official, setting a time and date for the Samsung Galaxy MWC Virtual Event.

The event will be held on June 28 at 17:15 UTC (22:45 IST) and will be live-streamed on YouTube. In its release, Samsung says that it will introduce its “ever-expanding Galaxy device ecosystem”. Samsung also plans to present the latest technologies and innovation efforts in relation to the growing importance of smart device security.

Samsung will also showcase its vision for the future of smartwatches to provide new experiences for users and new opportunities for developers. Samsung also shared an image for the event with silhouettes of a smartwatch, a smartphone, a tablet and a laptop.


In [15]:
print("Length of original text: ",len(text.split(' ')))
print("Length of summary text: ",len(summary.split(' ')))

Length of original text:  129
Length of summary text:  42
