# Trying out Spacy

In [1]:
import spacy
# Pkgs for Normalizing Text
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation
# Import Heapq for Finding the Top N Sentences
from heapq import nlargest

In [2]:
nlp = spacy.load('en_core_web_sm')

In [5]:
# First few paragraphs from this BBC article: https://www.bbc.com/news/election-us-2020-55134022
raw_text = """As the Trump White House reaches its final days, an eerie quiet has descended on the premises as attempts to challenge the election result founder in the courts. Brian Morgenstern, the deputy communications director, was wearing a jacket with a White House emblem in his office in the West Wing. The jacket was zipped all the way up, as if he were on his way out. The room, a few doors away from the Oval Office, was dark, with the shades drawn. His boss, the president, was in another part of the White House. In that moment, Donald Trump was on speaker phone with Rudy Giuliani, the head of his legal effort to challenge the election, and a group of state lawmakers who had gathered for a "hearing", as they put it, at a hotel in Gettysburg, Pennsylvania. "This election was rigged and we can't let that happen," the president said on the phone. Morgenstern was monitoring the event on his computer screen, in a distracted manner. A moment later he swivelled in his chair and spoke to a visitor about college, real estate, baseball, and, almost as an afterthought, the president's achievements. Trump's effort to contest the election results in Pennsylvania failed on Friday, not long after the so-called hearing, and even that had a shaky legal foundation. An appeals court judge said there was "no basis" for his challenge. A certification of ballots showed President-elect Joe Biden won the state by more than 80,000 votes. Being with Trump the day he lost the election The votes in Arizona were certified on Monday and in Wisconsin that could happen soon - both states Biden won. Government officials have started working towards a transition to the new administration, and the new president starts on 20 January."""

In [6]:
raw_text

'As the Trump White House reaches its final days, an eerie quiet has descended on the premises as attempts to challenge the election result founder in the courts. Brian Morgenstern, the deputy communications director, was wearing a jacket with a White House emblem in his office in the West Wing. The jacket was zipped all the way up, as if he were on his way out. The room, a few doors away from the Oval Office, was dark, with the shades drawn. His boss, the president, was in another part of the White House. In that moment, Donald Trump was on speaker phone with Rudy Giuliani, the head of his legal effort to challenge the election, and a group of state lawmakers who had gathered for a "hearing", as they put it, at a hotel in Gettysburg, Pennsylvania. "This election was rigged and we can\'t let that happen," the president said on the phone. Morgenstern was monitoring the event on his computer screen, in a distracted manner. A moment later he swivelled in his chair and spoke to a visitor a

In [7]:
docx = nlp(raw_text)
docx

As the Trump White House reaches its final days, an eerie quiet has descended on the premises as attempts to challenge the election result founder in the courts. Brian Morgenstern, the deputy communications director, was wearing a jacket with a White House emblem in his office in the West Wing. The jacket was zipped all the way up, as if he were on his way out. The room, a few doors away from the Oval Office, was dark, with the shades drawn. His boss, the president, was in another part of the White House. In that moment, Donald Trump was on speaker phone with Rudy Giuliani, the head of his legal effort to challenge the election, and a group of state lawmakers who had gathered for a "hearing", as they put it, at a hotel in Gettysburg, Pennsylvania. "This election was rigged and we can't let that happen," the president said on the phone. Morgenstern was monitoring the event on his computer screen, in a distracted manner. A moment later he swivelled in his chair and spoke to a visitor abo

In [16]:
for sentence in docx.sents:
    print(sentence)

As the Trump White House reaches its final days, an eerie quiet has descended on the premises as attempts to challenge the election result founder in the courts.
Brian Morgenstern, the deputy communications director, was wearing a jacket with a White House emblem in his office in the West Wing.
The jacket was zipped all the way up, as if he were on his way out.
The room, a few doors away from the Oval Office, was dark, with the shades drawn.
His boss, the president, was in another part of the White House.
In that moment, Donald Trump was on speaker phone with Rudy Giuliani, the head of his legal effort to challenge the election, and a group of state lawmakers who had gathered for a "hearing", as they put it, at a hotel in Gettysburg, Pennsylvania.
"This election was rigged and we can't let that happen," the president said on the phone.
Morgenstern was monitoring the event on his computer screen, in a distracted manner.
A moment later he swivelled in his chair and spoke to a visitor abo

In [11]:
stopwords = list(STOP_WORDS)
stopwords[:20]

['thereupon',
 'otherwise',
 'after',
 'beyond',
 'cannot',
 'in',
 'give',
 'ourselves',
 'less',
 'just',
 '’ve',
 'very',
 'onto',
 'same',
 'was',
 'nine',
 "'ll",
 'least',
 're',
 'quite']

In [12]:
word_frequencies = {}  
for word in docx:  
    if word.text not in stopwords:
        if word.text not in word_frequencies.keys():
            word_frequencies[word.text] = 1
        else:
            word_frequencies[word.text] += 1


In [13]:
word_frequencies

{'As': 1,
 'Trump': 4,
 'White': 3,
 'House': 3,
 'reaches': 1,
 'final': 1,
 'days': 1,
 ',': 25,
 'eerie': 1,
 'quiet': 1,
 'descended': 1,
 'premises': 1,
 'attempts': 1,
 'challenge': 3,
 'election': 5,
 'result': 1,
 'founder': 1,
 'courts': 1,
 '.': 14,
 'Brian': 1,
 'Morgenstern': 2,
 'deputy': 1,
 'communications': 1,
 'director': 1,
 'wearing': 1,
 'jacket': 2,
 'emblem': 1,
 'office': 1,
 'West': 1,
 'Wing': 1,
 'The': 3,
 'zipped': 1,
 'way': 2,
 'room': 1,
 'doors': 1,
 'away': 1,
 'Oval': 1,
 'Office': 1,
 'dark': 1,
 'shades': 1,
 'drawn': 1,
 'His': 1,
 'boss': 1,
 'president': 4,
 'In': 1,
 'moment': 2,
 'Donald': 1,
 'speaker': 1,
 'phone': 2,
 'Rudy': 1,
 'Giuliani': 1,
 'head': 1,
 'legal': 2,
 'effort': 2,
 'group': 1,
 'state': 2,
 'lawmakers': 1,
 'gathered': 1,
 '"': 6,
 'hearing': 2,
 'hotel': 1,
 'Gettysburg': 1,
 'Pennsylvania': 2,
 'This': 1,
 'rigged': 1,
 'let': 1,
 'happen': 2,
 'said': 2,
 'monitoring': 1,
 'event': 1,
 'computer': 1,
 'screen': 1,
 'dist

In [14]:
maximum_frequncy = max(word_frequencies.values())

In [18]:
for word in word_frequencies.keys():  
        word_frequencies[word] = (word_frequencies[word]/maximum_frequncy)

In [21]:
word_frequencies

{'As': 0.04,
 'Trump': 0.16,
 'White': 0.12,
 'House': 0.12,
 'reaches': 0.04,
 'final': 0.04,
 'days': 0.04,
 ',': 1.0,
 'eerie': 0.04,
 'quiet': 0.04,
 'descended': 0.04,
 'premises': 0.04,
 'attempts': 0.04,
 'challenge': 0.12,
 'election': 0.2,
 'result': 0.04,
 'founder': 0.04,
 'courts': 0.04,
 '.': 0.56,
 'Brian': 0.04,
 'Morgenstern': 0.08,
 'deputy': 0.04,
 'communications': 0.04,
 'director': 0.04,
 'wearing': 0.04,
 'jacket': 0.08,
 'emblem': 0.04,
 'office': 0.04,
 'West': 0.04,
 'Wing': 0.04,
 'The': 0.12,
 'zipped': 0.04,
 'way': 0.08,
 'room': 0.04,
 'doors': 0.04,
 'away': 0.04,
 'Oval': 0.04,
 'Office': 0.04,
 'dark': 0.04,
 'shades': 0.04,
 'drawn': 0.04,
 'His': 0.04,
 'boss': 0.04,
 'president': 0.16,
 'In': 0.04,
 'moment': 0.08,
 'Donald': 0.04,
 'speaker': 0.04,
 'phone': 0.08,
 'Rudy': 0.04,
 'Giuliani': 0.04,
 'head': 0.04,
 'legal': 0.08,
 'effort': 0.08,
 'group': 0.04,
 'state': 0.08,
 'lawmakers': 0.04,
 'gathered': 0.04,
 '"': 0.24,
 'hearing': 0.08,
 'hot

In [19]:
sentence_list = [sentence for sentence in docx.sents]
sentence_scores = {}  
for sent in sentence_list:  
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if len(sent.text.split(' ')) < 30:
                if sent not in sentence_scores.keys():
                    sentence_scores[sent] = word_frequencies[word.text.lower()]
                else:
                    sentence_scores[sent] += word_frequencies[word.text.lower()]

In [29]:
sentence_scores

{As the Trump White House reaches its final days, an eerie quiet has descended on the premises as attempts to challenge the election result founder in the courts.: 2.3200000000000003,
 Brian Morgenstern, the deputy communications director, was wearing a jacket with a White House emblem in his office in the West Wing.: 2.8800000000000003,
 The jacket was zipped all the way up, as if he were on his way out.: 1.84,
 The room, a few doors away from the Oval Office, was dark, with the shades drawn.: 3.8400000000000003,
 His boss, the president, was in another part of the White House.: 2.7600000000000002,
 "This election was rigged and we can't let that happen," the president said on the phone.: 2.72,
 Morgenstern was monitoring the event on his computer screen, in a distracted manner.: 1.8,
 A moment later he swivelled in his chair and spoke to a visitor about college, real estate, baseball, and, almost as an afterthought, the president's achievements.: 6.24,
 Trump's effort to contest the 

In [30]:
summarized_sentences = nlargest(7, sentence_scores, key=sentence_scores.get)
summarized_sentences

[A moment later he swivelled in his chair and spoke to a visitor about college, real estate, baseball, and, almost as an afterthought, the president's achievements.,
 The room, a few doors away from the Oval Office, was dark, with the shades drawn.,
 Trump's effort to contest the election results in Pennsylvania failed on Friday, not long after the so-called hearing, and even that had a shaky legal foundation.,
 Brian Morgenstern, the deputy communications director, was wearing a jacket with a White House emblem in his office in the West Wing.,
 His boss, the president, was in another part of the White House.,
 "This election was rigged and we can't let that happen," the president said on the phone.,
 As the Trump White House reaches its final days, an eerie quiet has descended on the premises as attempts to challenge the election result founder in the courts.]