# Introduction

I will summarize the debates specifically for both  politicians. After applying text pre-processing and sentence tokenization I will build histogram and calculate the weights of words and sentence scores. Afterwards I will print out the summarization of the speeches.

Steps for this notebook are:
1. Pre-processing the text
2. Tokenizing text into sentences
3. Building the Histogram
4. Calculating the sentences scores
5. Getting the summary

# Importing Libraries

In [1]:
import pandas as pd
import nltk
import re
nltk.download("stopwords")
nltk.download('punkt')
import heapq

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 1st Presidential Debate

In [3]:
first_corpus = pd.read_pickle("/content/drive/MyDrive/Data Science/us election presidential debates/pickles/first_whole_corpus.pkl")


### Donald Trump 

In [4]:
DT_text = first_corpus.loc["Donald Trump"]["transcript"]
DT_text

'How are you doing?, Thank you very much, Chris. I will tell you very simply. We won the election. Elections have consequences. We have the Senate, we have the White House, and we have a phenomenal nominee respected by all. Top, top academic, good in every way. Good in every way. In fact, some of her biggest endorsers are very liberal people from Notre Dame and other places. So I think she’s going to be fantastic. We have plenty of time. Even if we did it after the election itself. I have a lot of time after the election, as you know. So I think that she will be outstanding. She’s going to be as good as anybody that has served on that court. We really feel that. We have a professor at Notre Dame, highly respected by all, said she’s the single greatest student he’s ever had. He’s been a professor for a long time at a great school., And we won the election and therefore we have the right to choose her, and very few people knowingly would say otherwise. And by the way, the Democrats, they

#### Pre-processing the text

In [5]:
DT_text = re.sub(r'\[[0-9]*\]',' ',DT_text)
DT_text = re.sub(r'\s+',' ',DT_text)
clean_DT = DT_text.lower()
clean_DT = re.sub(r'\W',' ',clean_DT) 
clean_DT = re.sub(r'\d',' ',clean_DT) 
clean_DT = re.sub(r'\s+',' ',clean_DT)
clean_DT

'how are you doing thank you very much chris i will tell you very simply we won the election elections have consequences we have the senate we have the white house and we have a phenomenal nominee respected by all top top academic good in every way good in every way in fact some of her biggest endorsers are very liberal people from notre dame and other places so i think she s going to be fantastic we have plenty of time even if we did it after the election itself i have a lot of time after the election as you know so i think that she will be outstanding she s going to be as good as anybody that has served on that court we really feel that we have a professor at notre dame highly respected by all said she s the single greatest student he s ever had he s been a professor for a long time at a great school and we won the election and therefore we have the right to choose her and very few people knowingly would say otherwise and by the way the democrats they wouldn t even think about not do

#### Tokenizing text into sentences

In [6]:
sentences_DT = nltk.sent_tokenize(DT_text)

In [7]:
stop_words = nltk.corpus.stopwords.words('english')
stop_words.remove("no")
stop_words.remove("not")

#### Building the Histogram

In [8]:
word2count = {}
for word in nltk.word_tokenize(clean_DT):
    if word not in stop_words:
        if word not in word2count.keys():
            word2count[word] = 1
        else:
            word2count[word] += 1 

In [9]:
# histogram
word2count

{'able': 1,
 'absolutely': 3,
 'academic': 1,
 'accept': 1,
 'accord': 1,
 'according': 2,
 'acres': 1,
 'act': 1,
 'actually': 1,
 'addition': 1,
 'administration': 4,
 'afford': 1,
 'afraid': 1,
 'african': 2,
 'ago': 3,
 'agree': 3,
 'agreed': 4,
 'ahead': 5,
 'air': 4,
 'airports': 2,
 'alcohol': 1,
 'alcoholism': 1,
 'allow': 3,
 'allowed': 1,
 'almost': 7,
 'along': 3,
 'already': 7,
 'also': 9,
 'american': 1,
 'americans': 1,
 'announced': 1,
 'another': 1,
 'answer': 4,
 'antifa': 3,
 'anybody': 2,
 'anything': 5,
 'appeals': 1,
 'approval': 1,
 'approximately': 1,
 'area': 1,
 'around': 1,
 'ask': 2,
 'asked': 1,
 'asking': 1,
 'aspect': 1,
 'assets': 1,
 'ate': 2,
 'avenue': 1,
 'away': 5,
 'back': 17,
 'bad': 7,
 'badly': 2,
 'ballot': 10,
 'ballots': 13,
 'baltimore': 1,
 'ban': 2,
 'bank': 1,
 'basket': 2,
 'bastards': 1,
 'beau': 1,
 'beautiful': 1,
 'became': 3,
 'become': 1,
 'behind': 2,
 'believe': 3,
 'bernie': 4,
 'best': 2,
 'better': 6,
 'biden': 1,
 'big': 10,
 

In [10]:
for key in word2count.keys():
    word2count[key] = word2count[key]/max(word2count.values())

In [11]:
# weighted histogram
word2count

{'able': 0.07692307692307693,
 'absolutely': 0.07894736842105263,
 'academic': 0.014925373134328358,
 'accept': 0.07692307692307693,
 'accord': 0.07692307692307693,
 'according': 0.05,
 'acres': 0.07692307692307693,
 'act': 0.07692307692307693,
 'actually': 0.07692307692307693,
 'addition': 0.07692307692307693,
 'administration': 0.26666666666666666,
 'afford': 0.07692307692307693,
 'afraid': 0.07692307692307693,
 'african': 0.13333333333333333,
 'ago': 0.075,
 'agree': 0.125,
 'agreed': 0.16666666666666666,
 'ahead': 0.13157894736842105,
 'air': 0.3076923076923077,
 'airports': 0.10526315789473684,
 'alcohol': 0.05263157894736842,
 'alcoholism': 0.05263157894736842,
 'allow': 0.125,
 'allowed': 0.5,
 'almost': 0.3684210526315789,
 'along': 0.2,
 'already': 0.175,
 'also': 0.225,
 'american': 0.06666666666666667,
 'americans': 0.06666666666666667,
 'announced': 0.3333333333333333,
 'another': 0.05263157894736842,
 'answer': 0.16666666666666666,
 'antifa': 0.23076923076923078,
 'anybody

#### Calculating the sentences scores

In [12]:
sent2score = {}
for sentence in sentences_DT:
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word2count.keys():
            if len(sentence.split(' ')) < 50:
                if sentence not in sent2score.keys():
                    sent2score[sentence] = word2count[word]
                else:
                    sent2score[sentence] += word2count[word]

In [13]:
sent2score

{'47 years you’ve done nothing., Let me just tell you something, Joe.': 2.8379811468970937,
 'A fixing of the VA which was a mess under him, 308,000 people died because they didn’t have proper health care.': 2.0068151147098514,
 'A rebuilding of the military, including Space Force and all of the other things.': 1.0426113360323888,
 'A solicited ballot, okay, solicited, is okay.': 2.4631578947368418,
 'African-Americans are super predators and they’ve never forgotten it.': 1.4364035087719298,
 'Again, two million people would be dead now instead of… Still, 204,000 people is too much.': 4.2318146111547525,
 'All you had to do is turn on the lights and you pick up a lot.': 0.6000000000000001,
 'Also, they took over something that was down here.': 0.6285087719298246,
 'An election could be won or lost with that.': 0.8765383608274417,
 'And I am urging my people.': 2.0,
 'And I think we’re going to do well because people are really happy with the job we’ve done., But you know what?': 5.3151

#### Getting the Summary

In [14]:
best_sentences = heapq.nlargest(25,sent2score,key=sent2score.get)

In [15]:
for sentence in best_sentences:
    print(sentence)

They’re understand., Wrong., Wrong., It’s so wrong., So, if we would have listened to you., If we would’ve listened to you, the country would have been left wide open, millions of people would have died, not 200,000.
Take a look at Carolyn Maloney’s race-, I want to see an honest ballot cut-, I want to see an honest ballot count., And I think he does too-
But if he ever got to run this country and they ran it the way he would want to run it, we would have by the way our suburbs would be gone.
And Joe does the circles and has three people someplace., We’ve had no negative effect., We’ve had no negative effect, and we’ve had 35, 40,000 people at these rallies., If you could get the crowds, you would have done the same thing.
If you look at Chicago, if you look at any place you want to look, Seattle, they heard we were coming in the following day and they put up their hands and we got back Seattle.
But I don’t want to accept the National Guard., Sure, I’m will to do that., I would say alm

### Joe Biden

In [16]:
JB_text = first_corpus.loc["Joe Biden"]["transcript"]
JB_text



#### Pre-processing the text

In [17]:
JB_text = re.sub(r'\[[0-9]*\]',' ',JB_text)
JB_text = re.sub(r'\s+',' ',JB_text)
clean_JB = JB_text.lower()
clean_JB = re.sub(r'\W',' ',clean_JB) 
clean_JB = re.sub(r'\d',' ',clean_JB) 
clean_JB = re.sub(r'\s+',' ',clean_JB)
clean_JB



#### Tokenizing text into sentences

In [18]:
sentences_JB = nltk.sent_tokenize(JB_text)

In [19]:
stop_words = nltk.corpus.stopwords.words('english')
stop_words.remove("no")
stop_words.remove("not")

#### Building the Histogram

In [20]:
word2count = {}
for word in nltk.word_tokenize(clean_JB):
    if word not in stop_words:
        if word not in word2count.keys():
            word2count[word] = 1
        else:
            word2count[word] += 1 

In [21]:
# histogram
word2count

{'man': 12,
 'well': 24,
 'first': 4,
 'thank': 1,
 'looking': 1,
 'forward': 6,
 'mr': 4,
 'president': 24,
 'american': 15,
 'people': 76,
 'right': 8,
 'say': 15,
 'supreme': 2,
 'court': 9,
 'nominee': 1,
 'occurs': 1,
 'vote': 21,
 'united': 8,
 'states': 13,
 'senators': 3,
 'not': 73,
 'going': 60,
 'get': 27,
 'chance': 1,
 'middle': 3,
 'election': 11,
 'already': 6,
 'started': 1,
 'tens': 2,
 'thousands': 6,
 'voted': 3,
 'thing': 3,
 'happen': 6,
 'wait': 3,
 'see': 4,
 'outcome': 4,
 'way': 34,
 'express': 1,
 'view': 1,
 'elect': 2,
 'vice': 4,
 'stake': 2,
 'made': 8,
 'clear': 4,
 'wants': 7,
 'rid': 3,
 'affordable': 5,
 'care': 10,
 'act': 6,
 'running': 2,
 'ran': 1,
 'governing': 1,
 'trying': 6,
 'strip': 1,
 'million': 10,
 'health': 2,
 'insurance': 3,
 'goes': 2,
 'justice': 3,
 'opposed': 3,
 'seems': 1,
 'like': 19,
 'fine': 3,
 'person': 7,
 'written': 2,
 'went': 7,
 'bench': 1,
 'thinks': 1,
 'constitutional': 1,
 'struck': 1,
 'happens': 1,
 'women': 3,
 '

In [22]:
for key in word2count.keys():
    word2count[key] = word2count[key]/max(word2count.values())

In [23]:
# weighted histogram
word2count

{'man': 0.15789473684210525,
 'well': 0.3157894736842105,
 'first': 0.05263157894736842,
 'thank': 0.013157894736842105,
 'looking': 0.013157894736842105,
 'forward': 0.07894736842105263,
 'mr': 0.05263157894736842,
 'president': 0.3157894736842105,
 'american': 0.19736842105263158,
 'people': 1.0,
 'right': 0.1095890410958904,
 'say': 0.2054794520547945,
 'supreme': 0.0273972602739726,
 'court': 0.1232876712328767,
 'nominee': 0.0136986301369863,
 'occurs': 0.0136986301369863,
 'vote': 0.2876712328767123,
 'united': 0.1095890410958904,
 'states': 0.1780821917808219,
 'senators': 0.0410958904109589,
 'not': 1.0,
 'going': 1.0,
 'get': 0.7105263157894737,
 'chance': 0.02631578947368421,
 'middle': 0.07894736842105263,
 'election': 0.2894736842105263,
 'already': 0.15789473684210525,
 'started': 0.02631578947368421,
 'tens': 0.05263157894736842,
 'thousands': 0.15789473684210525,
 'voted': 0.07894736842105263,
 'thing': 0.07894736842105263,
 'happen': 0.15789473684210525,
 'wait': 0.0789

#### Calculating the sentences scores

In [24]:
sent2score = {}
for sentence in sentences_JB:
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word2count.keys():
            if len(sentence.split(' ')) < 50:
                if sentence not in sent2score.keys():
                    sent2score[sentence] = word2count[word]
                else:
                    sent2score[sentence] += word2count[word]

In [25]:
sent2score

{'200,000 dead.': 0.045454545454545456,
 '40,000 people a day are contracting COVID.': 1.8416666666666666,
 'A lot of people died and a lot more are going to die unless he gets a lot smarter, a lot quicker-, Oh, give me a break., Well, let’s have this debate-, Because he doesn’t have a plan.': 5.339671052631579,
 'A reporter came up to him to ask him a question, he said, “No, no, no.': 4.515151515151516,
 'A young woman got killed and they asked the president what he thought.': 1.88578216374269,
 'All these dog whistles and racism don’t work anymore.': 1.0,
 'And I laid out again in July, what we should be doing.': 0.26666666666666666,
 'And If you don’t, then you’re going to have significant economic consequences.”, Well, he hasn’t drawn a line.': 2.3157894736842106,
 'And I’ll be a president, not just for the Democrats.': 1.5157894736842106,
 'And I’m going to eliminate those tax cuts., And make sure that we invest in the people who in fact need the help.': 6.307361111111112,
 'And I

#### Getting the Summary

In [26]:
best_sentences = heapq.nlargest(25,sent2score,key=sent2score.get)

In [27]:
for sentence in best_sentences:
    print(sentence)

Good paying jobs [crosstalk 00:02:41]., He doesn’t know how to do that-, The fact is, it’s going to create millions of good paying jobs, and these tax incentives for people to weatherize, which he wants to get rid of.
Is it going to change, or are you going to get four more years of these lies?, There is no [crosstalk 01:05:32]… There is no evidence of that-, There is no evidence of that-, Five states have had mail-in ballots for the last decade or more.
And here’s the deal-, I’m talking about the Biden plan [crosstalk 00:55:51]-, No., That is not-, Not true-, Not true., Not true., Simply… Look-, That is simply not the case-, What it’s going to do, it’s going to create thousands and millions of jobs.
No one has established at all that there is fraud related to mail-in ballots, that somehow it’s a fraudulent process., He has no idea what he’s talking about.
Come on., Well, just take a look at what is the analysis done by Wall Street firms, points out that my economic plan would create 7

## 2nd Presidential Debate

In [28]:
second_corpus = pd.read_pickle("/content/drive/MyDrive/Data Science/us election presidential debates/pickles/second_whole_corpus.pkl")

### Donald Trump

In [29]:
text_DT = second_corpus.loc["Donald Trump"]["transcript"]
text_DT

'How are you doing? How are you?, So as you know, 2.2 million people modeled out, were expected to die. We closed up the greatest economy in the world in order to fight this horrible disease that came from China. It’s a worldwide pandemic. It’s all over the world. You see the spikes in Europe and many other places right now. If you notice, the mortality rate is down 85%. The excess mortality rate is way down and much lower than almost any other country. And we’re fighting it and we’re fighting it hard. There is a spike. There was a spike in Florida and it’s now gone., There was a very big spike in Texas. It’s now gone. There was a very big spike in Arizona. It’s now gone. And there was some spikes and surges and other places, they will soon be gone. We have a vaccine that’s coming. It’s ready. It’s going to be announced within weeks. And it’s going to be delivered. We have Operation Warp Speed, which is the military is going to distribute the vaccine., I can tell you from personal expe

#### Pre-processing the text

In [30]:
text_DT = re.sub(r'\[[0-9]*\]',' ',text_DT)
text_DT = re.sub(r'\s+',' ',text_DT)
DT_clean = text_DT.lower()
DT_clean = re.sub(r'\W',' ',DT_clean) 
DT_clean = re.sub(r'\d',' ',DT_clean) 
DT_clean = re.sub(r'\s+',' ',DT_clean)
DT_clean

'how are you doing how are you so as you know million people modeled out were expected to die we closed up the greatest economy in the world in order to fight this horrible disease that came from china it s a worldwide pandemic it s all over the world you see the spikes in europe and many other places right now if you notice the mortality rate is down the excess mortality rate is way down and much lower than almost any other country and we re fighting it and we re fighting it hard there is a spike there was a spike in florida and it s now gone there was a very big spike in texas it s now gone there was a very big spike in arizona it s now gone and there was some spikes and surges and other places they will soon be gone we have a vaccine that s coming it s ready it s going to be announced within weeks and it s going to be delivered we have operation warp speed which is the military is going to distribute the vaccine i can tell you from personal experience i was in the hospital i had it 

#### Tokenizing text into sentences

In [31]:
sentences_dt = nltk.sent_tokenize(text_DT)

In [32]:
stop_words = nltk.corpus.stopwords.words('english')
stop_words.remove("no")
stop_words.remove("not")


#### Building the Histogram

In [33]:
word2count = {}
for word in nltk.word_tokenize(DT_clean):
    if word not in stop_words:
        if word not in word2count.keys():
            word2count[word] = 1
        else:
            word2count[word] += 1 

In [34]:
# histogram
word2count

{'know': 29,
 'million': 17,
 'people': 46,
 'modeled': 1,
 'expected': 1,
 'die': 1,
 'closed': 13,
 'greatest': 2,
 'economy': 4,
 'world': 10,
 'order': 1,
 'fight': 2,
 'horrible': 6,
 'disease': 4,
 'came': 9,
 'china': 22,
 'worldwide': 3,
 'pandemic': 2,
 'see': 9,
 'spikes': 3,
 'europe': 4,
 'many': 14,
 'places': 6,
 'right': 9,
 'notice': 1,
 'mortality': 2,
 'rate': 3,
 'excess': 1,
 'way': 12,
 'much': 7,
 'lower': 1,
 'almost': 1,
 'country': 19,
 'fighting': 2,
 'hard': 2,
 'spike': 5,
 'florida': 1,
 'gone': 7,
 'big': 15,
 'texas': 2,
 'arizona': 1,
 'surges': 1,
 'soon': 10,
 'vaccine': 4,
 'coming': 9,
 'ready': 5,
 'going': 57,
 'announced': 2,
 'within': 2,
 'weeks': 3,
 'delivered': 1,
 'operation': 1,
 'warp': 1,
 'speed': 1,
 'military': 3,
 'distribute': 1,
 'tell': 8,
 'personal': 1,
 'experience': 2,
 'hospital': 1,
 'got': 25,
 'better': 9,
 'something': 3,
 'gave': 6,
 'therapeutic': 1,
 'guess': 3,
 'would': 24,
 'call': 4,
 'could': 8,
 'say': 28,
 'cure'

In [35]:
for key in word2count.keys():
    word2count[key] = word2count[key]/max(word2count.values())

In [36]:
# weighted histogram
word2count

{'know': 0.5087719298245614,
 'million': 0.2982456140350877,
 'people': 0.8070175438596491,
 'modeled': 0.017543859649122806,
 'expected': 0.017543859649122806,
 'die': 0.017543859649122806,
 'closed': 0.22807017543859648,
 'greatest': 0.03508771929824561,
 'economy': 0.07017543859649122,
 'world': 0.17543859649122806,
 'order': 0.017543859649122806,
 'fight': 0.03508771929824561,
 'horrible': 0.10526315789473684,
 'disease': 0.07017543859649122,
 'came': 0.15789473684210525,
 'china': 0.38596491228070173,
 'worldwide': 0.05263157894736842,
 'pandemic': 0.03508771929824561,
 'see': 0.15789473684210525,
 'spikes': 0.05263157894736842,
 'europe': 0.07017543859649122,
 'many': 0.24561403508771928,
 'places': 0.10526315789473684,
 'right': 0.15789473684210525,
 'notice': 0.017543859649122806,
 'mortality': 0.03508771929824561,
 'rate': 0.05263157894736842,
 'excess': 0.017543859649122806,
 'way': 0.21052631578947367,
 'much': 0.12280701754385964,
 'lower': 0.017543859649122806,
 'almost': 

#### Calculating the sentences scores

In [37]:
sent2score = {}
for sentence in sentences_dt:
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word2count.keys():
            if len(sentence.split(' ')) < 50:
                if sentence not in sent2score.keys():
                    sent2score[sentence] = word2count[word]
                else:
                    sent2score[sentence] += word2count[word]

In [38]:
sent2score

{'$48 million.': 0.2982456140350877,
 '1994, your crime bill, the superpredators.': 0.8035714285714286,
 'A murderer would come in.': 1.6714285714285713,
 'A rapist would come in.': 1.6714285714285713,
 'A very bad person would come in.': 2.1042105263157893,
 'AOC plus three, they know nothing about the climate.': 2.3487719298245615,
 'Alabama is different than New York.': 1.4285714285714286,
 'All he talks about is shut downs.': 0.2857142857142857,
 'All of the emails, the emails, the horrible emails of the kind of money that you were raking in, you and your family.': 1.8937593984962406,
 'Also, I charged them 25% on dumped steel, because they were killing our steel industry.': 1.0242105263157895,
 'And I got better very fast or I wouldn’t be here tonight.': 0.9,
 'And I think we’re going to win the House.': 2.2744360902255636,
 'And I think you owe an explanation to the American people.': 1.5985964912280701,
 'And I will tell you that I had something that they gave me, a therapeutic,

#### Getting the Summary

In [39]:
best_sentences = heapq.nlargest(25,sent2score,key=sent2score.get)

In [40]:
for sentence in best_sentences:
    print(sentence)

I got criminal justice reform done and prison reform and Opportunity Zones-, … got criminal justice reform done, and prison reform, and Opportunity Zones, I took care of Black colleges and universities, I don’t know what to say, they can say anything, I mean, they can say anything.
You just said, “I’m going to do that, I’m going to do this.” You put tens of thousands of mostly Black young men in prison, now you’re saying you’re going to get… You’re going to undo that, why didn’t you get it done?
Hispanic, women, Asian, people with diplomas, with no diplomas, MIT graduates; number one in the class, everybody had the best numbers.
Boy, oh, boy., Can’t believe that one., Well, you have to understand the first time I ever heard of Black Lives Matter, they were chanting, “Pigs in a blanket,” talking about police, pigs, pigs, talking about our police.
Joe, that calling you a corrupt politician-, They’re calling it the laptop from hell., Excuse me, [crosstalk 00:06:13] the laptop from hell., 

### Joe Biden

In [41]:
text_JB = second_corpus.loc["Joe Biden"]["transcript"]
text_JB

'220,000 Americans dead. You hear nothing else I say tonight, hear this. Anyone who is responsible for not taking control. In fact, not saying I take no responsibility initially. Anyone is responsible for that many deaths should not remain as president of the United States of America. We’re in a situation where there are a thousand deaths a day now. A thousand deaths a day. And there are over 70,000 new cases per day. Compared to what’s going on in Europe as the New England Medical Journal said, they’re starting from a very low rate. We’re starting from a very high rate., The expectation is we’ll have another 200,000 Americans dead between now and the end of the year. If we just wore these masks, the president’s own advisors have told him, we can save a 100,000 lives. And we’re in a circumstance where the president thus far and still has no plan, no comprehensive plan., What I would do is make sure we have everyone encouraged to wear a mask all the time. I would make sure we move into 

#### Pre-processing the text

In [42]:
text_JB = re.sub(r'\[[0-9]*\]',' ',text_JB)
text_JB = re.sub(r'\s+',' ',text_JB)
JB_clean = text_JB.lower()
JB_clean = re.sub(r'\W',' ',JB_clean) 
JB_clean = re.sub(r'\d',' ',JB_clean) 
JB_clean = re.sub(r'\s+',' ',JB_clean)
JB_clean

' americans dead you hear nothing else i say tonight hear this anyone who is responsible for not taking control in fact not saying i take no responsibility initially anyone is responsible for that many deaths should not remain as president of the united states of america we re in a situation where there are a thousand deaths a day now a thousand deaths a day and there are over new cases per day compared to what s going on in europe as the new england medical journal said they re starting from a very low rate we re starting from a very high rate the expectation is we ll have another americans dead between now and the end of the year if we just wore these masks the president s own advisors have told him we can save a lives and we re in a circumstance where the president thus far and still has no plan no comprehensive plan what i would do is make sure we have everyone encouraged to wear a mask all the time i would make sure we move into the direction of rapid testing investing in rapid te

#### Tokenizing text into sentences

In [43]:
sentences_jb = nltk.sent_tokenize(text_JB)

In [44]:
stop_words = nltk.corpus.stopwords.words('english')
stop_words.remove("no")
stop_words.remove("not")

#### Building the Histogram

In [45]:
word2count = {}
for word in nltk.word_tokenize(JB_clean):
    if word not in stop_words:
        if word not in word2count.keys():
            word2count[word] = 1
        else:
            word2count[word] += 1 

In [46]:
# histogram
word2count

{'americans': 5,
 'dead': 2,
 'hear': 2,
 'nothing': 6,
 'else': 5,
 'say': 15,
 'tonight': 3,
 'anyone': 3,
 'responsible': 2,
 'not': 78,
 'taking': 4,
 'control': 6,
 'fact': 32,
 'saying': 11,
 'take': 10,
 'no': 26,
 'responsibility': 2,
 'initially': 1,
 'many': 6,
 'deaths': 3,
 'remain': 1,
 'president': 30,
 'united': 12,
 'states': 22,
 'america': 7,
 'situation': 7,
 'thousand': 3,
 'day': 4,
 'new': 11,
 'cases': 1,
 'per': 1,
 'compared': 1,
 'going': 84,
 'europe': 3,
 'england': 2,
 'medical': 3,
 'journal': 2,
 'said': 42,
 'starting': 2,
 'low': 2,
 'rate': 4,
 'high': 1,
 'expectation': 1,
 'another': 5,
 'end': 5,
 'year': 6,
 'wore': 1,
 'masks': 2,
 'advisors': 1,
 'told': 7,
 'save': 2,
 'lives': 2,
 'circumstance': 1,
 'thus': 1,
 'far': 1,
 'still': 4,
 'plan': 16,
 'comprehensive': 1,
 'would': 13,
 'make': 30,
 'sure': 26,
 'everyone': 4,
 'encouraged': 1,
 'wear': 2,
 'mask': 1,
 'time': 15,
 'move': 4,
 'direction': 1,
 'rapid': 2,
 'testing': 3,
 'investing

In [47]:
for key in word2count.keys():
    word2count[key] = word2count[key]/max(word2count.values())

In [48]:
# weighted histogram
word2count

{'americans': 0.05952380952380952,
 'dead': 0.023809523809523808,
 'hear': 0.023809523809523808,
 'nothing': 0.07142857142857142,
 'else': 0.05952380952380952,
 'say': 0.17857142857142858,
 'tonight': 0.03571428571428571,
 'anyone': 0.03571428571428571,
 'responsible': 0.023809523809523808,
 'not': 0.9285714285714286,
 'taking': 0.047619047619047616,
 'control': 0.07142857142857142,
 'fact': 0.38095238095238093,
 'saying': 0.13095238095238096,
 'take': 0.11904761904761904,
 'no': 0.30952380952380953,
 'responsibility': 0.023809523809523808,
 'initially': 0.011904761904761904,
 'many': 0.07142857142857142,
 'deaths': 0.03571428571428571,
 'remain': 0.011904761904761904,
 'president': 0.35714285714285715,
 'united': 0.14285714285714285,
 'states': 0.2619047619047619,
 'america': 0.08333333333333333,
 'situation': 0.08333333333333333,
 'thousand': 0.03571428571428571,
 'day': 0.047619047619047616,
 'new': 0.13095238095238096,
 'cases': 0.011904761904761904,
 'per': 0.011904761904761904,
 

#### Calculating the sentences scores

In [49]:
sent2score = {}
for sentence in sentences_jb:
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word2count.keys():
            if len(sentence.split(' ')) < 50:
                if sentence not in sent2score.keys():
                    sent2score[sentence] = word2count[word]
                else:
                    sent2score[sentence] += word2count[word]

In [50]:
sent2score

{'10 million people have lost their private insurance, and he wants to take away 22 million more people who have it under Obamacare and over 110 million people with pre-existing conditions.': 8.48454469507101,
 '220,000 Americans dead.': 0.08333333333333333,
 '700 billion more dollars.': 0.8771929824561403,
 '99% of people recover.': 1.1578947368421053,
 '99.9 of young people recover.': 1.263157894736842,
 'A thousand deaths a day.': 0.11904761904761904,
 'A whole range of things the President has said, even today, he thinks we are in control.': 2.5130012531328325,
 'And China’s building a new road to a new golf course you have overseas.': 1.7451583504215078,
 'And I don’t look at this in terms of the way he does, blue states and red states.': 1.8110902255639099,
 'And I’m going to make sure you get that.': 3.1666666666666665,
 'And I’m going to say, as I said at the beginning, what is on the ballot here is the character of this country.': 3.679302422723476,
 'And I’ve laid out a clear

#### Getting the Summary

In [51]:
best_sentences = heapq.nlargest(25,sent2score,key=sent2score.get)

In [52]:
for sentence in best_sentences:
    print(sentence)

The future lies in us being able to breathe and they know they’re good jobs and getting us there., By the way, the fastest growing industry in America is the electric, excuse me, solar energy and wind.
You won’t get federal subsidies to the gas, oh, excuse me to solar and wind., Why are we giving it to oil industry?, He takes everything out of context, but the point is, look, we have to move toward net zero emissions.
There’s not enough people in jail.” And go on my website, get the quote, the date when he said it, “not enough people.” He talked about marauding gangs, young gangs, and the people who are going to maraud our cities.
We’re going to choose to move forward because we have enormous opportunities, enormous opportunities to make things better., We can grow this economy, we can deal with the systemic racism.
Secondly, we’re going to make sure we reduce the premiums and reduce drug prices by making sure that there’s competition, that doesn’t exist now, by allowing Medicare to ne