# Make Pandemic Responses Great Again!
## The Evolution of the White House's Rhetoric on the COVID-19 Pandemic

#### By: Madelyn Mathai

![](trump.jpg)

## Introduction

The COVID-19 Pandemic is unlike anything in modern history. It has put entire states in lockdown, plunged the US into an economic recession, and taken 77,000 American lives so far. Even for those who are not directly effected medically or financially, the novel coronavirus has caused many people to put their normal lives on hold and shelter in place.  The pandemic has caused needs of the people to grow and evolve. In this blog, I will determine what these needs have been and when they were addressed by the White House.

In confusing, difficult times like this, people turn to leadership for information, guidance, and a plan. There have been a number of different metrics used to characterize the severity of the problem including number of tests issued, number of tests positive, number of deaths, number of ventilators, number of masks, and number of hospital beds to name a few. People are constantly looking for measures of success to determine where in the "slope of the curve" they are. Success is largely subjective. People who are liberal tend to view the Trump Administration's pandemic response as chaotic, while his supporters call it heroic. 

This blog post will attempt to understand how briefings coming directly from the White House regarding the coronavirus have evolved over time. Hopefully this can offer an interesting analysis of the Trump Administration's handling of the COVID-19 pandemic and we can determine if it has improved over time. 

I hope to answer the following two questions:
    1. How has the greatest need evolved over time?
    2. How has the Trump Administration's rhetoric regarding responsibility evolved over time?

## Setup

In [8]:
import os
import random
import re
import string

import json
from collections import Counter

%matplotlib inline

import os
import re
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from nltk import tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [9]:
sid = SentimentIntensityAnalyzer()

[m for m in dir(sid) if not m.startswith('_')]

vlex = list(sid.lexicon.items())


In [87]:
%run functions.ipynb

## Data

To consider this question, I used a corpus of Official White House Remarks & Briefings beginning on January 1, 2020 until April 30, 2020 with "White House Coronavirus Tak Force" in the title. While this may be a relatively small corpus because it is only pulled from a singular source, hopefully it will be useful in understanding how the American leadership has addressed the coronavirus pandemic.

In [11]:
corpus = json.load(open('data/briefing_transcripts.json'))
len(corpus)

53

There are a total of 53 Briefings and Remarks in this corpus. 

## Descrpitive Analysis

I want to consider the change in rhetoric over time, so I am going to segment the corpus by month.

In [12]:
apr_briefings = [item for item in corpus if item['date'].split()[0]=='Apr']
mar_briefings = [item for item in corpus if item['date'].split()[0]=='Mar']
feb_briefings = [item for item in corpus if item['date'].split()[0]=='Feb']
jan_briefings = [item for item in corpus if item['date'].split()[0]=='Jan']

In [13]:
print("In April, there are", len(apr_briefings), 'briefings.')
print("In March, there are", len(mar_briefings), 'briefings.')
print("In February, there are", len(feb_briefings), 'briefings.')
print("In January, there are", len(jan_briefings), 'briefings.')


In April, there are 23 briefings.
In March, there are 26 briefings.
In February, there are 2 briefings.
In January, there are 2 briefings.


There aren't many press briefings with Coronavirus Task Force in the title earlier on in the year. January and February only have two. This is a little surprising, but given that Trump declared a National Emergency on March 13, 2020, it makes sense that we would see a large increase in briefings and remarks after.

## Tokenizing the Data

In order to tokenize the data, I ran a for loop for each month. The for loop pulled the text from each briefing and tokenized the text.

In [14]:
apr_tokens = []
for briefing in apr_briefings:
    briefing_text = briefing['text']
    tokens = tokenize(briefing_text, lowercase=True, strip_chars=to_strip)
    apr_tokens.extend(tokens)

In [15]:
mar_tokens = []
for briefing in mar_briefings:
    mar_text = briefing['text']
    tokens = tokenize(mar_text, lowercase=True, strip_chars=to_strip)
    mar_tokens.extend(tokens)

In [16]:
feb_tokens = []
for briefing in jan_briefings:
    feb_text = briefing['text']
    tokens = tokenize(feb_text, lowercase=True, strip_chars=to_strip)
    feb_tokens.extend(tokens)

In [17]:
jan_tokens = []
for briefing in jan_briefings:
    jan_text = briefing['text']
    tokens = tokenize(jan_text, lowercase=True, strip_chars=to_strip)
    jan_tokens.extend(tokens)

## Frequency Lists

For other analysis like keyness, concordance, and sentiment, it will be useful to consider not only words but also bigrams and trigrams. 

In [18]:
apr_word_dist = Counter(apr_tokens)
apr_bigram_dist = Counter(get_ngram_tokens(apr_tokens, 2))
apr_trigram_dist = Counter(get_ngram_tokens(apr_tokens, 3))

In [19]:
mar_word_dist = Counter(mar_tokens)
mar_bigram_dist = Counter(get_ngram_tokens(mar_tokens, 2))
mar_trigram_dist = Counter(get_ngram_tokens(mar_tokens, 3))

In [20]:
feb_word_dist = Counter(feb_tokens)
feb_bigram_dist = Counter(get_ngram_tokens(feb_tokens, 2))
feb_trigram_dist = Counter(get_ngram_tokens(feb_tokens, 3))

In [21]:
jan_word_dist = Counter(jan_tokens)
jan_bigram_dist = Counter(get_ngram_tokens(jan_tokens, 2))
jan_trigram_dist = Counter(get_ngram_tokens(jan_tokens, 3))

In [22]:
apr_word_dist.most_common(20)

[('the', 18732),
 ('to', 12770),
 ('and', 12318),
 ('that', 8493),
 ('—', 8392),
 ('a', 7985),
 ('of', 7882),
 ('you', 6703),
 ('i', 6527),
 ('we', 6368),
 ('in', 5074),
 ('have', 4770),
 ('it', 4614),
 ('is', 3637),
 ('they', 3172),
 ('but', 3146),
 ('are', 3058),
 ('be', 2904),
 ('this', 2730),
 ('so', 2691)]

In [23]:
mar_word_dist.most_common(20)

[('the', 12986),
 ('to', 8990),
 ('and', 8302),
 ('that', 5967),
 ('of', 5452),
 ('a', 5006),
 ('—', 4837),
 ('we', 4445),
 ('you', 4347),
 ('i', 3937),
 ('in', 3291),
 ('have', 2885),
 ('it', 2623),
 ('is', 2379),
 ('are', 2296),
 ('be', 2269),
 ('this', 2095),
 ('for', 2000),
 ('but', 1956),
 ('with', 1870)]

In [24]:
feb_word_dist.most_common(20)

[('the', 351),
 ('to', 194),
 ('of', 182),
 ('and', 165),
 ('that', 103),
 ('in', 94),
 ('we', 89),
 ('you', 76),
 ('is', 74),
 ('a', 71),
 ('—', 71),
 ('this', 69),
 ('have', 59),
 ('as', 58),
 ('secretary', 57),
 ('are', 56),
 ('i', 49),
 ('china', 40),
 ('with', 40),
 ('so', 39)]

In [25]:
jan_word_dist.most_common(20)

[('the', 351),
 ('to', 194),
 ('of', 182),
 ('and', 165),
 ('that', 103),
 ('in', 94),
 ('we', 89),
 ('you', 76),
 ('is', 74),
 ('a', 71),
 ('—', 71),
 ('this', 69),
 ('have', 59),
 ('as', 58),
 ('secretary', 57),
 ('are', 56),
 ('i', 49),
 ('china', 40),
 ('with', 40),
 ('so', 39)]

This is what we expected. Throughout these months, the most used words were articles and pronouns. Let's now consider bigrams.

In [26]:
apr_bigram_dist.most_common(20)

[('the president:', 2218),
 ('going to', 1867),
 ('of the', 1551),
 ('in the', 1321),
 ('we have', 1216),
 ('to be', 1204),
 ('i think', 1119),
 ('— the', 1086),
 ('want to', 1072),
 ('you know', 1046),
 ('a lot', 949),
 ('and i', 909),
 ('to the', 789),
 ('lot of', 767),
 ('and we', 708),
 ('to do', 680),
 ('and the', 655),
 ('thank you', 651),
 ('we’re going', 626),
 ('on the', 625)]

In [27]:
mar_bigram_dist.most_common(20)

[('going to', 1208),
 ('of the', 1184),
 ('the president:', 1164),
 ('in the', 839),
 ('to be', 777),
 ('we have', 769),
 ('i think', 728),
 ('want to', 715),
 ('thank you', 707),
 ('to the', 625),
 ('you know', 604),
 ('a lot', 566),
 ('and i', 565),
 ('— the', 524),
 ('and the', 480),
 ('that we', 468),
 ('the president', 451),
 ('lot of', 433),
 ('and we', 429),
 ('to do', 424)]

In [28]:
feb_bigram_dist.most_common(20)

[('of the', 53),
 ('in the', 32),
 ('united states', 32),
 ('the united', 27),
 ('to the', 25),
 ('thank you', 19),
 ('the risk', 17),
 ('we have', 17),
 ('secretary azar:', 16),
 ('the american', 15),
 ('want to', 14),
 ('this is', 14),
 ('and the', 14),
 ('department of', 14),
 ('on the', 13),
 ('i want', 13),
 ('public health', 13),
 ('that we', 13),
 ('continue to', 12),
 ('14 days', 12)]

In [29]:
jan_bigram_dist.most_common(20)

[('of the', 53),
 ('in the', 32),
 ('united states', 32),
 ('the united', 27),
 ('to the', 25),
 ('thank you', 19),
 ('the risk', 17),
 ('we have', 17),
 ('secretary azar:', 16),
 ('the american', 15),
 ('want to', 14),
 ('this is', 14),
 ('and the', 14),
 ('department of', 14),
 ('on the', 13),
 ('i want', 13),
 ('public health', 13),
 ('that we', 13),
 ('continue to', 12),
 ('14 days', 12)]

This is slightly more informative. We can see a few patterns over time: 
* shifts away from *secretary azar,* who is the Secretary of Health and Human Services
* shifts away from *the american,* "united states", "the united"
* shifts away from *public health*
* shifts away from *14 days,* which is the incubation period emphasized earlier on in the pandemic

However, we must remember that January and Ferburary's corpora each only contain two briefings. Therefore, this is not super informative on how rhetoric has evolved. We want to look at the frequency of words given size of the corpus, so we will need to consider keyness.

## Keyness Analysis of Words

In [30]:
calculate_keyness(jan_word_dist, apr_word_dist, top=20)

WORD                     Corpus Freq.RC Freq.  Keyness
secretary                57        261       191.082
risk                     25        51        116.845
china                    40        279       105.908
united                   32        262       76.228
low                      19        68        71.505
screening                11        6         71.220
public                   25        168       67.744
chinese                  11        12        61.629
14                       12        25        55.689
department               16        76        52.654
health                   30        364       52.440
travel                   13        46        49.169
security                 11        48        37.762
homeland                 7         10        36.515
azar                     7         11        35.522
citizens                 11        61        33.343
steps                    8         23        32.992
transmission             7         15        32.175
entry 

These are the 20 most key words, meaning these words appear far more frequently in January than April even when accounting for size of corpus. We can group these into categories:
* Origination of COVID-19: *china*, *chinese*
* Action plan: *actions*, *screening*, *security*, *travel*, *steps*, *department*, *entry*
* Nature of the virus: *transmission*, *14* (referring to incubation period), *health*
* US: *homeland*, *secretary*, *united*, *public*, *citizens*, *Azar*
* Metrics/ Measurements: *low*, *risk*

### Keyness Comparing March to April

I want to now consider changes between March and April, as the understanding of the severity of the problem changes drastically in March. Shelter in place orders were issued, businesses were shutdown, and a national emergency was declared. Let's see what speech was used immediately after this shock and how it evolved over time. 

In [31]:
calculate_keyness(mar_word_dist, apr_word_dist, top=20)

WORD                     Corpus Freq.RC Freq.  Keyness
vice                     508       378       107.722
will                     1166      1196      84.206
thank                    806       789       71.857
available                187       102       71.496
travel                   119       46        69.674
tested                   155       80        64.244
president                1165      1280      58.227
risk                     113       51        55.773
meeting                  98        41        52.781
night                    106       49        50.814
elderly                  47        8         48.634
symptoms                 91        40        46.341
legislation              46        10        41.916
house                    165       115       40.576
senate                   57        18        40.132
commercial               93        47        39.693
fault                    41        8         39.653
children                 48        13        37.997
private 

### Keyness Comparing April to March

In [32]:
calculate_keyness( apr_word_dist,mar_word_dist, top=20)

WORD                     Corpus Freq.RC Freq.  Keyness
he                       960       357       96.248
states                   1014      406       80.691
oil                      119       10        69.635
military                 175       28        67.322
—                        8392      4837      66.669
ventilators              478       165       58.554
see                      927       397       57.636
jersey                   158       27        57.216
farmers                  91        9         49.017
they                     3172      1726      47.421
antibody                 95        12        43.997
it                       4614      2623      43.561
million                  484       188       42.536
program                  127       24        41.555
machines                 83        10        39.692
metro                    98        15        39.175
infrastructure           65        6         36.318
reopen                   72        8         36.253
banks    

##### Some initial themes:
* March has more federal government terms (*president*, *legislation*, *house*, *senate*)
* March has more language directly related to coronavirus (*tested*, *risk*, *elderly*, *symptoms*)
* April discusses more issues related to the economy (*oil*, *farmers*, *program*, *machines*, *infrastructure*, *banks*)


######  Explanations for these Results
It makes sense that the House and Senate were discussed more in March. On March 26, 2020, Congress passed a $2 trillion economic relief package, which many Americans were depending on. 

The initial narrative was that the coronavirus is only truly harmful for the elderly, but with more data that myth has been dispelled, because younger people have died from the coronavirus. This is why *elderly* is key in March but not in April. 

It also makes sense that the military is discussed more in April, because the National Guard has been deployed to hot spots like New York City to help.

In April, Donald Trump also discussed bailing out the oil industry.

There was a national debate in April on when to reopen the states. Some argue that there is not a serious threat it more rural areas where they live and they should not have to financially suffer. Others argue that staying home is what has been able to "flatten the curve" in most place and if we lift restrictive measures to early, the initial staying home period will have been for nothing and the virus will come back strong.

## Exploring Topics Further Using Collocation Analysis

One word that I wanted to explore further is **available**. It has a keyness score of 71.496 in March. Throughout the pandemic, there has been a widespread fear of scarcity, ranging from things like tests to toilet paper. People are worried that there will not be enough of what they need. What is available gives a lot of indication to the severity of the problem. I wanted to explore what the Trump Administration was referring to when they said **available** and if that changed at all from March to April.

### March "available" Collocates

In [79]:
mar_colls_available = Counter()
mar_colls_available.update(collocates(mar_tokens,'available', [4,0]))

In [80]:
mar_colls_available.most_common(30)

[('that', 43),
 ('testing', 30),
 ('is', 29),
 ('be', 28),
 ('are', 27),
 ('make', 27),
 ('to', 26),
 ('tests', 25),
 ('will', 24),
 ('the', 21),
 ('and', 21),
 ('more', 16),
 ('of', 16),
 ('those', 14),
 ('have', 12),
 ('made', 8),
 ('making', 8),
 ('now', 8),
 ('these', 7),
 ('with', 7),
 ('test', 7),
 ('we', 6),
 ('it', 6),
 ('masks', 6),
 ('that’s', 6),
 ('for', 5),
 ('this', 5),
 ('going', 5),
 ('sure', 5),
 ('they', 5)]

### April "available" Collocates

In [81]:
apr_colls_available = Counter()
apr_colls_available.update(collocates(apr_tokens,'available', [4,0]))

In [82]:
apr_colls_available.most_common(30)

[('that', 28),
 ('be', 17),
 ('those', 14),
 ('the', 14),
 ('make', 13),
 ('is', 12),
 ('are', 11),
 ('and', 11),
 ('to', 10),
 ('we', 10),
 ('have', 10),
 ('will', 10),
 ('tests', 8),
 ('made', 7),
 ('this', 6),
 ('of', 6),
 ('testing', 5),
 ('best', 4),
 ('information', 4),
 ('all', 4),
 ('—', 4),
 ('ventilators', 4),
 ('additional', 3),
 ('million', 3),
 ('capacity', 3),
 ('also', 3),
 ('were', 3),
 ('on', 3),
 ('in', 3),
 ('can', 3)]

###### Testing
**Testing** and **tests** are commons words used around **available** in both months, although they are used more frequently in March.
Tests is one of the things the Trump Administration has received a lot of background for regarding the coronavirus. Donald Trump famously told reporters on March 6, "Anybody that wants a test can get a test." However, this was not the case. As of May 7, the COVID Tracking Project estimates that over 8.1 million people in the United States have now been tested.  That is approximately 2.4% of the US population. The press is aware of this failure to deliver, so they often ask him about the status of testing, which explains why it appears in both March and April. 

###### Masks
The word **Masks** was used often with the word **available** in March but not in April. There was a great change in rhetoric regarding masks from March to April. In March, there was a shortage of N-95 masks and it became a large problem because healthcare workers had to reuse masks, which put them at an unnecessary risk of exposure. Masks and other personal protective equipment became things the public was advised not to buy. Now, there is not a mask shortage anymore. In a number of counties masks are required for everyone leaving the home. Many stories have rules that you cannot enter without a mask. What once was supposed to be saved for the people who need it most are now required for all. This makes sense why **masks** was used often with **available** in the press briefings in March but not April.

###### Ventilators
The word **ventilators**  was used often with the word **available** in April but not in March. There was an initial concern that there would not be enough ventilators for people who are sick in the hospital from coronavirus. However, we do not see ventilators as collocates in March. We see them in April because in April Donald Trump called himself the "King of Ventilators" due to his increase in production. Trump claims the the United States has more ventilators than we'll ever need, so he wants to stress that in his press briefings in April.


## Keyness Analysis of Bigrams

In addition to topics addressed, I want to see how the communication style itself changes in order to better understand the communication and leadership style of the Trump Administration. In other words, I want to consider not only what he says but also how he says it.  I think considering the keyness of bigrams could be a bit more useful for this because they can be verb-noun pairings, which are far more informative than verbs and nouns individually.

### Bigram Keyness Comparing March to April

In [83]:
calculate_keyness(mar_bigram_dist, apr_bigram_dist, top=30)

WORD                     Corpus Freq.RC Freq.  Keyness
thank you                707       651       80.132
vice president:          273       199       60.758
the vice                 345       278       59.631
mr vice                  116       53        56.411
the president            451       413       52.144
be tested                40        5         46.974
the risk                 44        7         46.944
vice president           225       170       45.978
will be                  360       339       37.523
south korea              64        27        34.152
the fda                  100       60        32.787
american public          61        26        32.179
the senate               46        15        31.531
commercial labs          43        13        31.355
the house                53        21        30.255
let me                   188       157       29.345
with regard              46        17        28.155
regard to                46        17        28.155
of washin

##### Groupings for March Key Bigrams
* American Federal Government: "vice president", "the vice", "mr vice", "the president", "vice president", "of washington", "the senate", "the house", 
* Verbs: "be tested", "will be", "let me", "we're working", "we will"
* Misc: "the risk", "south korea", "last night", "as the", "you the", "health and", "the airline"
* Vaccine/ Test: "the fda", "commercial labs", "private sector"
* Americans:"american public", "the american","stay home"
* Pleasantries: "thank you", "with regard", "regard to","as the", "you the",

### Bigram Keyness Comparing April to March

In [38]:
calculate_keyness(apr_bigram_dist, mar_bigram_dist, top=30)

WORD                     Corpus Freq.RC Freq.  Keyness
new jersey               154       27        54.403
the president:           2218      1164      45.084
— the                    1086      524       38.455
he said                  122       26        34.781
should have              79        12        31.783
states that              102       21        30.315
to open                  91        18        28.323
the states               200       66        27.301
can see                  93        22        23.112
what he                  48        6         22.391
opening up               48        6         22.391
more than                301       124       21.682
new orleans              50        7         21.479
it was                   492       229       21.096
and new                  65        13        19.979
it and                   244       97        19.889
to reopen                43        6         18.519
have enough              39        5         17.875
did the  

###### Groupings for April Key Bigrams
* Specific US states/cities: "new jersey", "new orleans"
* General states: "states that", "the states", 
* Third person subject: "he said", "what he", "but he", "he didn't", "they're doing", "what he"
* Evaluating: "can see", "have enough", "should have", "more than", "it was", "we see", "in fact", "you see" 
* Opening: "to open", "opening up", "to reopen"
* Misc: "no no","- the", "it and", "did the", "a look", "president: you", "some of", "-q"

### Findings from March and April Bigrams: 

Bigrams in March are more key than bigrams in April, meaning March bigrams are especially unique to March.

Some themes were reinforced from word keyness: 
* There are more bigrams about Federal Government in March
* There are more bigrams about states and opening in April
* In March there was more talk of developing tests/ vaccines 
    
In April there is more language used for evaluating or assessing. The tone shifts to correct what the Trump Administration believes to be false, initial assumptions.

With bigram keyness, we see mor verbs than word keyness. Verbs that are key in March appear to mainly be in the future tense, while verbs in April appear to be in the present and past. This indicates that there is less of a plan coming from the Trump Administration in April. Also, pronouns in March appear to be generally first person, while pronouns in April appear to by generally third person. This suggests that there is a shift in responsibility of the pandemic from first to third person. 

## Exploring Topics Further Using Concordance Analysis:

We can use concordance analysis to further explore some of the things we found in keyness analysis. I am interested in the idea of responsibility and ownership. In March, most of the key pronouns are in the first person, but in April most of the pronouns used are in the third person. That means in March the Trump Administration was using words like *I* and *we* at a higher normalized frequency in March. In April, the Trump Administration was using words like *he* and *they* at a higher normalized frequency. This means the subject of the coronavirus response has shifted in responsibility from the White House to others. Another way to test this hypothesis is to consider how the bigram *should have*, which was key in April, was used. Let's consider this bigram in context:

In [39]:
apr_kwic_shouldhave = make_kwic2("should have", apr_tokens, win=6)

In [88]:
print_kwic(sort_kwic(apr_kwic_shouldhave, order= ['L1']))

        there’s other tests that other americans  should have  and i think this has really
                         the way we have now and  should have  — the private sector who clearly
                  know i don’t think the captain  should have  been writing letters he’s not ernest
                 on in this country this country  should have  voter id okay let’s do another
           president: well — well look governors  should have  had ventilators; they chose not to
                     have sent that letter or he  should have  gone through his chain of command
                     met with them many times he  should have  never met with them and in
               happy with the president?” “no he  should have  given us 10000” that’s what’s happening
                    anybody in this room what he  should have  done and i’m sure he feels
              is what every department of health  should have  because when you go to that
             your staff or peter navarro himself  

Only a few of these have the word *we* and *I* immediately before *should have*. The rare ocassions where the word preceding **should have** is "I", he uses words like "maybe I should have" or some people think I should have". He doesn't really take responsibility for these errors.  Most of the subjects are *they*, *you* or some other subjects including *governors* and *obama.* 

In [41]:
mar_kwic_shouldhave = make_kwic2("should have", mar_tokens, win=6)

In [89]:
print_kwic(sort_kwic(mar_kwic_shouldhave, order= ['L1']))

               american — i think every american  should have  a grateful heart first and foremost
                       up to half of the country  should have  caution before traveling? and if that’s
            talking about the ventilators but he  should have  ordered the ventilators and he had
              next italy? the president: well he  should have  — you know the hospital systems
                         how it comes out but it  should have  been — well it’s like —
               you did all those divisions italy  should have  close to 400000 deaths they’re not
               that were right here because they  should have  never allowed it to happen but
                         to all of them but they  should have  told us about this and i
                     they knew about it and they  should have  told us we could have saved
        focus on critical infrastructure jobs we  should have  that guidance before the president and
            cdc-approved tests are moving out

In march there were only 12 occurences of the bigram *should have* and none of them were that suprising. It makes sense that there are more instances of this bigram because it is in the past tense, and in March less time has gone by for the Trump Administration to reflect on things that should have happened.

## Exploring Further Using Lexicon-based Sentiment Analysis

I want to further explore my hypothesis that the Trump Administration is shifting responsibility/ blame to other parties. In order to do this, I am going to perform a sentiment analysis on the words surrounding the pronouns "we" and "he".  We know from keyness analysis that "he" was the most key word in April. We also know that bigrams that included we (we're working, we will) are key in March. I made some alterations to the make_kwic function in order to do this, which I have included in the functions notebook.

The VADER sentiment lexicon is sensitive both the polarity and the intensity of sentiments expressed in social media contexts, but is also generally applicable to sentiment analysis in other domains.

VADER's Methodology:
> Sentiment ratings from 10 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability). Over 9,000 token features were rated on a scale from "[–4] Extremely Negative" to "[4] Extremely Positive", with allowance for "[0] Neutral (or Neither, N/A)". We kept every lexical feature that had a non-zero mean rating, and whose standard deviation was less than 2.5 as determined by the aggregate of those ten independent raters. This left us with just over 7,500 lexical features with validated valence scores that indicated both the sentiment polarity (positive/negative), and the sentiment intensity on a scale from –4 to +4.

Therefore, even thought these corpora are different sizes, we do not need to consider the average because negative words are associated with a negative score. This means that a large corpus does not have a higher opportunity to score higher because with each word there is also the opportunity for a negative score.


Beyond the shift in normalized frequency of these pronouns, let's see if there is a change to the context these pronouns are in. 

### Pronouns

##### He v. We  in March

In [43]:
mar_kwic_he= make_kwic_as_text("he", mar_tokens, win=6)

In [44]:
mar_kwic_he_sidscore = []
for item in mar_kwic_he:
    for tok in item:
        score = sid.lexicon.get(tok,0)
    mar_kwic_he_sidscore.append(score)

In [45]:
sum(mar_kwic_he_sidscore)

19.300000000000004

A score of 19.3 means the word he was generally used in a positive context in March.

In [46]:
mar_kwic_we= make_kwic_as_text("we", mar_tokens, win=6)

In [47]:
mar_kwic_we_sidscore = []
for item in mar_kwic_we:
    for tok in item:
        score = sid.lexicon.get(tok,0)
    mar_kwic_we_sidscore.append(score)

In [95]:
sum(mar_kwic_we_sidscore) / len(mar_kwic_we_sidscore)

0.041079865016872934

We knew that the word *we* was used a lot during March. A score of 182.6 shows that it was in a positive context most of the time. It is significantly higher than the *he* score meaning the word *we* was used by the Trump Administration not only more often, but also in a nicer way.

##### He v. We in April

In [49]:
apr_kwic_he= make_kwic_as_text("he", apr_tokens, win=6)

In [50]:
apr_kwic_he_sidscore = []
for item in apr_kwic_he:
    for tok in item:
        score = sid.lexicon.get(tok,0)
    apr_kwic_he_sidscore.append(score)

In [94]:
sum(apr_kwic_he_sidscore)  / len(apr_kwic_he_sidscore)

0.05656249999999998

The word *he* was used in a net positive context because it is greater than 0. However, given that *he* was the most key word in April (used 960 times) and that he can refer to a number of other people, I would expect a higher score. 

In [52]:
apr_kwic_we= make_kwic_as_text("we", apr_tokens, win=6)

In [53]:
apr_kwic_we_sidscore = []
for item in apr_kwic_we:
    for tok in item:
        score = sid.lexicon.get(tok,0)
    apr_kwic_we_sidscore.append(score)

In [96]:
sum(apr_kwic_we_sidscore) / len(apr_kwic_we_sidscore)

0.04335741206030155

The score is 276.1. This is a large increase from March. The Trump Administration's rhetoric about itself becomes more positive over time. I think this is likely because of the criticism the Trump Administration has amassed for various reasons regarding the coronavirus response. They highlight their accomplishments more as criticism grows.

It is also a much larger than the score of *he* in the same month. That means holding time constant, the Trump Administration speaks much more highly of themselves than of others. 

### Promises to the People

Let's also consider the sentiment of the promises from the Trump Administration. The bigram **we will** was key in March but not in April. This bigram will give us insight into if and how the the promises evolve.

#### March

In [98]:
mar_kwic_wewill= make_kwic2("we will", mar_tokens, win=6)

In [99]:
print_kwic(sort_kwic(mar_kwic_wewill, order= ['R1']))

                       of our people i know that  we will  achieve victory and quickly return to
                    fault we want to protect and  we will  all of the things that a
                   to making that data public so  we will  all know? dr birx: we’re committed
               something that i hope — hopefully  we will  all have made the right moves
                   earth and in the coming weeks  we will  all have to make changes and
                    than we were even before and  we will  also have apparatus in place that
                    on a program to address that  we will  also be working with small businesses
                   put in a great healthcare and  we will  always — i will say this
                        is yes and the answer is  we will  always maintain a solvent social security
                 listens to the governors and so  we will  assess at the end of the
                will only do it with preexisting  we will  back preexisting conditions ok

In [67]:
mar_kwic_wewill= make_kwic2_as_text("we will", mar_tokens, win=6)

In [68]:
mar_kwic_wewill_sidscore = []
for item in mar_kwic_wewill:
    for tok in item:
        score = sid.lexicon.get(tok,0)
    mar_kwic_wewill_sidscore.append(score)

In [69]:
sum(mar_kwic_wewill_sidscore)

13.700000000000001

The score is a net positive, but it is quite low given the size of the corpus. That means the promises likely reflect the solemn time of a national emergency.

#### April

In [70]:
apr_kwic_wewill= make_kwic2("we will", apr_tokens, win=6)

In [71]:
print_kwic(sort_kwic(apr_kwic_wewill, order= ['R6']))

               enduring all of this together and  we will  soon prevail together we’re making a
                    will and i feel certain that  we will  then we have to worry about
                  texas and we’ve told them that  we will  be staffing those hospitals again above
                with and has been persistent and  we will  emphasize to the american people again:
                 end users of these products and  we will  continue to do — make all
               will end this plague and together  we will  restore the full measure of american
                         to see and the only way  we will  see them is if every american
                      on the fact that very soon  we will  have an antibody test that americans
                   if they need to remain closed  we will  allow them to do that and
                     we will need more money and  we will  — we will sit down and
               they need and americans have jobs  we will  continue — at the departmen

In [76]:
apr_kwic_wewill = make_kwic2_as_text("we will", apr_tokens, win=6)

In [77]:
apr_kwic_wewill_sidscore = []

for item in apr_kwic_wewill:
    for tok in item:
        score = sid.lexicon.get(tok,0)
        apr_kwic_wewill_sidscore.append(score)



In [78]:
sum(apr_kwic_wewill_sidscore)

165.9000000000001

The promises become far more positive in April. While the  **we will** in March seems to be statements of fact like "we will issue new guidance" or "we will be talking about this further" statements in April seem to be more inspirational statements like "end this plague together and we will restore every American", "your struggle is our struggle and we will beat this virus", and "we have learned so much we will be stronger than ever."

Rather than deliver more updates and information, the Trump Administration has chosen to inspire the people and assure them that America will indeed be Great Again. This seems to indicate that the worst of the coronavirus is over. 

# Conclusion

From these results, we have a few main takeaways from how the Trump Administration's speech regarding the coronavirus has changed over time. 

### If It Isn't One Thing, It's Another:
As time goes on, old issues are solved and new ones arise. The White House addresses the greatest need at the time. In March, a national emergency was declared to escalate the severity of the problem. Millions were out of work and filed for unemployment. In March, the Trump Administration spoke a lot about chambers of congress, as the stimulus bill was released, as this was the biggest need for many people. In April, states were beginning to reopen and thousands protested the shelter in place order, believing it was a violation of freedom. The Trump Administration discussed the concept of reopening states more in April, as this was the biggest need for many people in conservative states.

In a pandemic, there is a public concern over what is available. When something is scarce, fear arises in the people because they might not be able to have it even if they need it.  The collocates of the word "available" changed from April to March because different things became scarce. **Testing** was often used with the word **available** in both April and March due to initial claims that the Trump Administration was unable to deliver upon. Some experts argue that ample testing and tracing is the only way to overcome a pandemic. There was a brief shortage of masks in March, which caused the collocate **mask** to be used frequently with available. It was the greatest need of healthcare workers because they were putting their lives at risk without necessary protection. The Trump Administration increased production of ventilators and appears to be using the abundance of ventilators as a measure of success, so **ventilators**  was often used with **available** in April not because they were a great need, but because they were a metric of success.



### The Trump Administration Has Done It's Part and Pats Itself on the Back:

What is discussed is not entirely up to the Trump Administration; they are asked questions by reporters and must respond to what is relevant. However, how they speak about a pandemic shows leadership style and is completely within their control. In March, key bigrams were mainly in the first person and in April key bigrams tended to be in the third person. That means initially the Trump Administration spoke about thinks it (*we*) was doing to address the problem and later on discussed things that others (*he*,*they*) were or were not doing. This indicates a general shift in responsibility and ownership of the problem from Trump Administration to other people and parties. Along with this change in subject is a change in tone, to one that is more critical. Key bigrams of April included *can see*, *have enough*, *should have*, *more than*, *it was*, *we see*, *in fact*, and *you see* which implies the Trump Administration is correcting some initial assumptions of the state of the problem. Additionally, the bigram *should have* was used extensively in April to refer to other people and measures they should have taken.

Lastly, the Trump Administration is not so critical when it comes to their own actions. Using sentiment analysis we saw that the words surrounding the word *we* were far more positive than the words surrounding the word *he*. This shows that while Trump uses critical language of others and their response to the COVID-19 pandemic, he thinks that his Administration has generally done many positive things and handled the crisis well. 

![](trumpthumbs.gif)