<h1>The Political Role of COVID-19</h1>

Group 3: Justin Deutsch, Djustin8; Emily Lawson, emily03; Matthew Pinho, mpinho16

The COVID-19 pandemic has people flooding the internet with updates, facts, and opinions but peoples opinions can differ, change over time, align with their political parties or none of the above creating a lot of confusion. Now is not the time for indecisiveness, debate, and disagreement. The economy, people’s livelihoods, and most importantly people’s lives are at stake. 

Therefore, Group 3 will evaluate, and analyze the growth over time of the political relevance of COVID-19 as well as compare and contrast the sentiments and the perspective of the major political figures on the pandemic of both major political parties.

We want to know how have sentiments changed about the pandemic crisis over time generally and party-wise? How has the vocabulary changed? How different do Democrats and Republicans think? As well as how loyal are their members to their party’s viewpoints?

<h2> Step 1 </h2>

First we gathered all Tweets from 6 political representatives (3 from each party) from the beginning of March up until April 19th.

In [1]:
def appendTweet(data,tweet):
    row = [tweet.user.name,tweet.user.screen_name,tweet.full_text,
           tweet.created_at,tweet.favorite_count,tweet.retweet_count,hasattr(tweet, 'retweeted_status'),
           'media' in tweet.entities]
    row_series = pandas.Series(row, index = data.columns)
    data = data.append(row_series, ignore_index=True)
    return data

def getTweetsSince(since_id,username,data):
    page = 1
    while True:
        tweets = api.user_timeline(id=username,page=page,tweet_mode='extended')
        if tweets and str(tweets[-1].created_at) > since_id:
            for tweet in tweets:
                data = appendTweet(data,tweet)
        else:
            return data
        page += 1

<h3> Step 2 </h3>

Next we filtered the Tweets using a bag of words to find only the ones relevant to the pandemic

## Loading Data

In [100]:
# Loading in the data from FILE
DATA_FILE = './six_politicians.csv'

In [101]:
# Reading the data 
df = pd.read_csv(DATA_FILE, index_col=0)

# Converting the data Created to a datetime object
df['Created'] = pd.to_datetime(df['Created'])

df.shape, df.dtypes

((2498, 8),
 Name                      object
 Username                  object
 Text                      object
 Created           datetime64[ns]
 Favorite Count             int64
 Retweet Count              int64
 Retweeted                   bool
 Media Attached              bool
 dtype: object)

In [102]:
# Displaying the list of people and their twitter handles
names_handles = zip(df['Name'].unique().tolist(), df['Username'].unique().tolist())

print(f"{'Name':>20} ---> Twitter Handle")
print('----------------------------------------')
for n, h in names_handles:
    print(f'{n:>20} ---> {h}')

                Name ---> Twitter Handle
----------------------------------------
     Donald J. Trump ---> realDonaldTrump
        Ron DeSantis ---> GovRonDeSantis
         Marco Rubio ---> marcorubio
           Joe Biden ---> JoeBiden
        Andrew Cuomo ---> NYGovCuomo
      Bernie Sanders ---> BernieSanders


In [103]:
df.head()

Unnamed: 0,Name,Username,Text,Created,Favorite Count,Retweet Count,Retweeted,Media Attached
0,Donald J. Trump,realDonaldTrump,RT @WhiteHouse: LIVE: Press Briefing with Coro...,2020-04-19 22:25:08,0,3060,False,False
1,Donald J. Trump,realDonaldTrump,White House News Conference at 5:45. Thank you!,2020-04-19 20:13:28,62328,12827,False,False
2,Donald J. Trump,realDonaldTrump,Thank you to my boy! https://t.co/GAFe1AdZpt,2020-04-19 19:38:09,70865,25015,False,False
3,Donald J. Trump,realDonaldTrump,“On February 19th there was a Democratic Debat...,2020-04-19 19:18:05,63158,19546,False,False
4,Donald J. Trump,realDonaldTrump,Great book by @SenatorTimScott! https://t.co/9...,2020-04-19 19:15:28,20342,5419,False,False


## Preprocessing

### Removing Keywords

In [104]:
KEYWORD_FILE = './covid_keywords.txt'

In [193]:
# Loading keywords
keywords = []

with open(KEYWORD_FILE, 'r') as f:
    line = f.readline()
    while line:
        keywords.append(line.strip().lower())
        line = f.readline()

len(keywords), keywords[:5]

(61, ['unemployment', 'front line', 'testing', 'health', 'public health'])

In [204]:
# Making keywords regex safe then filtering df by them
safe_keywords = [re.escape(word) for word in keywords]
clean_df = df[df['Text'].str.lower().str.contains('|'.join(safe_keywords))].copy()

df.shape, clean_df.shape

((2498, 8), (1174, 8))

# Analysis

## Summary Statistics

### Count of Covid Tweets

In [73]:
# counts tweets per user
def count_tweets_per_user(df):
    counts = []
    
    for user in df['Username'].unique():
        counts.append((user, df[df['Username'] == user].shape[0]))
    
    return counts

In [198]:
from itertools import groupby

# counting tweets for each user before/after filtering by keywords
num_tweets_before_filtering = count_tweets_per_user(df)
num_tweets_after_filtering = count_tweets_per_user(clean_df)

# joining the two lists together 
num_tweets_before_after = num_tweets_before_filtering + num_tweets_after_filtering

# sorting by username so users' counts are next to each other, needed for groupby
num_tweets_before_after = sorted(num_tweets_before_after)

d = groupby(num_tweets_before_after, lambda x: x[0])
counts = []

for x in d:
    f = next(x[1])[1]
    s = next(x[1])[1]
    counts.append((x[0], max(f, s), min(f, s)))

counts

[('BernieSanders', 558, 250),
 ('GovRonDeSantis', 40, 16),
 ('JoeBiden', 500, 253),
 ('NYGovCuomo', 620, 407),
 ('marcorubio', 480, 161),
 ('realDonaldTrump', 300, 87)]

In [199]:
# Percent tweets that are covid related:
for entry in counts:
    print(f'{entry[0]}: {min(entry[1], entry[2]) / max(entry[1], entry[2]):.2%}')

BernieSanders: 44.80%
GovRonDeSantis: 40.00%
JoeBiden: 50.60%
NYGovCuomo: 65.65%
marcorubio: 33.54%
realDonaldTrump: 29.00%


### Word and Character Counts

In [208]:
clean_df['num_chars'] = clean_df['Text'].map(len)
clean_df['word_count'] = clean_df['Text'].map(lambda x: len(x.split()))     # this isn't robust against not,very,well,written,tweets
                                                                            # should change this to use regular expression or library 

In [209]:
clean_df[['num_chars', 'word_count']].describe()

Unnamed: 0,num_chars,word_count
count,1174.0,1174.0
mean,209.552811,33.672061
std,68.383854,11.937988
min,29.0,3.0
25%,144.0,24.0
50%,227.0,36.0
75%,273.0,43.75
max,333.0,57.0


In [None]:
## Loading Data

# Loading in the data from FILE
DATA_FILE = './six_politicians.csv'

# Reading the data 
df = pd.read_csv(DATA_FILE, index_col=0)

# Converting the data Created to a datetime object
df['Created'] = pd.to_datetime(df['Created'])

df.shape, df.dtypes

# Displaying the list of people and their twitter handles
names_handles = zip(df['Name'].unique().tolist(), df['Username'].unique().tolist())

print(f"{'Name':>20} ---> Twitter Handle")
print('----------------------------------------')
for n, h in names_handles:
    print(f'{n:>20} ---> {h}')

df.head()

## Preprocessing

### Removing Keywords

KEYWORD_FILE = './covid_keywords.txt'

# Loading keywords
keywords = []

with open(KEYWORD_FILE, 'r') as f:
    line = f.readline()
    while line:
        keywords.append(line.strip().lower())
        line = f.readline()

len(keywords), keywords[:5]

# Making keywords regex safe then filtering df by them
safe_keywords = [re.escape(word) for word in keywords]
clean_df = df[df['Text'].str.lower().str.contains('|'.join(safe_keywords))].copy()

df.shape, clean_df.shape