# Plotting Trump's Twitter behavior

Did Trump's approval rating have a relationship with his Twitter bahavior? And did Trump's most erratic online behavior correspond to the dates of major events of his presidency?

The following article by FiveThirtyEight (https://fivethirtyeight.com/features/never-tweet-mr-president/) showed the possiblity of a relationship between Trumps approval rating and his anger on Twitter early in his presidency. In this analysis, I attempt to create a similar metric to measure Trump's Twitter anger, and determine if there is a correlation between Trump's anger online, and his approval rating. Finally, I will plot Trump's twitter anger by day and week of his presidency, and highlight key moments of his term to visualize any potential patterns. 

# Data Used

1. New York Times complete list of Trump's insults on Twitter: [https://www.nytimes.com/interactive/2021/01/19/upshot/trump-complete-insult-list.html](http://)
2. Trump Twitter Archive full list of Trump's tweets during his presidency: [https://www.thetrumparchive.com/](http://)
3. Trump's approval rating as calculated by FiveThirtyEight's polling aggregation: [https://projects.fivethirtyeight.com/trump-approval-ratings/](http://)
4. A hand generated list of key moments of the Trump presidency - This is, of course, subjective and may be missing events or include events that one may not feel to be a 'key' moment. Take this with a grain of salt.

# Data cleaning

Importing data

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
insults = pd.read_csv("../input/all-trumps-twitter-insults-20152021/trump_insult_tweets_2014_to_2021.csv")
tweets = pd.read_csv("../input/trump-tweets-and-timeline/trumptweets.csv")

Counting number of total tweets and insults per day

In [None]:
insults_count = pd.to_datetime(insults.date).dt.date.value_counts().to_frame().reset_index()
insults_count.columns = ['date', 'insult_count']
tweets_count = pd.to_datetime(tweets.date).dt.date.value_counts().to_frame().reset_index()
tweets_count.columns = ['date', 'tweet_count']
tweets_insults = tweets_count.merge(insults_count, on='date', how='left')
tweets_insults

Cleaning approval data

In [None]:
approval = pd.read_csv("../input/approval/approval_topline.csv")
approval = approval[approval['subgroup'] == 'All polls']
approval['date'] = pd.to_datetime(approval.modeldate).dt.date
approval

Merging tweet counts, insult counts, and approval rating

In [None]:
full = approval.merge(tweets_insults, how = "left", on = "date")
full.insult_count = full.insult_count.fillna(0)
full.tweet_count = full.tweet_count.fillna(0)
full = full.sort_values(by = "date").reset_index()
full['insult_ratio'] = full.insult_count / full.tweet_count
full.insult_ratio = full.insult_ratio.fillna(0)

In [None]:
full.head(10)

# Daily

Plotting insults and total tweets by day

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(full.tweet_count)
plt.plot(full.insult_count)

Correlation between tweet count per day, insult count per day, and ratio of insults to total tweets per day and approval rating

In [None]:
print("Approval vs. Daily tweet Count:", np.corrcoef(full.approve_estimate, full.tweet_count)[0][1])
print("Approval vs. Daily insult Count:", np.corrcoef(full.approve_estimate, full.insult_count)[0][1])
print("Approval vs. Ratio of insults to tweets:", np.corrcoef(full.approve_estimate, full.insult_ratio)[0][1])

# Weekly

Aggregating data by week

In [None]:
full['week'] = pd.to_datetime(full.date).dt.week
full['year'] = pd.to_datetime(full.date).dt.year
full['month'] = pd.to_datetime(full.date).dt.month

In [None]:
insults_by_week = full.groupby(["week", "year"]).agg({'insult_count':'sum', 'tweet_count': 'sum',
                         'approve_estimate':'mean', 'date': "min"})
insults_by_week["insult_ratio"] = insults_by_week.insult_count / insults_by_week.tweet_count
insults_by_week.insult_ratio = insults_by_week.insult_ratio.fillna(0)
insults_by_week = insults_by_week.sort_values(by=['year', 'week'])
insults_by_week

In [None]:
print("Approval vs. Daily tweet Count:", np.corrcoef(insults_by_week.approve_estimate, insults_by_week.tweet_count)[0][1])
print("Approval vs. Daily insult Count:", np.corrcoef(insults_by_week.approve_estimate, insults_by_week.insult_count)[0][1])
print("Approval vs. Ratio of insults to tweets:", np.corrcoef(insults_by_week.approve_estimate, insults_by_week.insult_ratio)[0][1])

# Measuring Twitter Mood

The metric use to calculate Trump's "Twitter Mood Index" is an average of:
1. Ratio of insults to total tweets
2. Ratio of "negative" tweets (by TextBlob polarity score) to total tweets
3. Ratio of words tagged as Angry, Sad, Surpised, or Fearful to total emotional words (As determined by the tagger of the text2emotion package's tagger
4. Number of aggression points per tweet, with aggression points calculated as .25 points per word in all capital letters or exclamation point. (example: The following Trump tweet would get .5 aggression points:
"The Caravan is largely broken up thanks to the strong immigration laws of Mexico and their willingness to use them so as not to cause a giant scene at our Border. Because of the Trump Administrations actions, Border crossings are at a still UNACCEPTABLE 46 year low. Stop drugs!"

Don't take this metric *too* seriously. While it is intended to be a measure of Trump's mood as represented in his social media posting, it has plenty of flaws. But hopefully it can be a representation of the magnitude of angry and aggressive online behavior on a daily or weekly basis by the president.

Importing packages

In [None]:
from textblob import TextBlob

Calculating polarity score

In [None]:
polarity = []
positive_tweets = []
negative_tweets = []
neutral_tweets = []
for tweet in tweets.text:
    tweet_polarity = TextBlob(tweet).sentiment.polarity
    polarity.append(tweet_polarity)
    if tweet_polarity>0:
        positive_tweets.append(1)
        negative_tweets.append(0)
        neutral_tweets.append(0)
    if tweet_polarity == 0:
        positive_tweets.append(0)
        negative_tweets.append(0)
        neutral_tweets.append(1)
    if tweet_polarity <0:
        positive_tweets.append(0)
        negative_tweets.append(1)
        neutral_tweets.append(0)


Calculating ratio of emotional words

In [None]:
!pip install text2emotion
import text2emotion as te

In [None]:
happy = []
angry = []
surprise = []
sad = []
fear = []
for tweet in tweets.text:
    emotions = te.get_emotion(tweet)
    happy.append(emotions['Happy'])
    angry.append(emotions['Angry'])
    surprise.append(emotions['Surprise'])
    sad.append(emotions['Sad'])
    fear.append(emotions['Fear'])

In [None]:
tweets['polarity'] = polarity
tweets['positive_tweets'] = positive_tweets
tweets['negative_tweets'] = negative_tweets
tweets['neutral_tweets'] = neutral_tweets
tweets['happy'] = happy
tweets['angry'] = angry
tweets['surprise'] = surprise
tweets['sad'] = sad
tweets['fear'] = fear
tweets

Calculating Aggression Points

In [None]:
import string

def count_uppercase(a_str):
    uc = 0
    x = "".join(l for l in a_str if l not in string.punctuation)
    for c in x.split():
        if c.isupper() == True:
            uc = uc + 1
    return uc

def count_exclamation(a_str):
    e = 0
    for l in a_str:
        if l == "!":
            e = e + 1
    return e

uppers = []
exclamations = []
for tweet in tweets.text:
    uppers.append(count_uppercase(tweet))
    exclamations.append(count_exclamation(tweet))
    
tweets['uppers'] = uppers
tweets['exclamations'] = exclamations

Cleaning and merging data

In [None]:
tweets.date = pd.to_datetime(tweets.date).dt.date
tweets['tweet_count'] = tweets.date
tweets['count'] = tweets.positive_tweets + tweets.negative_tweets + tweets.neutral_tweets
tweetscounts = tweets[["date", "polarity", "positive_tweets", "negative_tweets", "neutral_tweets", "happy", "angry", "surprise", "sad", "fear", "count", 'uppers', 'exclamations']]
tweetscounts = tweetscounts.groupby('date').sum().reset_index()
tweetscounts.head(10)

Cleaning fully merged data

In [None]:
tweetsE_insults = tweetscounts.merge(insults_count, on='date', how='left')

In [None]:
full = approval.merge(tweetsE_insults, how = "left", on = "date")

In [None]:
full.insult_count = full.insult_count.fillna(0)
full['count'] = full['count'].fillna(0)
full = full.sort_values(by = "date").reset_index()
full['insult_ratio'] = full.insult_count / full['count']
full.insult_ratio = full.insult_ratio.fillna(0)
full.positive_tweets = full.positive_tweets.fillna(0)
full.negative_tweets = full.negative_tweets.fillna(0)
full.neutral_tweets = full.neutral_tweets.fillna(0)
full.polarity = full.polarity.fillna(0)
full.happy = full.happy.fillna(0)
full.angry = full.angry.fillna(0)
full.sad = full.sad.fillna(0)
full.fear = full.fear.fillna(0)
full.surprise = full.surprise.fillna(0)
full.uppers = full.uppers.fillna(0)
full.exclamations = full.exclamations.fillna(0)
full['week'] = pd.to_datetime(full.date).dt.week
full['year'] = pd.to_datetime(full.date).dt.year
full['month'] = pd.to_datetime(full.date).dt.month
full

Calculating Mood Index

In [None]:
full['MoodIndex'] = 100 *((full.insult_count + full.negative_tweets + full.angry + full.sad + full.surprise + full.fear + (.25*(full.exclamations + full.uppers))))/(4* full['count'])

MoodIndex and Approval rating by day of presidency

In [None]:
plt.figure(figsize=(3,4))
full.plot(kind='line',x = 'date', y='MoodIndex', color='red')
full.plot(kind='line',y='approve_estimate', color='blue')

Importing, cleaning, and merging timeline of key events of Trump presidency with rest of data

In [None]:
timeline = pd.read_csv("../input/trump-tweets-and-timeline/Trump_timeline.csv")
timeline.date = pd.to_datetime(timeline.date).dt.date

In [None]:
full = full.merge(timeline, how = 'left', on = 'date')

# Plot of Twitter Mood Index by day

Moseover the orange points to see which key event of the Trump presidency happened on that day

In [None]:
!pip install mplcursors

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import mplcursors
%matplotlib inline
plt.rcParams['figure.figsize'] = [10, 5]
%matplotlib notebook
x = full.date
y = full.MoodIndex
markers_on = pd.notnull(full.event)
labels = full.event[markers_on]

fig,ax = plt.subplots()
line, = plt.plot(x,y, markevery=markers_on)
plt.plot(x[markers_on], y[markers_on], linestyle = "", marker = 'o')
cursor = mplcursors.cursor(hover=True)
cursor.connect(
    "add", lambda sel: sel.annotation.set_text(labels[sel.target.index]))

# Aggregating by week

In [None]:
insults_by_week = full.groupby(["week", "year"]).agg({'insult_count':'sum', 'count': 'sum',
                         'approve_estimate':'mean', 'date': "min", "happy": "sum", "angry": "sum", "sad": "sum", "surprise": "sum", "fear": "sum", "exclamations": "sum", "uppers": "sum", "positive_tweets": "sum", "negative_tweets": "sum", "neutral_tweets": "sum", "polarity": "sum"})
insults_by_week["insult_ratio"] = insults_by_week.insult_count / insults_by_week['count']
insults_by_week.insult_ratio = insults_by_week.insult_ratio.fillna(0)
insults_by_week

Getting weekly Mood Index

In [None]:
insults_by_week['MoodIndex'] = 100 *((insults_by_week.insult_count + insults_by_week.negative_tweets + insults_by_week.angry + insults_by_week.sad + insults_by_week.surprise + insults_by_week.fear + (.25*(insults_by_week.exclamations + insults_by_week.uppers))))/(4* insults_by_week['count'])

In [None]:
insults_by_week = insults_by_week.sort_values(by=['year', 'week'])

Adding timeline events to weekly data

In [None]:
timeline['week'] = pd.to_datetime(timeline.date).dt.week
timeline['year'] = pd.to_datetime(timeline.date).dt.year
timeline = timeline.sort_values(by=['year', 'week'])
insults_by_week = insults_by_week.merge(timeline, on= ["week", "year"], how = 'left')
insults_by_week

# Plot of Twitter Mood Index by week

Moseover the orange points to see which key event of the Trump presidency happened on that day

In [None]:
plt.rcParams['figure.figsize'] = [10, 5]
%matplotlib inline
%matplotlib notebook
x = insults_by_week.date_x
y = insults_by_week.MoodIndex
markers_on = pd.notnull(insults_by_week.event)
labels = insults_by_week.event[markers_on]

fig,ax = plt.subplots()
markers_on = pd.notnull(insults_by_week.event)
line, = plt.plot(x,y, markevery=markers_on)
plt.plot(x[markers_on], y[markers_on], linestyle = "", marker = 'o')
cursor = mplcursors.cursor(hover=True)
cursor.connect(
    "add", lambda sel: sel.annotation.set_text(labels[sel.target.index]))

# What can we learn from this?

Definitively? Not much. And I'm sure the metric I came up with can be vastly improved. But it does seem that there may be a pattern between Trump's seemingly eratic decisions, and his mood on Twitter. Many of his key events came right before a spike in his Twitter Mood Score, or happened at high points. But this may all be coincidence. We do know that there was a weak positive correlation between his approval rating and his number of tweets per day. But it's interesting to think, that perhaps Trump's erratic decisions could have been forseen by looking at how he had been tweeting that week. 