## US Election 2020 Twitter Sentiment Analysis 

There are plenty of existing population polls trying to predict the outcome of a Presidential election. It would be of interest to perform ML sentiment analysis of a sample of past Twitter feeds (before it became X) to see how accurate it would be. Of course it begs the question if the sample population in Twitter has a diverse representation. On the other hand, the same can be said of the traditional polls and whether their samples are also diverse. 

#### Dataset 
I plan to use the [US Election 2020 Sample Dataset](https://www.kaggle.com/datasets/manchunhui/us-election-2020-tweets/data) from Kaggle for this analysis. It consists of two CSV files totaling about 1.7 million rows, one with Tweets focused on Joe Biden, and the other on Donald Trump. The dataset needs to be culled of Tweets originating outside of the US. While that doesn’t guarantee that non-US citizens in the US are Tweeting (or that citizens outside of the US are Tweeting) it helps narrow down any international opinions that would skew the results. The rest would be just data cleaning, for example, removing hashtags/mentions, URLs, and any extraneous text that is not generally readable and that may not convert to useful tokens. However, there some tools exist to help with data cleaning as well.

#### Tools 
As of late there are several publicly available LLMs that can assist with sentiment analysis, specifically on Twitter/X data. I'd like to explore the use of Hugging Face's [Twitter roBERTa-base Sentiment Analysis module](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest).

In [10]:
# Import Libraries 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import plotly.express as px 
import tqdm     # Shows progress bar for long tasks

# Hugging Face Transformer Import
from transformers import pipeline

---
#### Reading in datasets

In [None]:
# Reading Trump Dataset 
trump = pd.read_csv("input/hashtag_donaldtrump.csv", lineterminator='\n')

In [None]:
# Reading Biden Dataset 
biden = pd.read_csv("input/hashtag_joebiden.csv", lineterminator='\n') 

---
#### Exploratory Data Analysis

In [6]:
trump.head()

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_followers_count,user_location,lat,long,city,country,continent,state,state_code,collected_at
0,2020-10-15 00:00:01,1.316529e+18,#Elecciones2020 | En #Florida: #JoeBiden dice ...,0.0,0.0,TweetDeck,360666500.0,El Sol Latino News,elsollatinonews,🌐 Noticias de interés para latinos de la costa...,...,1860.0,"Philadelphia, PA / Miami, FL",25.77427,-80.19366,,United States of America,North America,Florida,FL,2020-10-21 00:00:00
1,2020-10-15 00:00:01,1.316529e+18,"Usa 2020, Trump contro Facebook e Twitter: cop...",26.0,9.0,Social Mediaset,331617600.0,Tgcom24,MediasetTgcom24,Profilo ufficiale di Tgcom24: tutte le notizie...,...,1067661.0,,,,,,,,,2020-10-21 00:00:00.373216530
2,2020-10-15 00:00:02,1.316529e+18,"#Trump: As a student I used to hear for years,...",2.0,1.0,Twitter Web App,8436472.0,snarke,snarke,"Will mock for food! Freelance writer, blogger,...",...,1185.0,Portland,45.520247,-122.674195,Portland,United States of America,North America,Oregon,OR,2020-10-21 00:00:00.746433060
3,2020-10-15 00:00:02,1.316529e+18,2 hours since last tweet from #Trump! Maybe he...,0.0,0.0,Trumpytweeter,8.283556e+17,Trumpytweeter,trumpytweeter,"If he doesn't tweet for some time, should we b...",...,32.0,,,,,,,,,2020-10-21 00:00:01.119649591
4,2020-10-15 00:00:08,1.316529e+18,You get a tie! And you get a tie! #Trump ‘s ra...,4.0,3.0,Twitter for iPhone,47413800.0,Rana Abtar - رنا أبتر,Ranaabtar,"Washington Correspondent, Lebanese-American ,c...",...,5393.0,Washington DC,38.894992,-77.036558,Washington,United States of America,North America,District of Columbia,DC,2020-10-21 00:00:01.492866121


In [133]:
# Show size of the Trump dataset
print(trump.shape)

# Display all the columns in the DataFrame 
print(trump.columns)

(970919, 22)
Index(['created_at', 'tweet_id', 'tweet', 'likes', 'retweet_count', 'source',
       'user_id', 'user_name', 'user_screen_name', 'user_description',
       'user_join_date', 'user_followers_count', 'user_location', 'lat',
       'long', 'city', 'country', 'continent', 'state', 'state_code',
       'collected_at', 'candidate'],
      dtype='object')


In [None]:
# Show distribution of non-null features
trump.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 970919 entries, 0 to 970918
Data columns (total 21 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   created_at            970919 non-null  object 
 1   tweet_id              970919 non-null  float64
 2   tweet                 970919 non-null  object 
 3   likes                 970919 non-null  float64
 4   retweet_count         970919 non-null  float64
 5   source                970043 non-null  object 
 6   user_id               970919 non-null  float64
 7   user_name             970897 non-null  object 
 8   user_screen_name      970919 non-null  object 
 9   user_description      869651 non-null  object 
 10  user_join_date        970919 non-null  object 
 11  user_followers_count  970919 non-null  float64
 12  user_location         675957 non-null  object 
 13  lat                   445719 non-null  float64
 14  long                  445719 non-null  float64
 15  

In [12]:
biden.head()

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_followers_count,user_location,lat,long,city,country,continent,state,state_code,collected_at
0,2020-10-15 00:00:01,1.316529e+18,#Elecciones2020 | En #Florida: #JoeBiden dice ...,0.0,0.0,TweetDeck,360666500.0,El Sol Latino News,elsollatinonews,🌐 Noticias de interés para latinos de la costa...,...,1860.0,"Philadelphia, PA / Miami, FL",25.77427,-80.19366,,United States of America,North America,Florida,FL,2020-10-21 00:00:00
1,2020-10-15 00:00:18,1.316529e+18,#HunterBiden #HunterBidenEmails #JoeBiden #Joe...,0.0,0.0,Twitter for iPad,809904400.0,Cheri A. 🇺🇸,Biloximeemaw,"Locked and loaded Meemaw. Love God, my family ...",...,6628.0,,,,,,,,,2020-10-21 00:00:00.517827283
2,2020-10-15 00:00:20,1.316529e+18,@IslandGirlPRV @BradBeauregardJ @MeidasTouch T...,0.0,0.0,Twitter Web App,3494182000.0,Flag Waver,Flag_Wavers,,...,1536.0,Golden Valley Arizona,46.304036,-109.171431,,United States of America,North America,Montana,MT,2020-10-21 00:00:01.035654566
3,2020-10-15 00:00:21,1.316529e+18,@chrislongview Watching and setting dvr. Let’s...,0.0,0.0,Twitter for iPhone,8.242596e+17,Michelle Ferg,MichelleFerg4,,...,27.0,,,,,,,,,2020-10-21 00:00:01.553481849
4,2020-10-15 00:00:22,1.316529e+18,#censorship #HunterBiden #Biden #BidenEmails #...,1.0,0.0,Twitter Web App,1.032807e+18,the Gold State,theegoldstate,A Silicon Valley #independent #News #Media #St...,...,390.0,"California, USA",36.701463,-118.755997,,United States of America,North America,California,CA,2020-10-21 00:00:02.071309132


In [49]:
# Show size of the Biden dataset
print(biden.shape)

# Display all the columns in the DataFrame 
print(biden.columns)

(776886, 22)
Index(['created_at', 'tweet_id', 'tweet', 'likes', 'retweet_count', 'source',
       'user_id', 'user_name', 'user_screen_name', 'user_description',
       'user_join_date', 'user_followers_count', 'user_location', 'lat',
       'long', 'city', 'country', 'continent', 'state', 'state_code',
       'collected_at', 'candidate'],
      dtype='object')


Add a column to differentiate tweets from Trump and Biden datasets before we combine them.

In [38]:
# Create a new column 'candidate' to differentiate 
# between tweets of Trump and Biden upon concatination 
trump['candidate'] = 'trump'

# Biden dataframe 
biden['candidate'] = 'biden'

# Combine the dataframes 
tweets_df = pd.concat([trump, biden]) 
tweets_df = tweets_df.reset_index(drop=True)

# Final data shape 
print('Final Data Shape :', tweets_df.shape) 

# View the first few rows 
print("\nFirst few rows:") 
print(tweets_df.head(3)) 

Final Data Shape : (1747805, 22)

First few rows:
            created_at      tweet_id  \
0  2020-10-15 00:00:01  1.316529e+18   
1  2020-10-15 00:00:01  1.316529e+18   
2  2020-10-15 00:00:02  1.316529e+18   

                                               tweet  likes  retweet_count  \
0  #Elecciones2020 | En #Florida: #JoeBiden dice ...    0.0            0.0   
1  Usa 2020, Trump contro Facebook e Twitter: cop...   26.0            9.0   
2  #Trump: As a student I used to hear for years,...    2.0            1.0   

             source      user_id           user_name user_screen_name  \
0         TweetDeck  360666534.0  El Sol Latino News  elsollatinonews   
1  Social Mediaset   331617619.0             Tgcom24  MediasetTgcom24   
2   Twitter Web App    8436472.0              snarke           snarke   

                                    user_description  ...  \
0  🌐 Noticias de interés para latinos de la costa...  ...   
1  Profilo ufficiale di Tgcom24: tutte le notizie...  ...   


In [50]:
# Show info on combined tweet dataset
tweets_df.info(show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1747805 entries, 0 to 1747804
Data columns (total 22 columns):
 #   Column                Non-Null Count    Dtype  
---  ------                --------------    -----  
 0   created_at            1747805 non-null  object 
 1   tweet_id              1747805 non-null  float64
 2   tweet                 1747805 non-null  object 
 3   likes                 1747805 non-null  float64
 4   retweet_count         1747805 non-null  float64
 5   source                1746216 non-null  object 
 6   user_id               1747805 non-null  float64
 7   user_name             1747758 non-null  object 
 8   user_screen_name      1747805 non-null  object 
 9   user_description      1564528 non-null  object 
 10  user_join_date        1747805 non-null  object 
 11  user_followers_count  1747805 non-null  float64
 12  user_location         1219049 non-null  object 
 13  lat                   801012 non-null   float64
 14  long                  801012 non-n

In [163]:
# Validate we have the Trump column set properly
tweets_df[tweets_df['candidate'] == 'trump'].head()

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_location,lat,long,city,country,continent,state,state_code,collected_at,candidate
0,2020-10-15 00:00:01,1.316529e+18,#Elecciones2020 | En #Florida: #JoeBiden dice ...,0.0,0.0,TweetDeck,360666500.0,El Sol Latino News,elsollatinonews,🌐 Noticias de interés para latinos de la costa...,...,"Philadelphia, PA / Miami, FL",25.77427,-80.19366,,US,North America,Florida,FL,2020-10-21 00:00:00,trump
2,2020-10-15 00:00:02,1.316529e+18,"#Trump: As a student I used to hear for years,...",2.0,1.0,Twitter Web App,8436472.0,snarke,snarke,"Will mock for food! Freelance writer, blogger,...",...,Portland,45.520247,-122.674195,Portland,US,North America,Oregon,OR,2020-10-21 00:00:00.746433060,trump
4,2020-10-15 00:00:08,1.316529e+18,You get a tie! And you get a tie! #Trump ‘s ra...,4.0,3.0,Twitter for iPhone,47413800.0,Rana Abtar - رنا أبتر,Ranaabtar,"Washington Correspondent, Lebanese-American ,c...",...,Washington DC,38.894992,-77.036558,Washington,US,North America,District of Columbia,DC,2020-10-21 00:00:01.492866121,trump
5,2020-10-15 00:00:17,1.316529e+18,@CLady62 Her 15 minutes were over long time ag...,2.0,0.0,Twitter for Android,1138416000.0,Farris Flagg,FarrisFlagg,#BidenHarris2020 #JoeBiden2020 #KamalaHarrisFo...,...,"Perris,California",33.782519,-117.228648,,US,North America,California,CA,2020-10-21 00:00:01.866082651,trump
6,2020-10-15 00:00:17,1.316529e+18,@richardmarx Glad u got out of the house! DICK...,0.0,0.0,Twitter for iPhone,7.674018e+17,Michael Wilson,wilsonfire9,,...,"Powell, TN",,,,,,,,2020-10-21 00:00:02.239299182,trump


In [164]:
# Validate we have the Biden column set properly
tweets_df[tweets_df['candidate'] == 'biden'].head()

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_location,lat,long,city,country,continent,state,state_code,collected_at,candidate
970919,2020-10-15 00:00:01,1.316529e+18,#Elecciones2020 | En #Florida: #JoeBiden dice ...,0.0,0.0,TweetDeck,360666500.0,El Sol Latino News,elsollatinonews,🌐 Noticias de interés para latinos de la costa...,...,"Philadelphia, PA / Miami, FL",25.77427,-80.19366,,US,North America,Florida,FL,2020-10-21 00:00:00,biden
970921,2020-10-15 00:00:20,1.316529e+18,@IslandGirlPRV @BradBeauregardJ @MeidasTouch T...,0.0,0.0,Twitter Web App,3494182000.0,Flag Waver,Flag_Wavers,,...,Golden Valley Arizona,46.304036,-109.171431,,US,North America,Montana,MT,2020-10-21 00:00:01.035654566,biden
970923,2020-10-15 00:00:22,1.316529e+18,#censorship #HunterBiden #Biden #BidenEmails #...,1.0,0.0,Twitter Web App,1.032807e+18,the Gold State,theegoldstate,A Silicon Valley #independent #News #Media #St...,...,"California, USA",36.701463,-118.755997,,US,North America,California,CA,2020-10-21 00:00:02.071309132,biden
970925,2020-10-15 00:00:25,1.316529e+18,"In 2020, #NYPost is being #censorship #CENSORE...",0.0,0.0,Twitter for iPhone,19940330.0,Change Illinois | Biden will increase taxes by...,changeillinois,"Illinois, home of Lincoln and Reagan, used to ...",...,"Chicago, Illinois",41.875562,-87.624421,Chicago,US,North America,Illinois,IL,2020-10-21 00:00:03.106963698,biden
970926,2020-10-15 00:00:31,1.316529e+18,►► Tell Politicians to STICK IT with this FREE...,0.0,0.0,Freebie-Depot,103083200.0,🆓 Freebie Depot,FreebieDepot,Free Stuff - No Fluff! Get all kinds of FREE ...,...,USA - Land of the FREE!,,,,,,,,2020-10-21 00:00:03.624790981,biden


Show distribution of tweets by `country` location.

In [53]:
# Seeing the various countries of tweets
pd.set_option('display.max_rows', 500)  # Display all rows
print(tweets_df['country'].value_counts())

country
United States of America            332495
United States                        61905
United Kingdom                       58051
India                                40091
Germany                              35379
France                               35299
Canada                               27805
Italy                                20076
Australia                            14899
Mexico                               10903
Turkey                               10368
The Netherlands                       9587
Brazil                                8735
Pakistan                              8597
Spain                                 7252
Ireland                               5452
Netherlands                           5279
Colombia                              4425
Argentina                             4404
Venezuela                             4333
Chile                                 3844
Belgium                               3823
Nigeria                               3810
Ban

---
#### Data Cleaning and Wrangling

While this dataset has quite a number of null values depending on the column, let's see if we can maximize the useful data without dropping ALL rows with null values. First let's update all USA country fields to show "US". Then, find where the `user_location` is specified, but the `country` is not (`state_code` also is missing in many rows). We will focus on users where stated location is somewhere in the USA with the assumption that this will be part of our voting and/or influence group (even though there's no way to tell whether any user is a US citizen or not, and some users aren't even necessarily individuals but represent organizations, or even bots). Sometimes users will input a city and state (or even 'USA') in the `user_location` field which we may be able to parse even if the `country` field is null.

For easier manipulation, we will change rows with various `country` names for USA to just "US". 

In [87]:
# Shorten any United States (/of America) to simply "US"
tweets_df['country'] = tweets_df['country'].replace({'United States of America': "US",'United States': "US"}) 

Filter rows to show only tweets where country field is US.

In [122]:
# Isolate tweets where `country` is "US"
tweets_cntryUSA = tweets_df[tweets_df["country"] == "US"]
tweets_cntryUSA.head()

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_location,lat,long,city,country,continent,state,state_code,collected_at,candidate
0,2020-10-15 00:00:01,1.316529e+18,#Elecciones2020 | En #Florida: #JoeBiden dice ...,0.0,0.0,TweetDeck,360666500.0,El Sol Latino News,elsollatinonews,🌐 Noticias de interés para latinos de la costa...,...,"Philadelphia, PA / Miami, FL",25.77427,-80.19366,,US,North America,Florida,FL,2020-10-21 00:00:00,trump
2,2020-10-15 00:00:02,1.316529e+18,"#Trump: As a student I used to hear for years,...",2.0,1.0,Twitter Web App,8436472.0,snarke,snarke,"Will mock for food! Freelance writer, blogger,...",...,Portland,45.520247,-122.674195,Portland,US,North America,Oregon,OR,2020-10-21 00:00:00.746433060,trump
4,2020-10-15 00:00:08,1.316529e+18,You get a tie! And you get a tie! #Trump ‘s ra...,4.0,3.0,Twitter for iPhone,47413800.0,Rana Abtar - رنا أبتر,Ranaabtar,"Washington Correspondent, Lebanese-American ,c...",...,Washington DC,38.894992,-77.036558,Washington,US,North America,District of Columbia,DC,2020-10-21 00:00:01.492866121,trump
5,2020-10-15 00:00:17,1.316529e+18,@CLady62 Her 15 minutes were over long time ag...,2.0,0.0,Twitter for Android,1138416000.0,Farris Flagg,FarrisFlagg,#BidenHarris2020 #JoeBiden2020 #KamalaHarrisFo...,...,"Perris,California",33.782519,-117.228648,,US,North America,California,CA,2020-10-21 00:00:01.866082651,trump
7,2020-10-15 00:00:18,1.316529e+18,@DeeviousDenise @realDonaldTrump @nypost There...,0.0,0.0,Twitter for iPhone,9.007611e+17,Stacey Gulledge 🇺🇸 Patriot ♥️ KAG 🙏 👮‍♀️♥️,sm_gulledge,"Patriot, Wife, “Shaken not Stirred” Mom of two...",...,"Ohio, USA",40.225357,-82.68814,,US,North America,Ohio,OH,2020-10-21 00:00:02.612515712,trump


Filter full tweet DF for rows where `country` is null but `user_location` is set. This way we can try to parse out tweet user locations that are still in the USA.

In [158]:
# Check to see where user_location is available, but no country specified
tweets_loconly = tweets_df[tweets_df['country'].isnull() & 
                           tweets_df['user_location'].notnull()]
tweets_loconly['user_location'].head(10)

6                     Powell, TN
14       USA - Land of the FREE!
21                  Mother Earth
26                    Everywhere
28                  WASherst, PA
37                       Danmark
47                  Mother Earth
51           #BlueWave USA 🌊🌊🌊🌊🌊
70    37°12'28.3"N 115°57'25.9"W
82               Chula Vista, CA
Name: user_location, dtype: object

Parse `user_location` (again, where `country` is null) for US state abbreviations at the end.

In [92]:
# Read in list of US State abbreviations to parse user_location
states = pd.read_csv("support/states.csv", names=['states'])
statelist = list(states['states'])
print(statelist)

['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'USA']


In [None]:
# Filter the rows that have country as null but location as filled for those
# that have the last two characters matching a State abbreviation
user_states = tweets_loconly[tweets_loconly['user_location'].\
                             str[-2:].isin(statelist)]

In [150]:
user_states['user_location'].head(10)

6             Powell, TN
28          WASherst, PA
82       Chula Vista, CA
89        New Castle, IN
124             Davie,FL
148      Chula Vista, CA
189    Olmsted Falls, OH
218      Chula Vista, CA
260         Lawrence, MA
311      Chula Vista, CA
Name: user_location, dtype: object

Some `user_location` rows indicate "USA" so let's parse for that too.

In [95]:
# Check if user_location indicates "USA" if no state abbreviation at the end
user_stateUSA = tweets_loconly[tweets_loconly['user_location'].\
                               str.contains("USA", na=False)]

In [149]:
user_stateUSA['user_location'].head(10)

14        USA - Land of the FREE!
51            #BlueWave USA 🌊🌊🌊🌊🌊
83            #BlueWave USA 🌊🌊🌊🌊🌊
101                 NorthWest USA
102                 NorthWest USA
133           #BlueWave USA 🌊🌊🌊🌊🌊
170           #BlueWave USA 🌊🌊🌊🌊🌊
177           #BlueWave USA 🌊🌊🌊🌊🌊
255    USA, the GREATEST Country!
274                   Chicago USA
Name: user_location, dtype: object

Show distribution of users across their stated "USA" location.

In [97]:
user_stateUSA['user_location'].value_counts()

user_location
NorthWest USA                            764
#BlueWave USA 🌊🌊🌊🌊🌊                      536
California USA                           379
N.E USA                                  276
The South-USA                            223
                                        ... 
Rochester, NH, USA                         1
South Standing Rock, USA                   1
La Plasha (USA)                            1
Free Range, USA - Can't Follow             1
420 Beech Ave, Madison, TN 37115, USA      1
Name: count, Length: 2118, dtype: int64

Check if there are any rows that have `country` set but with null `user_location`.

In [124]:
# Check to see where country is available, but no user_location specified
tweets_cntryonly = tweets_df[tweets_df['user_location'].isnull() & 
                             tweets_df['country'].notnull()]
tweets_cntryonly.head()

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_location,lat,long,city,country,continent,state,state_code,collected_at,candidate


Seeing none, now let's combine `tweets_cntryUSA` with other parsed location-only tweets.

In [129]:
# Combine DFs with "US" country, and those with no country but US locations.
user_USAonly = pd.concat([tweets_cntryUSA, user_states, user_stateUSA]) 
user_USAonly = user_USAonly.reset_index(drop=True)

# Make sure to fill null 'country' fields with "US"
user_USAonly['country'] = user_USAonly['country'].fillna(value="US")
print(f"New size of filtered DF: {user_USAonly.shape}")

New size of filtered DF: (435522, 22)


Now we have a bit over 435 thousand tweets for our final sentiment dataset which includes users with stated locations in the USA.

In [126]:
user_USAonly.head(10)

Unnamed: 0,created_at,tweet_id,tweet,likes,retweet_count,source,user_id,user_name,user_screen_name,user_description,...,user_location,lat,long,city,country,continent,state,state_code,collected_at,candidate
0,2020-10-15 00:00:01,1.316529e+18,#Elecciones2020 | En #Florida: #JoeBiden dice ...,0.0,0.0,TweetDeck,360666500.0,El Sol Latino News,elsollatinonews,🌐 Noticias de interés para latinos de la costa...,...,"Philadelphia, PA / Miami, FL",25.77427,-80.19366,,US,North America,Florida,FL,2020-10-21 00:00:00,trump
1,2020-10-15 00:00:02,1.316529e+18,"#Trump: As a student I used to hear for years,...",2.0,1.0,Twitter Web App,8436472.0,snarke,snarke,"Will mock for food! Freelance writer, blogger,...",...,Portland,45.520247,-122.674195,Portland,US,North America,Oregon,OR,2020-10-21 00:00:00.746433060,trump
2,2020-10-15 00:00:08,1.316529e+18,You get a tie! And you get a tie! #Trump ‘s ra...,4.0,3.0,Twitter for iPhone,47413800.0,Rana Abtar - رنا أبتر,Ranaabtar,"Washington Correspondent, Lebanese-American ,c...",...,Washington DC,38.894992,-77.036558,Washington,US,North America,District of Columbia,DC,2020-10-21 00:00:01.492866121,trump
3,2020-10-15 00:00:17,1.316529e+18,@CLady62 Her 15 minutes were over long time ag...,2.0,0.0,Twitter for Android,1138416000.0,Farris Flagg,FarrisFlagg,#BidenHarris2020 #JoeBiden2020 #KamalaHarrisFo...,...,"Perris,California",33.782519,-117.228648,,US,North America,California,CA,2020-10-21 00:00:01.866082651,trump
4,2020-10-15 00:00:18,1.316529e+18,@DeeviousDenise @realDonaldTrump @nypost There...,0.0,0.0,Twitter for iPhone,9.007611e+17,Stacey Gulledge 🇺🇸 Patriot ♥️ KAG 🙏 👮‍♀️♥️,sm_gulledge,"Patriot, Wife, “Shaken not Stirred” Mom of two...",...,"Ohio, USA",40.225357,-82.68814,,US,North America,Ohio,OH,2020-10-21 00:00:02.612515712,trump
5,2020-10-15 00:00:20,1.316529e+18,One of the single most effective remedies to e...,0.0,0.0,Twitter Web App,540476900.0,Jamieo,jamieo33,"Don't know what I am. Can lean left and right,...",...,"Pennsylvania, USA",40.969989,-77.727883,,US,North America,Pennsylvania,PA,2020-10-21 00:00:02.985732243,trump
6,2020-10-15 00:00:25,1.316529e+18,"In 2020, #NYPost is being #censorship #CENSORE...",0.0,0.0,Twitter for iPhone,19940330.0,Change Illinois | Biden will increase taxes by...,changeillinois,"Illinois, home of Lincoln and Reagan, used to ...",...,"Chicago, Illinois",41.875562,-87.624421,Chicago,US,North America,Illinois,IL,2020-10-21 00:00:04.105381834,trump
7,2020-10-15 00:00:26,1.316529e+18,#Trump #PresidentTrump #Trump2020LandslideVict...,3.0,5.0,Twitter for Android,1.243315e+18,Ron Burgundy,Anchorman_USA,"I'm kind of a Big Deal, People know me! I driv...",...,"San Diego, CA",32.717421,-117.162771,San Diego,US,North America,California,CA,2020-10-21 00:00:04.478598364,trump
8,2020-10-15 00:01:08,1.31653e+18,"@cnnbrk #Trump owes #RicardoAguirre $730,000 t...",3.0,2.0,Twitter for iPhone,194650400.0,MoClarker,MoClarker,Media Maven/Scientist/Fan O Fauci,...,Santa Monica Beach,47.005211,-88.96291,,US,North America,Michigan,MI,2020-10-21 00:00:07.091114077,trump
9,2020-10-15 00:01:10,1.31653e+18,#Democrats have spent more #tax #payer #paid #...,0.0,0.0,Twitter Web App,37738710.0,E Turner,Webinfotech,"Christian Veteran - My Oath to my Country, Fla...",...,United States,39.78373,-100.445882,,US,North America,,,2020-10-21 00:00:07.837547138,trump


Let's take a quick look the top users by tweet count, of which some do not appear to be particular individuals, but do contribute to sentiment.

In [155]:
# Find counts of unique users where `country` is "USA"\
user_USAonly["user_screen_name"].value_counts(sort=True)

user_screen_name
Hotpage_News       1843
steveziegenbus2    1259
JournalistJG       1082
mcleod              980
THCPetDoctor        913
                   ... 
KahanYankee           1
RonaldStanleyJr       1
dmariemart            1
DeplorableMan21       1
riotrantrave          1
Name: count, Length: 98928, dtype: int64

In [156]:
# Top10 users by tweet count
top10users = user_USAonly.groupby('user_screen_name')['tweet'].count().sort_values(ascending=False).reset_index().head(10) 

# Interactive bar chart 
top10_bar = px.bar(top10users, x='user_screen_name', y='tweet', 
template='plotly_dark', 
color_discrete_sequence=px.colors.qualitative.G10_r, 
# color_discrete_sequence=px.colors.qualitative.Dark24_r, 
title='Top10 Users by Tweet Count') 

# To view the graph 
top10_bar.show() 


In [None]:
# Top10 users by tweet count
tweets_candidate = user_USAonly.groupby('candidate')['tweet'].count().sort_values(ascending=False).reset_index().head(10) 

# Interactive bar chart 
top10_bar = px.bar(tweets_candidate, x='candidate', y='tweet', 
template='plotly_dark', 
color_discrete_sequence=px.colors.qualitative.Dark24_r, 
# color_discrete_sequence=px.colors.qualitative.Dark24_r, 
title='Tweet counts in the USA for Trump and Biden') 

# To view the graph 
top10_bar.show() 