# **2020 US General Election Turnout Rates Data**

Source for dataset: https://www.kaggle.com/imoore/2020-us-general-election-turnout-rates (the Kaggle user pulled this data from this source: https://data.world/government/vep-turnout)

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## **Exploratory Data Analysis**

In [None]:
df = pd.read_csv('2020 November General Election - Turnout Rates.csv')
print(df.shape)
df.head()

(52, 15)


Unnamed: 0,State,Source,Official/Unofficial,Total Ballots Counted (Estimate),Vote for Highest Office (President),VEP Turnout Rate,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,Overseas Eligible,State Abv
0,United States,,,158835004,,66.4%,239247182,257605088,7.8%,1461074,1962811,616440,3294457,4971025.0,
1,Alabama,https://www2.alabamavotes.gov/electionnight/st...,Unofficial,2306587,2297295.0,62.6%,3683055,3837540,2.3%,25898,50997,10266,67782,,AL
2,Alaska,https://www.elections.alaska.gov/results/20GEN...,,367000,,69.8%,525568,551117,3.4%,4293,2074,1348,6927,,AK
3,Arizona,https://results.arizona.vote/#/featured/18/0,,3400000,,65.5%,5189000,5798473,8.9%,38520,76844,7536,93699,,AZ
4,Arkansas,https://results.enr.clarityelections.com/AR/10...,Unofficial,1212030,1206697.0,55.5%,2182375,2331171,3.6%,17510,36719,24698,64974,,AR


52 Rows and 15 column, each row represents voter turnout statistics for each state in the United States

### **Preprocessing**

In [None]:
df.isna().sum()/df.shape[0]*100 # percentage of missing values for each column

State                                   0.000000
Source                                 23.076923
Official/Unofficial                    51.923077
Total Ballots Counted (Estimate)        0.000000
Vote for Highest Office (President)    53.846154
VEP Turnout Rate                        0.000000
Voting-Eligible Population (VEP)        0.000000
Voting-Age Population (VAP)             0.000000
% Non-citizen                           0.000000
Prison                                  0.000000
Probation                               0.000000
Parole                                  0.000000
Total Ineligible Felon                  0.000000
Overseas Eligible                      98.076923
State Abv                               1.923077
dtype: float64

In [None]:
df.drop(columns=['Source','Official/Unofficial','Vote for Highest Office (President)','State Abv'], inplace=True)
df.drop(index=[0], inplace=True)
df.head()

Unnamed: 0,State,Total Ballots Counted (Estimate),VEP Turnout Rate,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,Overseas Eligible
1,Alabama,2306587,62.6%,3683055,3837540,2.3%,25898,50997,10266,67782,
2,Alaska,367000,69.8%,525568,551117,3.4%,4293,2074,1348,6927,
3,Arizona,3400000,65.5%,5189000,5798473,8.9%,38520,76844,7536,93699,
4,Arkansas,1212030,55.5%,2182375,2331171,3.6%,17510,36719,24698,64974,
5,California,16800000,64.7%,25962648,30783255,15.0%,104730,0,102586,207316,


I dropped the columns that have missing values since most of them are not important for my data analysis and some of the columns are missing more than 50% of the data

In [None]:
df.fillna(0, inplace=True)
df.head()

Unnamed: 0,State,Total Ballots Counted (Estimate),VEP Turnout Rate,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,Overseas Eligible
1,Alabama,2306587,62.6%,3683055,3837540,2.3%,25898,50997,10266,67782,0
2,Alaska,367000,69.8%,525568,551117,3.4%,4293,2074,1348,6927,0
3,Arizona,3400000,65.5%,5189000,5798473,8.9%,38520,76844,7536,93699,0
4,Arkansas,1212030,55.5%,2182375,2331171,3.6%,17510,36719,24698,64974,0
5,California,16800000,64.7%,25962648,30783255,15.0%,104730,0,102586,207316,0


In [None]:
df.isna().sum()

State                               0
Total Ballots Counted (Estimate)    0
VEP Turnout Rate                    0
Voting-Eligible Population (VEP)    0
Voting-Age Population (VAP)         0
% Non-citizen                       0
Prison                              0
Probation                           0
Parole                              0
Total Ineligible Felon              0
Overseas Eligible                   0
dtype: int64

Filled in missing overseas eligible values with zeros in order to produce some visualizations for overseas eligible voters in each state

In [None]:
df[df['Overseas Eligible'] > 0]

Unnamed: 0,State,Total Ballots Counted (Estimate),VEP Turnout Rate,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon,Overseas Eligible


There appear to be no overseas eligible voters in this dataset so I am just going to drop the column

In [None]:
df.drop(columns=['Overseas Eligible'], inplace=True)
df.head()

Unnamed: 0,State,Total Ballots Counted (Estimate),VEP Turnout Rate,Voting-Eligible Population (VEP),Voting-Age Population (VAP),% Non-citizen,Prison,Probation,Parole,Total Ineligible Felon
1,Alabama,2306587,62.6%,3683055,3837540,2.3%,25898,50997,10266,67782
2,Alaska,367000,69.8%,525568,551117,3.4%,4293,2074,1348,6927
3,Arizona,3400000,65.5%,5189000,5798473,8.9%,38520,76844,7536,93699
4,Arkansas,1212030,55.5%,2182375,2331171,3.6%,17510,36719,24698,64974
5,California,16800000,64.7%,25962648,30783255,15.0%,104730,0,102586,207316


In [None]:
df['Total Ballots Counted (Estimate)'] = df['Total Ballots Counted (Estimate)'].astype('int')

ValueError: ignored

### **Visualizations**

In [None]:
df_temp = df.set_index('State')
sns.barplot(x='State', y='VEP Turnout Rate', data=df, 
            order=df_temp['VEP Turnout Rate'].sort_values(ascending=False).index[:15])
plt.xticks(rotation=90)
plt.xlabel('State', weight='bold')
plt.ylabel('VEP Turnout Rate (%)', weight='bold')
plt.show()

TypeError: ignored