# Plotting change in party leaning between 2016 and 2020 presidential elections by county.

## Method

Get a rough percantage of the number of votes which were democratic by county for each election year, take the difference, and plot the result into a map.

A *positive difference* indicates that the county's political leaning has shifted towards the democratic candidate; 
and a *negative difference* indicates that the county's political leaning has shifted away from the democratic candidate.

These percentages only take into consideration votes for the democratic and republican parties -- third-party parties are not included.

## Motivation

This was done as a personal exercise and was my first time using python for data processing. It was also my first time trying to plot anything on a map. 

## Known issues

It appears that the 2020 file county names do not all align with the names in the 2016 file.  Since this was just an exercise, I'm not planning to add another mapping file (cross-walk) to fill in the remainder of the map.

The code would be cleaner if some things could be moved into functions and re-used.

A 2020 elections file which included `fips` would also simplify things since the entire chunk of state-county-fips code to map the 2020 results could be removed completely.  

## Findings

I did not attempt to make any conclusions based on the outcome.

In [None]:
#import chart_studio.plotly as py
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot, plot
init_notebook_mode(connected=True)

In [None]:
import pandas as pd

# used for conditional derived column
import numpy as np

In [None]:
import json

# the geo data must be json formatted (not loaded into a dataframe!) 
geofilepath = r'/kaggle/input/geojsoncountiesfipsjson/geojson-counties-fips.json'

with open(geofilepath) as f:
    countygeo = json.load(f)


## Load in the 2016 data


In [None]:

# load 2016 data -- winner by county
temp = pd.read_csv(r'/kaggle/input/2016uspresidentialvotebycounty/pres16results.csv')
temp = temp.dropna()

df = temp[['fips', 'lead', 'st', 'county', 'cand', 'votes']].drop_duplicates()

df = df[(df['cand']=='Hillary Clinton') | (df['cand']=='Donald Trump')] # Selecting only democratic and republican party.

df['party'] = np.where(df['cand'] == 'Hillary Clinton', 'DEM', 'REP')
df['won'] = np.where(df['cand'] == df['lead'], True, False)

df.drop('cand', axis=1, inplace=True)
df.drop('lead', axis=1, inplace=True)

df = df.groupby(['st', 'county', 'fips', 'party']).sum().unstack(level=-1)

df.drop('won', axis=1, inplace=True)
df.columns = ['DEM_2016','REP_2016']
df = df.reset_index()

# Percentage of Democratic Party votes.
df['DEM_percent_2016'] = df['DEM_2016']/(df['REP_2016']+df['DEM_2016']) *100

# give in a better name

df_2016 = df
df_2016.head()


## Load in the 2020 data

In [None]:
df = pd.read_csv(r'/kaggle/input/us-election-2020/president_county_candidate.csv')

df = df[(df['party']=='DEM') | (df['party']=='REP')] # Selecting only democratic and republican party.

df = df.groupby(['state', 'county', 'party']).sum().unstack()

df.drop('won', axis=1, inplace=True)
df.columns = ['DEM_2020','REP_2020']
df = df.reset_index()

# Percentage of Democratic Party votes.
df['DEM_percent_2020'] = df['DEM_2020']/(df['REP_2020']+df['DEM_2020']) *100

df_2020_t = df

df_2020_t.head()

###  Add the county information (`fips`, specifically), to the 2020 data

In [None]:
# load county data
# (we read this same file above, but whatever.)
temp = pd.read_csv(r'/kaggle/input/2016uspresidentialvotebycounty/pres16results.csv')
#temp = temp.dropna()

counties = temp[['st', 'county', 'fips']].drop_duplicates()

counties.head()

In [None]:
# Adding one more dataset from kaggle to get USA state codes so that we can join on multiple columns 
# -- it's totally possible for counties to have the same name accross state lines.
state_code = pd.read_csv(r"/kaggle/input/latitude-and-longitude-for-every-country-and-state/world_country_and_usa_states_latitude_and_longitude_values.csv")
state_code = state_code[['usa_state_code','usa_state']]

state_code = state_code.dropna()

df2 = df_2020_t.merge(state_code, left_on='state',right_on='usa_state', how='inner')

df2 = df2.merge(counties, left_on=['usa_state_code', 'county'], right_on=['st', 'county'], how='inner')

df2.drop(['usa_state_code', 'state', 'usa_state'], axis=1, inplace=True )

df_2020 = df2
df_2020.head()


## Plotting!


In [None]:
import plotly.express as px

In [None]:
# fix up fips to all be 5 digits (ie. add leading 0s)
df_2016['fips'] = df_2016['fips'].str.zfill(5)
df_2020['fips'] = df_2020['fips'].str.zfill(5)

### Plot raw 2016 dem percent

In [None]:
fig = px.choropleth(df_2016, geojson=countygeo, locations='fips', 
                        color='DEM_percent_2016',
                           color_continuous_scale="rainbow",
                           range_color=(0, 100),
                           scope="usa",
                           labels={'DEM_percent_2016':'dem perc 2016'}
                          )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

### Plot raw 2020 dem percent


In [None]:
fig = px.choropleth(df_2020, geojson=countygeo, locations='fips', 
                        color='DEM_percent_2020',
                           color_continuous_scale="rainbow",
                           range_color=(0, 100),
                           scope="usa",
                           labels={'DEM_percent_2020':'dem perc 2020'}
                          )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

## Blend the data for a 2016 -> 2020 comparison

In [None]:
df = df_2016.merge(df_2020, on='fips', how='outer')
df['DEM_delta'] = df['DEM_percent_2020'] - df['DEM_percent_2016']

df_comb = df
df_comb.head()

In [None]:
upper = df_comb['DEM_delta'].max()
lower = df_comb['DEM_delta'].min()

fig = px.choropleth(df_comb, geojson=countygeo, locations='fips', 
                        color='DEM_delta',
                           color_continuous_scale="rainbow",
                           range_color=(lower, upper),
                           scope="usa",
                           labels={'county_x': 'county', 'DEM_delta':'dem perc county change'}
                          )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()