# A recent [New York Times article](https://www.nytimes.com/2021/02/05/briefing/marjorie-taylor-greene-uk-vaccine-biden-stimulus.html) asked why left leaning states/countries have had better Covid prevention, but worse vaccine distribution. This notebook seeks to visualize that pattern at the US State level. I will examine and visualize the relationship between Covid cases per capita and political lean of a state, as well as vaccine distribution per capita and political lean by state. I will use Biden's vote percentage for each state as a proxy for the partisan lean of a state.

# Please leave a comment with any feedback or suggestions!

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Import data
* sv - Vaccination distribution data by state per day
* cc - Covid cases by county per day
* ci - Information on each county
* vote - Each states vote shares for Trump and Biden in the 2020 election

In [None]:
sv = pd.read_csv('../input/usa-covid19-vaccinations/us_state_vaccinations.csv')
cc = pd.read_csv('../input/covid19-us-county-jhu-data-demographics/covid_us_county.csv')
ci = pd.read_csv('../input/covid19-us-county-jhu-data-demographics/us_county.csv')
vote = pd.read_csv("../input/2020-us-presidential-election-results-by-state/voting.csv")

# Get most recent vaccination data 

In [None]:
sv = sv[sv.date == "2021-03-01"]

# Preview the data

In [None]:
sv #State vaccine distribution data

In [None]:
cc # County covid case data

In [None]:
ci #County Information (to get county population)

In [None]:
vote #Vote counts and percentages for Trump and Biden in the 2020 election

# Merge data

Aggregate by county to get totals for each county

In [None]:
county_merged = pd.merge(cc, ci, how='left', on='fips' )
county_merged2 = county_merged.groupby("county_x").agg({"state_x": "min", "cases": "max", "deaths": "max", "population": "mean"})

Aggregate county case count totals by state to get totals for each state, and create column for cases per 100 people and deaths per 100 people

In [None]:
state_merged = county_merged2.groupby("state_x").sum()
state_merged["cases_per_hundred"] = (state_merged["cases"] / state_merged["population"]) * 100
state_merged["deaths_per_hundred"] = (state_merged["deaths"] / state_merged["population"]) * 100

Merge state case count totals with vaccination totals 

In [None]:
data = pd.merge(state_merged, sv, left_on= "state_x", right_on= "location")

Reduce data to get only the columns we need for this analysis

In [None]:
data = pd.merge(data, vote, left_on = "location", right_on = "state")
data = data[["location", "cases_per_hundred", "deaths_per_hundred", "people_vaccinated_per_hundred", "population", "biden_pct", "biden_win"]]
data

# Visualizing the rekationships with scatter plots

In [None]:
import matplotlib.colors as mcol
import matplotlib.cm as cm
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
plt.style.use('fivethirtyeight')


# Create scatter plot - Covid Cases vs Biden vote count by state

In [None]:
x = data.biden_pct
y = data.cases_per_hundred
z = data.population/50000
 
cm1 = mcol.LinearSegmentedColormap.from_list("MyCmapName",["r","b"])
plt.scatter(x, y, s=z, c=x, cmap=cm1, alpha=.8, edgecolors="grey", linewidth=2)
plt.plot(x, np.poly1d(np.polyfit(x, y, 1))(x), alpha = .5)

plt.xlabel("Biden Vote Percentage")
plt.ylabel("Covid Cases per 100")
plt.title("Progressive states have had better virus prevention...")
 
plt.show()


# Create scatter plot - Vaccine distribution vs Biden vote count by state

In [None]:
x = data.biden_pct
y = data.people_vaccinated_per_hundred
z = data.population/50000
 
cm1 = mcol.LinearSegmentedColormap.from_list("MyCmapName",["r","b"])
plt.scatter(x, y, s=z, c=x, cmap=cm1, alpha=.8, edgecolors="grey", linewidth=2)
plt.plot(x, np.poly1d(np.polyfit(x, y, 1))(x), alpha = .5) 

plt.xlabel("Biden Vote Percentage")
plt.ylabel("Vaccinations per 100")
plt.title("...and slightly better vaccine distribution")
 
plt.show()

# Create 3d plot of Vote count (x), Case count (y), and Vaccine distribution (z)

In [None]:
from mpl_toolkits import mplot3d

In [None]:
z = data.people_vaccinated_per_hundred
x = data.biden_pct
y = data.cases_per_hundred
 
# Creating figure
fig = plt.figure(figsize = (10, 7))
ax = plt.axes(projection ="3d")
ax.scatter3D(x, y, z, c = x, cmap = cm1)
plt.title("Partisan lean vs Covid cases per 100 vs Vaccines given per 100")
xLabel = ax.set_xlabel("Biden Vote Percentage", linespacing=3.2)
yLabel = ax.set_ylabel("Covid Cases Per 100", linespacing=3.1)
zLabel = ax.set_zlabel("Vaccinations Per 100", linespacing=3.4)
 
# show plot
plt.show()

In [None]:
!pip install vega_datasets

# Mapping cases counts and vote share

In [None]:
import altair as alt
from vega_datasets import data
pop = data.population_engineers_hurricanes()
vote = pd.merge(vote, pop)
covid = county_merged[county_merged.date == "2021-02-05"]
states = alt.topo_feature(data.us_10m.url, 'states')

variable_list = ['biden_pct', 'biden_vote', 'biden_win']

background = alt.Chart(states).mark_geoshape().encode(
    color=alt.Color('biden_pct:Q', scale=alt.Scale(scheme='redblue')),
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(vote, 'id', list(vote.columns))
).properties(
    width=1000,
    height=600
).project(
    type='albersUsa'
)

points = alt.Chart(covid).mark_circle().encode(
    longitude='long_x',
    latitude='lat_x',
    size=alt.Size('cases', scale=alt.Scale(range=[10, 500])),
    tooltip='county_x'
)

background + points

# What can we learn from this

The data shows some accuracy on a US State level to the claims of the author of the NYT article, as more liberal states have had fewer Covid Cases per capita, but fewer vaccines distrivuted per capita as well. Of course, no causal relationship can be proven, as many confounding variables (population density for one) are at play. 

# Next steps

1. Visualize relationship between Case count/vaccine distribution and Political lean by **country**. This will entail using a different proxy for partisan lean (Government structure? Social progress index? Other Ideas?)
2. Visualize relationships by state for death counts, as well as hospitalizations and mask wearing