**Question:**
Is there a significant difference between the number of deaths in states governed under different parties? ("Democrat" vs. "Republican")

We explore this by a hypothesis test which compares the proportion of COVID-19 deaths in states governed by Democrat and Republican parties, with a significance level of $\alpha = 0.05$.

Let $\pi_D$ and $\pi_R$ be the proportion of deaths in Decmocratic and Republican governed states respectively.

$$
H_0: \pi_D - \pi_R = 0 \\
H_a: \pi_D - \pi_R \neq 0
$$

Our T test statistic is:
$$
T = \frac{\mu_D - \mu_R}{SE}
$$

with standard error:
$$
SE = \sqrt{\frac{s_D^2}{n_D} + \frac{s_R^2}{n_R}}
$$

We assume every death is random and independent.

To explore this test we combine the following datasets:

**Data:**

`deaths-by-county-and-race.csv`

`state-abbreviations.csv`

`state-political-parties.csv`


In [1]:
# Load libraries and data
import numpy as np
import pandas as pd

county_deaths_df = pd.read_csv('../data/cdc-deaths-by-county-and-race.csv')
state_parties_df = pd.read_csv('../data/state-political-parties.csv')
state_abbreviations_df = pd.read_csv('../data/state-abbreviations.csv')

In [2]:
# Aggregate deaths by state
# One note in the county_deaths_df is each county has 3 rows of the same count
# We will keep every third row
county_deaths_df = county_deaths_df.iloc[::3]

state_deaths_df = county_deaths_df.groupby('State').sum()
state_deaths_df = state_deaths_df.filter(['Total deaths', 'COVID-19 Deaths'])

# Compute proportion of deaths by COVID-19 
state_deaths_df['COVID-19 Death Proportion'] = state_deaths_df['COVID-19 Deaths'] / state_deaths_df['Total deaths']

# Set index to state name for merging
state_deaths_df.index.rename('State Abbreviation', inplace=True)
state_deaths_df.head()

Unnamed: 0_level_0,Total deaths,COVID-19 Deaths,COVID-19 Death Proportion
State Abbreviation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AK,7025,764,0.108754
AL,104758,14461,0.138042
AR,51752,7338,0.141792
AZ,145717,20202,0.138639
CA,600256,77073,0.1284


In [3]:
# Attatch state Governer's party
state_parties_df['State Name'] = state_parties_df['Location'].apply(
    lambda x: x.upper() # Uppercase all state names
)
state_parties_df.drop(columns=['Location'], inplace=True) 
state_parties_df.head()

Unnamed: 0,Governor Political Affiliation,State Name
0,Republican,ALABAMA
1,Republican,ALASKA
2,Republican,ARIZONA
3,Republican,ARKANSAS
4,Democrat,CALIFORNIA


In [4]:
# Add state abbreviations to parties dataframe and set as index
state_parties_df = state_parties_df.merge(state_abbreviations_df, on='State Name')
state_parties_df.set_index('State Abbreviation', inplace=True)
state_parties_df.head()

Unnamed: 0_level_0,Governor Political Affiliation,State Name
State Abbreviation,Unnamed: 1_level_1,Unnamed: 2_level_1
AL,Republican,ALABAMA
AK,Republican,ALASKA
AZ,Republican,ARIZONA
AR,Republican,ARKANSAS
CA,Democrat,CALIFORNIA


In [5]:
# Merge state deaths and parties dataframes
state_deaths_df = state_deaths_df.merge(state_parties_df, left_index=True, right_index=True)
state_deaths_df.head()

Unnamed: 0_level_0,Total deaths,COVID-19 Deaths,COVID-19 Death Proportion,Governor Political Affiliation,State Name
State Abbreviation,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AK,7025,764,0.108754,Republican,ALASKA
AL,104758,14461,0.138042,Republican,ALABAMA
AR,51752,7338,0.141792,Republican,ARKANSAS
AZ,145717,20202,0.138639,Republican,ARIZONA
CA,600256,77073,0.1284,Democrat,CALIFORNIA


In [6]:
# Calculate the proportion of COVID-19 deaths for the whole US
US_COVID_19_deaths_proportion = state_deaths_df['COVID-19 Deaths'].sum() / state_deaths_df['Total deaths'].sum()
print(f"The US proportion of COVID-19 deaths is {US_COVID_19_deaths_proportion:.3%}")

The US proportion of COVID-19 deaths is 12.947%


In [7]:
# Calculate state standard deviations of COVID-19 deaths by political party
party_COVID_19_deaths_sd = state_deaths_df.groupby('Governor Political Affiliation')['COVID-19 Death Proportion'].std()
party_COVID_19_deaths_sd.head()

Governor Political Affiliation
Democrat      0.030144
Republican    0.025559
Name: COVID-19 Death Proportion, dtype: float64

In [8]:
# Calculate the proportion of COVID-19 deaths for each party
party_COVID_19_deaths_proportion = state_deaths_df.groupby('Governor Political Affiliation').sum()['COVID-19 Deaths'] / state_deaths_df.groupby('Governor Political Affiliation').sum()['Total deaths']
party_COVID_19_deaths_proportion.head()

Governor Political Affiliation
Democrat      0.125156
Republican    0.134201
dtype: float64

In [9]:
# Calculate hypothesis test statistic
proportion_difference = party_COVID_19_deaths_proportion['Democrat'] - party_COVID_19_deaths_proportion['Republican']
democrat_deaths = state_deaths_df.groupby('Governor Political Affiliation').sum()['COVID-19 Deaths']['Democrat']
republican_deaths = state_deaths_df.groupby('Governor Political Affiliation').sum()['COVID-19 Deaths']['Republican']
SE = np.sqrt(
    (
        party_COVID_19_deaths_sd['Democrat']**2 / democrat_deaths
    ) + (
        party_COVID_19_deaths_sd['Republican']**2 / republican_deaths
    )
)
t = proportion_difference / SE
print(f"""
The statistics for this test are:
Democrat deaths: {democrat_deaths}
Republican deaths: {republican_deaths}
Proportion difference: {proportion_difference:.3f}
SE: {SE:7f}
The t statistic is {t:.3f}
""")


The statistics for this test are:
Democrat deaths: 365035
Republican deaths: 356697
Proportion difference: -0.009
SE: 0.000066
The t statistic is -137.606



## Concluding remarks
We see that the overall USA COVID-19 death proportion is 12.95%, there is a slight deviance from this death proportion between states governed by Democrat and Republican parties. 

The p-value for a t-statistic of -137.6 is less than 0.05, so we reject the null hypothesis and assume $H_a$ is true.

We find that though the results seem marginally close, statistically there is a difference over the US population. 

Some points to take away are that there are many other factors that could explain the difference in COVID-19 death proportions such as vaccination mandates, social vulnerability, and other factors that are explored in other notebooks.