# Analysis on COVID-19 Vaccinations and Deaths in the United States



In this analysis, I will be comparing COVID-19 vaccination rates and COVID-19 deaths and see if they have any correlation. I will be using two data sets in this analysis: 
- Our World In Data: https://ourworldindata.org/us-states-vaccinations
    - State-by-state data on COVID-19 vaccinations in the United States
- CDC: https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36
    - State-by-state data on COVID-19 excess deaths associated with COVID-19 in the United States

In [193]:
#all necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly 
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as py
from plotly.offline import iplot
from plotly.subplots import make_subplots
import plotly.figure_factory as ff

In [194]:
#read in data set

#1) Our World In Data Data set
df1 = pd.read_csv("us_state_vaccinations.csv")

#2) CDC Data Set
df2 = pd.read_csv("United_States_COVID-19_Cases_and_Deaths_by_State_over_Time.csv")

## Exploring the Datasets

In [195]:
df1.head()

Unnamed: 0,date,location,total_vaccinations,total_distributed,people_vaccinated,people_fully_vaccinated_per_hundred,total_vaccinations_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,distributed_per_hundred,daily_vaccinations_raw,daily_vaccinations,daily_vaccinations_per_million,share_doses_used,total_boosters,total_boosters_per_hundred
0,2021-01-12,Alabama,78134.0,377025.0,70861.0,0.15,1.59,7270.0,1.45,7.69,,,,0.207,,
1,2021-01-13,Alabama,84040.0,378975.0,74792.0,0.19,1.71,9245.0,1.53,7.73,5906.0,5906.0,1205.0,0.222,,
2,2021-01-14,Alabama,92300.0,435350.0,80480.0,,1.88,,1.64,8.88,8260.0,7083.0,1445.0,0.212,,
3,2021-01-15,Alabama,100567.0,444650.0,86956.0,0.28,2.05,13488.0,1.77,9.07,8267.0,7478.0,1525.0,0.226,,
4,2021-01-16,Alabama,,,,,,,,,,7498.0,1529.0,,,


In [196]:
df2.head()

Unnamed: 0,submission_date,state,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death,created_at,consent_cases,consent_deaths
0,12/01/2021,ND,163565,135705.0,27860.0,589,220.0,1907,,,9,0.0,12/02/2021 02:35:20 PM,Agree,Not agree
1,08/17/2020,MD,100715,,,503,0.0,3765,3616.0,149.0,3,0.0,08/19/2020 12:00:00 AM,,Agree
2,03/28/2022,VT,107785,,,467,35.0,585,,,0,0.0,03/29/2022 01:30:11 PM,Not agree,Not agree
3,03/18/2020,ME,44,44.0,0.0,12,0.0,0,0.0,0.0,0,0.0,03/20/2020 12:00:00 AM,Agree,Agree
4,08/29/2021,WA,556639,,,2991,479.0,6525,,,9,0.0,08/31/2021 12:00:00 AM,,


In [197]:
df1.tail()

Unnamed: 0,date,location,total_vaccinations,total_distributed,people_vaccinated,people_fully_vaccinated_per_hundred,total_vaccinations_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,distributed_per_hundred,daily_vaccinations_raw,daily_vaccinations,daily_vaccinations_per_million,share_doses_used,total_boosters,total_boosters_per_hundred
29151,2022-04-03,Wyoming,740229.0,949085.0,338155.0,51.1,127.9,295755.0,58.43,163.99,54.0,325.0,562.0,0.78,126103.0,21.79
29152,2022-04-04,Wyoming,740234.0,949085.0,338162.0,51.1,127.9,295760.0,58.43,163.99,5.0,324.0,560.0,0.78,126105.0,21.79
29153,2022-04-05,Wyoming,741946.0,949785.0,338316.0,51.2,128.2,296330.0,58.46,164.11,1712.0,523.0,904.0,0.781,126347.0,21.83
29154,2022-04-06,Wyoming,742407.0,950585.0,338372.0,51.21,128.28,296382.0,58.47,164.25,461.0,507.0,876.0,0.781,126422.0,21.84
29155,2022-04-07,Wyoming,743110.0,952485.0,338512.0,51.23,128.4,296503.0,58.49,164.57,703.0,565.0,976.0,0.78,126581.0,21.87


In [198]:
df2.tail()

Unnamed: 0,submission_date,state,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death,created_at,consent_cases,consent_deaths
48355,12/15/2020,DC,25339,,,301,0.0,720,,,4,0.0,12/16/2020 02:28:57 PM,,
48356,01/12/2022,WY,123743,97745.0,25998.0,989,246.0,1588,1588.0,0.0,0,0.0,01/13/2022 02:34:51 PM,Agree,Agree
48357,07/11/2020,RI,16456,,,52,0.0,978,,,3,0.0,03/25/2022 01:13:04 PM,Not agree,
48358,11/22/2021,AZ,1245127,1127692.0,117435.0,3249,403.0,21942,19414.0,2528.0,2,0.0,11/23/2021 02:18:53 PM,Agree,Agree
48359,04/16/2020,WY,401,296.0,105.0,8,0.0,2,2.0,0.0,0,0.0,04/16/2020 04:22:39 PM,Agree,Agree


In [199]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29156 entries, 0 to 29155
Data columns (total 16 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   date                                 29156 non-null  object 
 1   location                             29156 non-null  object 
 2   total_vaccinations                   26404 non-null  float64
 3   total_distributed                    26144 non-null  float64
 4   people_vaccinated                    26140 non-null  float64
 5   people_fully_vaccinated_per_hundred  24782 non-null  float64
 6   total_vaccinations_per_hundred       24881 non-null  float64
 7   people_fully_vaccinated              26060 non-null  float64
 8   people_vaccinated_per_hundred        24857 non-null  float64
 9   distributed_per_hundred              24861 non-null  float64
 10  daily_vaccinations_raw               25378 non-null  float64
 11  daily_vaccinations          

In [200]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48360 entries, 0 to 48359
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   submission_date  48360 non-null  object 
 1   state            48360 non-null  object 
 2   tot_cases        48360 non-null  int64  
 3   conf_cases       26643 non-null  float64
 4   prob_cases       26571 non-null  float64
 5   new_case         48360 non-null  int64  
 6   pnew_case        44658 non-null  float64
 7   tot_death        48360 non-null  int64  
 8   conf_death       26272 non-null  float64
 9   prob_death       26272 non-null  float64
 10  new_death        48360 non-null  int64  
 11  pnew_death       44690 non-null  float64
 12  created_at       48360 non-null  object 
 13  consent_cases    40295 non-null  object 
 14  consent_deaths   41106 non-null  object 
dtypes: float64(6), int64(4), object(5)
memory usage: 5.5+ MB


In [201]:
df1.describe()

Unnamed: 0,total_vaccinations,total_distributed,people_vaccinated,people_fully_vaccinated_per_hundred,total_vaccinations_per_hundred,people_fully_vaccinated,people_vaccinated_per_hundred,distributed_per_hundred,daily_vaccinations_raw,daily_vaccinations,daily_vaccinations_per_million,share_doses_used,total_boosters,total_boosters_per_hundred
count,26404.0,26144.0,26140.0,24782.0,24881.0,26060.0,24857.0,24861.0,25378.0,29091.0,27438.0,26144.0,9238.0,8788.0
mean,10415670.0,12860610.0,5476839.0,42.865106,99.337657,4486172.0,51.700431,122.831304,40459.71,39216.53,3653.621838,0.792863,2027009.0,19.916246
std,45396690.0,55950920.0,23290170.0,21.569616,50.880264,19557680.0,22.735779,59.643171,193288.3,179525.8,2738.608534,0.117025,8777949.0,10.970407
min,416.0,6000.0,401.0,0.0,0.17,1.0,0.24,6.14,0.0,0.0,0.0,0.043,8.0,0.01
25%,798025.8,1025415.0,462565.8,28.73,64.49,303101.5,37.59,84.9,1326.25,2363.0,1738.0,0.755,164649.0,10.85
50%,2796562.0,3450545.0,1543121.0,47.355,102.08,1159176.0,55.21,124.6,7859.0,9327.0,2890.5,0.804,562110.5,20.165
75%,7156782.0,8510450.0,3631310.0,58.52,135.6,3050392.0,67.35,167.51,24654.25,24412.0,4870.0,0.856,1448513.0,27.59
max,563999100.0,708450300.0,255975700.0,87.76,222.59,218135600.0,100.6,280.46,4629928.0,3384387.0,27652.0,1.138,98424740.0,52.45


In [202]:
df2.describe()

Unnamed: 0,tot_cases,conf_cases,prob_cases,new_case,pnew_case,tot_death,conf_death,prob_death,new_death,pnew_death
count,48360.0,26643.0,26571.0,48360.0,44658.0,48360.0,26272.0,26272.0,48360.0,44690.0
mean,457527.1,471741.5,68681.618456,1646.044913,263.225133,7487.334615,7727.31132,856.40724,19.952895,2.043656
std,839239.5,776437.1,107786.375493,5296.282952,1559.424472,12194.798981,9693.196417,1279.032256,47.268477,27.430908
min,0.0,0.0,0.0,-10199.0,-171804.0,0.0,0.0,0.0,-352.0,-2594.0
25%,8704.75,45062.0,46.0,27.0,0.0,184.0,889.0,0.0,0.0,0.0
50%,141077.0,208489.0,14096.0,380.0,4.0,2330.0,4008.0,236.0,4.0,0.0
75%,566282.2,639879.0,100822.5,1424.0,171.0,9452.0,11150.5,1212.0,20.0,1.0
max,9105181.0,8503930.0,602704.0,319809.0,171617.0,88355.0,71408.0,6725.0,1178.0,2919.0


In [203]:
df1.location.unique()

array(['Alabama', 'Alaska', 'American Samoa', 'Arizona', 'Arkansas',
       'Bureau of Prisons', 'California', 'Colorado', 'Connecticut',
       'Delaware', 'Dept of Defense', 'District of Columbia',
       'Federated States of Micronesia', 'Florida', 'Georgia', 'Guam',
       'Hawaii', 'Idaho', 'Illinois', 'Indian Health Svc', 'Indiana',
       'Iowa', 'Kansas', 'Kentucky', 'Long Term Care', 'Louisiana',
       'Maine', 'Marshall Islands', 'Maryland', 'Massachusetts',
       'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana',
       'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 'New Mexico',
       'New York State', 'North Carolina', 'North Dakota',
       'Northern Mariana Islands', 'Ohio', 'Oklahoma', 'Oregon',
       'Pennsylvania', 'Puerto Rico', 'Republic of Palau', 'Rhode Island',
       'South Carolina', 'South Dakota', 'Tennessee', 'Texas',
       'United States', 'Utah', 'Vermont', 'Veterans Health',
       'Virgin Islands', 'Virginia', 'Washington', 'West V

In [204]:
df2.state.unique()

array(['ND', 'MD', 'VT', 'ME', 'WA', 'DE', 'WI', 'GU', 'MS', 'AL', 'NV',
       'MI', 'CT', 'NC', 'IN', 'NE', 'MO', 'NH', 'ID', 'IL', 'MT', 'CA',
       'VI', 'OR', 'FSM', 'NJ', 'DC', 'MN', 'AZ', 'RI', 'LA', 'KY', 'SC',
       'VA', 'WY', 'KS', 'FL', 'CO', 'WV', 'AR', 'MP', 'AS', 'HI', 'AK',
       'GA', 'OK', 'PW', 'TX', 'PR', 'UT', 'MA', 'NYC', 'RMI', 'NY', 'SD',
       'OH', 'PA', 'NM', 'TN', 'IA'], dtype=object)

We can see that there have been 563,999,093 total doses given and 255.975,678 total people fully vaccinated in the United States.

In [205]:
df1[df1["total_vaccinations"] == df1["total_vaccinations"].max()][["date", "location","total_vaccinations"]]

Unnamed: 0,date,location,total_vaccinations
25096,2022-04-07,United States,563999093.0


In [206]:
df1[df1["people_vaccinated"] == df1["people_vaccinated"].max()][["date", "location","people_vaccinated"]]

Unnamed: 0,date,location,people_vaccinated
25096,2022-04-07,United States,255975678.0


We can see that the highest death total is in California, with 88,355 deaths. 

In [211]:
df2[df2["tot_death"] == df2["tot_death"].max()][["submission_date", "state","tot_death"]]

Unnamed: 0,submission_date,state,tot_death
281,04/06/2022,CA,88355
35372,04/05/2022,CA,88355


In [207]:
fig = px.histogram(data_frame=df1, x=df1["date"],y=df1["total_vaccinations"], labels={"date":"Date","total_vaccinations":"Total Vaccinations"}, title="Total Vaccination Doses Over Time")
fig.show()

0        12/01/2021
1        08/17/2020
2        03/28/2022
3        03/18/2020
4        08/29/2021
            ...    
48355    12/15/2020
48356    01/12/2022
48357    07/11/2020
48358    11/22/2021
48359    04/16/2020
Name: submission_date, Length: 48360, dtype: object