# COVID Dataset Description:
Provided by the CDC, this dataset looks at trends in behavioral indicators reguarding COVID vaccines (Definitely Will Get Vaccinated, Probably Will Get Vaccinated or Are Unsure, Probably or Definitely Will Not Get Vaccinated, Vaccinated) across different demopgraphics (e.g. region, age, sex, race) amongst a large sample size. 

This dataset was found on the [data.gov website](https://catalog.data.gov/dataset?res_format=CSV)  and was downloaded using the download button or the link below  
***Direct link to download the dataset***: https://data.cdc.gov/api/views/qz99-wyhv/rows.csv?accessType=DOWNLOAD  

I will aim to address the following questions:
- Which age group has the highest rate of vaccination?
- What is the difference in vaccination status between those below poverty and above poverty?
- Does vaccination status differ by metropolitan statistical area?
- How does insurance affect vaccination status and willingness to get vaccinated?

 


In [None]:
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('COVID_VaccTrends.csv')


In [None]:
df.describe()

In [None]:
df

In [None]:
df.columns

In [None]:
#L
df.groupby("Indicator Category").mean()

In [None]:
plt.figure(figsize=(20, 5), dpi=80)
plt.scatter(df['Indicator Category'], df['Estimate (%)'])
plt.xlabel('Indicator Category')
plt.ylabel('Estimate (%)')
plt.title("Percent Estimates of Vaccination Status Across All Categories")

In [None]:
#Looks at Estimate (%) of Indicator Category (Vaccination status) by Group Category (speicfic demographics)
column = 'Estimate (%)'
df2 = pd.pivot_table(df, values=column, index=["Indicator Category"], columns=["Group Category"])
df2

### Which age group has the highest rate of vaccination?

In [None]:
df2[['18 - 29 years', '50 - 64 years', '65+ years']].plot(figsize=(20,10))
plt.grid()
plt.ylabel("Estimate %")
plt.title("Estimate % of each indicator by age group",fontsize=40)

### What is the difference in vaccination status between those below poverty and above poverty?

In [None]:
df2[['Below poverty', 'Above poverty, income <$75k']].plot(figsize=(30,20))
plt.grid()
plt.ylabel("Estimate %")
plt.title("Estimate % of each indicator by income status",fontsize=40)

### Does vaccination status differ by metropolitan statistical area?

In [None]:
df2[['Rural', 'Suburban',"Urban"]].plot(figsize=(20,20))
plt.grid()
plt.ylabel("Estimate %")
plt.title("Estimate % of each indicator by metropolitan statistical area",fontsize=40)

### How does insurance affect vaccination status and willingness to get vaccinated?

In [None]:
df2[['Insured', 'Not insured']].plot(figsize=(30,20))
plt.grid()
plt.ylabel("Estimate %")
plt.title("Estimate % of each indicator by insurance status",fontsize=40)

## Discussion:

As the data and graphs have shown, many people in the U.S have been able to get vaccinated. However, there is a disparity between the percent of people who want to get vaccinated, who don't want to get vaccinated, and who are vaccinated when looking at differences in age, income, health insurance, and where they live.

Correlation does not mean causation, but this analysis opens the discussion to consider that there should be better education and access to resources regardless of socioeconimic status-- paritcularly for those who have or don't have health insurance. 