## Project Goals:

The goal of this project is to use Data Science technique used in class to examine data sets about COVID vaccination in the US. So far, we have gathered our data, created a project plan and a collaboration plan, and tidied our data to prepare it for use.

## Links:
https://github.com/hduece/finaltutorial

https://hduece.github.io/finaltutorial/

## Project Plan: ##

As partners (Hailey Dusablon and Jordan Stein), we are interested in exploring socioeconomic factors that may be correlated to health issues. Since we are both interested in careers in the healthcare field, analyzing this type of data would be a great opportunity to gain some insight into how lifestyle factors may affect the health of populations. This type of research is important because lifestyle choices can be changed, so it is useful to know how our everyday actions can impact our health. The first dataset we are interested in analyzing is the COVID Data Tracker published by the CDC (which can be found here: https://covid.cdc.gov/covid-data-tracker/#datatracker-home). This dataset contains information related to the prevalence of covid cases, deaths, and trends at the national, state, and county levels in the United States. More specifically, this dataset contains information regarding the state-wide distribution of people who have received the COVID vaccine. One question that would be interesting to explore with this data is how socioeconomic factors, such as poverty and minority group distribution, affect the spread of covid and the vaccination rates in different states and counties. We would also like to see if the regions with lower rates of the covid vaccine also have lower rates of other types of common vaccines. Since there is a significant stigma surrounding the COVID vaccine, it would be interesting to see how the COVID vaccine rates compare to other vaccine rates. Lastly, we could compare the covid trends in the United States to trends in other countries by including some research about how other health officials handled the spread of the virus, and this may be useful for looking at socioeconomic factors since they will likely differ depending on the country. We are considering this research because COVID has been a very relevant problem for the past year and a half and the COVID vaccine has been very controversial. We have not seen a lot of covid research studying social factors such as religion, access to healthcare resources, and income. 

It would be interesting to single out a few of these factors and see if any of them are related to covid trends. Another dataset that could be interesting to look at is from https://www.kaggle.com/redwankarimsony/heart-disease-data. This dataset looks at several predicting factors for heart disease, such as age, gender, resting blood pressure, and cholesterol. Similar to the COVID Tracker dataset, we think it would be interesting to look at social factors such as occupation, income, and diet that may be correlated to heart disease. A lot of websites discuss how lifestyle factors such as healthy diets and exercise can reduce heart rate, so it would be interesting to analyze these factors first-hand and see how heavily they correlate to heart disease. We could also analyze other datasets to see if individuals with heart disease may be more prone to developing other diseases or disorders. After exploring multiple data sets, it was obvious that a lot of research regarding medical problems only explored medical issues and did not address any social factors that may be correlated to the health-related problems. Another component of this research may include analyzing the lifestyle choices of certain regions or countries with the lowest rates of heart disease and comparing those lifestyle choices to areas with the highest rates of heart disease. As heart disease is the leading cause of death in the United States, this is a very prevalent medical issue and research like this can help educate people about how their everyday decisions can affect their health.

## Collaboration Plan: ##

To begin the project, we started by setting up a Github repository and ensuring that we were both listed as collaborators. Since the website was created with Hailey’s account, Jordan cloned the repository on her computer. After this website was established, we met on Zoom to discuss datasets we were interested in exploring and possible ideas about how we would analyze these datasets. For future collaboration, we are planning to meet twice a week on Tuesdays and Thursdays for two hours each day via Zoom or in-person and will be coordinating code through a private Github repository.

## Challenges with Obtaining Data: ##

The biggest challenge in creating the dataframe was deciding how to tidy and reformat the data. We decided it would be best to drop the columns that we would not need for our research purposes. We dropped any columns that had information regarding the number of vaccines delivered to each state, since we are only interested in looking at the number of vaccines that were administered to populations. We dropped columns all columns that had information about different age groups (over 18 and under 12), but we kept the column with information about the 65+ age group because older populations are at a significantly higher risk for being infected. Lastly, we dropped all the columns about the booster vaccine since the data was only available at a national level (because the booster vaccine was only developed very recently). Another challenge in starting this project was learning how to navigate GitHub. It took a bit of trial-and-error to figure out how to organize our information on the site. Now, we are more familiar and comfortable with using GitHub.

In [5]:
import pandas as pd
pd.DataFrame
df = pd.read_csv("../data/vaccine.csv")
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = df.set_index("State")
df

Unnamed: 0_level_0,People with at least One Dose by State of Residence,Percent of Total Pop with at least One Dose by State of Residence,People Fully Vaccinated by State of Residence,Percent of Total Pop Fully Vaccinated by State of Residence,People Fully Vaccinated Moderna Resident,People Fully Vaccinated Pfizer Resident,People Fully Vaccinated Janssen Resident,People Fully Vaccinated Unknown 2-dose manufacturer Resident,People with 2 Doses by State of Residence,Percent of Total Pop with 1+ Doses by State of Residence,Percent of Total Pop with 2 Doses by State of Residence,People with 1+ Doses by State of Residence,People 65+ with at least One Dose by State of Residence,Percent of 65+ Pop with at least One Dose by State of Residence,People 65+ Fully Vaccinated by State of Residence,Percent of 65+ Pop Fully Vaccinated by State of Residence,People 65+ Fully Vaccinated_Moderna_Resident,People 65+ Fully Vaccinated_Pfizer_Resident,People 65+ Fully Vaccinated_Janssen_Resident,People 65+ Fully Vaccinated_Unknown 2-dose Manuf_Resident,People who have received a booster dose,Percent of fully vaccinated people with booster doses,People 65+ who have received a booster dose,Percent of fully vaccinated people 65+ with booster doses
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1
United States,220145796,66.3,190402262,57.4,69820278,105293042,15172407,116535,175343725,66.3,52.8,220145796,52627847,96.2,46294318,84.6,26181713,27892038,3263131,50846,12472219,6.6,7759894,16.8
Alaska,431007,58.9,382206,52.2,144385,204181,33568,72,348909,58.9,47.7,431007,80610,88.0,75011,81.9,41516,30456,3220,29,39974,10.5,16922,22.6
Alabama,2648588,54.0,2174923,44.4,926095,1104989,143400,439,2033012,54.0,41.5,2648588,746465,87.8,647865,76.2,342771,271794,34800,289,147525,6.8,93543,14.4
Arkansas,1735828,57.5,1435635,47.6,606329,728308,100252,746,1336123,57.5,44.3,1735828,458879,87.6,393094,75.0,223847,161051,15339,420,102338,7.1,59698,15.2
Arizona,4426640,60.8,3833436,52.7,1478901,2061598,288334,4603,3547652,60.8,48.7,4426640,1182952,90.4,1040007,79.5,504136,489955,46876,2470,253662,6.6,164631,15.8
California,29182133,73.9,23996305,60.7,8678485,13415409,1898259,4152,22118410,73.9,56.0,29182133,5968420,99.9,4797587,82.2,2405947,2217387,190697,1294,1475028,6.1,842775,17.6
Colorado,3864323,67.1,3518233,61.1,1325346,1917607,273690,1590,3247787,67.1,56.4,3864323,782531,92.9,720852,85.6,346084,352644,25252,568,295161,8.4,180142,25.0
Connecticut,2789281,78.2,2506712,70.3,889590,1407814,208895,413,2298914,78.2,64.5,2789281,650209,99.9,586525,93.1,231025,335700,20877,88,171600,6.8,129361,22.1
District of Columbia,518165,73.4,437130,61.9,157706,243836,35310,278,402381,73.4,57.0,518165,84526,96.8,74590,85.4,40067,32450,2630,46,22452,5.1,11780,15.8
Delaware,666545,68.5,579053,59.5,210500,316841,51218,494,528132,68.5,54.2,666545,189563,99.9,168052,89.0,68600,90051,9358,270,41092,7.1,29783,17.7
