# Covid Data Analysis

### Description:
Coronavirus is a continuing worldwide pandemic, which has affected a lot of people including you. Our goal of the project in this class is to develop an analytical framework to study the data coming from United States to understand patterns of COVID-19 effect and spread. 

In order to achieve that, the project is separated into 3 stages: 

- Stage I - Data and Project Understanding, 
- Stage II - Data Modeling and Hypothesis Testing, 
- Stage III - Basic Machine Learning, and 
- Stage IV - Dashboard 

*PS: Each stage has equal distibution of weight in terms of point for the final project.*


## Project Stage - I (Data and Project Understanding )

### COVID-19 Dataset:

We will utilize the data from usfacts.org. The dataset contains daily county-level tracker of COVID-19 cases. This makes it easy to follow COVID-19 cases on a granular level, as does the ability to break down infections per 100,000 people (with the population data). The underlying data is available for download below the US county map and has helped government agencies like the Centers for Disease Control and Prevention in its nationwide efforts.

https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/

**Number of Cases**
https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_confirmed_usafacts.csv

**Number of Deaths**
https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_deaths_usafacts.csv

**Population by County**
https://usafactsstatic.blob.core.windows.net/public/data/covid-19/covid_county_population_usafacts.csv


### Enrichment Datasets for COVID-19

**Census Demographic ACS**

https://data.census.gov/cedsci/table?q=dp&tid=ACSDP1Y2018.DP05

This dataset contains demographic information of each county in United States which can be combined with the COVID-19 dataset. For example, population estimated broken down by age group can determine what the level of infection can be. 

How to:

![SegmentLocal](gif/ACS_download.gif "segment")


**ACS Social, Economic, and Housing**

https://data.census.gov/cedsci/table?q=dp&tid=ACSDP1Y2018.DP05

On the same link as above the left tab of the page there are several additional datasets such as Social Characteristics, Economic Charactistics, and Housing Characteristics. The social, economic, and housing determinants can provide insight to what type of population lives in a county.

![title](image/ACS_characteristics.png)

**Employment Dataset**

https://www.bls.gov/cew/downloadable-data-files.htm

The employment dataset provides the level of employment and the earning potential by county.

![SegmentLocal](gif/Employment_download.gif "segment")


**Presidential Election Results (Political leanings)**

https://www.kaggle.com/unanimad/us-election-2020

This dataset provides the 2020 election results by county. The dataset contains who was the winning candidate and by how much. 

![SegmentLocal](gif/Election.gif "segment")

**Hospital Beds Dataset**

https://protect-public.hhs.gov/pages/hospital-utilization

This dataset provides the number of hospital beds and ICU units available by county. This is also time-delimited showing the decreasing capacity of beds due to COVID-19

![SegmentLocal](gif/Hospital_download_2.gif "segment")

## Goals

The idea here is to get acquainted with the different datasets shown above. We will be using all of them in our analysis. 

### Tasks:

#### Task 1: (20 pts)
- Initialize a Github Repository for your project. 
    - Add a description (readme.MD) to your project. See here on how to setup: https://bulldogjob.com/news/449-how-to-write-a-good-readme-for-your-github-project

#### Task 2: (30 pts)
- Entire team looks at the COVID-19 Dataset and understands the type of variables present in each of the data. (10 pts)
    - **Deliverable** 
        - Section in the report describing the COVID-19 dataset and datatype - variable dictionary
        - Preliminary intutions from the data
- Each student member of the team takes on an enrichment dataset. They read the data descriptions and understand the variables present in the data. (20 pts)
    - **Deliverable** 
        - Section in the report describing the enrichment data and datatype - variable dictionary.
        - How can you merge the data with the primary COVID-19 dataset. Identify the individual variable which map between the datasets.
        - Describe how your enrichment data can help in the analysis of COVID-19 spread. Pose initial hypothesis questions. 

Upload the entire report to canvas and your Github Repository. 

#### Task 3: (50 pts)
- Team: (20 pts)
    - Create a team notebook to read in the COVID-19 data (cases, deaths, and population) using `pandas` and display the dataframe in a notebook.
    - Merge all the three variables (cases, deaths, and population) to create a super COVID-19 datafame. Export it to a csv format.
- Member: (30 pts)
    - Calculate COVID-19 data trends for last week of the data. Are the cases increasing, decreasing, or stable? Each student chooses a state to analyze. 
    - Each student member creates notebooks to read the Enrichment data and displays them on a notebook. 
    - Each student member performs initial merges with the COVID-19 data using the variables in the Enrichment data. 

**Deliverable**
Each member creates separate notebooks for member tasks. Upload all notebooks to Github Repository. 

## Deadline: 2022-02-20