# Project Delaware
Project Delaware involves extracting Covid data from https://healthdata.gov/ and transforming the data to provide the following metrics:

        1. The total number of PCR tests performed as of a particular day (total_pcr_date) in the United States.

        2. The n-day (window) rolling average number of new cases per day for the last k (rolling_averages_days) days.
        
        3. The top n (top_states) states with the highest test positivity rate (positive tests / tests performed) for tests performed in the last k (positivity_rates_days) days.


    Parameters:
        - total_pcr_date: (date) Date up until when total pcr tests should calculated to. Default = current date - 1 day. Format = 'YYYY-MM-DD'
        - window: (int) Number of days for the rolling average window. Defaul = 7.
        - rolling_averages_days: (int) Number of days to be caculated for the rolling average. Default = 30.
        - positivity_rates_days: (int) Number of days from when the rolling average window should start. Defaul = 30.
        - top_states: (int) Number of top states with highest positivity rates. Default = 10.

## Required Libraries
To install the required libraries for the application run `pip install -r src/requirements.txt` from the `project_del` directory. This will install all dependencies for the application to run.

## Running the Application
The driver to run the application can be found in `main.py` in the `project_del` directory.
To run the application and view the results, run `python main.py` from the `project_del` directory.

All the parameters have been set to the asked questions as default values.

To edit the parameters like dates, etc, include the parameter(s) to be edited while instatiating the DataWrngler class in the main.py file (`data_wrangler_object = DataWrangler(...)`) together with their desired values.

The code that transforms the data and provides the metrics can be found in `src/data_wrangler.py`.


## Application Run

In [1]:
from src.data_wrangler import DataWrangler

In [2]:
data_wrangler_object = DataWrangler()
data_wrangler_object.covid_data

Unnamed: 0,state,state_name,state_fips,fema_region,overall_outcome,date,new_results_reported,total_results_reported
0,AL,Alabama,1,Region 4,Negative,2020-03-01,96,96
1,AL,Alabama,1,Region 4,Positive,2020-03-01,16,16
2,AL,Alabama,1,Region 4,Negative,2020-03-02,72,168
3,AL,Alabama,1,Region 4,Positive,2020-03-02,6,22
4,AL,Alabama,1,Region 4,Negative,2020-03-03,94,262
...,...,...,...,...,...,...,...,...
131624,WY,Wyoming,56,Region 8,Negative,2022-05-19,860,1302998
131625,WY,Wyoming,56,Region 8,Positive,2022-05-19,71,120950
131626,WY,Wyoming,56,Region 8,Inconclusive,2022-05-20,0,3345
131627,WY,Wyoming,56,Region 8,Negative,2022-05-20,269,1303267


In [3]:
data_wrangler_object.main()


1. The total number of PCR tests performed as at 2022-05-22 in the United State is 317858748732

2. The 7-day rolling average number of new cases per day for the last 30 days is 
date
2022-04-21    34272.857143
2022-04-22    34821.714286
2022-04-23    35282.714286
2022-04-24    36237.428571
2022-04-25    39042.142857
2022-04-26    42950.142857
2022-04-27    46806.428571
2022-04-28    50961.000000
2022-04-29    54455.428571
2022-04-30    56555.428571
2022-05-01    58098.142857
2022-05-02    59388.142857
2022-05-03    60494.714286
2022-05-04    61724.000000
2022-05-05    62467.428571
2022-05-06    63764.428571
2022-05-07    65641.285714
2022-05-08    67213.714286
2022-05-09    69455.714286
2022-05-10    72624.714286
2022-05-11    76082.285714
2022-05-12    78761.857143
2022-05-13    80478.000000
2022-05-14    82242.000000
2022-05-15    83468.285714
2022-05-16    85481.714286
2022-05-17    86511.571429
2022-05-18    85338.142857
2022-05-19    81489.714286
2022-05-20    73715.714286
Name:

## Insights
- As of yesterday, a total of 317,858,748,732 PCR tests have been performed in the United States.

- In the last 30 days, the 7-day rolling average of new cases per day has increased from 34k+ to about 80k

- Considering PCR tests done in the last 30 days, Oklahoma has the highest positivity rate of 0.21, followed by Idaho with 0.182, then New Mexico with 0.175, Mississippi with 0.168 and Nevada with 0.15. Positivity rates of other states that make up the top 10 states are:
    - Alabama                 0.141
    - South Dakota            0.141
    - Texas                   0.136
    - Nebraska                0.135
    - Utah                    0.134

## Caveat
The link https://healthdata.gov/resource/j8mb-icvb.json is able to only provide data for the first 1000 records.
Knowing that the current data is about 132k records now, I decided to add a limit of 300,000 in the url (https://healthdata.gov/resource/j8mb-icvb.json?$limit=300000) in order to pull all records. Either this limit will need to be adjusted in the future, or get a way of extracting all the records from the json link.