# Group 5 - Project 1: Dominican Republic and Italy  

## Directions
Once you have collected the data of your preferred locations, submit a Jupyter Notebook file (.ipynb) that contains the data you have collected. Use Numpy and Pandas library to import, clean, and process your data.

Your Jupyter Notebook should contain the following and will be scored accordingly:

1. Numpy library (1 point)

2. Pandas library (1 point)

3. Display of the tabulated data of your 2 locations (1 point)

4. The data should contain the total number of confirmed cases, recoveries, and fatalities per day from March to August 2020 only (3 points)

5. Then, add a column in your tabulated data the difference of total confirmed cases, recoveries, and fatalities for your 2 locations (1 point)

6. Add also another column in your tabulated data the difference in the active cases (which is the net value of your total confirmed cases, recoveries, and fatalities) for the two locations (1 point)

7. Indicate in your Jupyter Notebook the URLs of the source for your data (2 points)

8. Include also a brief description of the contributions of your fellow group members (10 points) Note: No contribution of any group member will receive zero (0) point.

## Library Imports

In [8]:
import numpy as np
import pandas as pd

## Data Cleaning Function 

In [9]:
def clean_data(csv_file: str) -> pd.DataFrame:
    """Returns clean data for Italy and the Dominican Republic for June to August 2020"""
    return (pd.read_csv(csv_file)
              .rename({"Country/Region": "Country"}, axis=1) 
              .query('`Country` == "Italy" or `Country` == "Dominican Republic"')
              .set_index('Country')
              .loc[:,'6/1/20':'8/31/20']
              .T)

## Applying the Data Cleaning Function to the `csv` files from `github`

In [10]:
cases_csv = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
Cases = clean_data(cases_csv)
Cases

Country,Dominican Republic,Italy
6/1/20,17572,233197
6/2/20,17752,233515
6/3/20,18040,233836
6/4/20,18319,234013
6/5/20,18708,234531
...,...,...
8/27/20,92964,263949
8/28/20,93390,265409
8/29/20,93732,266853
8/30/20,94241,268218


In [11]:
recovered_csv = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv'
Recovered = clean_data(recovered_csv)
Recovered

Country,Dominican Republic,Italy
6/1/20,10893,158355
6/2/20,11075,160092
6/3/20,11224,160938
6/4/20,11474,161895
6/5/20,11736,163781
...,...,...
8/27/20,64347,206554
8/28/20,65285,206902
8/29/20,66320,208224
8/30/20,66776,208536


In [12]:
deaths_csv = 'https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
Deaths = clean_data(deaths_csv)
Deaths

Country,Dominican Republic,Italy
6/1/20,502,33475
6/2/20,515,33530
6/3/20,516,33601
6/4/20,520,33689
6/5/20,525,33774
...,...,...
8/27/20,1630,35463
8/28/20,1648,35472
8/29/20,1673,35473
8/30/20,1681,35477


## Getting the Differences

In [13]:
joint = {'Cases': Cases, 'Recovered': Recovered, 'Deaths': Deaths}
for key in joint.keys():
    item = joint[key]
    item['Diff'] = item['Dominican Republic'] - item['Italy'] 
joint

{'Cases': Country  Dominican Republic   Italy    Diff
 6/1/20                17572  233197 -215625
 6/2/20                17752  233515 -215763
 6/3/20                18040  233836 -215796
 6/4/20                18319  234013 -215694
 6/5/20                18708  234531 -215823
 ...                     ...     ...     ...
 8/27/20               92964  263949 -170985
 8/28/20               93390  265409 -172019
 8/29/20               93732  266853 -173121
 8/30/20               94241  268218 -173977
 8/31/20               94715  269214 -174499
 
 [92 rows x 3 columns],
 'Recovered': Country  Dominican Republic   Italy    Diff
 6/1/20                10893  158355 -147462
 6/2/20                11075  160092 -149017
 6/3/20                11224  160938 -149714
 6/4/20                11474  161895 -150421
 6/5/20                11736  163781 -152045
 ...                     ...     ...     ...
 8/27/20               64347  206554 -142207
 8/28/20               65285  206902 -141617
 8/29/2

## Merged Table

In [14]:
joint = pd.concat(joint, axis=1)


for s in ['Dominican Republic', 'Italy', 'Diff']:
    joint[('Active Cases', s)] = joint[('Cases', s)] - joint[('Deaths', s)] - joint[('Recovered', s)]
joint

Unnamed: 0_level_0,Cases,Cases,Cases,Recovered,Recovered,Recovered,Deaths,Deaths,Deaths,Active Cases,Active Cases,Active Cases
Country,Dominican Republic,Italy,Diff,Dominican Republic,Italy,Diff,Dominican Republic,Italy,Diff,Dominican Republic,Italy,Diff
6/1/20,17572,233197,-215625,10893,158355,-147462,502,33475,-32973,6177,41367,-35190
6/2/20,17752,233515,-215763,11075,160092,-149017,515,33530,-33015,6162,39893,-33731
6/3/20,18040,233836,-215796,11224,160938,-149714,516,33601,-33085,6300,39297,-32997
6/4/20,18319,234013,-215694,11474,161895,-150421,520,33689,-33169,6325,38429,-32104
6/5/20,18708,234531,-215823,11736,163781,-152045,525,33774,-33249,6447,36976,-30529
...,...,...,...,...,...,...,...,...,...,...,...,...
8/27/20,92964,263949,-170985,64347,206554,-142207,1630,35463,-33833,26987,21932,5055
8/28/20,93390,265409,-172019,65285,206902,-141617,1648,35472,-33824,26457,23035,3422
8/29/20,93732,266853,-173121,66320,208224,-141904,1673,35473,-33800,25739,23156,2583
8/30/20,94241,268218,-173977,66776,208536,-141760,1681,35477,-33796,25784,24205,1579


## Sources
COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University<br>
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv


## Contributions
ALDAY, Kraemon Joshua - Differences<br>
GARCIA, Enrico Joaquin - Active Cases<br>
QUITELES, Sean Argie - Cases, Recovered<br>
REYES, Justin Rupert F. - clean_data(csv_file) function