# MC - Covid-19 Data preparation

Task description: https://ds-spaces.technik.fhnw.ch/app/uploads/sites/82/2020/09/minichallenge_covid19.pdf

Author: Roman Janic Studer

Data Pipeline Concept: https://en.wikipedia.org/wiki/Pipeline_(computing)

## Procedure

1. Create function for daily data pull
2. Drop unneccesary data
3. Clean data (tidy data principe)
4. Prepare data for visualization (aggregation maybe)
5. Return global and local dataframe
6. Create visualization (plotly) for global and local data (barplots incl. moving average e. g. srf.ch or choropleth map
7. Document process and code

Steps 1 to 5 should function as a data preparation pipeline

In [None]:
# TODO Link data description

# Datasource local = https://github.com/daenuprobst/covid19-cases-switzerland
# Datasource global = https://github.com/CSSEGISandData/COVID-19

# Imports
import pandas as pd
from datetime import date, timedelta

In [None]:
# Constants
DROP_COLUMNS = ['FIPS','Admin2','Province_State','Recovered','Combined_Key','Incidence_Rate','Case-Fatality_Ratio']
TODAY = date.today()
YESTERDAY = TODAY - timedelta(1)

In [None]:
def get_data():
    """
    Pulls latest data from sources
    :return df_global: Dataframe containg current Covid-19 Data from John Hopkins University
    :return df_CH_cases: Daily new cases in Switzerland
    :return df_CH_fatal: Daily new fatalities in Switzerland
    """
    # Get most recent data from John Hopkins
    try:
        df_global = pd.read_csv(f'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{TODAY.strftime("%m-%d-%Y")}.csv')
        
    except:
        df_global = pd.read_csv(f'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{YESTERDAY.strftime("%m-%d-%Y")}.csv')
        
    # Get most recent data from covid19-cases-switzerland
    df_CH_cases = pd.read_csv('https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_cases_switzerland_openzh-phase2.csv')
    df_CH_fatal = pd.read_csv('https://raw.githubusercontent.com/daenuprobst/covid19-cases-switzerland/master/covid19_fatalities_switzerland_openzh-phase2.csv')
    
    return df_global, df_CH_cases, df_CH_fatal

In [None]:
df_global, df_CH_cases, df_CH_fatal = get_data()

In [None]:
df_CH_fatal.columns

In [None]:
def drop_columns(dfs, columns):
    """
    Drops columns in list of dataframes if column name is in columns
    :return dfs: List of dataframes
    """        