# <u> Covid-19 Data Exploration </u>

## <u> Objective: </u> 
### Assess the Quality of the *Case Numbers* Statistic to control response measures.
Define the ultimate goal of response measures to prevent hospital capacities from maxing out.\
Thus, investigate whether there exist scenarios in which an increase in the hospital admissions was not indicated
by an increase in case numbers.
Furthermore, investigate beahviour of hospital admissions and deaths after substantial increase in testing.

## <u> Method: </u>
1. Get an Overview over Hospital Admission Data
2. Correlate Case Number Data with Testing Strategy
3. Combine Observations from (1) and (2)

In [3]:
import os
import wget
import pandas as pd
import matplotlib.pyplot as plt

## Download and Load all Data Sets

In [4]:
# create data directory if it doesnt exist yet
if not 'data' in os.listdir():
    ! mkdir data

data_dir = './data/'

# clean data dir from old data (some sets are updated on a daily bases)
! rm data/*


# add urls to data file for new data sources here

# From European Centre for Disease Prevention and Control:
cases_deaths_url = 'https://opendata.ecdc.europa.eu/covid19/nationalcasedeath_eueea_daily_ei/csv/data.csv'
hospitalization_url = 'https://opendata.ecdc.europa.eu/covid19/hospitalicuadmissionrates/csv/data.csv'
tests_url = 'https://opendata.ecdc.europa.eu/covid19/testing/csv/data.csv'
variants_url = 'https://opendata.ecdc.europa.eu/covid19/virusvariant/csv/data.csv'
vaccinations_url = 'https://opendata.ecdc.europa.eu/covid19/vaccine_tracker/csv/data.csv'

# add them to the dictionary and specify the desired file name
download_dict = {'cases_deaths.csv': cases_deaths_url, 'hospitalizations.csv': hospitalization_url,
                'tests': tests_url, 'vaccinations': vaccinations_url, 'variants': variants_url}

for dict_item in download_dict.items():
    # download data file
    wget.download(dict_item[1], data_dir + dict_item[0])
    # load data frame named by filename
    df_name = dict_item[0].split('.')[0]
    globals()[df_name] = pd.read_csv(data_dir + str(dict_item[0])) # use string as variable name

In [3]:
print('Loaded: \n')
for filename in download_dict.keys():
    print(filename + ', with variables:' + '\n')
    print(globals()[filename.split('.')[0]].columns.values, '\n \n')

Loaded: 

cases_deaths.csv, with variables:

['dateRep' 'day' 'month' 'year' 'cases' 'deaths' 'countriesAndTerritories'
 'geoId' 'countryterritoryCode' 'popData2020' 'continentExp'] 
 

hospitalizations.csv, with variables:

['country' 'indicator' 'date' 'year_week' 'value' 'source' 'url'] 
 

tests, with variables:

['country' 'country_code' 'year_week' 'level' 'region' 'region_name'
 'new_cases' 'tests_done' 'population' 'testing_rate' 'positivity_rate'
 'testing_data_source'] 
 

vaccinations, with variables:

['YearWeekISO' 'FirstDose' 'FirstDoseRefused' 'SecondDose' 'UnknownDose'
 'NumberDosesReceived' 'Region' 'Population' 'ReportingCountry'
 'TargetGroup' 'Vaccine' 'Denominator'] 
 

variants, with variables:

['country' 'country_code' 'year_week' 'source' 'new_cases'
 'number_sequenced' 'percent_cases_sequenced' 'valid_denominator'
 'variant' 'number_detections_variant' 'percent_variant'] 
 



## Overview over ECDPC Hospital Admission Data

In [72]:
hospitalizations.info()
hosp_indicators = hospitalizations.indicator.unique()
print('\n \n More specifically, we have data on: \n', hospitalizations.indicator.unique())

# isolate weekly icu admissions per 100k
hosp_by_indicator = hospitalizations.groupby('indicator')
weekly_icu = hosp_by_indicator.get_group(hosp_indicators[-1])

# join weekly_icu with testing data set
weekly_icu.join(tests, on='year_week')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20530 entries, 0 to 20529
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   country    20530 non-null  object 
 1   indicator  20530 non-null  object 
 2   date       20530 non-null  object 
 3   year_week  20530 non-null  object 
 4   value      20530 non-null  float64
 5   source     20530 non-null  object 
 6   url        18385 non-null  object 
dtypes: float64(1), object(6)
memory usage: 1.1+ MB

 
 More specifically, we have data on: 
 ['Daily hospital occupancy' 'Daily ICU occupancy'
 'Weekly new hospital admissions per 100k'
 'Weekly new ICU admissions per 100k']


ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

In [61]:
# how many and which countries?
countries = hospitalizations.country.unique()
icu_countries = weekly_icu.country.unique()
print('No. Countries Appearing in the Data Set:', len(countries), '\n \n', countries)
print('\n \n No. Countries with weekly ICU admission Data:', len(icu_countries), '\n \n', icu_countries)

No. Countries Appearing in the Data Set: 29 
 
 ['Austria' 'Belgium' 'Bulgaria' 'Croatia' 'Cyprus' 'Czechia' 'Denmark'
 'Estonia' 'Finland' 'France' 'Germany' 'Greece' 'Hungary' 'Iceland'
 'Ireland' 'Italy' 'Latvia' 'Lithuania' 'Luxembourg' 'Malta' 'Netherlands'
 'Norway' 'Poland' 'Portugal' 'Romania' 'Slovakia' 'Slovenia' 'Spain'
 'Sweden']

 
 No. Countries with weekly ICU admission Data: 14 
 
 ['Cyprus' 'Czechia' 'Estonia' 'France' 'Greece' 'Ireland' 'Latvia'
 'Lithuania' 'Malta' 'Netherlands' 'Norway' 'Slovenia' 'Spain' 'Sweden']


In [70]:
# Visualize weekly ICU admissions per 100k for all countries with new_cases
icu_by_country = weekly_icu[['value', 'country', 'year_week']].groupby('country')


#for country in icu_countries:
#    icu_df = icu_by_country.get_group(country)
    
# icu_by_country.plot(kind='line', x='year_week', y='value') # group as title?

Unnamed: 0,country,country_code,year_week,level,region,region_name,new_cases,tests_done,population,testing_rate,positivity_rate,testing_data_source
0,Austria,AT,2020-W18,national,AT,Austria,349.0,17956,8901064.0,201.728692,1.94364,Country website
1,Austria,AT,2020-W19,national,AT,Austria,249.0,42153,8901064.0,473.572598,0.590705,Country website
2,Austria,AT,2020-W20,national,AT,Austria,367.0,46001,8901064.0,516.803384,0.797809,Country website
3,Austria,AT,2020-W21,national,AT,Austria,285.0,39348,8901064.0,442.059511,0.724306,Country website
4,Austria,AT,2020-W22,national,AT,Austria,203.0,46677,8901064.0,524.397982,0.434904,Country website


In [30]:
# Develop Visualization which reveals Testing Strategy, Case Numbers, Deaths and Hospital Admissions

# Look at Germany
germany_hosp = hosp_by_country.get_group('Germany')
germany_hosp.head()
# Add Case Numbers to df



Unnamed: 0,value,country,year_week,date
8756,200.0,Germany,2020-W12,2020-03-20
8757,308.0,Germany,2020-W12,2020-03-21
8758,364.0,Germany,2020-W12,2020-03-22
8759,451.0,Germany,2020-W13,2020-03-23
8760,616.0,Germany,2020-W13,2020-03-24


In [32]:
cases_by_country = cases_deaths.groupby('countriesAndTerritories')

## <u> More Ideas: </u>
1.  <b> Hypothesis: </b> Vaccinating the Old already contributed a lot to relieve the hospitals (since serious cases were mostly among the elderly)\
$\rightarrow$ If this was true we'd observe: *Normal* spreading, i.e. no anomalies in positive test rates, but decreasing fatalities/ hospitalizations\
2. <b> Hypothesis: </b> The Nightly Movement Restrictions (don't) help to mitigate spreading.\
$\rightarrow$ Are there Compareable Bundesländer and or countries with and without nightly movement restrictions