# Analyzation of the current COVID-19 outbreak

## Getting the data we need
First import all the data. The data set is taken from the public git-repository: https://github.com/CSSEGISandData/COVID-19

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import csv

cases_ts = pd.read_csv("COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
deaths_ts = pd.read_csv("COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv")
recovered_ts = pd.read_csv("COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv")

Group the data such that numbers for the whole country are reported. We also do not need the coordinates.

In [None]:
cases_ts = cases_ts.groupby(['Country/Region']).sum()
cases_ts = cases_ts.drop(columns=['Lat', 'Long'])
deaths_ts = deaths_ts.groupby(['Country/Region']).sum()
deaths_ts = deaths_ts.drop(columns=['Lat', 'Long'])
recovered_ts = recovered_ts.groupby(['Country/Region']).sum()
recovered_ts = recovered_ts.drop(columns=['Lat', 'Long'])

active_ts = cases_ts - (deaths_ts + recovered_ts)

Ok, of course now we also want to see some stuff. I'll just define a function that allowes us to print a timeseries for a set of predefined countries.

In [None]:
def plot_ts(ts, countries, title='', xlabel='', ylabel=''):
    for c in countries_to_plot:
        ts.loc[c].plot(style='x-', figsize=(14,6))
    plt.title(title)
    plt.legend(countries_to_plot)
    plt.ylabel(ylabel)
    plt.xlabel(xlabel)
    plt.show()
    

Now let's finally plot the timeseries for our choosen countries:

In [None]:
countries_to_plot = ['Austria', 'Netherlands', 'Germany', 'Italy']
plot_ts(active_ts, countries_to_plot, title='Active Cases', ylabel='Cases', xlabel='Date')

As you can guess, these are all absolute numbers. But maybe the absolute numbers are not as informative about how much a specific country is affected. A better measure for this is the measure of *cases per capita*. So how many cases there are for each person living in the country.

To calculate this ratio we need to get the population of each country and normalize the amount of cases with this number. For this we use the dataset provided by the World Bank with the population numbers of 2015. https://data.world/worldbank/total-population-per-country (Some country names had to be adjusted to fit the other dataset)

In [None]:
population = pd.read_csv("population.csv", sep=';').loc[:, ['Country Name', '2015']].set_index('Country Name')

With this dataset we can now normalize the cases to a *per 1.000.000 people* measure.

In [None]:
def norm_ts(ts):
    norm_dataframe = pd.DataFrame(columns=ts.columns)
    for country_name, cases in ts.iterrows():
        try:
            norm_dataframe.loc[country_name] = ts.loc[country_name].div(population.loc[country_name]['2015']/1000000)
        except KeyError:
            # print("Data for " + str(country_name) + " could not be retrieved")
            pass
        
    return norm_dataframe

In [None]:
countries_to_plot = ['China', 'Netherlands', 'Austria', 'Korea, South']
norm_active_ts = norm_ts(active_ts)
plot_ts(norm_active_ts, countries_to_plot, title='Active Cases per 1.000.000', xlabel='Date', ylabel='Active Cases per 1 mio. Inhabitants')

## Global development of the cases
Now that we have all the data we need, and already had a peak into the cases for some countries, we want to investigate the overall, global development of cases. For gaining the right data, we just sum up the number of cases of each country.

In [None]:
cases_ts_global = cases_ts.sum(axis=0)
deaths_ts_global = deaths_ts.sum(axis=0)
recovered_ts_global = recovered_ts.sum(axis=0)
active_ts_global = active_ts.sum(axis=0)

Ok cool. Now lets have a look at the global cases.

In [None]:
plt.figure(figsize=(15,7))
plt.title('timecourse of cases')
plt.plot(active_ts_global, 'x-', label='active')
plt.plot(recovered_ts_global, 'x-',  label='recovered')
plt.plot(deaths_ts_global, 'x-',  label='dead')
plt.plot(cases_ts_global, 'x-', label='infections')
plt.xlabel('time (date)')
plt.ylabel('individuals')
plt.xticks(np.arange(1,len(cases_ts_global),7))
plt.ylim(0, max(cases_ts_global)+10000)
plt.legend()
plt.show()

## How do the cases evolve in different countries?
From the plot of the data we can already see some differences between countries. But we want to be able to compare the dynamics of the outbreak even better. Therefore we want to compare the data between countries w.r.t. the day the first case was observed. 

We define a method that plot that part of the data for the countries we choose starting from the day the observed cases reach a threshold T (per mio. inhabitants).

In [None]:
def plot_thresholded_ts(ts, country_names, T, title='', xlabel='', ylabel=''):
    serieses = []
    plt.figure(figsize=(14,6))
    for c in country_names:
        curr_country_ts = np.asarray(ts.loc[c])[ts.loc[c].values > T]
        plt.plot(curr_country_ts, 'x-')
   
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.legend(country_names)
    plt.show()
        
    return serieses

Here are some plots that might be interesting:

In [None]:
country_names = ['Italy', 'Netherlands', 'Austria', 'Malta', 'Greece', 'Spain', 'Germany']
plot_thresholded_ts(norm_ts(cases_ts), country_names, 1, title='Growth of infected cases\n starting form the day 1 of a million\n inhabitants were infected', xlabel='Days since outbreak', ylabel='cases per million inhabitants')

country_names = ['Italy', 'China', 'Iran', 'Korea, South', 'Singapore', 'Japan']
plot_thresholded_ts(norm_ts(recovered_ts), country_names, 0.1, title='Growth of recoveries starting\n from the day 1 of 100.000 inhabitants\n recovered', xlabel='Days since start of recovery', ylabel='recoveries per million inhabitants')

country_names = ['Italy', 'Iran', 'Korea, South', 'Spain']
plot_thresholded_ts(norm_ts(deaths_ts), country_names, 0.01, title='Growth of deaths starting\n from the day 1 of 10.000 inhabitants died', xlabel='Day since first death in 10.000 inhabitants', ylabel='deaths per million inhabitants')

Testnumbers can be found on: https://www.worldometers.info/coronavirus/covid-19-testing/
Data source: FluId( www.who.int/fluid )