# Corona Cases in Germany - Geo Analysis for Federation States - Hotspots

In [None]:
from datetime import datetime
import numpy as np
import pandas as pd
import geopandas as gpd
import geoplot
import mapclassify
import matplotlib.pyplot as plt
import matplotlib.colors as colors

%matplotlib inline
import matplotlib.pyplot as pyplot

On base of the data the of the [Berliner Morgenpost](https://interaktiv.morgenpost.de/corona-virus-karte-infektionen-deutschland-weltweit/) this notebook analyzes 
- how the Corona spreads out currently in the german federation
- what the current hotspots are
- what most likely the upcoming hotspots will be

It therefore shows
- how the absolute 
  - confirmed
  - active 
  covid-19 cases are distributed among the german states
- how the relative
  - confirmed 
  - active
  covid-19 cases per 100000 inhabitants are distributed among the german states.
- how big the growth factor is in the single german states

### Kudos ..

... to the great Data Team of the [Berliner Morgenpost](https://interaktiv.morgenpost.de/corona-virus-karte-infektionen-deutschland-weltweit/), which tirelessly collects current case data and seems to be the best informed source in Germany!!! 


# Statistics

In [None]:
fed_inhabitants = pd.read_csv('../input/de-stats/german-federation-inhabitants.csv', thousands='.', decimal=',')
fed_inhabitants_2018 = fed_inhabitants[['Bundesland','2018']].rename(columns={'2018':'inhabitants_2018'})

In [None]:
de = gpd.read_file('../input/de-stats/german-federation-geo-adm1.shp', encoding='utf-8')

## Corona Data

In [None]:
CURRENT_DATA_URL='https://funkeinteraktiv.b-cdn.net/current.v4.csv'
HISTORY_DATA_URL='https://funkeinteraktiv.b-cdn.net/history.light.v4.csv'

Note: use HISTORY_DATA, because we need more dates for growth calculation

In [None]:
df = pd.read_csv(HISTORY_DATA_URL, usecols=['parent', 'label', 'label_parent', 'population', 'date', 'updated', 'confirmed', 'recovered', 'deaths'])
df['date'] = pd.to_datetime(df['date'], format="%Y%m%d")

LATEST_DATE = df.date.max() - pd.Timedelta(days=1)
history_df = df[df.date<=LATEST_DATE]

**The statistics are from:**

In [None]:
LATEST_DATE

## German Data

In [None]:
de_history_df = history_df.loc[history_df.label_parent=='Deutschland',['label','date','recovered','confirmed','deaths']]
de_history_df['active'] = de_history_df['confirmed'] - (de_history_df['recovered'] + de_history_df['deaths'])
de_by_date = de_history_df.groupby('date').sum()

### German Federal States

In [None]:
de_fed_by_date_all = de_history_df.groupby(['date','label']).sum().reset_index()
#feds = list(de_fed_by_date_all.label.unique())
feds = ['Baden-Württemberg',
 'Bayern',
 'Berlin',
 'Brandenburg',
 'Bremen',
 'Hamburg',
 'Hessen',
 'Mecklenburg-Vorpommern',
 'Niedersachsen',
 'Nordrhein-Westfalen',
 'Rheinland-Pfalz',
 'Saarland',
 'Sachsen',
 'Sachsen-Anhalt',
 'Schleswig-Holstein',
 'Thüringen']

de_fed_by_date = de_fed_by_date_all[de_fed_by_date_all.label.isin(feds)]

## Cases by Federation State

### Absolute Cases by Federation State

In [None]:
wi_df = de.merge(fed_inhabitants_2018,right_on='Bundesland', left_on='NAME_1').drop(['ID_0','ISO','NAME_0','NAME_1', 'ID_1', 'TYPE_1','ENGTYPE_1','NL_NAME_1','VARNAME_1'], axis=1)
with_inhabitants_and_cases_df = wi_df.merge(de_fed_by_date[de_fed_by_date.date==LATEST_DATE], right_on='label', left_on='Bundesland')

### Fed Confirmed/Active Cases per 10000 Inhabitants

In [None]:
PER_INHABITANTS = 100000

In [None]:
confirmed_per_inhabitants = (with_inhabitants_and_cases_df['confirmed']*PER_INHABITANTS)/with_inhabitants_and_cases_df['inhabitants_2018']
active_per_inhabitants = (with_inhabitants_and_cases_df['active']*PER_INHABITANTS)/with_inhabitants_and_cases_df['inhabitants_2018']

# Current State

## Confirmed Case in the single German Federation States

In [None]:
with_inhabitants_and_cases_df['confirmed_per_inhabitants']=confirmed_per_inhabitants
with_inhabitants_and_cases_df['active_per_inhabitants']=active_per_inhabitants
cases_by_bundesland=with_inhabitants_and_cases_df[['Bundesland','confirmed','confirmed_per_inhabitants','active','active_per_inhabitants']]
cases_by_bundesland.sort_values(by=['active_per_inhabitants','active'], ascending=False)

In [None]:
plt.figure(figsize=(14,9))

ax = plt.subplot(121, title = "Absolute Number of Confirmed Cases")
geoplot.choropleth(
    wi_df, hue=cases_by_bundesland.confirmed, 
    scheme = mapclassify.Quantiles(cases_by_bundesland.confirmed, k=7),
    cmap='Oranges', legend=True, ax=ax
);


ax = plt.subplot(122, title = "Confirmed Cases per {} inhabitants".format(PER_INHABITANTS))
geoplot.choropleth(
    wi_df, hue=cases_by_bundesland.confirmed_per_inhabitants, 
    scheme=mapclassify.Quantiles(cases_by_bundesland.confirmed_per_inhabitants, k=7),
    cmap='Oranges',legend=True, ax=ax
);

The plot of the **confirmed** cases per 100000 inhabitant shows, that Hamburg and Saarland are actually quite more effected than the absolute numbers would indicate.

In Hessen and Niedersachsen the situation seems to be quite better than the absolute numbers tell us.

## Confirmed vs. Active Case in the single German Federation States

`Active Cases = Confirmed Cases - (Recovered Cases + Death)`

Lets see, how many cases are actually still active and thus are still infectious and potentially have to be treated in the hospitals.

As the relative case rates per 100000 inhabitants tell us more about the actual burden, we will now compare the relative numbers instead of the absolute ones.

In [None]:
plt.figure(figsize=(14,18))

ax = plt.subplot(122, title = "Active Cases per {} inhabitants".format(PER_INHABITANTS))
geoplot.choropleth(
    wi_df, hue=cases_by_bundesland.active_per_inhabitants, 
    scheme=mapclassify.Quantiles(cases_by_bundesland.active_per_inhabitants, k=7),
    cmap='Reds',legend=True, ax=ax
);

ax = plt.subplot(121, title = "Confirmed Cases per {} inhabitants".format(PER_INHABITANTS))
geoplot.choropleth(
    wi_df, hue=confirmed_per_inhabitants, 
    scheme=mapclassify.Quantiles(cases_by_bundesland.confirmed_per_inhabitants, k=7),
    cmap='Oranges',legend=True, ax=ax
);


# New Infections
## New Infections Yesterday
### New Infections Yesterday - Federal States

In [None]:
def highlight_new_max(df):
    num_of_cols = df.size
    if df['new/max %'] >= 100:
        return ['background-color: red']*num_of_cols
    elif df['new/max %'] >= 50:
        return ['background-color: yellow']*num_of_cols
    else:
        return ['']*num_of_cols 

In [None]:
new_1days_by_state=de_fed_by_date.set_index(['date','label']).groupby('label').diff()['confirmed']
new_infections_1days_by_state= pd.DataFrame({'new_infections_1day': new_1days_by_state.groupby('label').last(),'max_new_infections_1day':new_1days_by_state.groupby('label').max()})
new_infections_1days_by_state['new/max %']=new_infections_1days_by_state.new_infections_1day*100/new_infections_1days_by_state.max_new_infections_1day

new_infections_1days_by_state.sort_values(by='new/max %', ascending=False).style.apply(highlight_new_max,axis=1)

In [None]:
plt.figure(figsize=(7,14))

ax = plt.subplot(111, title = "New Infections yesterday/ max new infections 1day ever %")
geoplot.choropleth(
    wi_df.set_index('Bundesland'), hue=new_infections_1days_by_state['new/max %'], 
    cmap='Reds',
    legend=True, 
    legend_kwargs={'orientation': 'horizontal'},  
    ax=ax
);

### New Infections Yesterday - Germany

In [None]:
de_last = float(de_by_date['confirmed'].diff().tail(1))
de_max = de_by_date['confirmed'].diff().max()

print ("Germany: Last    Day: ", de_last)
print ("Germany: Max per Day: ", de_max)
print ()
print ("Germany: Last Day/Max per Day %: ", de_last*100/de_max)

## Sum over the last 7 days per State and per State and 100k Inhabitants

In [None]:
def abs_growth_from(df_by_date):
    return df_by_date.diff().fillna(0)

In [None]:
def severity_colors(c):
    if c >= 200:
        return 'background-color: darkred'
    if c >= 150:
        return 'background-color: red'        
    if c >= 100:
        return 'background-color: orange'    
    if c >= 50:
        return 'background-color: yellow'       
    else:
        return 'background-color: white' 
    
def highlight(df):
    num_of_cols = df.size
    if df.new_infections_7days_per100k < 1.0:
        return ['background-color: yellow']*num_of_cols
    else:
        return [severity_colors(df['new_infections_7days_per100k'])]*num_of_cols 

In [None]:
state_to_new_infections_7days = {}
for fed in feds:
    fed_by_date = de_fed_by_date[de_fed_by_date.label==fed].set_index('date')
    abs_growth_fed_by_date =abs_growth_from(fed_by_date.confirmed)
    abs_growth_fed_by_date_last7days = abs_growth_fed_by_date.rolling(7).sum()
    state_to_new_infections_7days[fed] = list(abs_growth_fed_by_date_last7days.loc[abs_growth_fed_by_date_last7days.index == LATEST_DATE])

new_infections_7days_by_state = pd.DataFrame(state_to_new_infections_7days).transpose()
new_infections_7days_by_state.columns=['new_infections_7days']
new_infections_7days_by_state['new_infections_7days_per100k'] =fed_inhabitants_2018.set_index('Bundesland').merge(new_infections_7days_by_state, left_index=True, right_index=True).apply(lambda r:r['new_infections_7days']*PER_INHABITANTS/r['inhabitants_2018'], axis=1)

nif7=new_infections_7days_by_state.sort_values(by='new_infections_7days_per100k', ascending=False)
nif7.style.apply(highlight, axis=1)        

In [None]:
CUT_MAX=100
new_infections_7days_by_state['new_infections7days_per100k_cut'] = new_infections_7days_by_state['new_infections_7days_per100k'].apply(lambda v:v if v<CUT_MAX else CUT_MAX)

In [None]:
plt.figure(figsize=(14,18))

ax = plt.subplot(221, title = "New Infections last 7 days per {} inhabitants".format(PER_INHABITANTS))
geoplot.choropleth(
    wi_df.set_index('Bundesland'), hue=new_infections_7days_by_state.new_infections_7days_per100k, 
    cmap='Reds',
    legend_kwargs={'orientation': 'horizontal'}, 
    legend=True, 
    ax=ax
);

ax = plt.subplot(222, title = "New Infections last 7 days per {} inhabitants cut to {}".format(PER_INHABITANTS, CUT_MAX))
geoplot.choropleth(
    wi_df.set_index('Bundesland'), 
    hue=new_infections_7days_by_state.new_infections7days_per100k_cut, 
    cmap='Reds',
    legend_kwargs={'orientation': 'horizontal'}, 
    legend=True, 
    ax=ax    
);

# Case Status Quo

## Growth Factor


### Definition
The growth factor on day N is the number of confirmed cases on day N minus confirmed cases on day N-1 divided by the number of confirmed cases on day N-1 minus confirmed cases on day N-2. (see also [notebook COVID-19-Germany - Case Overview](https://www.kaggle.com/pat777/covid-19-germany-case-overview)).

`gf = (new infections on day(N) / (new infections day(N-1))`

### Concept
The growth factor is related to the **reproductive number `R`** of the virus.

![How the virus spreads with R0=2](https://cdn.newsapi.com.au/image/v1/665b7257930b5c7c472ab0d03743dc42?width=650)

The idea behind `R` is, that in a would without any measurements (e.g. social distancing) against the virus, the number of infections grows with a basic reproductive number `R0`. For Corona we have `R0 ~ 2.3`, which means, that one person infects 2.3 other persons during his/her infection. 

With the time more people of the population have been already infected and can not be infected any more (which hopefully also applies to Corona). Also the German government ordered social distancing, thus the effectiv reproductive number `Re` decreases. 

**When `Re==1`, we have reached the inflection point:**<br>
This is the point, when 1 person only infects one other person during his/her infection. Now the number of active case stays nearby constant (formula: `active cases = confirmed - (recovered+deaths)), so that e.g. the hospital capacity would not be overloaded (if the rate of intensive care patient keeps stable).

When `Re < 1` the epedemic stops spreading.

**Note:**<br>
For a more detailed explanation of the reproductive number and about how an epedemic can spread and end please refer to the following ingenious [article in the Washingtonpost](https://www.washingtonpost.com/graphics/2020/health/coronavirus-how-epidemics-spread-and-end/)


In [None]:
def rolling_mean(days, df_confirmed_by_date, fed):
    fed_confirmed_by_date=df_confirmed_by_date.loc[df_confirmed_by_date.label==fed,['date','confirmed']].set_index('date')
    fed_confirmed_by_date['rolling_mean'] = fed_confirmed_by_date.rolling(days).mean()    
    return fed_confirmed_by_date

def growth_factor_from(df, shift):
    growth_confirmed = df
    return (growth_confirmed/ growth_confirmed.shift(shift))

def plot_growth_factor(growth_factor_df, max_growth=10, details=""):
    ax = growth_factor_df.plot(figsize=(12,6), linestyle=':', linewidth=2, title='Growth Factor Confirmed (2nd derivate):' + details, ylim=(0.9,max_growth))
    ax.axhline(y=1, color='red', linestyle='--')
    plt.grid(True)
    plt.ylabel('Delta today/Delta day before');
    return ax    

#### Note
As in Germany it can take up to 7 (or even more) days until the case data are collected by the Robert Koch Institute (RKI), we use a rolling mean of the absolute growth over 7 days to calculate the growth factor:

#### Sorted Growth Factors of the single federation states

In [None]:
# collect latest growth factors per state
state_to_gf = {}
for fed in feds:
    fed_confirmed_by_date = rolling_mean(7, de_fed_by_date, fed)
    gf = growth_factor_from(abs_growth_from(fed_confirmed_by_date), 1)
    state_to_gf[fed] = list(gf.loc[gf.index == LATEST_DATE, 'rolling_mean'])

gfs_by_state = pd.DataFrame(state_to_gf).transpose()
gfs_by_state.columns=['gf']
gfs_by_state.sort_values(by='gf', ascending=False)

In [None]:
wgf_df = de.merge(gfs_by_state, right_index=True, left_on='NAME_1').drop(['ID_0','ISO','NAME_0', 'ID_1', 'TYPE_1','ENGTYPE_1','NL_NAME_1','VARNAME_1'], axis=1)

In [None]:
def min_max(v):
    min_v = v.min() if v.min()<1 else 0.99
    max_v = v.max() if v.max()>1 else 1.01
    return (min_v, max_v)

In [None]:
min_v, max_v = min_max(wgf_df.gf)

plt.figure(figsize=(14,14))
divnorm = colors.TwoSlopeNorm(vmin=min_v,vcenter=1.0,vmax=max_v)

ax = plt.subplot(111, title = "Growth Rate")
geoplot.choropleth(
    wgf_df, hue=wgf_df.gf,     
    cmap='coolwarm', norm = divnorm, legend=True, ax=ax
);

In [None]:
# collect latest growth factors per state
state_to_gf4 = {}
for fed in feds:
    fed_confirmed_by_date = rolling_mean(4, de_fed_by_date, fed)
    gf4 = growth_factor_from(abs_growth_from(fed_confirmed_by_date), 4)
    # print (fed, gf4)
    state_to_gf4[fed] = list(gf4.loc[gf4.index == LATEST_DATE, 'rolling_mean'])

gf4s_by_state = pd.DataFrame(state_to_gf4).transpose()
gf4s_by_state.columns=['gf4']
gf4s_by_state.sort_values(by='gf4', ascending=False)

## Growth Factor 4 == R<sub>eff</sub> of RKI


### Definition
see [Corona-Pandemie: Die Mathematik hinter den Reproduktionszahlen R](https://www.heise.de/newsticker/meldung/Corona-Pandemie-Die-Mathematik-hinter-den-Reproduktionszahlen-R-4712676.html)

`gf4 = Reff = (new infections on the 4 days([N-3,N])/4) / (new infections on the 4 days([N-7,N-4])/4)`

## R<sub>eff</sub> for the Single Federation States

In [None]:
wgf4_df = de.merge(gf4s_by_state, right_index=True, left_on='NAME_1').drop(['ID_0','ISO','NAME_0', 'ID_1', 'TYPE_1','ENGTYPE_1','NL_NAME_1','VARNAME_1'], axis=1)

In [None]:
min_v, max_v = min_max(wgf4_df.gf4)

plt.figure(figsize=(14,14))
divnorm = colors.TwoSlopeNorm(vmin=min_v,vcenter=1.0,vmax=max_v)

ax = plt.subplot(111, title = "R eff")
geoplot.choropleth(
    wgf4_df, hue=wgf4_df.gf4,     
    cmap='coolwarm', norm = divnorm, legend=True, ax=ax
);

## R<sub>eff</sub> for Germany
(registered in federation states only)

In [None]:
DAYS_BEFORE=4

In [None]:
abs_growth_de = abs_growth_from(de_fed_by_date.groupby('date').sum()['confirmed'])

In [None]:
abs_growth_de_4days = abs_growth_de.rolling(4).sum()
abs_growth_de_4days.iloc[-1]/abs_growth_de_4days.iloc[-1-DAYS_BEFORE]

# R<sub>7</sub>: 7-days-R of RKI

### Definition

7-Tage-R an. Er vergleicht den 7-Tages-Mittelwert der Neuerkrankungen eines Tages mit dem 7-Tages-Mittelwert **vier** Tage zuvor.

`gf7 = (new infections on the 7 days([N-6,N])/7) / (new infections on the 7 days([N-10,N-4])/7)`

In [None]:
state_to_gf7 = {}
for fed in feds:
    fed_by_date = de_fed_by_date[de_fed_by_date.label==fed].set_index('date')
    abs_growth_fed_by_date =abs_growth_from(fed_by_date.confirmed)
    abs_growth_fed_by_date_last7days = abs_growth_fed_by_date.rolling(7).sum()
    state_to_gf7[fed] = abs_growth_fed_by_date_last7days.iloc[-1]/abs_growth_fed_by_date_last7days.iloc[-1-DAYS_BEFORE]

gf7s_by_state = pd.DataFrame(state_to_gf7, index=['gf7']).transpose().sort_values(by='gf7', ascending=False)
gf7s_by_state

In [None]:
wgf7_df = de.merge(gf7s_by_state, right_index=True, left_on='NAME_1').drop(['ID_0','ISO','NAME_0', 'ID_1', 'TYPE_1','ENGTYPE_1','NL_NAME_1','VARNAME_1'], axis=1)

In [None]:
min_v, max_v = min_max(wgf7_df.gf7)

plt.figure(figsize=(14,14))
divnorm = colors.TwoSlopeNorm(vmin=min_v,vcenter=1.0,vmax=max_v)

ax = plt.subplot(111, title = "R7")
geoplot.choropleth(
    wgf7_df, hue=wgf7_df.gf7,     
    cmap='coolwarm', norm = divnorm, legend=True, ax=ax
);

## R<sub>7</sub> for Germany
(registered in federation states only)

In [None]:
abs_growth_de_7days = abs_growth_de.rolling(7).sum()
abs_growth_de_7days.iloc[-1]/abs_growth_de_7days.iloc[-1-DAYS_BEFORE]

# Links

For a more detailed geographical case analysis please visit 
- the [Robert Koch-Institut: COVID-19-Dashboard](https://experience.arcgis.com/experience/478220a4c454480e823b17327b2bf1d4)
- the [Infektionskarte of the Berliner Morgenpost](https://interaktiv.morgenpost.de/corona-virus-karte-infektionen-deutschland-weltweit/)

My further analyses notebooks can be found at
- [COVID-19-Germany - Case Overview](https://www.kaggle.com/pat777/covid-19-germany-case-overview/)
- [COVID-19-Berlin](https://www.kaggle.com/pat777/covid-19-berlin)