## Solutions

### Problem 1
* Task description / introduction to SIR model: see [this notebook](SIRmodel.ipynb)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint

In [None]:
def F(y, t, N, c, w):
  "return the derivates defining the differential equations of the SIR model, y = (S, I, R)"
  S, I, R = y
  dS = -c*S/N*I
  dI =  c*S/N*I -w*I
  dR =           w*I
  return dS, dI, dR

In [None]:
# solve system of ODEs
def SolveSIRAndPlot(F, y0, ts, args):
    # solve ODE
    S, I, R = odeint(F, y0, ts, args).T
    # plot result
    plt.plot(ts, S, label = "susceptible")
    plt.plot(ts, I, label = "infected")
    plt.plot(ts, R, label = "recovered")
    plt.xlabel("time [days]")
    plt.ylabel("individuals [1]")
    plt.legend()
    plt.show()

In [None]:
# time steps [assume days as time unit]
def TimeSteps(t_min, t_max, dt):
    return np.arange(t_min, t_max+dt, dt)

In [None]:
# initial conditions and constants
y0 = (999, 1, 0) # (note: we use floats although individual counts are integers)
N  = sum(y0)
c  = 0.10
w  = 0.05

In [None]:
print("r =", c/w, "(basic reproduction number)")
if c/w > N/y0[0]: print("Outbreak")

In [None]:
ts = TimeSteps(t_min = 0, t_max = 500, dt = 1)

In [None]:
SolveSIRAndPlot(F, y0, ts, args = (N, c, w))

To get a feeling for the impact of the model parameters on the time evolution, we can use [``ipywidgets``](https://ipywidgets.readthedocs.io/)  in jupyter notebooks, which allow us to add interactive sliders to change the model parameters and call a function whenever the slider has been moved:

In [None]:
from ipywidgets import interact
interact(lambda c: c**2, c=(0, 0.5, 0.05));

In [None]:
%matplotlib inline
from ipywidgets import interact
interact(lambda c, w, t_max: 
             SolveSIRAndPlot(F, y0, TimeSteps(0, t_max, 5), 
                             args = (N, c, w)), 
         c=(0, 0.5, 0.05),
         w=(0, 0.5, 0.05),
         t_max=(500, 2500, 500),
         continuous_update=False);

### Problem 2

This is an open-ended project task. Here, we will just look at some basics to get you started on your own investigations.

In [None]:
### same code as before...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from requests_cache import CachedSession
session = CachedSession('../data/cache_plot_corona_cases', backend='sqlite', expire_after = 86400)

def SaveURL(url, path):
  with open(path, "w") as outf:
    outf.write(GetURL(url))
    
def GetURL(url):
  response = session.get(url)
  print(f"Loaded {url} from cache: {response.from_cache}")
  if response.status_code != 200:
    print(f"Request failed with code {response.status_code}")
    return None
  else:
    return response.text

def LoadDataset(url, local_path, date_column, cases_column):
  SaveURL(url, local_path)
  df = pd.read_csv(local_path, parse_dates = [date_column])\
         .set_index(date_column)
  print("Last data point:")
  print(df.tail(1)[cases_column])
  return df

# JHU, documentation: https://github.com/owid/covid-19-data/tree/master/public/data#readme
dfj = LoadDataset("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/jhu/new_cases.csv", 
                  "/tmp/new_cases.csv",
                  "date",
                  "Germany")

Let's pick the numbers for Germany:

In [None]:
df = dfj[["Germany"]]

First issue with the data:  Sometimes the number of new cases per day are negative.

Reason given in README: 
> Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data, because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.

In [None]:
df.describe()

Fortunately, pandas provides a handy solution:

In [None]:
df.clip(lower = 0).describe()

What is the problem with this?

In [None]:
df.sum() - df.clip(lower = 0).sum()

We're artificially reducing the sum. But wait... that's today's number...

In [None]:
df.index[df["Germany"] < 0]

It seems only the last value is wrong, so probably not too much a concern.

Second issue with the data: Day-to-day fluctuations. Unlikely to be real.

In [None]:
# simplest solution: rolling average (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html)
df = df.assign(rolling = df["Germany"].rolling("7d", center = True).mean())

In [None]:
df.tail(20)

In [None]:
# plot
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [16, 8]
plt.plot(df["Germany"], "b", alpha = 0.4)
plt.plot(df["rolling"], "b")

In [None]:
df = df.assign(weekday = df.index.weekday)
df

In [None]:
df.loc["01.11.2020":"30.04.2021"].plot.scatter("weekday", "Germany", alpha = 0.2);

(Hard to see anything here as numbers vary a lot even when selecting a subset.)

In [None]:
df = df.assign(avg = df["Germany"] / df["rolling"])

In [None]:
df.tail(20)

In [None]:
df.plot.scatter("weekday", "avg", alpha = 0.2);