# Regime shifts in COVID-19 pandemia

COVID-19 is a particularly dangerous disease for health-care system:

* SARS-CoV-2 virus is as infectious as flu.
* Significant amount of infected patients require hospitalisation.
* Many patients have asymptomatic disease form that remains undetected,
* The disease spread follows exponential law with high basic reproduction number.  

Because of these reasons it is extremely important to know  basic reproduction number $R_0$. It can vary form day to day. We use shorthand $\alpha_i$ for the basic reproduction number in the $i$-th day.
However, due to the similarity with common cold and flu a large number of cases remain undetected. Therefore, a reliable estimate for reproduction number can be estimated only from 
* the amount of hospitalised patients
* the amount of recorded deaths.

These observations come with time lag:  
* On average the patient will be hospitalised after 7-14 days from infection.
* On average the death occurs 14-19 days from infection.

Newetheless, these are the only objective measurements we can get across countries, as different countries have different testing procedures.


The average mortality for COVID-19 is 3% if you consider reported cases vs deaths.
The moratlity is expected to be smaller as the large number of cases remain unreported.
It is important to note that the moratlity can vary a lot depending how overloaded is the medical system. 
In this study, we are interested how much the initial prevention measures infuence the basic reproduction number. 
As most European countries were swift to act the overload is not significant in this timeframe.  


## I.  Basic notation and background knowledge

Consider the observations of the $i$-th day: 
* Let $\alpha_i$ be the basic reproduction number. 
* Let $x_i$ be the true number of infected individuals.
* Let $y_i$ be the number of recorded hospitalisations.
* Let $z_i$ be the number of recorded deaths
* Let $u_i$ be indicator for a potential regime shift.

The regime indicator $u_i$ is set to one if the regime shift is assumed to possible:
* The exact nature of it depends how you formalise backgroung knowledge. 
* For instance assume that a new prevention measure was inplemented on the $i$-th day and you think that it takes up to three days to enforce it. Then you should set  $u_i=u_{i+1}=u_{i+2}=1$. 

Ás infections, hospitalisations and deaths are out of sync and we need to consolidate them. The most naive way is to align hospitalisations and deaths with the infection time using average time delays:

* We assume that a patient is hospiltalised after 12 days on average. 
* We assume that a death occurs after after 17 days on average.
* These numbers are not substantiated on a real evidence.
* Graphs of Wuhan outbreaks seem to indicate 12 day lag for hospitalisation.  


Hospitalisation and death rates determine the variability of observations $y_i$ and $z_i$:

* We assume that 3% of infected individuals die.
* We assume that 15-20% of infected individuals are hospitalised.
* These rates can be much lower due to asymptomatic infections.
* This is not a problem as we can define $x_i$ as the number of non-asymptomatic infections. 

## II. First order Hidden Markov Model

### Emission probabilities

* If we shift $y_i$ 12 days and $z_i$ 17 days to the left we get a setting where $x_i$ determines $y_i$ and $z_i$.

* The state of the Hidden Markov model is a pair $(x_i, \alpha_i, \ldots)$ where $\alpha_i$ is the current reproduction number.

* Let $p_y$ and $p_z$ be the probabilities of hospitalisation and death then 

\begin{align*}
y_i&\sim Binomial(x_i, p_y)\\
z_i&\sim Binomial(x_i, p_z)\enspace
\end{align*}

* According to the background knowlege we can set $p_y=20\%$ and $p_z=3\%$.


### State transitions and extra evidence

Let $(x_i, \alpha_i, s_i)$ be the state of Hidden Markov model where $s_i$ is the level of containment and mitigation measures.
Most coutries have used three level of mitigation measures:

* no rectrictions (0)
* social distancing (1)
* hard lockdown (2)

Note that we do not know when the mitigation measures kick in as there is an unknown amount of time is needed to enforece these measures in real life. 
Thus, we must model the process with a following state diagram:

<img src = 'illustrations/one-way-three-state-model.png' width=100%>

From basic assumptions we know that $s_1=0$ and $s_n=2$.
The exact value of $\rho$ is irrelevant as all valid state sequences will get the same probability. 

In the simplest case we assume that in each state the basic reproduction number is constant and $\alpha_i$ can only change if state of mitigation measures changes.
This leads to the following evaluation rules:
* The mitigation state changes with the following probability

\begin{align*}
\Pr[s_{i+1}&=s_i]=1-\rho\\
\Pr[s_{i+1}&=s_i+1]=\rho\\
\end{align*}

* When the mitigation state does not change

\begin{align*}
s_{i+1}&=s_i\\
\alpha_{i+1}&=\alpha_i\\
x_{i+1}&=\alpha_{i}\cdot x_{i}
\end{align*}

* When the mitigation state changes 

\begin{align*}
s_{i+1}&=s_i+1\\
\Pr[\alpha_{i+1}]&=const\\
x_{i+1}&=\alpha_{i}\cdot x_{i}
\end{align*}

### Initial probabilities

The initial mitigation state $s_1=0$ and the 
\begin{align*}
\Pr[\alpha_i]&=const\\
\Pr[x_1]&=const
\end{align*}


### What do these constant probabilities mean

* We use constant probabilities to maximally impartial about the changes. 
* In practice one shpuld use grid of values with step size $0.05$ instead of continious values.
* According to our background knowledge the basic reproduction number is in the range $[1.0, 1.5]$. 
* The initial number of infected patients depends form which point you stat the timeseries.
* In most cases, the initial number of infected patients is in the range $x_1\in[1, 1000]$.





## III.  Inference

Note that we formalised the problem in a way that the mechains of the Hidden Markov model is fixed and we need to find the only the hidden states of the model:

* We can use decoding to find the most probable path and corresponding parameters $x_i, \alpha_i, s_i$.

* We can use belief propagation to estimate marginal probabilities and find the maximising parameters $x_i, \alpha_i, s_i$.



## IV. Available data 

As the situation is changing we provide only links to the data 
* [Johns Hopkins University Center for Systems Science and Engineering dataset](https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases)
* [European Centre for Disease Prevention and Control dataset](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide)
* [BIIT dataset for Estionian COVID-19 cases]( https://docs.google.com/spreadsheets/d/1nGRqoWD6B8zXqBE7ftW2DG5sX9HNTu5FMoehBygLdg0/edit#gid=0)
* [University of Oxford containment measures dataset](http://epidemicforecasting.org/containment)
* [European Union travel restrictions](https://ec.europa.eu/transport/coronavirus-response_en)
* [Detailed information of events in www.worldometers.info](https://www.worldometers.info/coronavirus/)


## V. Kaplan–Meier survival graphs and higher order chains

* Shifting the timeseries for hospiltalisation and death is avery crude measure.
* Kaplan-Meyer survival graph allows to estimate how many patients are alive after $x$ days.
* If the Kaplan-Mayer suervival graph is given form the infection date we can easily model the impact $x_i$ has to $x_{i+j}$.
* Similar graph can be built for the hospitalisation data.
* If we keep $x_i, x_{i-1}, \ldots, x_{i-k}$ in the hidden state state we can compute much more refined estimate for the emission probabilities.   



# Homework

## 6.1 Basic model for COVID-19 (<font color ='red'>5p</font>)

Implement the basic change detection algorithm described above and analyse the data from
* China
* Italy 
* Iran
* Germany
* Spain
* United Kingdom 
* Sweden

Visualise the results. Do different contries have similar basic reproduction numbers? 

You get <font color ='red'>3p</font> for implementing the inference and <font color ='red'>2p</font> for visualisation and interpretation.

## 6.2 Model with variable reproduction number for COVID-19 (<font color ='red'>3p+2p</font>)

Improve the model so that small fluctuations are in basic reproduction numbers is tolerated inside the block. Define probability model by yourself and justify it.
Redo the analysis and visualise the results in similar manner as in the previous exercise.
You get up to <font color ='red'>2p</font> extra points if you manage to use $u_i$ in the model.

## 6.3 Higer-order Hidden Markov  model for COVID-19 (<font color ='red'>5-10p</font>)

Improve the model so that $x_i$ can contribute to many observations $y_i$ and $z_i$.
You get <font color ='red'>5p</font> if you use some ad hoc estimate for surviaval to show that such analysis can be carried out.
You get <font color ='red'>10p</font> if you use find some preliminary survival estimates and use it.