# A Model of Prevalence for COVID-19

**This worksheet presents hypothetical mathematical models of COVID -- and it is too early to draw conclusions about which represents the current epidemic. In all cases you should follow the advice of public health authorities to stay at home (for everyone who can) and self isolate if you have any symptoms. For authoritative advice see: https://www.nhs.uk/conditions/coronavirus-covid-19/**

Author

    George Danezis
    University College London
    Twitter: @gdanezis
    Web: http://www0.cs.ucl.ac.uk/staff/G.Danezis/
   
Code and Data, as well as Jupyter notebook available here:
https://github.com/gdanezis/COVID-Prevalence
    
I use here the time series of reported outcomes from COVID-19, namely recoveries and deaths in different countries, to estimate the prevalence of the virus, as well as to project its growth. Under most scenarios a significant fraction of the population will be infected in the next 4-6 weeks, unless the latest public heath measures lower its growth. However, there are scenarios that explain the current apparent high association with fatalities, through association rather than causation. So it is possible that COVID-19 is highly infectious (particularly within hospitals) but does not cause significant fatalities (CFR < 0.1%). Of course other scenarios with a CFR of about 1% (high) are also possible.

In [1]:
import sys
from analytical import *

# The model

We are trying to infer 4 key variables
* Prevalence $p$.
* Testing rate $f$ of mild or asympromatic carriers.
* Total infected population $R_p$.

We are given as data:
* The number of deaths $D$ that tested positive with COVID.
* The number of recoveries $R$ that tested positive with COVID.
* The total population $P$.

We have to make assumptions about:
* The Case Fatality Rate (CFR) due to COVID $\text{CFR}_c$.
* The CFR due to other reasons with similar symptoms as COVID $\text{CFR}_o$. We set it to $0.01/100$ (Annual mortality of 1% * 1/12 months * ~1/10 serious conditions mau be confused.)
* The increase in risk to be infected with COVID if in a serious condition $\mu$. We usually set $\mu=10$, namely being in a serious medical condition leading to death (for other reasons) also exposes a patient to the equavalent of x10 to catch COVID -- due to hospital infection rates.

The system of equations we need to solve are:
* The hospital prevalence $p_h$:
\begin{equation}
h_p = 1 - (1 - p)^{\mu}
\end{equation}
* The actual deaths due to COVID. All deaths minus the ones due to other causes that still tested positive for COVID:
\begin{equation}
D_a = \max{(D - p_h \cdot \text{CFR}_o \cdot P, 1)}
\end{equation}
* The total infections is the population is actual infected and deaths due to Covid
\begin{equation}
p = (R_p + D) / P
\end{equation}
* The observed recoveries are a fraction of the actual one based on testing rates.
\begin{equation}
R = f \cdot R_p
\end{equation}
* The definition of the Case Fatality Rate is, the number of deaths due to the virus, devided by the number of infections (sum of all dead with virus, and those recovered).
\begin{equation}
\text{CFR}_c = \frac{D_a}{D + R_p}
\end{equation}

This system of equaltions is non-linear but we can solve it numericaly using iterative methods.

### Limitations

* The Testing rate $f$ assumes totaly random testing in the population that is not a fatality. However most countries do use some symptoms or at least self-selection as a gate for testing. Therefore a lower rate of testing can justify the observed recovered cases and the reported rate of testing may be a factor of 5-10 lower than the one reported here. (depending of how well symptoms guide testing).
* We assume all Deaths either caused or associated with COVID are tested, and reported in $D$. However, it is not clear that health authorities are testing dead people, and many cases resulting in fatalities may not have been reported.  
* The Case Fatality Rate $\text{CFR}_c$ measure the fatalities **caused** by COVID-19, rather than the ones merely associated with COVID. The raw data about recoveries and deaths can only be used directly to estimate the latter (association) since it is not clear whether a fatality is due to COVID or something else (but the patient also tested positive). As a result the CFR we estimate can be much lower than other studies, since a lot of deaths may simply be due to other causes (evidenced by high comorbidity, and potentially already high prevalence in some places).




# Results for different $\text{CFR}_c$ 

### Discussion for CFR=1%

A CFR of around 1%-2% is an estimate that was feared early on from experiences in China and elsewhere. However, given this CFR the prevalence in South Korea is so small that the resting rate should be close to 31%. In face we know that about 1-in-170 people have been tested there (huge, but not 31%), weakening the evidence for such a CFR. Other testing rates also seem an order of magniture off. Comorbidity figures are much lower than those reported from Italy.

In [2]:
# Since we measure prevalence based on outcomes, the figures lag by about 20 days.
CFR_covid = 0.01 # CFR medium high: 1%
hospital_infection_mult = 5.0
make_table(populations, CFR_covid, hospital_infection_mult, flx=sys.stdout)

Assumptions: COVID CFR:  1.00% In Hospital factor: x5.0
Country           Prev      CFR  Testing   Comorb.    Infected
--------------------------------------------------------------
Japan            0.00%    1.00%    5.94%    4.76%        3,958
USA              0.02%    1.00%    0.61%    4.76%       48,721
Germany          0.01%    1.00%    3.91%    4.76%       11,591
Italy            0.96%    1.00%    1.30%    4.68%      573,208
Spain            0.45%    1.00%    1.61%    4.72%      208,073
Belgium          0.07%    1.00%    4.84%    4.76%        8,293
Switzerland      0.13%    1.00%    1.18%    4.75%       11,121
Iran             0.21%    1.00%    4.90%    4.74%      170,794
Korea, South     0.02%    1.00%   30.27%    4.76%       10,460
United Kingdom   0.05%    1.00%    0.43%    4.76%       31,571
Netherlands      0.12%    1.00%    0.01%    4.75%       20,074
France           0.12%    1.00%    2.71%    4.75%       81,054


### Discussion for CFR=0.1%

A CFR of 0.1% is on the low side, and lower than one estimated by most studies. In fact it would put COVID-19 on par with seasonal viruses in terms of fatality rate. Such a CFR would require Italy and Spain to have had a single digit percentage of their populations infected in early March, which means that by now (end of March) about 50% of the population must have had COVID-19 (if the increase is at a similar rate, see projection section below). The testing rate for South Korea, and others, is still too large (1-in-20 rather than 1-in-170).

In [3]:
# Since we measure prevalence based on outcomes, the figures lag by about 20 days.
CFR_covid = 0.001 # CFR low: 0.1%
hospital_infection_mult = 5.0
make_table(populations, CFR_covid, hospital_infection_mult, flx=sys.stdout)

Assumptions: COVID CFR:  0.10% In Hospital factor: x5.0
Country           Prev      CFR  Testing   Comorb.    Infected
--------------------------------------------------------------
Japan            0.02%    0.10%    0.84%   33.32%       27,962
USA              0.11%    0.10%    0.09%   33.29%      344,391
Germany          0.10%    0.10%    0.55%   33.29%       81,931
Italy            7.00%    0.10%    0.18%   30.30%    4,229,650
Spain            3.22%    0.10%    0.22%   31.92%    1,500,361
Belgium          0.52%    0.10%    0.68%   33.10%       58,780
Switzerland      0.92%    0.10%    0.17%   32.92%       79,031
Iran             1.50%    0.10%    0.69%   32.67%    1,218,232
Korea, South     0.14%    0.10%    4.28%   33.27%       73,959
United Kingdom   0.34%    0.10%    0.06%   33.18%      223,499
Netherlands      0.83%    0.10%    0.00%   32.96%      142,571
France           0.86%    0.10%    0.38%   32.95%      575,753


### Discussion for CFR=0.001

This is a negligible Case Fatality rate, and as a result most deaths with COVID are due to other reasons rather than the COVID virus. As a result the comorbidity rates are high (>90%) which is compatible with what was observed in Italy. This scenario would mean that Italy has long reached now the >60% herd immunity threshold, and we should be seeing the tail end of the epidemic soon.

In [4]:
# Since we measure prevalence based on outcomes, the figures lag by about 20 days.
CFR_covid_low = 0.0001 # CFR very low: 0.01%
hospital_infection_mult = 10.0
make_table(populations, CFR_covid_low, hospital_infection_mult, flx=sys.stdout)

Assumptions: COVID CFR:  0.01% In Hospital factor: x10.0
Country           Prev      CFR  Testing   Comorb.    Infected
--------------------------------------------------------------
Japan            0.03%    0.01%    0.62%   90.90%       38,186
USA              0.14%    0.01%    0.06%   90.86%      472,266
Germany          0.14%    0.01%    0.40%   90.86%      112,317
Italy           16.65%    0.01%    0.07%   83.43%    10,066,389
Spain            5.30%    0.01%    0.14%   88.79%    2,471,375
Belgium          0.72%    0.01%    0.49%   90.64%       82,301
Switzerland      1.32%    0.01%    0.12%   90.41%      113,058
Iran             2.22%    0.01%    0.47%   90.06%    1,799,738
Korea, South     0.20%    0.01%    3.12%   90.84%      101,616
United Kingdom   0.47%    0.01%    0.04%   90.73%      310,069
Netherlands      1.18%    0.01%    0.00%   90.46%      202,948
France           1.23%    0.01%    0.27%   90.44%      820,873


# Projection forward

Since the data used to estimate prevalence relates to outcomes, we know that the estimation lags behind by the amount of typical time it takes to have an outcome. From studies we consider this to be about 20 days. We therefore build a projection of the prevalence based on a simple model:
* The increase in prevalence follows the difference equation. Its solution is a logistic curve, and 0.6 represents 60% of the population being infected (after which herd immunity kicks in).
\begin{equation}
dp = (p \cdot r - p) \cdot (0.6 - p)
\end{equation}
* The rate of growth $r$ is computed based on the 3 previous days of growth.
* We assume a low CFR (0.01%).

The resulting projections are:

![COVID prevalence projections for key countries](figures/All-prev.png)

The inflence of $\text{CFR}_c$ is not dramatic, but only moves prevance rates by 2 weeks. For the United Kingdom the plot a range of $\text{CFR}_c$ in $[1\%-0.01\%]$ 

![COVID prevalence projections for key countries](figures/United%20Kingdom-prev.png)

The same plot for Italy:

![COVID prevalence projections for key countries](figures/Italy-prev.png)