-
Notifications
You must be signed in to change notification settings - Fork 0
/
cases_day.Rmd
193 lines (156 loc) · 6.62 KB
/
cases_day.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: "Registrated cases per day"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
<style type="text/css">
body, td {
font-size: 20px;
}
code.r{
font-size: 16px;
}
pre {
font-size: 12px
}
</style>
# The data
The data is taken from the website of the RIVM: https://www.rivm.nl/coronavirus-covid-19/grafieken.
The figure below shows the daily number of registrated covid-19 cases from 2020/2/27 until 2020/6/25
```{r,include=FALSE}
source("HM+utill.r")
Gm <- read.csv("gemelde.csv")
Gm$Datum <- seq(as.Date("2020/2/27"), as.Date("2020/6/25"), "days")
Gm$Week <- as.numeric(format(Gm$Datum,"%W"))
Gm$cases <- Gm$nieuw+Gm$gisteren
Gm$Ccases <- c(NA,cumsum(Gm$cases)[-length(Gm$cases)])
Gm$Cprev14 <- stepback(14,Gm$cases)##
##
Gm <- Gm[-1,]
```
```{r,echo=FALSE}
ggplot(Gm,aes(x=Datum,y=cases))+
geom_step(color="darkred")+
labs(title="Number of registrated (GGD) covid-19 cases",y="Number of cases",x="Date")+
theme_bw()
```
The reproductive power function is calculated conditional on the number of infected in the previous 14 days.
So the reproductive power function can be seen as the probability that an infected from the 14 previous days, produces a new on.
The estimated reproductive power function is:
```{r,echo=FALSE,warning=FALSE}
fit <- glm(cbind(cases,Cprev14)~factor(Datum)-1,family=binomial,data=Gm)
Gm$Rp_14 <- exp(coef(fit))/(1+exp(coef(fit)))
##
rp_mat <-matrix(NA,nrow=119,ncol=1000)
for (i in 1:1000){
cas <-rnbinom(119,size=Gm$Cprev14,prob=1-Gm$Rp_14)
fits <- glm(cbind(cas,Cprev14)~factor(Datum)-1,
family=binomial,data=Gm)
rp <- exp(coef(fits))/(1+exp(coef(fits)))
rp_mat[,i] <-rp
}
##
plot(c(1,120),c(0,.7),pch="",xlab="Day",ylab="Probability")
title("Reproductive power function per infected of the last 14 days")
for (i in 1:1000){
lines(1:119,rp_mat[,i],col="gray",type="s")
}
lines(1:119,Gm$Rp_14,col="red",type="s")
##
##
```
The beginning is quite messy since the conditioning is on less than 14 days. There is also a clear weekend effect as it is in the raw data.
The gray lines are 1000 parametric bootstrap lines to show the uncertainty. There seems to roughly 4 periods: the first 30 days, between 30 an 50 days, between 50 and 80 days and the last part. Approximately between day 80 an 110 the reproductive power seem to rise again.
# A Markov-switching model
Models with 2 until 11 states were fitted to the data. Due to the large dispersion in the time series of the covid-19 data it was difficult to find stable solutions. Nevertheless reasonable estimates were obtained.
The model with 8 states fitted the data best according to Akaike's Information Criterion (AIC). This large number of states might be explained by the large dispersion in the data especially the weekend effects might have an influence.
In the figure below the reproductive power function (blue line) and the path through the the most likely hidden states from the 8 state model (red line) are shown. As can be seen the estimated states follow the reproductive power function quit accurately. That this number of states is needed might be due to the volatility in the time series. However this model might also overfit the data.
```{r,echo=FALSE,warning=FALSE}
##
##=====================fit 4-state HMM ===============================
##
Prev <- Gm$Cprev14
x <-Gm$cases
##
##=====================fit 4-state HMM ===============================
##
##=====================fit 4-state HMM ===============================
##
m <-4
pi0<-c(.15,.05,.1,.1)
gamma0<-matrix(
c(
0.9,0.1,0.05,0.05,
0.2,0.4,0.1,0.3,
0.05,0.05,0.7,0.2,
0.1,0.1,0.1,0.7
),m,m,byrow=TRUE)
mod4s<-HMM.mle(x,m,pi0,gamma0,stationary=TRUE)
#delta0<-c(1,1,1,1)/4
#mod4h<-HMM.mle(x,m,pi0,gamma0,delta=delta0,stationary=FALSE)
#mod4s
#mod4h
##
##=====================fit 8-state HMM ===============================
##
m<-8
pi0<-c(.20,.10,.05,.05,.05,.05,.1,.05)
gamma0<-matrix(
c(
0.40,0.02,0.02,0.02,0.40,0.02,0.02,0.10,
0.02,0.40,0.50,0.02,0.02,0.02,0.01,0.01,
0.10,0.20,0.20,0.02,0.40,0.02,0.03,0.03,
0.01,0.05,0.01,0.60,0.01,0.30,0.01,0.01,
0.10,0.05,0.30,0.03,0.40,0.03,0.05,0.04,
0.01,0.01,0.01,0.15,0.01,0.80,0.03,0.03,
0.01,0.01,0.01,0.01,0.03,0.90,0.02,0.01,
0.90,0.04,0.01,0.01,0.01,0.01,0.01,0.01
),m,m,byrow=TRUE)
mod8s<-HMM.mle(x,m,pi0,gamma0,stationary=TRUE)
#delta0<-c(0.1,.1,.1,0.1,0.1,.1,.3,.2)
#mod8h<-HMM.mle(x,m,pi0,gamma0,delta=delta0,stationary=FALSE)
#mod8s
#mod8h
##
vit8 <- HMM.viterbi(x,mod8s)
vit4 <- HMM.viterbi(x,mod4s)
Gm$state8 <-mod8s$pi[vit8]
Gm$state4 <-mod4s$pi[vit4]
ggplot(Gm,aes(x=Datum,y=Rp_14))+
geom_step(color="blue")+
geom_step(aes(y=state8),color="red")+
labs(title="Reproductive power function and the path through \n the the most likely hidden states from the 8 state model",y="Probabilit",x="Date")+
geom_vline(xintercept = as.numeric(Gm$Datum[15]),linetype="dashed",
color = "gray", size=1)+
theme_bw()
```
In the figure below the decoded path for the 8-state model is shown again (blue line). Besides some going-up and going-down in this path one might roughly recognize 4 periods. So the decoded path for the 4-state model is also calculated and shown in the figure below (red line). The gray vertical line indicates the first 14 days.
```{r,echo=FALSE}
ggplot(Gm,aes(x=Datum,y=state8))+
geom_step(color="blue")+
geom_step(aes(y=state4),color="red")+
labs(title="Path through the the most likely hidden states",y="Probabilit",x="Date")+
geom_vline(xintercept = as.numeric(Gm$Datum[15]),linetype="dashed",
color = "gray", size=1)+
theme_bw()
```
The 4 state levels are: state 1 with a level of 0.15, state 2 with a level of 0.08, state 3 with a level of 0.05 and state 4 with a level of 0.03.
Using the 4-state model one might come up withe the following periods:
28-2 until 27-3: outbreak state with state level 1;
28-3 until 18-4: switching between state 2 and 3
19-4 until 19-5: switching between state 3 and 4
20-5 until 13-6: switching between state 2 and 3 (except the first of june which had state 4 )
18-6 until the end: switching between state 3 and 4
The figure below shows the reproductive power function with the uncertainty (parametric bootstrap lines) and the most likely state probability path from the 4 state model.
```{r,echo=FALSE,warning=FALSE}
vit <- HMM.viterbi(x,mod4s)
state4 <-mod4s$pi[vit]
plot(c(1,120),c(0,.7),pch="",xlab="Day",ylab="Probability")
for (i in 1:1000){
lines(1:119,rp_mat[,i],col="gray",type="s")
}
lines(1:119,Gm$Rp_14,col="red",type="s")
lines(1:119,state4,col="blue",type="s")
```