-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathREADME.Rmd
141 lines (105 loc) · 8.88 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
#out.width = "100%",
dpi=300,
fig.height=4
)
```
# NobBS: Nowcasting by Bayesian Smoothing
<!-- badges: start -->
<!-- badges: end -->
NobBS is Bayesian approach to estimate the number of occurred-but-not-yet-reported cases from incomplete, time-stamped reporting data for disease outbreaks. NobBS learns the reporting delay distribution and the time evolution of the epidemic curve to produce smoothed nowcasts in both stable and time-varying case reporting settings. For details, see the pre-print by [McGough et al. 2020](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007735).
## Installation
You can install the released version of NobBS from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("NobBS")
```
The development version of NobBS can be installed with:
``` r
library(devtools)
install_github("sarahhbellum/NobBS")
```
NobBS **requires** that [JAGS](https://mcmc-jags.sourceforge.io/) (Just Another Gibbs Sampler) is downloaded to the system or computer on which NobBS will run. Download JAGS here: https://mcmc-jags.sourceforge.io/.
## Usage
NobBS accepts a line list of case onset and case reporting dates as the input data, and users are required to specify:
* `data`: A **data frame** containing the line list data
* `now`: The date at which the nowcast is to be performed (of class `Date`, e.g. `as.Date()`)
* `units`: The temporal unit of reporting, either "1 day" or "1 week"
* `onset_date`: A string indicating the `Date` column containing the date of case onset
* `report_date`: A string indicating the `Date` column containing the date of case report
For example, in the data `denguedat`, cases are reported on a weekly basis:
```{r example}
library(NobBS)
data(denguedat)
head(denguedat)
```
For this dengue report line list, the user would specify `units`="1 week", `onset_date`="onset_week", and `report_date`="report_week" to produce a nowcast for the Date `now` of choice.
The user may optionally specify arguments such as:
* `moving_window`: The numeric size of the moving window used in the estimation of cases. The default option is `NULL`, i.e. no moving window, to consider all possible historical dates
* `max_D`: The numeric maximum delay D to consider in the estimation of cases, in the same units as `units`. For example, if cases are known to be 100% reported within the first 10 weeks, then it would be reasonable to set `max_D` = 9 weeks. Even if there are longer delays in the data, the delay probability for long delays would be modeled as $\Pr(Delay \geq 9 weeks)$. Note that it is not possible to set the maximum delay to be longer than what is possible to observe in the data (e.g. one cannot set `max_D`=15 in a 12-week time series). By default, `max_D` is set equal to either:
+ the length of the reporting time series minus 1, e.g. if 27 weeks of reporting data are provided, `max_D`=26.
+ the length of `moving_window`-1, if `moving_window` is specified.
* `cutoff_D`: A logical argument (default: `TRUE`) indicating whether to ignore delays larger than `max_D`. For example, for a `moving_window` of 27 weeks and a `max_D` of 10 weeks, `cutoff_D = TRUE` would indicate that cases reported with an 11+ week delay would be ignored. Conversely, setting `cutoff_D=FALSE` would model delays longer than 10 weeks within the moving window as $\Pr(27 weeks > Delay \geq 10 weeks)$.
* `proportion_reported`: A decimal between $(0,1]$ indicating the expected proportion reported. Default=1, meaning that 100% of cases are expected to be eventually reported. However, this may not be the case in all disease settings; for example, if the disease contributes to a number of asymptomatic cases that will not lead to detection by the health system, or if severe under-reporting is expected during a large outbreak. In these cases, the nowcast estimates will be inflated by 1/`proportion_reported`.
### Bayesian parameters
The user may specify whether underlying cases occur as part of a Poisson or a Negative Binomial process (Default: Poisson), and may specify the priors on the log-linear model as described in [McGough et al. 2020](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007735), where default values are set to be weakly informative. In addition, arguments governing the JAGS model MCMC sampling may be specified: number of chains (`nChains`, default=1), number of samples (`nSamp`, default=10,000), number of iterations discarded as burn-in (`nBurnin`, default=1000), number of thinned iterations (`nThin`, default=1), and adaptation period (`nAdapt`, default=1000). The percentile of the prediction interval may be specified with `conf` (default = 0.95 for a 95% prediction interval).
For recommendations on priors and moving windows, see the *Materials and Methods* sub-section in [McGough et al. 2020](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007735) entitled "NobBS R package implementation and recommendations."
## Model output
Altogether, a simple model may be run by specifying only the required arguments:
`NobBS(denguedat, as.Date("1990-10-01"),units="1 week",onset_date="onset_week",report_date="report_week")`. The user should **save the output to an R object** because many levels of information are stored in the output.
```{r}
test_nowcast <- NobBS(data=denguedat, now=as.Date("1990-10-01"),
units="1 week",onset_date="onset_week",report_date="report_week")
```
NobBS returns the list `test_nowcast`, which contains the following elements:
1. `estimates`, accessed via `test_nowcast$estimates`. This is a 5-column data frame containing case estimates for each date in the historical time series through `now`, with corresponding date of case onset, lower and upper bounds of the prediction interval, and the number of cases for that onset date reported up to `now`.
2. `estimates.inflated`, accessed via `test_nowcast$estimates.inflated`. This is a 5-column data frame containing case estimates for each date in the historical time series through `now`, *inflated by the proportion_reported*, with corresponding date of case onset, lower and upper bounds of the prediction interval, and the number of cases for that onset date reported up to `now`.
3. `nowcast.post.samples`, accessed via `test_nowcasts$nowcast.post.samples`. This is a vector of 10,000 samples from the posterior predictive distribution of the nowcast itself.
4. `params.post`, accessed via `test_nowcast$params.post`. This is a data frame containing 10,000 posterior samples for all parameters of the Bayesian model, including $\alpha_{t=now}$, {$log(\beta_d)$,...,$log(\beta_{max D})$ }, $\lambda_{t,d}$, and $\tau^2_\alpha$. This data frame can get increasingly large as the length of the time series of available data increases, so the user can control which parameter's posterior samples are outputted in `specs$param_names` (Default: all). See NobBS help file for details of `specs`.
```{r}
tail(test_nowcast$estimates)
```
## Plotting results
The NobBS output can be used in a variety of ways- for example, to inspect the posterior distribution of the nowcast estimate or to plot the nowcast and "hindcasts" (estimates produced for weeks in the past).
### Inspecting the posterior distribution of key parameters
Posterior of the nowcast:
```{r, message=FALSE, warning=FALSE, fig.width=7}
library(ggplot2)
nowcast_posterior <- data.frame(sample_estimate=test_nowcast$nowcast.post.samps)
ggplot(nowcast_posterior, aes(sample_estimate)) +
geom_histogram(binwidth=10) +
theme_bw() +
xlab("Cases (number)") +
ggtitle("Samples from the posterior distribution of the nowcast for Oct 1, 1990")
```
Posterior of $\beta_{d=0}$ (the estimated probability of a case being reported with no delay):
```{r, message=FALSE, warning=FALSE, fig.width=7}
library(ggplot2)
library(dplyr)
beta_0 <- data.frame(prob=exp(test_nowcast$params.post$`Beta 0`))
ggplot(beta_0, aes(prob)) +
geom_histogram() +
theme_bw() +
xlab("Probability of delay=0") +
ggtitle("Posterior distribution of the probability of delay d=0,
estimated over the period Jan 1, 1990 to Oct 1, 1990")
```
### Plotting the sequence of nowcast and hindcasts
```{r, fig.width=7}
nowcasts <- data.frame(test_nowcast$estimates)
ggplot(nowcasts) + geom_line(aes(onset_date,estimate,col="Nowcast estimate"),linetype="longdash") +
geom_line(aes(onset_date,n.reported,col="Reported to date"),linetype="solid") +
scale_colour_manual(name="",values=c("indianred3","black"))+
theme_classic()+
geom_ribbon(fill="indianred3",aes(x = onset_date,ymin=nowcasts$lower,
ymax=nowcasts$upper),alpha=0.3)+
xlab("Case onset date") + ylab("Estimated cases") +
ggtitle("Observed and predicted number of cases \nat the week of nowcast (Oct 1990) and weeks prior")
```