The epidemiological model is implemented in the odin
DSL, and compiled using odin.dust
. The code can be found in inst/odin/carehomes.R
.
The model can be run using the dust
package interface, given all necessary parameters and initial conditions. Typically however, sircovid
internally uses dust
and mcstate
to provide functions to fit parameters, and simulate the model to create forecasts or counterfactuals.
sircovid
implements a state-space model (also known as a compartmental model) for the transmission of COVID-19 in the UK, starting from 2020-Jan-01. The model, at its core, is an SEIR model. On top of this we have added age structure, hospitalisation pathways, seroprevalence testing and a parallel epidemic in carehomes.
{ width = 10cm}
Individuals are initially susceptible () in age groups , typically in 17 five year bands up to 80+. Two additional bands are used to separately model care-home workers, and care-home residents. Individuals are infected (into ) with rate which is calculated from age-specific mixing of infected individuals. Individuals may be asymptomatic , develop mild symptoms or influenza-like illness (ILI) . Individuals recover into - presently we do not model reinfection. Some proportion of those from will develop severe symptoms, in which case they may:
Of those in general hospital beds, two parallel pathways for confirmed and unconfirmed cases exist, along which individuals may:
Those triaged into ICU beds may:
Seroconversion in all exposed individuals is modelled by an independent set of states for pre-seroconversion , and levels which trigger a positive test or below this giving a negative test .
Transitions between all states use random draws from binomial distributions with set probability and the number of individuals in the partition .
The package also includes a model known as the ‘basic’ model, which is an older version of the model used in development. It is a subset of the above model states, which doesn’t model the states for:
Only deaths and number of ICU beds occupied can be used to fit the model to data.
This model has its own dust helper objects (sircovid::basic_compare()
, sircovid::basic_index()
, sircovid::basic_initial()
, and sircovid_parameters()
). This model can be fitted using the same techniques discussed in this vignette, but we do not show the detail here.
The ‘comparison function’ used by mcstate
calculates the likelihood of the model state given observed values. This can take daily counts of:
At any time-point, any of this data may be missing NA
. See sircovid::carehomes_compare()
for details.
Here we will demonstrate how death data from the Office of National Statistics (ONS) and a single seroprevalence survey can be used to fit the model.
We will use publicly available data from the Office for National Statistics, licensed under the Open Government Licence:
For simplicity, we will just fit to a single region here (England), but as most of this data is available subdivided into subregions it would also be possible to run independent model fits for each one. The data used here was accessed on 2020-10-12.
The daily deaths, where COVID-19 was mentioned on the death certification, can be downloaded above, and found in the ‘Covid-19 - Daily occurrences’ tab. We will only fit from 2020-Mar-14 onwards. We include a CSV of this information here, the raw data sources can be found in inst/extdata
.
# Read the data from the CSV
death_data <- read.table("ons_deaths.csv",
sep=",",
header = TRUE,
row.names = NULL,
stringsAsFactors = FALSE)
rmarkdown::paged_table(death_data)
date <chr> | deaths <int> | |||
---|---|---|---|---|
30/01/2020 | 1 | |||
31/01/2020 | 0 | |||
01/02/2020 | 0 | |||
02/02/2020 | 1 | |||
02/03/2020 | 1 | |||
03/03/2020 | 0 | |||
04/03/2020 | 1 | |||
05/03/2020 | 2 | |||
06/03/2020 | 2 | |||
07/03/2020 | 0 |
We can combine this with two data-points from a serosurvey provided by the ONS on 2020-May-04 and 2020-Sep-08:
serology <- read.table("ons_serology.csv",
sep=",",
header = TRUE,
row.names = NULL,
stringsAsFactors = FALSE)
serology
#> date England_npos_15_64 England_tested_15_64
#> 1 04/05/2020 60 885
#> 2 12/08/2020 476 9343
# Combine with the death data
i <- match(death_data$date, serology$date)
serology <- serology[i, ]
rownames(serology) <- NULL
data <- cbind(death_data, serology[setdiff(names(serology), "date")])
We will run this analysis over the first wave, covering the start of lockdown until initial easing of restrictions on 2020-May-11.
data[,'date'] <- as.Date(data[,'date'], format = "%d/%m/%y")
# Cut out outside date range
data <- data[data$date >= as.Date("2020-03-14"), ]
# Prepare for sircovid. Here only analyse up until restrictions ease
first_wave_data <- data[data$date < as.Date("2020-05-11"), ]
We must give the columns their correct names for sircovid::carehomes_particle_filter()
, adding columns of missing data for any data streams not being fitted:
colnames(first_wave_data) <- c("date", "deaths", "npos_15_64", "ntot_15_64")
missing_cols <- c("icu", "hosp", "pillar2_cases", "pillar2_pos", "general", "deaths_comm", "deaths_hosp", "admitted", "new", "new_admitted", 'pillar2_tot', 'pillar2_over25_pos', 'pillar2_over25_tot', 'pillar2_over25_cases', 'react_pos', 'react_tot')
na_col <- as.data.frame(matrix(NA_integer_,
nrow = nrow(first_wave_data),
ncol = length(missing_cols)),
row.names = NULL)
colnames(na_col) <- missing_cols
first_wave_data <- cbind(first_wave_data, na_col)
Let’s also have a look at the full death data-stream over the epidemic so far:
The first step to running sircovid
is to set up a particle filter sircovid::carehomes_particle_filter()
which binds the model with the data described above. We must decide:
sircovid
models the seeding as a single introduction of ten infected individuals in the 15-19 age band.We can use this to process the data into the standard mcstate
data format as follows, referring to the documentation of sircovid::carehomes_particle_filter()
to determine the correct column names for the data:
steps_per_day <- 4L
pf_data <- sircovid::sircovid_data(first_wave_data,
start_date = "2020-01-01",
dt = 1 / steps_per_day)
This also converts the dates into a sircovid_date
for us, which is simply days since 2019-Dec-31 (i.e. sircovid::sircovid_date("2020-Jan-01") == 1
).
This is actually everything needed to intialise the sircovid particle filter, which can be iterated with different parameter sets to run inference and forecasts. To set this object up, we must also decide how many particles to run (more =~ better but slower; use at least 100); how many threads to parallelise the analysis over, ideally a divisor of the number of particles; and optionally a seed to make analysis reproducible:
sircovid_pf <- sircovid::carehomes_particle_filter(pf_data,
n_particles = 100L,
n_threads = 1L,
seed = 1L)
We can run this particle filter with a single parameter set to calculate the likelihood of the parameters given the data . Here we will look at changing the start date and beta, and how that affects the model likelihood:
# Dates are sircovid_date
set.seed(1)
sircovid_pf$run(pars = sircovid::carehomes_parameters(start_date = 15,
region = "England",
beta_value = 0.05))
#> [1] -473.4299
sircovid_pf$run(pars = sircovid::carehomes_parameters(start_date = 30,
region = "England",
beta_value = 0.05))
#> [1] -473.9585
sircovid_pf$run(pars = sircovid::carehomes_parameters(start_date = 15,
region = "England",
beta_value = 0.1))
#> [1] -1497.799
sircovid_pf$run(pars = sircovid::carehomes_parameters(start_date = 30,
region = "England",
beta_value = 0.05))
#> [1] -478.439
A start date of 15-Jan-2020 and appear to be best in this small grid search. We will now use pMCMC to iterate this and calculate full posterior likelihoods for model parameters.
Note: as the particle filter picks up from the last run’s random number state, to make runs of the same particle filter reproducible you would need to either set up a new particle filter object, or use the dust::reset()
function. This is automated in mcstate
and sircovid
.
We must now decide on how to treat the parameters in the model. Broadly, we can make three choices for each one:
For many of the probabilities of taking different routes through hospitalisation, population size, and rate of contact between age groups we used fixed values. Transmission and epidemic characteristics such as Rt and start date are treated as unknown, with only broad limits on their upper and lower bounds imposed. Parameters for disease severity are taken from Verity et al 2020, but experience with fitting suggests that allowing some variation around these estimates rather than exactly fixed values improves the overall fit – in this case we infer their values but impose strong priors.
In this example we will first try and estimate the value of Rt before and after national lockdown in March, and the start date of the epidemic (four parameters). All other parameters will remain fixed.
In deciding how to operate parameters, the first thing to do is to look at the odin model under the section ## User defined parameters - default in parentheses:
or run grep 'user' inst/odin/carehomes.R
to see the names of the model parameters. All of these will automatically be set up, but any can be sampled. Run the following to see the defaults:
sircovid::carehomes_parameters(region = "england",
start_date = 1) # as sircovid_date i.e. 2020-01-01
#> $hosp_transmission
#> [1] 0.1
#>
#> $ICU_transmission
#> [1] 0.05
#>
#> $comm_D_transmission
#> [1] 0.05
#>
#> $dt
#> [1] 0.25
#>
#> $initial_step
#> [1] 4
#>
#> $N_age
#> [1] 19
#>
#> $beta_step
#> [1] 0.08
#>
#> $population
#> [1] 3299637 3538206 3354246 3090232 3487863 3801409 3807954 3733642 3414297
#> [10] 3715812 3907461 3670651 3111835 2796740 2779326 1940686 2836964
#>
#> $carehome_beds
#> [1] 457781
#>
#> $carehome_residents
#> [1] 339674
#>
#> $carehome_workers
#> [1] 339674
#>
#> $m
#> [0,4) [5,9) [10,14) [15,19) [20,24)
#> [0,4) 5.806061e-07 2.319269e-07 1.345742e-07 8.887695e-08 1.342317e-07
#> [5,9) 2.319269e-07 1.875881e-06 3.496137e-07 1.591503e-07 1.297943e-07
#> [10,14) 1.345742e-07 3.496137e-07 2.043065e-06 3.937053e-07 7.134390e-08
#> [15,19) 8.887695e-08 1.591503e-07 3.937053e-07 2.172745e-06 3.888382e-07
#> [20,24) 1.342317e-07 1.297943e-07 7.134390e-08 3.888382e-07 7.434983e-07
#> [25,29) 2.142781e-07 1.917844e-07 1.035054e-07 2.156303e-07 3.832452e-07
#> [30,34) 2.281789e-07 2.745640e-07 1.531476e-07 1.324435e-07 2.232413e-07
#> [35,39) 2.622918e-07 3.271133e-07 2.822296e-07 2.355419e-07 1.922489e-07
#> [40,44) 1.168192e-07 2.740820e-07 3.156079e-07 2.763879e-07 2.177136e-07
#> [45,49) 7.272130e-08 9.568111e-08 1.795048e-07 2.920800e-07 2.551079e-07
#> [50,54) 9.685796e-08 9.174860e-08 9.886474e-08 1.651726e-07 1.823673e-07
#> [55,59) 7.244920e-08 5.944763e-08 1.009726e-07 9.334599e-08 1.407064e-07
#> [60,64) 7.213057e-08 1.116141e-07 7.058405e-08 7.012613e-08 1.059779e-07
#> [65,69) 5.622304e-08 8.732049e-08 8.176299e-08 1.136597e-07 8.642595e-08
#> [70,74) 2.472973e-08 3.409458e-08 6.502366e-08 1.129385e-07 6.513244e-08
#> [75,79) 2.472973e-08 3.409458e-08 6.502366e-08 1.129385e-07 6.513244e-08
#> [80,100) 2.472973e-08 3.409458e-08 6.502366e-08 1.129385e-07 6.513244e-08
#> CHW 1.440416e-07 1.786943e-07 1.629310e-07 1.864934e-07 2.151656e-07
#> CHR 2.472973e-09 3.409458e-09 6.502366e-09 1.129385e-08 6.513244e-09
#> [25,29) [30,34) [35,39) [40,44) [45,49)
#> [0,4) 2.142781e-07 2.281789e-07 2.622918e-07 1.168192e-07 7.272130e-08
#> [5,9) 1.917844e-07 2.745640e-07 3.271133e-07 2.740820e-07 9.568111e-08
#> [10,14) 1.035054e-07 1.531476e-07 2.822296e-07 3.156079e-07 1.795048e-07
#> [15,19) 2.156303e-07 1.324435e-07 2.355419e-07 2.763879e-07 2.920800e-07
#> [20,24) 3.832452e-07 2.232413e-07 1.922489e-07 2.177136e-07 2.551079e-07
#> [25,29) 4.815342e-07 2.829304e-07 2.210583e-07 2.266553e-07 2.378359e-07
#> [30,34) 2.829304e-07 4.376804e-07 3.216381e-07 2.530171e-07 1.956181e-07
#> [35,39) 2.210583e-07 3.216381e-07 4.017525e-07 3.570324e-07 2.537399e-07
#> [40,44) 2.266553e-07 2.530171e-07 3.570324e-07 3.968134e-07 3.643486e-07
#> [45,49) 2.378359e-07 1.956181e-07 2.537399e-07 3.643486e-07 5.039887e-07
#> [50,54) 2.240281e-07 1.981330e-07 1.902021e-07 2.217218e-07 2.025322e-07
#> [55,59) 1.860509e-07 1.733221e-07 1.637580e-07 1.723276e-07 1.764958e-07
#> [60,64) 1.421535e-07 1.637436e-07 1.979219e-07 2.127230e-07 1.691935e-07
#> [65,69) 1.368219e-07 1.065102e-07 1.825915e-07 1.596935e-07 1.024931e-07
#> [70,74) 5.656948e-08 4.161870e-08 5.511788e-08 1.542762e-07 1.257964e-07
#> [75,79) 5.656948e-08 4.161870e-08 5.511788e-08 1.542762e-07 1.257964e-07
#> [80,100) 5.656948e-08 4.161870e-08 5.511788e-08 1.542762e-07 1.257964e-07
#> CHW 2.534554e-07 2.555899e-07 2.635933e-07 2.752691e-07 2.633076e-07
#> CHR 5.656948e-09 4.161870e-09 5.511788e-09 1.542762e-08 1.257964e-08
#> [50,54) [55,59) [60,64) [65,69) [70,74)
#> [0,4) 9.685796e-08 7.244920e-08 7.213057e-08 5.622304e-08 2.472973e-08
#> [5,9) 9.174860e-08 5.944763e-08 1.116141e-07 8.732049e-08 3.409458e-08
#> [10,14) 9.886474e-08 1.009726e-07 7.058405e-08 8.176299e-08 6.502366e-08
#> [15,19) 1.651726e-07 9.334599e-08 7.012613e-08 1.136597e-07 1.129385e-07
#> [20,24) 1.823673e-07 1.407064e-07 1.059779e-07 8.642595e-08 6.513244e-08
#> [25,29) 2.240281e-07 1.860509e-07 1.421535e-07 1.368219e-07 5.656948e-08
#> [30,34) 1.981330e-07 1.733221e-07 1.637436e-07 1.065102e-07 4.161870e-08
#> [35,39) 1.902021e-07 1.637580e-07 1.979219e-07 1.825915e-07 5.511788e-08
#> [40,44) 2.217218e-07 1.723276e-07 2.127230e-07 1.596935e-07 1.542762e-07
#> [45,49) 2.025322e-07 1.764958e-07 1.691935e-07 1.024931e-07 1.257964e-07
#> [50,54) 1.900017e-07 2.465184e-07 1.413444e-07 1.090120e-07 1.100982e-07
#> [55,59) 2.465184e-07 3.178364e-07 2.405478e-07 1.656965e-07 9.720656e-08
#> [60,64) 1.413444e-07 2.405478e-07 2.093669e-07 2.018837e-07 1.252670e-07
#> [65,69) 1.090120e-07 1.656965e-07 2.018837e-07 2.516157e-07 1.599770e-07
#> [70,74) 1.100982e-07 9.720656e-08 1.252670e-07 1.599770e-07 1.779584e-07
#> [75,79) 1.100982e-07 9.720656e-08 1.252670e-07 1.599770e-07 1.779584e-07
#> [80,100) 1.100982e-07 9.720656e-08 1.252670e-07 1.599770e-07 1.779584e-07
#> CHW 2.027563e-07 2.092149e-07 1.832679e-07 1.438780e-07 9.430844e-08
#> CHR 1.100982e-08 9.720656e-09 1.252670e-08 1.599770e-08 1.779584e-08
#> [75,79) [80,100) CHW CHR
#> [0,4) 2.472973e-08 2.472973e-08 1.440416e-07 2.472973e-09
#> [5,9) 3.409458e-08 3.409458e-08 1.786943e-07 3.409458e-09
#> [10,14) 6.502366e-08 6.502366e-08 1.629310e-07 6.502366e-09
#> [15,19) 1.129385e-07 1.129385e-07 1.864934e-07 1.129385e-08
#> [20,24) 6.513244e-08 6.513244e-08 2.151656e-07 6.513244e-09
#> [25,29) 5.656948e-08 5.656948e-08 2.534554e-07 5.656948e-09
#> [30,34) 4.161870e-08 4.161870e-08 2.555899e-07 4.161870e-09
#> [35,39) 5.511788e-08 5.511788e-08 2.635933e-07 5.511788e-09
#> [40,44) 1.542762e-07 1.542762e-07 2.752691e-07 1.542762e-08
#> [45,49) 1.257964e-07 1.257964e-07 2.633076e-07 1.257964e-08
#> [50,54) 1.100982e-07 1.100982e-07 2.027563e-07 1.100982e-08
#> [55,59) 9.720656e-08 9.720656e-08 2.092149e-07 9.720656e-09
#> [60,64) 1.252670e-07 1.252670e-07 1.832679e-07 1.252670e-08
#> [65,69) 1.599770e-07 1.599770e-07 1.438780e-07 1.599770e-08
#> [70,74) 1.779584e-07 1.779584e-07 9.430844e-08 1.779584e-08
#> [75,79) 1.779584e-07 1.779584e-07 9.430844e-08 1.779584e-08
#> [80,100) 1.779584e-07 1.779584e-07 9.430844e-08 1.779584e-08
#> CHW 9.430844e-08 9.430844e-08 4.000000e-06 4.000000e-06
#> CHR 1.779584e-08 1.779584e-08 4.000000e-06 5.000000e-05
#>
#> $N_tot
#> [1] 3299637 3538206 3354246 3090232 3487863 3757132 3763601 3690155 3374529
#> [10] 3672532 3861949 3627897 3075590 2779756 2762342 1889735 2582208 339674
#> [19] 339674
#>
#> $N_tot_15_64
#> [1] 35401480
#>
#> $p_specificity
#> [1] 0.9
#>
#> $pillar2_specificity
#> [1] 0.99
#>
#> $pillar2_sensitivity
#> [1] 0.99
#>
#> $prop_noncovid_sympt
#> [1] 0.01
#>
#> $observation
#> $observation$phi_ICU
#> [1] 1
#>
#> $observation$k_ICU
#> [1] 2
#>
#> $observation$phi_general
#> [1] 1
#>
#> $observation$k_general
#> [1] 2
#>
#> $observation$phi_hosp
#> [1] 1
#>
#> $observation$k_hosp
#> [1] 2
#>
#> $observation$phi_death_hosp
#> [1] 1
#>
#> $observation$k_death_hosp
#> [1] 2
#>
#> $observation$phi_death_comm
#> [1] 1
#>
#> $observation$k_death_comm
#> [1] 2
#>
#> $observation$k_death
#> [1] 2
#>
#> $observation$phi_admitted
#> [1] 1
#>
#> $observation$k_admitted
#> [1] 2
#>
#> $observation$phi_new
#> [1] 1
#>
#> $observation$k_new
#> [1] 2
#>
#> $observation$phi_new_admitted
#> [1] 1
#>
#> $observation$k_new_admitted
#> [1] 2
#>
#> $observation$phi_pillar2_cases
#> [1] 1
#>
#> $observation$k_pillar2_cases
#> [1] 2
#>
#> $observation$rho_pillar2_tests
#> [1] 0.1
#>
#> $observation$exp_noise
#> [1] 1e+06
#>
#>
#> $p_admit_conf
#> [1] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
#>
#> $p_asympt
#> [1] 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34 0.34
#> [16] 0.34 0.34 0.34 0.34
#>
#> $p_death_comm
#> [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.7
#>
#> $p_death_hosp_D
#> [1] 0.1258933 0.1226134 0.1356729 0.1526679 0.1743031 0.1941879 0.2093617
#> [8] 0.2244326 0.2370135 0.2578281 0.2908746 0.3207640 0.3625638 0.3909655
#> [15] 0.4211515 0.4475459 0.4820000 0.2600168 0.4820000
#>
#> $p_death_ICU
#> [1] 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896
#> [8] 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896
#> [15] 0.5234896 0.5234896 0.5234896 0.5234896 0.5234896
#>
#> $p_hosp_ILI
#> [1] 0.001772051 0.001510059 0.002681536 0.005482996 0.010801921 0.020789645
#> [7] 0.030033884 0.043195770 0.050353126 0.067162222 0.084808192 0.106382665
#> [13] 0.131365666 0.151247633 0.169503156 0.204930342 0.200036782 0.065384922
#> [19] 0.200036782
#>
#> $p_ICU_hosp
#> [1] 0.11845387 0.11845387 0.11845387 0.11845387 0.11845387 0.11845387
#> [7] 0.11845387 0.12551064 0.14091042 0.17595537 0.24437611 0.35243489
#> [13] 0.52779221 0.72404472 0.18197685 0.05439050 0.01673267 0.21931358
#> [19] 0.01673267
#>
#> $p_seroconversion
#> [1] 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394
#> [8] 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394
#> [15] 0.8212394 0.8212394 0.8212394 0.8212394 0.8212394
#>
#> $p_sympt_ILI
#> [1] 0.2356449 0.2356449 0.2450545 0.2450545 0.2777233 0.2777233 0.3033047
#> [8] 0.3033047 0.3225450 0.3225450 0.3819877 0.3819877 0.4339815 0.4339815
#> [15] 0.4836359 0.4836359 0.5049071 0.3390641 0.5049071
#>
#> $psi_hosp_ILI
#> [1] 0.008647088 0.007368644 0.013085111 0.026755414 0.052710207 0.101447374
#> [7] 0.146556549 0.210782695 0.245708497 0.327731957 0.413839119 0.519116224
#> [13] 0.641025944 0.738044114 0.827125720 1.000000000 0.976120865 0.319059254
#> [19] 0.976120865
#>
#> $p_hosp_ILI_step
#> [1] 0.2049303
#>
#> $psi_ICU_hosp
#> [1] 0.16360021 0.16360021 0.16360021 0.16360021 0.16360021 0.16360021
#> [7] 0.16360021 0.17334653 0.19461564 0.24301727 0.33751521 0.48675845
#> [13] 0.72894974 1.00000000 0.25133371 0.07512036 0.02311000 0.30290060
#> [19] 0.02311000
#>
#> $p_ICU_hosp_step
#> [1] 0.7240447
#>
#> $psi_death_ICU
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>
#> $p_death_ICU_step
#> [1] 0.5234896
#>
#> $psi_death_hosp_D
#> [1] 0.2611893 0.2543846 0.2814790 0.3167383 0.3616246 0.4028795 0.4343604
#> [8] 0.4656277 0.4917293 0.5349130 0.6034743 0.6654854 0.7522070 0.8111317
#> [15] 0.8737583 0.9285184 1.0000000 0.5394539 1.0000000
#>
#> $p_death_hosp_D_step
#> [1] 0.482
#>
#> $psi_death_comm
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
#>
#> $p_death_comm_step
#> [1] 0.7
#>
#> $psi_admit_conf
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>
#> $p_admit_conf_step
#> [1] 0.2
#>
#> $s_E
#> [1] 2
#>
#> $s_asympt
#> [1] 1
#>
#> $s_mild
#> [1] 1
#>
#> $s_ILI
#> [1] 1
#>
#> $s_comm_D
#> [1] 2
#>
#> $s_hosp_D
#> [1] 2
#>
#> $s_hosp_R
#> [1] 2
#>
#> $s_ICU_D
#> [1] 2
#>
#> $s_ICU_R
#> [1] 2
#>
#> $s_triage
#> [1] 2
#>
#> $s_stepdown
#> [1] 2
#>
#> $s_PCR_pos
#> [1] 2
#>
#> $gamma_E
#> [1] 0.4357298
#>
#> $gamma_asympt
#> [1] 0.4784689
#>
#> $gamma_mild
#> [1] 0.4784689
#>
#> $gamma_ILI
#> [1] 0.25
#>
#> $gamma_comm_D
#> [1] 0.4
#>
#> $gamma_hosp_D
#> [1] 0.4
#>
#> $gamma_hosp_R
#> [1] 0.2
#>
#> $gamma_ICU_D
#> [1] 0.4
#>
#> $gamma_ICU_R
#> [1] 0.2
#>
#> $gamma_triage
#> [1] 2
#>
#> $gamma_stepdown
#> [1] 0.4
#>
#> $gamma_R_pre_1
#> [1] 0.2
#>
#> $gamma_R_pre_2
#> [1] 0.1
#>
#> $gamma_test
#> [1] 0.3
#>
#> $gamma_PCR_pos
#> [1] 0.2
start_date
is special in sircovid, and is set from the data in sircovid::sircovid_data()
, and can be sampled by using the start_date
name.
The parameters in the $observation
part of the object can also be sampled, but their use is is in the likelihood sircovid::carehomes_compare()
, where they are also defined.
If the parameters we are sampling are the same as their name in the odin model nothing special needs to be done. But we are actually free to give sample whatever definition of parameters we wish, as long as we define a transform function relating them to parameters in the model. We use start_date
directly, and define three linearly-interpolated values beta_1
, beta_2
and beta_3
covering the period of lockdown. This transform must take these pars
being sampled, and return a suitable sircovid::carehomes_parameters()
object. This return function, as shown above, is very flexible so most of the parameters can be modified, but the start date and region are mandatory.
The transmission constant in the model is multiplied by the age-specific contact matrix to give the transmission intensity , and is equivalent to in the simple SEIR model. This is a special parameter which can change at specified timepoints, which we typically define using key announcements made by the UK government (though you can use whatever you’d like):
These will be linearly interpolated between the beta_value
values at every integration timestep by sircovid::sircovid_parameters_beta()
, but we do only need to call this function ourselves if we would like to make another parameter time varying.
The rates of progression
in hospital are available individually via sircovid::carehomes_parameters_progression()
(which are used by default), but can be modified. Severity rates can be obtained from the markovid package, and loaded with sircovid::sircovid_parameters_severity()
.
We then must give each parameter to be sampled:
For each of the parameters above we will use improper (flat) priors, the default:
start_date_param <-
mcstate::pmcmc_parameter("start_date",
initial = sircovid::sircovid_date("2020-02-08"),
min = sircovid::sircovid_date("2020-01-01"),
max = sircovid::sircovid_date("2020-03-15"),
discrete = TRUE)
beta_date <- c("2020-03-16", "2020-03-23", "2020-03-25")
beta_min <- 0
beta_max <- 1
beta1_param <- mcstate::pmcmc_parameter("beta1", initial = 0.15, min = beta_min, max = beta_max)
beta2_param <- mcstate::pmcmc_parameter("beta2", initial = 0.04, min = beta_min, max = beta_max)
beta3_param <- mcstate::pmcmc_parameter("beta3", initial = 0.03, min = beta_min, max = beta_max)
mcmc_param_list <- list(start_date_param, beta1_param, beta2_param, beta3_param)
Care must be taken to ensure the minimum for start_date
matches that defined in sircovid::sircovid_data
. The defaults for carehomes contact rates C_1
and C_2
should also be modified.
# Returns a function which takes pars (which are being sampled).
# Outside function binds beta date when first called
parameter_transform <- function(beta_date) {
beta_date <- sircovid::sircovid_date(beta_date)
function(pars) {
start_date <- pars[["start_date"]]
beta_value <- unname(pars[c("beta1", "beta2", "beta3")])
ret <- sircovid::carehomes_parameters(region = "england",
start_date = start_date,
beta_value = beta_value,
beta_date = beta_date,
C_1 = 3e-7,
C_2 = 3e-7)
ret
}
}
The final element required to build and run the MCMC object is a proposal distribution for each of the above four parameters. Here we will just take a simple diagonal matrix which gives an independent variance for each parameter. The mcstate
vignette explains how this can be improved by using a previous run.
mcmc_proposals <- matrix(0, nrow = length(mcmc_param_list), ncol = length(mcmc_param_list))
diag(mcmc_proposals) <- c(5, 5e-5, 2e-4, 7e-7)
n_steps <- 2e3
n_chains <- 1L
mcmc_params <- mcstate::pmcmc_parameters$new(mcmc_param_list,
mcmc_proposals,
parameter_transform(beta_date))
We can now run the inference. This will take a long time (~1hr), so increasing the number of threads above can be helpful. For real inference, running for at least 2000 steps on more than one chain is recommended.
samples <- mcstate::pmcmc(mcmc_params,
sircovid_pf,
n_steps = n_steps,
n_chains = n_chains,
save_trajectories = TRUE,
progress = TRUE)
saveRDS(samples, file = "samples.Rds")
We can easily plot the modelled deaths versus the data for a single parameter sample from the posterior
samples <- readRDS("samples.Rds")
# extract trajectories that we are fitting
plot_deaths <- function(samples, step) {
cum_deaths <- samples$trajectories$state[c("deaths_hosp", "deaths_comm"), step, ]
deaths <- apply(cum_deaths, 1, diff)
deaths <- cbind(deaths = rowSums(deaths), deaths)
dx <- sircovid::sircovid_date_as_date(samples$trajectories$step / samples$trajectories$rate)
# plot against the data
cols <- c(Total = "#8c8cd9", Hospital = "#cc0044", Community = "#999966")
plot(data$date, data$deaths, pch = 17, cex = 0.5, xlab = "Date", ylab = "Number of deaths")
lines(dx[c(-1, -2)], deaths[-1 ,'deaths'], lty = 1, col = cols[["Total"]])
lines(dx[c(-1, -2)], deaths[-1 ,'deaths_hosp'], lty = 1, col = cols[["Hospital"]])
lines(dx[c(-1, -2)], deaths[-1 ,'deaths_comm'], lty = 1, col = cols[["Community"]])
legend("topright", lwd = 1, col = cols, legend = names(cols), bty = "n")
}
plot_deaths(samples, n_steps)
This is clearly underestimating the epidemic’s peak. We should also have a look at these results to determine whether the MCMC converged, and enough samples were taken:
mcmc <- coda::as.mcmc(cbind(samples$probabilities, samples$pars))
summary(mcmc)
#>
#> Iterations = 1:2001
#> Thinning interval = 1
#> Number of chains = 1
#> Sample size per chain = 2001
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> log_prior 0.00000 0.000000 0.000e+00 0.0000000
#> log_likelihood -453.06439 1.881153 4.205e-02 0.2937759
#> log_posterior -453.06439 1.881153 4.205e-02 0.2937759
#> start_date 22.75362 2.451483 5.480e-02 1.0090588
#> beta1 0.10059 0.004743 1.060e-04 0.0011336
#> beta2 0.10040 0.033936 7.586e-04 0.0078185
#> beta3 0.02849 0.001915 4.281e-05 0.0003202
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> log_prior 0.00000 0.00000 0.00000 0.00000 0.00000
#> log_likelihood -456.92034 -454.45399 -452.76905 -451.28030 -450.71323
#> log_posterior -456.92034 -454.45399 -452.76905 -451.28030 -450.71323
#> start_date 19.00000 21.00000 22.00000 24.00000 29.00000
#> beta1 0.09209 0.09762 0.09970 0.10364 0.11205
#> beta2 0.04066 0.07282 0.09950 0.12933 0.16835
#> beta3 0.02492 0.02708 0.02825 0.02976 0.03271
coda::effectiveSize(mcmc)
#> log_prior log_likelihood log_posterior start_date beta1
#> 0.000000 41.003043 41.003043 5.902347 17.507546
#> beta2 beta3
#> 18.839754 35.779797
1 - coda::rejectionRate(mcmc)
#> log_prior log_likelihood log_posterior start_date beta1
#> 0.0000 0.1590 0.1590 0.0805 0.1590
#> beta2 beta3
#> 0.1590 0.1590
plot(mcmc)
This is a poor MCMC run. A simple solution is to run more chains for longer, but this is likely to be slow and inefficient. We can try and improve this by setting the proposal distribution to be the same as the observed covariance between samples in this prototype run by setting , which is achieved with proposal = alpha * cov(samples$pars)
where and is the number of parameters (this is Roberts and Rosenthal’s adaptive MCMC algorithm). We can also start the betas from their median value in the previous chain.
beta1_param <- mcstate::pmcmc_parameter("beta1",
initial = summary(mcmc)$quantiles["beta1", "50%"],
min = beta_min, max = beta_max)
beta2_param <- mcstate::pmcmc_parameter("beta2",
initial = summary(mcmc)$quantiles["beta2", "50%"],
min = beta_min, max = beta_max)
beta3_param <- mcstate::pmcmc_parameter("beta3",
initial = summary(mcmc)$quantiles["beta3", "50%"],
min = beta_min, max = beta_max)
mcmc_param_list <- list(start_date_param, beta1_param, beta2_param, beta3_param)
mcmc_params <- mcstate::pmcmc_parameters$new(mcmc_param_list,
cov(samples$pars),
parameter_transform(beta_date))
samples_tuned <- mcstate::pmcmc(mcmc_params,
sircovid_pf,
n_steps = n_steps,
n_chains = n_chains,
save_trajectories = TRUE,
progress = TRUE)
Iterating this procedure will produce well-mixed chains which produce samples efficiently, though some manual tweaks to the covariance matrix may be necessary for good mixing. We used a few iterations of this to improve chain mixing, and provide this data pre-calculated. Once a good covariance matrix has been found, you will likely want to run multiple chains for longer, to get a good effective sample size for each parameter, and confirm convergence by calculating the Rhat statistic.
cov_mat <- matrix(
c(13.0332898709, 0.0433604270, -0.0404375032, 0.0006694847,
0.0433604270, 1.678017e-04, -1.527468e-04, 2.836651e-06,
-0.0404375032, -1.527468e-04, 2.496026e-04, -8.368809e-06,
0.0006694847, 2.836651e-06, -8.368809e-06, 8.658182e-07),
nrow = length(mcmc_param_list), ncol = length(mcmc_param_list))
mcmc_params <- mcstate::pmcmc_parameters$new(mcmc_param_list,
cov_mat,
parameter_transform(beta_date))
n_steps <- 1e4
n_chains <- 4L
samples_tuned <- mcstate::pmcmc(mcmc_params,
sircovid_pf,
n_steps = n_steps,
n_chains = n_chains,
save_trajectories = TRUE,
progress = TRUE)
saveRDS(samples, file = "samples_tuned.Rds")
These chains have a decent effective sample size, and though there is some autocorrelation in them, they appear to have converged to the same optimum.
samples_tuned <- readRDS("samples_tuned.Rds")
mcmc_tuned <- coda::as.mcmc(cbind(samples_tuned$probabilities, samples_tuned$pars))
summary(mcmc_tuned)
#>
#> Iterations = 1:40004
#> Thinning interval = 1
#> Number of chains = 1
#> Sample size per chain = 40004
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> log_prior 0.00000 0.000000 0.0000000 0.0000000
#> log_likelihood -450.95076 2.261543 0.0113071 0.1302043
#> log_posterior -450.95076 2.261543 0.0113071 0.1302043
#> start_date 35.41463 8.072574 0.0403609 0.5333336
#> beta1 0.13956 0.028966 0.0001448 0.0019676
#> beta2 0.06084 0.035327 0.0001766 0.0026303
#> beta3 0.02918 0.002081 0.0000104 0.0001476
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> log_prior 0.000e+00 0.00000 0.00000 0.00000 0.00000
#> log_likelihood -4.555e+02 -452.44364 -451.03025 -449.31918 -446.75978
#> log_posterior -4.555e+02 -452.44364 -451.03025 -449.31918 -446.75978
#> start_date 1.500e+01 31.00000 37.00000 41.00000 47.00000
#> beta1 8.885e-02 0.11782 0.13808 0.15929 0.20009
#> beta2 3.112e-03 0.03373 0.05971 0.08414 0.13783
#> beta3 2.516e-02 0.02775 0.02904 0.03063 0.03331
coda::effectiveSize(mcmc_tuned)
#> log_prior log_likelihood log_posterior start_date beta1
#> 0.0000 301.6889 301.6889 229.1006 216.7086
#> beta2 beta3
#> 180.3887 198.7261
1 - coda::rejectionRate(mcmc_tuned)
#> log_prior log_likelihood log_posterior start_date beta1
#> 0.0000000 0.1529635 0.1529635 0.1369147 0.1529635
#> beta2 beta3
#> 0.1529635 0.1529635
# Separate chains for coda
mcmc_list <- vector(mode = "list", length = n_chains)
for (i in 1:n_chains) {
mcmc_list[[i]] <- coda::as.mcmc(samples_tuned$pars[samples_tuned$chain == i,])
}
mcmc_list <- coda::mcmc.list(mcmc_list)
coda::gelman.diag(mcmc_list)
#> Potential scale reduction factors:
#>
#> Point est. Upper C.I.
#> start_date 1.11 1.29
#> beta1 1.10 1.27
#> beta2 1.03 1.07
#> beta3 1.00 1.01
#>
#> Multivariate psrf
#>
#> 1.1
plot(mcmc_list)
#> Warning in rep(col, length = nchain(x)): partial argument match of 'length' to
#> 'length.out'
#> Warning in rep(col, length = nchain(x)): partial argument match of 'length' to
#> 'length.out'
#> Warning in rep(col, length = nchain(x)): partial argument match of 'length' to
#> 'length.out'
#> Warning in rep(col, length = nchain(x)): partial argument match of 'length' to
#> 'length.out'
We can see the posterior estimates for the epidemic start date (in days into 2020, i.e. a sircovid_date) and the transmission intensity at each time point in beta_date
.
Plotting versus the data again:
This is a better fit to the data than the one above, and captures the overall shape of the epidemic, but still underestimates the peak. Adding more flexibility into the severity parameters, adding more of the available datastreams can help rectify this issue.
We can also calculate and plot Rt using the provided helper function:
# Extract the number of susceptibles from the fit
S_partition_index <- grep("^S_", names(samples_tuned$predict$index))
# this gives dimension 19, n_trajectories, n_times
S <- samples_tuned$trajectories$state[S_partition_index, , , drop = FALSE]
n_steps <- dim(samples_tuned$pars)[1]
# Example for the first trajectory
# We must apply the transform function to the parameters i.e. pars = transform(pars)
Rt_estimate <-
sircovid::carehomes_Rt(samples_tuned$trajectories$step,
S[, n_steps, ],
samples_tuned$predict$transform(samples_tuned$pars[1, ]))
plot(first_wave_data$date,
Rt_estimate$eff_Rt_all[-1],
type='l', xlab = 'Date', ylab = 'Rt')
This is only for the final trajectory of the first chain. The sircovid::carehomes_Rt_trajectories()
function can be used to conveniently calculate this for all of the chains at once.
pars <- lapply(seq_len(nrow(samples_tuned$pars)),
function(i) samples_tuned$predict$transform(samples_tuned$pars[i, ]))
Rt_estimates <-
sircovid::carehomes_Rt_trajectories(
samples_tuned$trajectories$step,
S,
pars,
initial_step_from_parameters = TRUE,
shared_parameters = FALSE
)
Rt_estimates$date <- Rt_estimates$step * pars[[1]]$dt
saveRDS(Rt_estimates, "Rt_estimates.rds")
We can plot the mean and 95% HPD for any of the sampled parameters. Here is a brief example with Rt
Rt_estimates <- readRDS("Rt_estimates.rds")
dates <- sircovid::sircovid_date_as_date(Rt_estimates$date[, 1])
Rt_mean <- apply(Rt_estimates$eff_Rt_all, 1, mean)
Rt_quantiles <- apply(Rt_estimates$eff_Rt_all, 1, quantile, c(0.025, 0.975))
plot(dates[-1], Rt_mean[-1], type = 'l', xlab = "Date", ylab = "Rt")
lines(dates[-1], Rt_quantiles["2.5%", -1], lty = 2, col = "#999999")
lines(dates[-1], Rt_quantiles["97.5%", -1], lty = 2, col = "#999999")
Once the model has been fitted, as above, you can continue to run it forwards in time. This simulates runs from the model using samples from the posterior distributions for the fitted parameters. There is no further fitting to data, as it is not available, and parameters will remain the same over the whole time period.
You can run as far forward in time as you would like, though of course we would note that long term predictions are likely to be inaccurate, as parameters are expected to change over time in response to various external factors. Here we will attempt to forecast two weeks into the future, and compare with the observed deaths over that period.
# Choose a number of samples from the MCMC chains to use,
# the amount of burnin to remove, and how long to forecast for
control <- list(n_sample = 100, burnin = 2500, forecast_days = 14)
samples_for_prediction <-
mcstate::pmcmc_sample(samples_tuned, control$n_sample, control$burnin)
steps_for_prediction <- seq(samples_for_prediction$predict$step,
length.out = control$forecast_days + 1L,
by = samples_for_prediction$predict$rate)
# Now run predictions, using this sample from the posterior
predicted_trajectories <- mcstate::pmcmc_predict(samples_for_prediction,
steps_for_prediction,
prepend_trajectories = TRUE)
predicted_trajectories$date <- predicted_trajectories$step / predicted_trajectories$rate
We can plot the sampled trajectories and their forecasts along with the death data that was fitted to. The break between nowcasting and forecasting is shown here by the vertical dashed line.
# This has dimension (model partitions, samples, steps) = (49, 20, 73)
# Extract the deaths as before
total_steps <- length(predicted_trajectories$step) - 1L
deaths <- array(dim = c(control$n_sample, total_steps))
for (i in 1:control$n_sample) {
cum_deaths <- predicted_trajectories$state[c("deaths_hosp", "deaths_comm"), i, ]
deaths_sample <- rowSums(apply(cum_deaths, 1, diff))
deaths[i, ] <- deaths_sample
}
# Plot - the first value is removed as it is the earliest start date, not
# the fitted start date
par(mar = c(4.1, 5.1, 0.5, 0.5), las = 1)
mycol <- rgb(140, 140, 217, maxColorValue = 255, alpha = 90)
matplot(predicted_trajectories$date[c(-1, -2)],
t(deaths[, -1]), type = "l",
xlab = "Date", ylab = "Deaths",
col = mycol, lty = 1, xaxt="n",
ylim = range(data$deaths))
matpoints(sircovid::as_sircovid_date(data$date)[2:total_steps],
data$deaths[2:total_steps],
pch = 17, cex = 0.75, col = "#000000")
abline(v = sircovid::as_sircovid_date(max(first_wave_data$date)),
lty = 2, col = "#666666")
# Add formatted dates to plot
tics <- seq(1, total_steps, by = 10)
date_format <- format(sircovid::sircovid_date_as_date(predicted_trajectories$date[c(-1, -2)]),
"%d-%b")
axis(1, at = predicted_trajectories$date[c(-1, -2)][tics], labels = date_format[tics])
If you wish to run difference scenarios, or counterfactuals for periods in the past, you can use the run
method of the sircovid::carehomes()
model directly, with your chosen parameters.