estRodis is an R package for statistical analyses in the field of infectious disease dynamics. More precisely, for estimating the effective reproduction number, the dispersion parameter and the testing probability from the sequence cluster size distribution. The theoretical background and a use case of the package's functionalities are presented in the paper "Estimating
At the heart of the R-package estRodis are functions to simulate the size distribution of identical sequence clusters and to estimate parameters related to transmission dynamics from the sequence cluster size distribution. The simulation is based on a mathematical model of the size distribution of identical sequence clusters that takes into account the viral transmission process, the mutation of the virus and incomplete case-detection. We described the probability that a case was detected as the product of the probability that a case was confirmed by a test and the probability that the viral genome of a confirmed case is sequenced. There are two Bayesian inference models to estimate the effective reproduction number, the dispersion parameter, the yearly mutation rate and the testing probability to choose from. The first model uses a weakly informative prior distribution for the probability that a case is confirmed by a test, whereas the second model takes a fixed value for this probability as input. Both models use weakly informative prior distributions for the effective reproduction number, the dispersion parameter and the yearly mutation rate and require a fixed value for the probability that the viral genome of a confirmed case is sequenced as input.
The code used for the statistical analysis of the paper "Estimating
Whenever "model one" is mentioned in comments in the code, this refers to the standard model developed in the paper (prior distribution for testing probability) and "model two" refers to the alternative model described in the section "Sensitivity analysis" of the supplementary material (fixed value for testing probability).
R code to install the latest stable release of the estRodis package:
devtools::install_github("mwohlfender/estRodis", ref = "main", force = TRUE)
R code to install the newest development version of the estRodis package:
devtools::install_github("mwohlfender/estRodis@v0.0.1-zeta", ref = "main", force = TRUE)
In the following we present code examples how to use the main functionalities of the R package estRodis.
We simulate 1000 identical sequence clusters.
simulated_clusters <- estRodis_simulate_cluster_sizes(n_clusters = 1000,
max_cluster_size = 2500,
R = 0.8,
k = 0.3,
yearly_mutation_rate = 14,
mean_generation_interval = 5.2,
testing_proba = 0.6,
sequencing_proba = 0.4)
Result:
size | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 19 | 20 | 22 | 24 | 26 |
frequency | 725 | 111 | 59 | 31 | 16 | 13 | 10 | 3 | 6 | 2 | 3 | 4 | 4 | 2 | 4 | 1 | 2 | 1 | 1 | 1 | 1 |
percentage | 0.725 | 0.111 | 0.059 | 0.031 | 0.016 | 0.013 | 0.01 | 0.003 | 0.006 | 0.002 | 0.003 | 0.004 | 0.004 | 0.002 | 0.004 | 0.001 | 0.002 | 0.001 | 0.001 | 0.001 | 0.001 |
We apply model one to estimate the effective reproduction number, the dispersion parameter, the yearly mutation rate and the testing probability from the clusters we simulated in section (D1).
options(mc.cores = parallelly::availableCores())
estRodis_estimate_parameters_one(clusters_size = simulated_clusters$size,
clusters_freq = simulated_clusters$frequency,
sequencing_proba = 0.44)
Result:
Inference for Stan model: estRodis_stan_model_estimate_parameters_one.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
R 0.81 0.00 0.06 0.71 0.76 0.80 0.85 0.95 1171 1
k 0.34 0.00 0.08 0.19 0.28 0.33 0.39 0.53 1813 1
number_yearly_mutations 13.99 0.01 0.49 13.00 13.66 13.99 14.31 14.95 2254 1
testing_proba 0.60 0.01 0.24 0.15 0.41 0.62 0.81 0.99 1061 1
mutation_proba 0.18 0.00 0.01 0.17 0.18 0.18 0.18 0.19 2252 1
detection_proba 0.27 0.00 0.11 0.07 0.18 0.27 0.36 0.43 1061 1
lp__ -1156.67 0.04 1.54 -1160.57 -1157.41 -1156.29 -1155.53 -1154.84 1200 1
Samples were drawn using NUTS(diag_e) at Tue Mar 26 11:01:36 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
We apply model two to estimate the effective reproduction number, the dispersion parameter, the yearly mutation rate and the testing probability from the clusters we simulated in section (D1).
options(mc.cores = parallelly::availableCores())
estRodis_estimate_parameters_two(clusters_size = simulated_clusters$size,
clusters_freq = simulated_clusters$frequency,
testing_proba = 0.55,
sequencing_proba = 0.44)
Result:
Inference for Stan model: estRodis_stan_model_estimate_parameters_two.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
R 0.81 0.00 0.02 0.76 0.79 0.81 0.83 0.86 3181 1
k 0.34 0.00 0.07 0.22 0.29 0.33 0.38 0.51 2630 1
number_yearly_mutations 14.01 0.01 0.51 13.04 13.66 14.00 14.34 15.02 2763 1
mutation_proba 0.18 0.00 0.01 0.17 0.18 0.18 0.18 0.19 2761 1
lp__ -1154.64 0.03 1.22 -1157.85 -1155.19 -1154.32 -1153.76 -1153.24 2102 1
Samples were drawn using NUTS(diag_e) at Tue Mar 26 11:09:22 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The R package estRodis contains nine functions. Short introductions of all functions are provided below, more detailed information on input and output values can be found in the documentation of the respective function within the R package.
Simulate the size of identical sequence clusters.
Simulate a transmission chain.
Create a plot of a transmission chain.
Create a plot of a transmission chain including mutations of the viral genome.
Apply model one to estimate parameters from the sequence cluster size distribution.
Apply model two to estimate parameters from the sequence cluster size distribution.
Determine initial values for model one.
Determine initial values for model two.
Probability density function of a scaled version of the beta distribution.
This distribution is used as prior distribution of the testing probability in model one.