This repository contains all the data and files necessary to re-run the analyses in:
Emery JC, Dodd PJ et al. Estimating the contribution of subclinical tuberculosis disease to transmission.
Full details of the study can be found in the main paper and supplementary materials.
Contents of the folders in the repository:
-
data
: Input data -
stan
: Stan model files -
R
: R files to:-
1_odds_ratios
: Calculate odds ratios from household contact data -
2_run_stan
: Perform MCMC runs in stan -
3_rel_inf
: Estimate relative infectiousness from model results -
4_prevalence
: Meta-analyse prevalence survey data (e.g. proportion of prevalent TB that is subclinical) -
5_prop_trans
: Estimate the contribution of subclinical TB to transmission -
6_outputs
: Contruct plots
-
-
interim_outputs
: Intermediate outputs to be used as inputs for further analysis -
outputs
: Plots to be included in the:-
main_paper
: Main paper -
sup_mats
: Supplementary materials
-
Note that by default the output folders interim_outputs
, outputs/main_paper
and outputs/supp_mats
contain a placeholder textfile Placeholder.rtf
to preserve the otherwise empty folders before any outputs are generated.
Note also that hereafter XXXX
denotes either viet
(Viet Nam), phil
(Philippines), bang
(Bangladesh) or act3
(ACT3).
Household contact data (Supplementary Table 1): HHC_data_XXXX.csv
Subclinical and clinical disease durations (derived from Supplementary Table 2): duration_data.csv
Prevalence survey data (Supplementary Table 3): prevalence_survey_data.csv
To perform analyses as per the main paper or supplementary materials:
-
Household contact data is stored in the folder
data
and labelledHHC_data_XXXX.csv
. -
Odds ratios by symptoms or smear status are are estimated with
odds_symp.R
orodds_smear.R
, respectively, in the folderR/1_odds_ratios
. -
Results are saved as
odds_symp.csv
andodds_smear.csv
to the folderinterim_outputs
-
Household contact data is stored in the folder
data
and labelledHHC_data_XXXX.csv
. -
A MCMC run is then executed with
run_stan.R
in the folderR/2_run_stan
using the appropriate household contact data and stan model file from the folderstan
. -
The appropriate data and stan model file are specified in
run_stan.R
usingviet
(Viet Nam),phil
(Philippines),bang
(Bangladesh) oract3
(ACT3). All studies usemodel.stan
except Bangladesh, which usesmodel_bang.stan
since only smear-positive index cases were included. -
Full details of the model fit are saved as
fit_XXXX.rds
to the folderinterim_outputs
. -
The mean and variance of the (log) posterior of the relative hazards are saved as
rel_hazards_XXXX.csv
to the folderinterim_outputs
.
-
Mixed-effect meta-anayses of the relative hazards from subclinical and smear-negative index cases are performed using
rel_inf_s.R
andrel_inf_n.R
in the folderR/3_rel_inf
, respectively. -
In both scripts the mean and variance of the (log) posterior of the relative hazards (
rel_hazards_XXXX.csv
in the folderinterim_outputs
) for all studies are imported. -
In both scripts a mixed-effects meta-analysis is performed with the results for the summary value saved as either
rel_hazards_s.csv
orrel_hazards_n.csv
to the folderinterim_outputs
. -
In the script for the relative hazards from subclinical index cases, the duration of subclinical TB versus clinical TB is used to the estimate the relative infectiousness of subclinical TB for each study separately and the summary value. In the smear-negative case the relative hazards are assumed equal to the relative infectiousness.
-
The results for relative infectiousness for each study and the summary value are saved as
rel_inf_s.csv
andrel_inf_n.csv
to the folderinterim_outputs
.
-
Prevalence survey data is stored in the folder
data
and labelledprevalence_survey_data.csv
. -
This data is then imported to scripts in the folder
R/4_prevalence
that perform mixed-effect meta-anayses of:- The proportion of prevalent TB that is subclinical (
prop_prev_sub.R
) - The proportion of subclincal TB that is smear-positive (
prop_sub_pos.R
) - The proportion of clinical TB that is smear-positive (
prop_clin_pos.R
)
- The proportion of prevalent TB that is subclinical (
-
The results for the summary value are saved as
prop_prev_sub_summ.csv
,prop_sub_pos_summ.csv
andprop_clin_pos_summ.csv
to the folderinterim_outputs
. -
The results for the individual surveys plus summary value are saved as
prop_prev_sub_all.csv
,prop_sub_pos_all.csv
andprop_clin_pos_all.csv
to the folderinterim_outputs
.
-
The proportion of transmission from subclinical TB is estimated for each setting separately using
prop_trans_setting.R
in the folderR/5_prop_trans
. -
Prevalence survey data (
prevalence_survey_data.csv
) is imported from the folderdata
. -
The summary value for the relative hazards from subclinical and smear-negative index cases (
rel_hazards_s.csv
andrel_hazards_n.csv
) are imported from the folderinterim_outputs
. -
The duration of subclinical TB versus clinical TB is again used to the estimate the relative infectiousness of subclinical TB. In the smear-negative case the relative hazards are assumed equal to the relative infectiousness.
-
The proportion of transmission in each setting is then estimated and saved as
prop_trans_setting.csv
to the folderinterim_outputs
. -
A summary value for the proportion of transmission from subclinical TB is then estimated using
prop_trans_summ.R
in the folderR/5_prop_trans.R
. -
The process is the same as the above except that at step 2) the summary values
prop_prev_sub_summ.csv
,prop_sub_pos_summ.csv
andprop_clin_pos_summ.csv
are imported from the folderinterim_outputs
. -
The summary value for the proportion of transmission from subclinical TB is then saved as
prop_trans_summ.csv
to the folderinterim_outputs
.
-
Plots for the main paper are constructed using
outputs_main_paper.R
in the folderR/6_outputs
, which imports all relevant results from the folderinterim_outputs
. All plots are saved to the folderoutputs/main_paper
with the title of their respective label in the main paper (e.g.fig_3A.png
). -
Plots for the supplementary materials are constructed using
outputs_supp_mats.R
in the folderR/6_outputs
, which imports all relevant results from the folderinterim_outputs
. -
The appropriate study is specified in
outputs_supp_mats.R
usingviet
(Viet Nam),phil
(Philippines),bang
(Bangladesh) oract3
(ACT3). -
All plots are saved to the folder
outputs/supp_mats
with the type of plot and study name (e.g.trace_viet.png
) for the Viet Nam trace plot. -
Detailed model results (e.g n_eff, Rhat, mean, mcse, sd and sample quantiles) are viewed in a web-browser using
shinystan
.
Stan is used to perform the MCMC runs. The R packages rstan
and shinystan
are used to interact with stan and analyse the results using R. Meta-analyses are performed using the R package metafor
.
rstan
documentation: https://mc-stan.org/users/interfaces/rstan
shinystan
documentation: https://mc-stan.org/users/interfaces/shinystan
metafor
documentation: https://www.metafor-project.org/doku.php