In the data file, you will find the following:
- real_urinary_microbiome.csv: The real urinary microbiome dataset from our manuscript "A generalized Bayesian stochastic block model for microbiome community detection" with
$p=99$ taxa and$n=75$ samples (postmenopausal women). - simulation.rdata: the simulated networks in R with
$K=3,6,9$ communities. This was our original simulation data for the manuscript. - simulation2.rdata: contains additional simulated networks in R with
$K=12$ communities. These were added later when we revised our paper for submission to Statistics In Medicine, which included an extension of the original simulation study.
- Save/load the function code in R or download it from GitHub.
- Process the abundance data as described in our paper to estimate the co-occurrence network G. Our network in the real data analysis consisted of species level taxa. The MCLR transformation is available in the SPRING package in R.
- Matrix Q should be from one other taxonomic tree level such as genus.
- Run the code for one value of K and f at a time. The function returns all 1000 iterations for parameters Omega and z. You should discard the first half as burn in. Then, use our formulation from the paper to compute the posterior density, MAP, BIC, etc. To compute ARI for simulation data, use the ARI function in the aricode package in R.
- Sample code for a specified G and Q with 5 communities and our recommended MRF prior setting is given below:
# Step 1. Run the function
set.seed(1)
my_results = Bayesian_SBM_MRF(G = G, Q = Q, k = 5, f = 1)
# Step 2. Results from all 1000 iterations
z = my_results[[1]]
omegas = my_results[[2]]
# Step 3. Discard first half as burn in
z = z[-c(1:500),]
omegas = omegas[-c(1:500),,]