vignettes/C-composition-MMLM.Rmd

---
title: "Multilevel Model with Compositional Outcomes"
output: 
  html_document:
    theme: spacelab
    highlight: kate
    toc: yes
    toc_float: yes
    collapsed: no
    smooth_scroll: no
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: yes
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Multilevel Model with Compositional Outcomes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


In this vignette, we discuss how to specify multilevel models with compositional outcomes using `multilevelcoda`. 
In addition to `multilevelcoda`, we will use `brms` package (to fit models) and
`bayestestR` package (to compute useful indices and compare models). 
We will also attach built in datasets `mcompd` (simulated compositional sleep and wake variables) 
and `sbp` (sequential binary partition).


```r
library(multilevelcoda)
library(brms)
library(bayestestR)

data("mcompd") 
data("sbp") 

options(digits = 3)
```

# Multilevel model with compositional outcomes.
## Computing compositions and isometric log ratio coordinates. 
The ILR coordinates outcomes can be calculated using the `compilr()` functions.


```r
cilr <- compilr(data = mcompd, sbp = sbp,
                parts = c("TST", "WAKE", "MVPA", "LPA", "SB"), idvar = "ID", total = 1440)

head(cilr$TotalILR)
#>        ilr1 ilr2    ilr3  ilr4
#> [1,]  0.287 1.20  0.6270 1.702
#> [2,] -0.472 1.57 -0.8336 0.984
#> [3,] -0.486 1.33  1.3344 2.659
#> [4,] -0.316 1.37 -0.0332 0.551
#> [5,]  0.205 1.43 -0.6893 0.733
#> [6,] -0.446 1.16 -0.0950 0.670
#> attr(,"class")
#> [1] "rmult"
```

## Fitting model

A model with multilevel compositional outcomes is multivariate, as it has multiple ILR coordinate outcomes,each of which is predicted by a set of predictors. 
Our `brms` model can be then fitted using the `brmcoda()` function.


```r
mv <- brmcoda(compilr = cilr,
              formula = mvbind(ilr1, ilr2, ilr3, ilr4) ~ Stress + (1 | ID),
              cores = 8, seed = 123, backend = "cmdstanr")
#> Warning: In the future, 'rescor' will be set to FALSE by default for all models. It is thus recommended to explicitely set 'rescor'
#> via 'set_rescor' instead of using the default.
```

Here is a `summary()` of the model. 
We can see that stress significantly predicted `ilr1` and `ilr2`. 


```r
summary(mv)
#>  Family: MV(gaussian, gaussian, gaussian, gaussian) 
#>   Links: mu = identity; sigma = identity
#>          mu = identity; sigma = identity
#>          mu = identity; sigma = identity
#>          mu = identity; sigma = identity 
#> Formula: ilr1 ~ Stress + (1 | ID) 
#>          ilr2 ~ Stress + (1 | ID) 
#>          ilr3 ~ Stress + (1 | ID) 
#>          ilr4 ~ Stress + (1 | ID) 
#>    Data: tmp (Number of observations: 3540) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Group-Level Effects: 
#> ~ID (Number of levels: 266) 
#>                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(ilr1_Intercept)     0.33      0.02     0.30     0.37 1.00     1203     1547
#> sd(ilr2_Intercept)     0.30      0.01     0.28     0.34 1.00     1097     2116
#> sd(ilr3_Intercept)     0.39      0.02     0.35     0.43 1.00     1594     2478
#> sd(ilr4_Intercept)     0.30      0.02     0.27     0.33 1.00     1687     2483
#> 
#> Population-Level Effects: 
#>                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> ilr1_Intercept    -0.43      0.02    -0.48    -0.38 1.00     1089     1836
#> ilr2_Intercept     1.47      0.02     1.42     1.51 1.00      877     2029
#> ilr3_Intercept    -0.87      0.03    -0.93    -0.82 1.00     1644     2583
#> ilr4_Intercept     0.65      0.02     0.60     0.70 1.00     1948     2670
#> ilr1_Stress       -0.01      0.00    -0.02    -0.00 1.00     6441     3679
#> ilr2_Stress        0.01      0.00     0.00     0.01 1.00     5245     3596
#> ilr3_Stress        0.00      0.00    -0.01     0.01 1.00     6436     3519
#> ilr4_Stress        0.01      0.00    -0.00     0.01 1.00     6209     3303
#> 
#> Family Specific Parameters: 
#>            Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_ilr1     0.44      0.01     0.43     0.45 1.00     5231     3458
#> sigma_ilr2     0.38      0.00     0.37     0.39 1.00     5176     3232
#> sigma_ilr3     0.70      0.01     0.68     0.71 1.00     5339     3388
#> sigma_ilr4     0.53      0.01     0.51     0.54 1.00     5250     3363
#> 
#> Residual Correlations: 
#>                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> rescor(ilr1,ilr2)    -0.54      0.01    -0.57    -0.52 1.00     5202     3296
#> rescor(ilr1,ilr3)    -0.18      0.02    -0.21    -0.14 1.00     4833     3061
#> rescor(ilr2,ilr3)    -0.05      0.02    -0.09    -0.02 1.00     5156     3254
#> rescor(ilr1,ilr4)     0.11      0.02     0.07     0.14 1.00     4910     3445
#> rescor(ilr2,ilr4)    -0.05      0.02    -0.08    -0.01 1.00     5286     3449
#> rescor(ilr3,ilr4)     0.56      0.01     0.53     0.58 1.00     5255     3316
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```

# Bayes Factor for compositional multilevel modelling

We are often interested in whether a predictor significantly predict the overall composition, in addition to the individual ILR coordinates. 
In Bayesian, this can be done by comparing the marginal likelihoods of two models. 
Bayes Factors (BFs) are indices of relative evidence of one model over another. 
In the context of compositional multilevel modelling, Bayes Factors provide two main useful functions:

- Testing single parameters within a model
- Comparing models 

We can utilize Bayes factors to answer the following question: 
*"Which model (i.e., set of composition predictors, expressed as ILRs) is more likely to have produced the observed data?"*

Let's examine whether stress predicts the overall sleep-wake composition.

*Note*: To use Bayes factors, `brmsfit` models must be fitted with an additional non-default argument
`save_pars = save_pars(all = TRUE)`.


```r
# intercept only
mv0 <- brmcoda(compilr = cilr,
               formula = mvbind(ilr1, ilr2, ilr3, ilr4) ~ 1 + (1 | ID),
               iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
               backend = "cmdstanr", save_pars = save_pars(all = TRUE))
#> Warning: In the future, 'rescor' will be set to FALSE by default for all models. It is thus recommended to explicitely set 'rescor'
#> via 'set_rescor' instead of using the default.
# full model
mv <- brmcoda(compilr = cilr,
              formula = mvbind(ilr1, ilr2, ilr3, ilr4) ~ Stress + (1 | ID),
              iter = 6000, chains = 8, cores = 8, seed = 123, warmup = 1000,
              backend = "cmdstanr", save_pars = save_pars(all = TRUE))
#> Warning: In the future, 'rescor' will be set to FALSE by default for all models. It is thus recommended to explicitely set 'rescor'
#> via 'set_rescor' instead of using the default.
```

We can now compare these models with the `bayesfactor_models()` function


```r
bayes_factor(mv$Model, mv0$Model)
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Iteration: 8
#> Iteration: 9
#> Iteration: 1
#> Iteration: 2
#> Iteration: 3
#> Iteration: 4
#> Iteration: 5
#> Iteration: 6
#> Iteration: 7
#> Iteration: 8
#> Iteration: 9
#> Iteration: 10
#> Estimated Bayes factor in favor of mv$Model over mv0$Model: 0.00001
```


With a $BF$ < 1, our data favours the intercept only model, showing that there is 
insufficient evidence for stress predicting the overall sleep-wake composition.

Bayes factors provide a intuitive measure of the strength of evidence of one model over the other
or among different models. Check out the `bayestestR` packages for several other useful functions related to BFs.