vignettes/modelbpp.Rmd

---
title: "Get Started"
author: "Shu Fai Cheung"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Get Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
csl: apa.csl
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width  =  6,
  fig.height =  6,
  fig.align = "center",
  fig.path = ""
)
```

# Introduction

This article illustrates how to
use `model_set()` and other functions
from the package
[`modelbpp`](https://sfcheung.github.io/modelbpp/)
to:

- Fit a set of neighboring models,
  each has one more or one less degree
  of freedom than the original fitted
  model.

- Compute the BIC posterior probability
  (BPP), for each model [@wu_simple_2020].

- Use BPP to assess to what extent
  each model is supported by the
  data, compared to all other models
  under consideration.

# Workflow

1. Fit an SEM model, the
    original model, as usual in `lavaan`.

2. Call `model_set()` on the output from
  Step 1. It will automatically do the
  following:

    - Enumerate the neighboring models
      of the original model.

    - Fit all the models and compute their
      BIC posterior probabilities (BPPs).

3. Examine the results by:

    - printing the output of `model_set()`, or

    - generating a graph using `model_graph()`.

# Example

This is a sample dataset,
`dat_serial_4_weak`,
with four variables:

```{r}
library(modelbpp)
head(dat_serial_4_weak)
```

## Step 1: Fit the Original Model

Fit this original model, a serial
mediation model, with one direct path,
from `x` to `y`:

```{r}
library(lavaan)
mod1 <-
"
m1 ~ x
m2 ~ m1
y ~ m2 + x
"
fit1 <- sem(mod1, dat_serial_4_weak)
```

This the summary:

```{r}
summary(fit1,
        fit.measures = TRUE)
```

```{r echo = FALSE}
tmp <- fitMeasures(fit1)
fit1_cfi <- unname(tmp["cfi"])
fit1_rmsea <- unname(tmp["rmsea"])
```

The fit is acceptable, though the RMSEA
is marginal (CFI = `r formatC(fit1_cfi, 3, format = "f")`,
RMSEA = `r formatC(fit1_rmsea, 3, format = "f")`).

## Step 2: Call `model_set()`

Use `model_set()` to find the
neighboring models differ
from the target model by one on model
degrees of freedom, fit them, and compute
the BPPs:

```{r results = FALSE}
out1 <- model_set(fit1)
```

## Step 3: Examine the Results

To examine the results, just print
the output:

```{r}
out1
```

```{r echo = FALSE}
out1_bpp <- out1$bpp
out1_bpp_2 <- sort(out1_bpp, decreasing = TRUE)[2]
```

The total number of models examined,
including the original model, is `r  length(out1$models)`.

(Note: The total number of models
was 9 in previous version. Please
refer to the Note in the printout for
the changes.)

The BIC posterior probabilities
(BPPs) suggest that
the original model is indeed the most
probable model based on BPP. However,
the model with the direct path
dropped, `drop: y~x`, only
has slightly lower BPP
(`r formatC(out1_bpp_2, 3, format = "f")`)

This suggests that, with equal prior
probabilities [@wu_simple_2020], the
support for the model with the direct
and without the direct path have similar
support from the data based on BPP.

Alternatively, we can use `model_graph()`
to visualize the BPPs and model relations
graphically:

```{r graph1, fig.height = 8, fig.width = 8, eval = FALSE}
graph1 <- model_graph(out1)
plot(graph1)
```

![](graph1-1.png)

Each node (circle) represents one model.
The larger the BPP, the larger the node.

The arrow points from a simpler
model (a model with larger model *df*)
to a more complicated model (a model
with smaller model *df*). If two models
are connected by an arrow, then
one model can be formed from another model
by adding or removing one free parameter
(e.g., adding or removing one path).

## Repeat Step 2 with User Prior

In real studies, not all models are
equally probable before having data
(i.e., not all models have equal
prior probabilities). A researcher
fits the original model because

- its
  prior probability is higher than other
  models, at least other neighboring
  models (otherwise, it is not worthy
  of collecting data assess thi original
  model), but

- the prior probability
  is not so high to eliminate the need
  for collecting data to see how much it is
  supported by data.

Suppose we decide that the prior probability
of the original model is .50: probable, but
still needs data to decide whether it is
empirically supported

This can be done by setting `prior_sem_out`
to the desired prior probability when calling
`model_set()`:

```{r results = FALSE}
out1_prior <- model_set(fit1,
                        prior_sem_out = .50)
```

The prior probabilities of all other
models are equal. Therefore, with
nine models and the prior of the target
model being .50, the prior probability
of the other eight model is (1 - .50) / 8
or .0625.

This is the printout:

```{r}
out1_prior
```

If the prior of the target is set to .50,
then, taking into account both the prior
probabilities and the data, the target
model is strongly supported by the data.

This is the output of `model_graph()`:

```{r out1_prior, fig.height = 8, fig.width = 8, eval = FALSE}
graph1_prior <- model_graph(out1_prior)
plot(graph1_prior)
```

![](out1_prior-1.png)

# Advanced Options

## More Neighboring Models

If desired, we can enumerate models
"farther away" from the target model.
For example, we can set the
maximum difference
in model *df* to 2, to include models
having two more or two less *df* than
the original model:

```{r results = FALSE}
out1_df2 <- model_set(fit1,
                      df_change_add = 2,
                      df_change_drop = 2)
```

This is the printout. By default, when there
are more than 20 models, only the top 20
models on BPP will be printed:

```{r}
out1_df2
```

The number of models examined, including
the original model, is `r  length(out1_df2$models)`.

This is the output of `model_graph()`:

```{r graph1_df2, fig.height = 8, fig.width = 8, eval = FALSE}
graph1_df2 <- model_graph(out1_df2,
                          node_label_size = .75)
plot(graph1_df2)
```

![](graph1_df2-1.png)

Note: Due to the number of nodes,
`node_label_size` is used to reduce
the size of the labels.

## Excluding Some Parameters From the Search

When calling `model_set()`, users can
specify parameters that must be excluded
from the list to be added (`must_not_add`),
or must not be dropped (`must_not_drop`).

For example, suppose it is well
established that `m1~x` exists and should
not be dropped, we can exclude it
when calling `model_set()`

```{r results = FALSE}
out1_no_m1_x <- model_set(fit1,
                          must_not_drop = "m1~x")
```

This is the output:

```{r}
out1_no_m1_x
```

The number of models reduced to `r  length(out1_df2$models)`.

This is the plot:

```{r out1_no_m1_x, ig.height = 8, fig.width = 8, eval = FALSE}
out1_no_m1_x <- model_graph(out1_no_m1_x)
plot(out1_no_m1_x)
```

![](out1_no_m1_x-1.png)

## Models With Constraints

If the original model has equality
constraints, they will be included in
the search for neighboring models,
by default. That is, removing one
equality constraint between two models
is considered as a model with an
increase of 1 *df*.

## Recompute BPPs Without Refitting the Models

Users can examine the impact of the prior
probability of the original model without
refitting the models, by using
the output of `model_set()` as the
input, using the `model_set_out` argument:

```{r results = FALSE}
out1_new_prior <- model_set(model_set_out = out1,
                            prior_sem_out = .50)
```

The results are identical to calling
`model_set()` with the original `lavaan`
output as the input:

```{r}
out1_new_prior
```

## Many Neighboring Models

When a model has a lot of free parameters,
the number of neighboring models can be
large and it will take a long time to
fit all of them. Users can enable
parallel processing by setting `parallel`
to `TRUE` when calling `model_set()`.

## More Options

Please refer to the help page of
`model_set()` for options available.

# Further Information

For further information on other
functions,
please refer to their help pages.

# References