Added power

samuel-watson · Mar 16, 2023 · f9e25cd · f9e25cd
1 parent 181fd78
commit f9e25cd
Show file tree

Hide file tree

Showing 8 changed files with 172 additions and 60 deletions.
diff --git a/content/docs/cluster/power/_index.md b/content/docs/cluster/power/_index.md
@@ -4,6 +4,6 @@ bookFlatSection: true
 title: "The `glmmr` packages"
 ---
 
-# `glmmrBase`
-The main package is `glmmrBase`
+# Power calculations
+The `glmmrBase` package provides a range of tools to estimate the power of a design to estimate particular model parameters including both quick approximations using the information matrix or by data simulation and model fitting. The pages in this section discuss [approximations](approximations), [data simulation](data simulation), and include a set of examples and a [comparison](comparisons) with commonly used `design effects'.
 
diff --git a/content/docs/cluster/power/approximations.md b/content/docs/cluster/power/approximations.md
@@ -1,38 +1,126 @@
 ---
-title: Coding the model
+title: Estimating power
 weight: 1
 ---
 
-# Completing the model specification
+# Estimating power
 
-## Supported Families
-The package currently supports the following families and link functions, which are specified using the standard R families in the `stats` package.
+## Calculating the power
+The power for a model parameter $\beta_1$ for a two-sided null hypothesis test at level $\alpha$ is given by
+$$
+Pow(\beta_1) = \Phi \left(\frac{\vert \beta_1 \vert}{SE_{\beta_1}} - \Phi^{-1}\left(1-\frac{\alpha}{2}\right)\right)
+$$
+where $\Phi$ is the Gaussian distribution function and $SE_{\beta_1}$ is the standard error. We can estimate the standard error of the [GLMM](../../glmm/_index.md) with the generalised least squares estimator by calculating the information matrix:
+$$
+M = (X^T \Sigma^{-1} X)^{-1}
+$$
+then the standard error for the $p$th parameter is $\sqrt{M_{p,p}}$. For a Gaussian model with identity link, $\Sigma$ can be calculated exactly as $\Sigma = \sigma^2I + ZDZ^T$, and we can then plug the values in we need to calculate the power. However, for non-Gaussian models, obtaining the covariance matrix is much harder. The package `glmmrBase` provides [approximations](../../glmm/infomat.md) in these cases, which we can use to quickly estimate the power.
 
-| Family | Link functions          | code |
-|--------|-------------------------|------|
-| Gaussian | Identity, log         | `gaussian()` |
-| Binomial | Logit, log, identity  | `binomial()` |
-| Poisson  | Log, Identity         | `poisson()`  |
-| Gamma    | Log, Inverse, Identity| `gamma()`    |
-| Beta     | Logit                 | `Beta()`     |
+## Example: Cluster randomised trial with a Solomon design
+There are a multitude of packages available for R to estimate power for cluster randomised trials. These are shown in the table below. Many of these are relatively limited and provide functionality for specific models with explicit formulae. Because `glmmrBase` can handle a wide range of model specifications and generates the relevant matrices we can calculate the power easily for any type of trial we want under a range of functions. We just have to build the dataset representing the trial. Here, we give an example of a more complex trial design to illustrate.
 
+### The Solomon Design
+A Solomon design trial was developed in the 1940s for use in educational trials. One way of evaluating the performance of an educational intervention is to test students before and after using it. However, the act of completing a test on a particular topic may sensitise students to the material and itself act as an intervention. The Solomon design therefore includes four arms to enable us to account for this interaction. The different arms are shown in the Table. 
 
-The Beta family is provided by the package function `Beta()`, which generates a barebones list specifying the family and link. We use a mean-variance parameterisation of the Beta family. The likelihood is:
+| Trial arm | Pre-test | Intervention | Post test |
+|-----------|----------|--------------|-----------|
+| A         | ✓        |  ✓          | ✓         |
+| B         | ✓        |             | ✓         |
+| C         |          | ✓           | ✓         |
+| D         |          |             |  ✓         |
 
+### Statistical Model
+Two arms are intervention and two are control, with one of each receiving a pre-test, and all receiving the post-test. If we specify a model for individual $i$ in cluster $k$ at time $t$ where $t$ is an indicator for a post-test observation ($t=1$) or pre-test $t=0$;. We use $p_{k}$ to represent whether the cluster is in a pre-test arm ($p_k=1$) or not ($p_k=0$); and $d_k$ is an indicator for if the cluster is in an intervention arm ($d_k=1$) or control arm ($d_k=0$). Then the linear predictor, without random effects, is:
 $$
-f(y_i | \mu_i, \phi) = \frac{y_i^{\mu_i\phi - 1}(1-y_i)^{(1-\mu_i)\phi - 1}}{B(\mu_i\phi, (1-\mu_i)\phi)}
+\eta_{ikt} = \beta_0 + \beta_1 d_k + \beta_2 t + \beta_3 d_k t + \beta_4 p_k t + \beta_5 d_k p_k t
 $$
+The parameter of interest giving the treatment effect is $\beta_5$.
 
-where $B()$ is the Beta function, and we use logit link
-
+We will assume that the test produces a score as the outcome. Many analyses might use a linear model for these scores, however to illustrate a slightly more complex example, we will assume that the score is relatively small, say 10 items, and we assume the participant has an underlying probability of answering a question correctly that we wish to improve. So we specify a Binomial-logistic mixed model with the outcome $y_{ikt}$ being the total number of correct responses:
 $$
-\log\left( \frac{\mu_i}{1-\mu_i} \right) = \mathbf{x}_i\beta + \mathbf{z}_i \mathbf{u}
+y_{ikt} \sim \text{Binomial}(\mu,10)
 $$
-
-We similarly use a mean-variance parameterisation for the Gamma regression function:
-
 $$
-f(y_i | \mu_i, \nu) = \frac{1}{\Gamma(\nu)}\left( \frac{\nu y_i}{\mu_i} \right)^\nu \exp \left( -\frac{\nu y_i}{\mu_i} \right) \frac{1}{y}
+\mu = \frac{1}{1+\exp(-(\eta_{ijk}+\alpha_{kt}))}
 $$
+where $\alpha_{kt}$ represents our random effect. As there are only two periods we will use a "nested exchangable" structure $\alpha_{kt} = \alpha_{1k} + \alpha_{2kt}$ with $\alpha_{1k} \sim N(0,\tau_1^2)$ and $\alpha_{2k} \sim N(0,\tau_2^2)$. 
+
+### Parameter values
+We must select parameter values for the model at which to calculate power. As the model is non-linear, the power depends on the choice of $\beta$ as well as the covaraince parameters. We will initially choose $\beta_0,...,\beta_4 = 0$ so that the mean score is 50\%. We will also choose $\tau_1 = 0.25$ and $\tau_2 = 0.1$, which are relatively high. Finally, we will look at values of $\beta_5$ from zero to two, which gives approximate changes to the mean proportion correctly answers from 0 percentage points (pp) to 40pp. 
+
+In terms of sample size we will have 40 clusters allocated in equal proportion to each trial arm. We will also assume that the number of participants in each cluster is not fixed and is likely to be heterogeneous and betweem two and ten. For the first example, we will generate a random sample for the cluster sizes, but we can average the power over this sample size distribution by re-simulating.
+
+### Setting up the data
+We now have to create a data frame representing the trial described above. This is a little more complex than the standard blocked structure that we can create with the [nelder()](../../glmmr/creating_data.md) function, so we will start from scratch. First, we will generate a data frame describing the pre- and post-test variables and then expand each row to the desired number of clusters:
+```
+df <- expand.grid(pre = c(0,1), treat = c(0,1))
+df$t <- 2
+df1 <- expand.grid(pre = c(1), treat = c(0,1))
+df1$t <- 1
+df$arm <- 1:4
+df1$arm <- c(2,4)
+df <- rbind(df,df1)
+```
+so `df` now has six rows with the variables `pre`, `treat`, `t` and indicating the trial arm. Now we duplicate each row ten times, one for each cluster, and assign cluster labels:
+```
+df <- df[rep(1:6,each=10),]
+df$cl <- c(1:40,11:20,31:40)
+```
+we can then create the different interaction variables described in the model:
+```
+df$intpost <- df$treat*df$post
+df$prepost <- df$pre*df$post
+df$intprepost <- df$pre*df$treat*df$post
+```
+To generate a randomly sampled number of individuals per cluster, we can draw uniformly from the numbers between two and ten and then duplicate each cluster's rows that many times:
+```
+n <- sample(1:10,J*4,replace=TRUE)
+df$n <- n[df$cl]
+df1 <- df[rep(1:nrow(df),df$n),]
+```
+
+### Coding the model
+We can create the model in one command now:
+```
+mod <- Model$new(
+  formula = ~ treat + post + intpost + prepost + intprepost + (1|gr(cl)) + (1|gr(cl,t)),
+  mean = list(parameters = c(0,0,0,0,0,0.1)),
+  covariance = list(parameters = c(0.25,0.10)),
+  family = binomial(),
+  var_par = 10,
+  data = df1
+)
+```
+See [model specification](../../glmmr/model/model_specification.md) for details on writing the formula. The `mean` argument gives the $\beta$ parameters, where we have started with a treatment effect of 0.1. The `covariance` argument gives the parameters $\tau_1$ and $\tau_2$, and we have also specified the family and data. The additional `var_par` here gives the extra scale parameter for the binomial distribution, here representing that there are ten `trials`.
+
+### Estimating power
+Estimating power is simple, we can just use the member function `mod$power()`, which gives:
+```
+mod$power()
+>    Parameter Value        SE      Power
+>1 [Intercept]   0.0 0.1272784 0.02500000
+>2       treat   0.0 0.1796723 0.02500000
+>3        post   0.0 0.1851434 0.02500000
+>4     intpost   0.0 0.2580505 0.02500000
+>5     prepost   0.0 0.1851434 0.02500000
+>6  intprepost   0.1 0.2580851 0.05791791
+```
+So you can see the power for our parameter is a measly 5.8\%. It may be useful to estimate power over a range of values. Rather than re-generating the model each time, we can just update its parameters and re-calculate the power. To do this in a loop and extract the power ready for plotting we can do:
+```
+power.data <- data.frame(b = seq(0,2,by=0.02), power = NA)
+for(i in 1:nrow(power.data)){
+    mod$update_parameters(mean = c(0,0,0,0,0,power.data$b[i]))
+    pwr <- mod$power()
+    power.data$power[i] <- pwr$Power[6]
+}
+```
+and then we can plot these data with `ggplot`:
+```
+require(ggplot2)
+ggplot(data=power.data,aes(x=b,y=power))+
+  geom_line()+
+  theme_bw()
+```
+giving the figure below.
 
-where we also provide logit, inverse, and identity link functions for the specification of $\mu_i$.
+![Power](/power.jpeg)
diff --git a/content/docs/cluster/power/power.jpeg b/content/docs/cluster/power/power.jpeg
diff --git a/content/docs/cluster/power/power_glmmr.md b/content/docs/cluster/power/power_glmmr.md
@@ -3,36 +3,6 @@ title: Coding the model
 weight: 1
 ---
 
-# Completing the model specification
+# Power
 
-## Supported Families
-The package currently supports the following families and link functions, which are specified using the standard R families in the `stats` package.
-
-| Family | Link functions          | code |
-|--------|-------------------------|------|
-| Gaussian | Identity, log         | `gaussian()` |
-| Binomial | Logit, log, identity  | `binomial()` |
-| Poisson  | Log, Identity         | `poisson()`  |
-| Gamma    | Log, Inverse, Identity| `gamma()`    |
-| Beta     | Logit                 | `Beta()`     |
-
-
-The Beta family is provided by the package function `Beta()`, which generates a barebones list specifying the family and link. We use a mean-variance parameterisation of the Beta family. The likelihood is:
-
-$$
-f(y_i | \mu_i, \phi) = \frac{y_i^{\mu_i\phi - 1}(1-y_i)^{(1-\mu_i)\phi - 1}}{B(\mu_i\phi, (1-\mu_i)\phi)}
-$$
-
-where $B()$ is the Beta function, and we use logit link
-
-$$
-\log\left( \frac{\mu_i}{1-\mu_i} \right) = \mathbf{x}_i\beta + \mathbf{z}_i \mathbf{u}
-$$
-
-We similarly use a mean-variance parameterisation for the Gamma regression function:
-
-$$
-f(y_i | \mu_i, \nu) = \frac{1}{\Gamma(\nu)}\left( \frac{\nu y_i}{\mu_i} \right)^\nu \exp \left( -\frac{\nu y_i}{\mu_i} \right) \frac{1}{y}
-$$
-
-where we also provide logit, inverse, and identity link functions for the specification of $\mu_i$.
+Power!
diff --git a/content/docs/glmm/infomat.md b/content/docs/glmm/infomat.md
@@ -0,0 +1,38 @@
+---
+title: Coding the model
+weight: 1
+---
+
+# Completing the model specification
+
+## Supported Families
+The package currently supports the following families and link functions, which are specified using the standard R families in the `stats` package.
+
+| Family | Link functions          | code |
+|--------|-------------------------|------|
+| Gaussian | Identity, log         | `gaussian()` |
+| Binomial | Logit, log, identity  | `binomial()` |
+| Poisson  | Log, Identity         | `poisson()`  |
+| Gamma    | Log, Inverse, Identity| `gamma()`    |
+| Beta     | Logit                 | `Beta()`     |
+
+
+The Beta family is provided by the package function `Beta()`, which generates a barebones list specifying the family and link. We use a mean-variance parameterisation of the Beta family. The likelihood is:
+
+$$
+f(y_i | \mu_i, \phi) = \frac{y_i^{\mu_i\phi - 1}(1-y_i)^{(1-\mu_i)\phi - 1}}{B(\mu_i\phi, (1-\mu_i)\phi)}
+$$
+
+where $B()$ is the Beta function, and we use logit link
+
+$$
+\log\left( \frac{\mu_i}{1-\mu_i} \right) = \mathbf{x}_i\beta + \mathbf{z}_i \mathbf{u}
+$$
+
+We similarly use a mean-variance parameterisation for the Gamma regression function:
+
+$$
+f(y_i | \mu_i, \nu) = \frac{1}{\Gamma(\nu)}\left( \frac{\nu y_i}{\mu_i} \right)^\nu \exp \left( -\frac{\nu y_i}{\mu_i} \right) \frac{1}{y}
+$$
+
+where we also provide logit, inverse, and identity link functions for the specification of $\mu_i$.
diff --git a/content/docs/glmmr/_index.md b/content/docs/glmmr/_index.md
@@ -5,5 +5,16 @@ title: "The `glmmr` packages"
 ---
 
 # `glmmrBase`
-The main package is `glmmrBase`
+The currently available packages are `glmmrBase`, which provides the foundations including model specification, analysis, and estimation, and `glmmrOptim`, which implements a set of optimal design algorithms, with more packages planned in future.
+
+## Installation
+Both packages can be installed from CRAN using, for example
+```
+install.packages("glmmrBase")
+```
+or the most recent release is available from GitHub as:
+```
+devtools::install_github("samuel-watson/glmmrBase")
+```
+
 
diff --git a/content/docs/glmmr/model/model_code.md b/content/docs/glmmr/model/model_code.md
@@ -54,6 +54,12 @@ where we also provide logit, inverse, and identity link functions for the specif
 Data are required to build the matrices and perform the calculations. When we want to analyse a study design prior to data being collected we will have to create our own covariate data for this purpose. We provide several useful functions to create `data.frame`s with different structures to support generating dummy data, see [Generating data](../creating_data).
 
 ## Parameter values
+Parameter values are required to generate the matrices of the GLMM or act as starting values for model fitting algorithms. If they are not specified then default values are used. Parameter values are given separately to the `mean` and `covariance` arguments when specifying a new model object; values are passed as a vector in the order they appear in the formula. For example, for the mean function we might specify (where we include the formula for clarity):
+```
+mean = list(formula = ~x1 + x2,
+            parameters = c(-1,1,2))
+```
+to get the formula (or starting values) $-1 + x_1 + 2x_2$. Similarly, for a covariance function:
 
 
 

diff --git a/content/menu/index.md b/content/menu/index.md
@@ -5,7 +5,7 @@ headless: true
 - [**What is a GLMM?**]({{< relref "/docs/glmm" >}})
 
 - [**glmmr**]({{< relref "/docs/glmmr" >}})
-- [Making a model]({{< relref "/docs/glmmr/model" >}})
+- [Making a model]({{< relref "/docs/glmmr/model/model_specification" >}})
   - [Formula Specification]({{< relref "/docs/glmmr/model/model_specification" >}})
   - [Model Class Specification]({{< relref "/docs/glmmr/model/model_code" >}})
   - [Model Class Functions]({{< relref "/docs/glmmr/model/model_functions" >}})
@@ -15,10 +15,9 @@ headless: true
 
 - [**Cluster Trials**]({{< relref "/docs/cluster" >}})
 - [Power calculations]({{< relref "/docs/cluster/power" >}})
-  - [Approximations]({{< relref "/docs/cluster/power/approximations" >}})
-  - [Calculating in glmmr]({{< relref "/docs/cluster/power/power_glmmr" >}})
+  - [Estimating power]({{< relref "/docs/cluster/power/approximations" >}})
+  - [Simulation]({{< relref "/docs/cluster/power/simulation" >}})
   - [Comparison of methods]({{< relref "/docs/cluster/power/comparisons" >}})
-  - [Simulation]({{< relref "/docs/cluster/power" >}})
 <br />