Skip to content

Commit

Permalink
added examples unfinished
Browse files Browse the repository at this point in the history
  • Loading branch information
samuel-watson committed Jul 17, 2023
1 parent f8870aa commit aaa4336
Show file tree
Hide file tree
Showing 8 changed files with 133 additions and 18 deletions.
11 changes: 11 additions & 0 deletions content/docs/cpp/_index.md
@@ -0,0 +1,11 @@
---
weight: 1
bookFlatSection: true
title: "C++ API"
---

# `glmmr` C++ library
The set of packages here can be used as a standalone C++ header-only library for applications outside of R. The repository and details of its use will be added here.



7 changes: 7 additions & 0 deletions content/docs/cpp/cpp_interface.md
@@ -0,0 +1,7 @@
---
title: C++ API
weight: 1
---

# C++
TBC
2 changes: 1 addition & 1 deletion content/docs/glmmr/creating_data.md
Expand Up @@ -5,7 +5,7 @@ weight: 1

# Creating data
The package includes the function `nelder()`, which we use to generate data for the examples below. [Nelder (1965)](https://royalsocietypublishing.org/doi/10.1098/rspa.1965.0012) suggested a simple notation that could express a large variety of different blocked designs. The notation was proposed in the context of split-plot experiments for agricultural research, where researchers often split areas of land into blocks, sub-blocks, and other smaller divisions, and apply different combinations of treatments. However, the notation is useful for expressing a large variety of experimental designs with correlation and clustering including cluster trials, cohort studies, and spatial and temporal
prevalence surveys. We have included the function \code{nelder()} that generates a data frame of a design using the notation.
prevalence surveys. We have included the function `nelder()` that generates a data frame of a design using the notation.

There are two operations:
* `>` (or $\to$ in Nelder's notation) indicates "clustered in".
Expand Down
13 changes: 11 additions & 2 deletions content/docs/glmmr/model/optimal_designs.md
@@ -1,7 +1,16 @@
---
title: Model Class Functions
title: Optimal Experimental Designs
weight: 1
---

# c-Optimal experimental designs
A full description of the algorithms and examples to be added here.
This page describes the combinatorial algorithms described in [Watson, Girling, and Hemming (2023)](https://arxiv.org/abs/2303.07953) and [Watson and Pan (2023)](https://arxiv.org/abs/2207.09183) used to identify c-Optimal experimental designs for experiments with correlated observations. We consider in this example the identification of optimal cluster randomised trial designs, as given in those articles, however more examples will be added later. [Watson, Girling, and Hemming (2023)](https://arxiv.org/abs/2303.07953) summarise and describe a range of other approaches for cluster randomised trials as well.

## What is c-optimality?
Put simply, a c-optimal experimental design is one that minimises the variance of a linear combination of model parameters. Typically, we are only interested in one parameter that represents a treatment effect of interest. As we are considering experiments with correlated observations, we will use a [generalised linear mixed model](../../glmm/_index.md) and consider the generalised least squares estimator of the model fixed effect parameters $\beta$. The variance matrix of the estimates $\hat{\beta}$ is $V = X^T \Sigma^{-1} X$, where $\Sigma^{-1}$ is the covariance matrix of the observations, and so the information matrix is $M = V^{-1}$. The c-optimality criterion for a design $d$ is
$$
g(d) = c^T M_d c
$$
where $c$ is a vector picking out the linear combination of parameters of interest $c^T\beta$.

## What is "a design"?
5 changes: 5 additions & 0 deletions content/menu/index.md
Expand Up @@ -26,6 +26,11 @@ headless: true
- [Marginal standardisation]({{< relref "/other/marginal" >}})
- [Bayesian Survival Analysis]({{< relref "/other/survival" >}})
- [Girling Algorithm]({{< relref "/other/girlingalgo" >}})
- [Optimal design examples]({{< relref "/other/optimal_examples" >}})
<br />

- [**C++ API**]({{< relref "/docs/cpp" >}})
- [C++ library]({{< relref "/docs/cpp/cpp_interface" >}})
<br />

<br />
32 changes: 17 additions & 15 deletions content/other/girlingalgo.md
Expand Up @@ -112,41 +112,43 @@ where we've used the programming syntax $a += b$ to mean $a = a + b$.
# Example with code
We provide an example here on executing the above algorithm using `glmmrBase`. The algorithm isn't exposed to the user in the released package yet, as it hasn't been published and fully tested, but it is still hiding in the package! We will aim to find the optimal weights for cluster-periods in a stepped-wedge cluster trial design with six cluster sequences and seven time periods. The links in the menu to the left provide more information on model specification and other features (which are due to be completed), so the exposition here will be fairly bare bones.

We first generate a dataset with six cluster and seven time periods with one observation per cluster, adding in the treatment effect
We first generate a dataset with six cluster and seven time periods with ten observation per cluster, adding in the treatment effect
```
df <- nelder(~ (cl(6)*t(7)) > ind(1))
df <- nelder(~ (cl(6)*t(7)) > ind(10))
df$int <- 0
df[df$t > df$cl, 'int'] <- 1
```
As the algorithm calculates weights for unique experimental conditions, we could set the number of individuals to one per cluster-period and then indpendently tell the algorithm the total sample size $N$. The alternative, and the approach taken below,
is to generate a design with all the observations as above, and then the function will just use the total number of observations as
$N$.

Now, we will generate a linear mixed model with autoregressive covariance function (although one can use any covariance function with this package). In particular, the covariance function for cluster $k$ and time $t$, if $z_{kt}\mathbf{u} = \alpha_{kt}$ is the random effect term, is:
$$
\text{Cov}(\alpha_{kt},\alpha_{k',t'}) = \begin{cases}
\tau^2 \lambda^{\vert t - t' \vert} & \text{ if } k=k' \\\
0 & \text{ otherwise}
\end{cases}
$$
We will choose covariance parameter values of $\sigma^2 = 1$, $\tau = 0.25$ and $\lambda = 0.8$. The model is set up as:
We will choose covariance parameter values of $\sigma^2 = 1$, $\tau^2 = 0.05$ and $\lambda = 0.8$. The model is set up as:
```
model <- Model$new(
formula = ~ factor(t) + int - 1 + (1|gr(cl)*ar1(t)),
covariance = list(parameters = c(0.25,0.8)),
covariance = list(parameters = c(0.05,0.8)),
family = gaussian(),
data = df
)
```
The default values are $\sigma^2 = 1$ and $\beta = 0$ so these are not directly set here. The syntax for the call to the algorithm is not very user friendly on account of it being hidden. We first have to get the model to generate its internal C++ representation,
which is normally automatic. Then in the call to the `girling_algorithm` function we provide the pointer to this model, a target sample size (100 here, although it currently does nothing), the model marginal variance, the vector $c$ and the tolerance:
The default values are $\sigma^2 = 1$ and $\beta$ is initialised to random values, so these are not directly set here. The Girling Algorithm has now been added to `glmmrOptim`. As with all other algorithms in the package, we first create a design space:
```
ds <- DesignSpace$new(des)
```
And, then specify the option `algo="girling"` in the call to `optim()`:
```
model$.__enclos_env__$private$update_ptr(y = rep(0,model$n()))
w <- glmmrBase:::.girling_algorithm(model$.__enclos_env__$private$ptr,
100,
sigma_sq_ = 1,
C_ = c(rep(0,7),1),
tol_ = 1e-6)
outb <- ds$optimal(m=1,C = list(c(rep(0,5),1)),verbose = TRUE,algo="girling")
```
This takes about 50 milliseconds to run. And produces the weights shown in the figure.
The first argument `m` is redundant so we just set it to one. This takes about 50 milliseconds to run. And produces the weights shown in the figure.

![Optimal design weights](/images/example_girling_plot.jpg)
![Optimal design weights](images/example_girling_plot.jpg)

## Rounding weights
There are several ways of rounding design weights to whole numbers of clusters, which are described in Watson & Pan (2023). The function `apportion` in our `glmmrOptim` package calculates all of the different rounding schemes. It turns out to be very slow for 42 weights because the method proposed by Puckelsheim and Rieder calculates all possible combinations of design points with zero weight and so increases exponentially. For this example, we just use Hamilton's method to round the weights, which seems to perform the best in almost all the tests we have done. This rounding is quite simple; we will create a design with 100 observations:
Expand All @@ -162,4 +164,4 @@ if(sum(des)<n){
```
which produces the totals in the figure below.

![Optimal design weights](/images/example_girling_rounded.jpg)
![Optimal design weights](images/example_girling_rounded.jpg)
81 changes: 81 additions & 0 deletions content/other/optimal_examples.md
@@ -0,0 +1,81 @@
---
title: Optimal Examples
weight: 1
---

# Introduction
This web page accompanies the article "Optimal Study Designs for Cluster Randomised Trials: An Overview of Methods and Results" forthcoming in Statistical Methods in Medical Research. There is an arXiv version of the paper [here](https://arxiv.org/abs/2303.07953). There is more information about model specification and analysis on the other pages on this website, although not all of them are complete yet. The examples below cover combinatorial optimisation algorithms, the conic optimisation methods, and the [Girling algorithm](girlingalgo), which can all be run using the R package `glmmrOptim`. The package can be used to find optimal designs for any study design that can be represented by a generalised linear mixed model. Here, we focus specifically on cluster randomised trials, but other examples are presented elsewhere on this website.

# Data and model specification
The examples in the article all use the "stepped-wedge" design space, which includes $J$ cluster sequences, $J-1$ time periods and where the intervention roll out is staggered over time. The figure below (reproduced from Watson, Girling, and Hemming) shows this design space a "Design Space A".

![Design spaces](images/design_spaces.jpg)

We must first create a data frame that describes the design space or model of interest. We will use seven cluster sequences:
```
require(glmmrOptim)
df <- nelder(formula(~ (cl(7) * t(6)) > ind(10)))
df$int <- 0
df[df$t >= df$cl,'int'] <- 1
```
See [Data specification](../docs/glmmr/creating_data.md) for more information on creating data sets. We can then specify a generalised linear mixed model using the `glmmrBase` model class. For this example we shall use a linear mixed model with nested exchangeable covariance function (EXC2 in the paper), an ICC of 0.05 and a CAC of 0.8. Assuming an individual level variance of $\sigma^2 = 1$, these values give parameter values of $\tau^2 = 0.05*0.8/0.95$ (cluster-level variance) and $\omega^2 = 0.05*0.2/0.95$ (cluster-period level variance). We can then specify:
```
model <- Model$new(
formula = ~ factor(t) + int - 1 + (1|gr(cl)) + (1|gr(cl,t)),
covariance = list(parameters = c(0.05*0.8/0.95,0.05*0.2/0.95)),
family = gaussian(),
data = df
)
```
The default values is $\sigma^2 = 1$ and $\beta$ is initialised to random values, so these are not directly set here.

# Clusters as experimental conditions
All of the `glmmrOptim` algorithms are accessed through a `DesignSpace` class, which holds any number of `Model` objects. When a new `DesignSpace` is created, we provide one or more `Model` objects and the identifiers of the experimental conditions (if this is not provided, it is assumed that each row/observation is an experimental condition). For these examples we will consider that the cluster sequences are the experimental condition
```
ds <- DesignSpace$new(model,experimental_condition = df$cl)
```

## Conic optimisation
As the experimental conditions are uncorrelated, we can use the conic optimisation methods described by [Holland-Letz T, Dette H, and Pepelyshev (2011)](https://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2010.00757.x) and [Sagnol (2011)](https://linkinghub.elsevier.com/retrieve/pii/S0378375810005318). The function will automatically detect that the experimental conditions are uncorrelated and will default to using this method; to use the combinatorial algorithms instead we can set the argument `use_combin=TRUE`. See later sections for details on combinatorial algorithms. The function will also return the set of optimal apportionments using a range of methods, the value of `m` specifies the total number of clusters for these calculations.
```
outw <- ds$optimal(m = 10,C = list(c(rep(0,6),1)),verbose = TRUE)
> outw
$weights
[1] 0.2101400 0.1159425 0.1159449 0.1159452 0.1159449 0.1159425 0.2101400
$designs
$designs$Pukelsheim_1
[1] 2 1 1 2 1 1 2
$designs$Hamilton
[1] 2 1 1 2 1 1 2
$designs$Jefferson
[1] 2 1 1 1 1 1 3
$designs$Webster
[1] 2 1 1 1 1 1 3
```

# Observations as experimental conditions
We now set up a design space where each observation is an experimental condition:
```
ds <- DesignSpace$new(model)
```
The same function `optimal` is used to run the other algorithms. The argument `algo` specifies which algorithm to run, the options are:
- `1` Local search
- `2` Greedy search
- `3` Reverse greedy search
- `"girling"` Girling's algorithm
For the combinatorial algorithms, we can also string together numbers to run one algorithm after another, such as `c(3,1)`.

## Combinatorial algorithm

## Girling algorithm

# Non-linear models

# Small sample bias corrections

# Robust optimal designs
Binary file added static/images/design_spaces.jpg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit aaa4336

Please sign in to comment.