/
kenward.Rmd
381 lines (298 loc) · 14.1 KB
/
kenward.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
---
title: "Kenward-Roger"
package: mmrm
bibliography: '`r system.file("REFERENCES.bib", package = "mmrm")`'
csl: '`r system.file("jss.csl", package = "mmrm")`'
output:
rmarkdown::html_vignette:
toc: true
vignette: |
%\VignetteIndexEntry{Kenward-Roger}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
Here we describe the details of the calculations for the Kenward-Roger degrees of freedom and the adjusted covariance matrix of the coefficients.
## Model definition
The model definition is the same as what we have in [Details of the model fitting in `mmrm`](algorithm.html).
We are using the same notations.
### Linear model
For each subject $i$ we observe a vector
\[
Y_i = (y_{i1}, \dotsc, y_{im_i})^\top \in \mathbb{R}^{m_i}
\]
and given a design matrix
\[
X_i \in \mathbb{R}^{m_i \times p}
\]
and a corresponding coefficient vector $\beta \in \mathbb{R}^{p}$ we assume
that the observations are multivariate normal distributed:
\[
Y_i \sim N(X_i\beta, \Sigma_i)
\]
where the covariance matrix $\Sigma_i \in \mathbb{R}^{m_i \times m_i}$ is derived
by subsetting the overall covariance matrix $\Sigma \in \mathbb{R}^{m \times m}$
appropriately by
\[
\Sigma_i = G_i^{-1/2} S_i^\top \Sigma S_i G_i^{-1/2}
\]
where the subsetting matrix $S_i \in \{0, 1\}^{m \times m_i}$ contains
in each of its $m_i$ columns contains a single 1 indicating which overall time point
is matching $t_{ih}$.
$G_i \in \mathbb{R}_{\gt 0}^{m_i \times m_i}$ is the diagonal weight matrix.
Conditional on the design matrices $X_i$, the coefficient vector $\beta$ and the
covariance matrix $\Sigma$ we assume that the observations are independent between
the subjects.
We can write the linear model for all subjects together as
\[
Y = X\beta + \epsilon
\]
where $Y \in \mathbb{R}^N$ combines all subject specific observations vectors $Y_i$
such that we have in total $N = \sum_{i = 1}^{n}{m_i}$ observations,
$X \in \mathbb{R}^{N \times p}$ combines all subject specific design matrices
and $\epsilon \in \mathbb{R}^N$ has a multivariate normal distribution
\[
\epsilon \sim N(0, \Omega)
\]
where $\Omega \in \mathbb{R}^{N \times N}$ is block-diagonal containing the
subject specific $\Sigma_i$ covariance matrices on the diagonal and 0 in the
remaining entries.
## Mathematical Details of Kenward-Roger method
The mathematical derivation of the Kenward-Roger method is based on the Taylor expansion of the obtained covariance matrix
of $\hat\beta$ to get a more accurate estimate for it. All these derivations are based on the restricted maximum likelihood.
Following the same [notation](algorithm.html#covariance-matrix-model), the covariance matrix, $\Omega$ can be represented as
a function of covariance matrix parameters $\theta = (\theta_1, \dotsc, \theta_k)^\top$, i.e. $\Omega(\theta)$.
Here after model fitting with `mmrm`, we obtain the estimate $\hat\beta = \Phi(\hat\theta)X^\top\Omega(\hat\theta)^{-1}Y$,
where $\Phi(\theta) = \left\{X^\top \Omega(\theta)^{-1} X\right\} ^{-1}$ is the asymptotic covariance matrix of $\hat\beta$.
However, @kackar1984 suggests that although the $\hat\beta$ is unbiased for $\beta$, the covariance matrix,
$\hat\Phi = \left\{X^\top \hat\Omega X\right\}^{-1}$ can be biased. They showed that the variability of $\hat\beta$ can be partitioned into
two components,
\[
\Phi_A = \Phi + \Lambda
\]
where $\Phi$ is the variance-covariance matrix of the asymptotic distribution of $\hat\beta$ as $n\rightarrow \infty$ as defined above, and $\Lambda$ represents the amount
to which the asymptotic variance-covariance matrix underestimates $\Phi_A$.
Based on a Taylor series expansion around $\theta$, $\Lambda$ can be approximated by
\[
\Lambda \simeq \Phi \left\{\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}(Q_{hj} - P_h \Phi P_j)} }\right\} \Phi
\]
where
\[
P_h = X^\top \frac{\partial{\Omega^{-1}}}{\partial \theta_h} X
\]
\[
Q_{hj} = X^\top \frac{\partial{\Omega^{-1}}}{\partial \theta_h} \Omega \frac{\partial{\Omega^{-1}}}{\partial \theta_j} X
\]
$W$ is the inverse of the Hessian matrix of the log-likelihood function of $\theta$ evaluated at the estimate $\hat\theta$, i.e. the observed Fisher Information matrix as a consistent estimator of the variance-covariance matrix of $\hat\theta$.
Again, based on a Taylor series expansion about $\theta$, @kenward1997 show that
\[
\hat\Phi \simeq \Phi + \sum_{h=1}^k{(\hat\theta_h - \theta_h)\frac{\partial{\Phi}}{\partial{\theta_h}}} + \frac{1}{2} \sum_{h=1}^k{\sum_{j=1}^k{(\hat\theta_h - \theta_h)(\hat\theta_j - \theta_j)\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}}}}
\]
Ignoring the possible bias in $\hat\theta$,
\[
E(\hat\Phi) \simeq \Phi + \frac{1}{2} \sum_{h=1}^k{\sum_{j=1}^k{W_{hj}\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}}}}
\]
Using previously defined notations, this can be further written as
\[
\frac{\partial^2{\Phi}}{\partial{\theta_h}\partial{\theta_j}} = \Phi (P_h \Phi P_j + P_j \Phi P_h - Q_{hj} - Q_{jh} + R_{hj}) \Phi
\]
where
\[
R_{hj} = X^\top\Omega^{-1}\frac{\partial^2\Omega}{\partial{\theta_h}\partial{\theta_j}} \Omega^{-1} X
\]
substituting $\Phi$ and $\Lambda$ back to the $\hat\Phi_A$, we have
\[
\hat\Phi_A = \hat\Phi + 2\hat\Phi \left\{\sum_{h=1}^k{\sum_{j=1}^k{W_{hj}(Q_{hj} - P_h \hat\Phi P_j - \frac{1}{4}R_{hj})} }\right\} \hat\Phi
\]
where $\Omega(\hat\theta)$ replaces $\Omega(\theta)$ in the right-hand side.
Please note that, if we ignore $R_{hj}$, the second-order derivatives, we will get a different estimate of adjusted covariance matrix, and we call this the linear Kenward-Roger approximation.
### Special Considerations for mmrm models
In mmrm models, $\Omega$ is a block-diagonal matrix, hence we can calculate $P$, $Q$ and $R$ for each subject and add them up.
\[
P_h = \sum_{i=1}^{N}{P_{ih}} = \sum_{i=1}^{N}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} X_i}
\]
\[
Q_{hj} = \sum_{i=1}^{N}{Q_{ihj}} = \sum_{i=1}^{N}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} \Sigma_i \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_j} X_i}
\]
\[
R_{hj} = \sum_{i=1}^{N}{R_{ihj}} = \sum_{i=1}^{N}{X_i^\top\Sigma_i^{-1}\frac{\partial^2\Sigma_i}{\partial{\theta_h}\partial{\theta_j}} \Sigma_i^{-1} X_i}
\]
### Derivative of the overall covariance matrix $\Sigma$
The derivative of the overall covariance matrix $\Sigma$ with respect to the variance parameters can be calculated through the derivatives of the Cholesky factor, and hence obtained through automatic differentiation,
following [matrix identities calculations](https://en.wikipedia.org/wiki/Matrix_calculus#Identities_in_differential_form).
\[
\frac{\partial{\Sigma}}{\partial{\theta_h}} = \frac{\partial{LL^\top}}{\partial{\theta_h}} = \frac{\partial{L}}{\partial{\theta_h}}L^\top + L\frac{\partial{L^\top}}{\partial{\theta_h}}
\]
\[
\frac{\partial^2{\Sigma}}{\partial{\theta_h}\partial{\theta_j}} = \frac{\partial^2{L}}{\partial{\theta_h}\partial{\theta_j}}L^\top + L\frac{\partial^2{L^\top}}{\partial{\theta_h}\partial{\theta_j}} + \frac{\partial{L}}{\partial{\theta_h}}\frac{\partial{L^T}}{\partial{\theta_j}} + \frac{\partial{L}}{\partial{\theta_j}}\frac{\partial{L^\top}}{\partial{\theta_h}}
\]
### Derivative of the $\Sigma^{-1}$
The derivatives of $\Sigma^{-1}$ can be calculated through
\[
\frac{\partial{\Sigma\Sigma^{-1}}}{\partial{\theta_h}}\\
= \frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1} + \Sigma\frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} \\
= 0
\]
\[
\frac{\partial{\Sigma^{-1}}}{\partial{\theta_h}} = - \Sigma^{-1} \frac{\partial{\Sigma}}{\partial{\theta_h}}\Sigma^{-1}
\]
### Subjects with missed visits
If a subject do not have all visits, the corresponding covariance matrix can be represented as
\[
\Sigma_i = S_i^\top \Sigma S_i
\]
and the derivatives can be obtained through
\[
\frac{\partial{\Sigma_i}}{\partial{\theta_h}} = S_i^\top \frac{\partial{\Sigma}}{\partial{\theta_h}} S_i
\]
\[
\frac{\partial^2{\Sigma_i}}{\partial{\theta_h}\partial{\theta_j}} = S_i^\top \frac{\partial^2{\Sigma}}{\partial{\theta_h}\partial{\theta_j}} S_i
\]
The derivative of the $\Sigma_i^{-1}$, $\frac{\partial\Sigma_i^{-1}}{\partial{\theta_h}}$ can be calculated through $\Sigma_i^{-1}$ and $\frac{\partial{\Sigma_i}}{\partial{\theta_h}}$
using the above.
### Scenario under group specific covariance estimates
When fitting grouped `mmrm` models, the covariance matrix for subject i of group $g(i)$, can be written as
\[
\Sigma_i = S_i^\top \Sigma_{g(i)} S_i$.
\]
Assume there are $B$ groups, the number of parameters is increased by $B$ times. With the fact that for each group, the corresponding
$\theta$ will not affect other parts, we will have block-diagonal $P$, $Q$ and $R$ matrices, where the blocks are given by:
\[
P_h = \begin{pmatrix}
P_{h, 1} & \dots & P_{h, B} \\
\end{pmatrix}
\]
\[
Q_{hj} = \begin{pmatrix}
Q_{hj, 1} & 0 & \dots & \dots & 0 \\
0 & Q_{hj, 2} & 0 & \dots & 0\\
\vdots & & \ddots & & \vdots \\
\vdots & & & \ddots & \vdots \\
0 & \dots & \dots & 0 & Q_{hj, B}
\end{pmatrix}
\]
\[
R_{hj} = \begin{pmatrix}
R_{hj, 1} & 0 & \dots & \dots & 0 \\
0 & R_{hj, 2} & 0 & \dots & 0\\
\vdots & & \ddots & & \vdots \\
\vdots & & & \ddots & \vdots \\
0 & \dots & \dots & 0 & R_{hj, B}
\end{pmatrix}
\]
Use $P_{j, b}$ to denote the block diagonal part for group $b$, we have
\[
P_{h,b} = \sum_{g(i) = b}{P_{ih}} = \sum_{g(i) = b}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} X_i}
\]
\[
Q_{hj,b} = \sum_{g(i) = b}{Q_{ihj}} = \sum_{g(i) = b}{X_i^\top \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_h} \Sigma_i \frac{\partial{\Sigma_i^{-1}}}{\partial \theta_j} X_i}
\]
\[
R_{hj,b} = \sum_{g(i) = b}{R_{ihj}} = \sum_{g(i) = b}{X_i^\top\Sigma_i^{-1}\frac{\partial^2\Sigma_i}{\partial{\theta_h}\partial{\theta_j}} \Sigma_i^{-1} X_i}
\]
Similarly for $R$.
### Scenario under weighted mmrm
Under weights mmrm model, the covariance matrix for subject $i$, can be represented as
\[
\Sigma_i = G_i^{-1/2} S_i^\top \Sigma S_i G_i^{-1/2}
\]
Where $G_i$ is a diagonal matrix of the weights. Then, when deriving $P$, $Q$ and $R$,
there are no mathematical differences as they are constant, and having $G_i$ in addition to $S_i$
does not change the algorithms and we can simply multiply the formulas with $G_i^{-1/2}$, similarly as above for the subsetting matrix.
## Inference
Suppose we are testing the linear combination of $\beta$, $C\beta$ with $C \in \mathbb{R}^{c\times p}$, we can use the following F-statistic
\[
F = \frac{1}{c} (\hat\beta - \beta)^\top C (C^\top \hat\Phi_A C)^{-1} C^\top (\hat\beta - \beta)
\]
and
\[
F^* = \lambda F
\]
follows exact $F_{c,m}$ distribution.
When we have only one coefficient to test, then $C$ is a column vector containing a single $1$ inside. We can still follow the same calculations as for the multi-dimensional case. This recovers the degrees of freedom results of the Satterthwaite method.
$\lambda$ and $m$ can be calculated through
\[
M = C (C^\top \Phi C)^{-1} C^\top
\]
\[
A_1 = \sum_{h=1}^k{\sum_{j=1}^k{W_{hj} tr(M \Phi P_h \Phi) tr(M \Phi P_j \Phi)}}
\]
\[
A_2 = \sum_{h=1}^k{\sum_{j=1}^k{W_{hj} tr(M \Phi P_h \Phi M \Phi P_j \Phi)}}
\]
\[
B = \frac{1}{2c}(A_1 + 6A_2)
\]
\[
g = \frac{(c+1)A_1 - (c+4)A_2}{(c+2)A_2}
\]
\[
c_1 = \frac{g}{3c+2(1-g)}
\]
\[
c_2 = \frac{c-g}{3c+2(1-g)}
\]
\[
c_3 = \frac{c+2-g}{3c+2(1-g)}
\]
\[E^*={\left\{1-\frac{A_2}{c}\right\}}^{-1}\]
\[V^*=\frac{2}{c}{\left\{\frac{1+c_1 B}{(1-c_2 B)^2(1-c_3 B)}\right\}}\]
\[\rho = \frac{V^{*}}{2(E^*)^2}\]
\[m = 4 + \frac{c+2}{c\rho - 1}\]
\[\lambda = \frac{m}{E^*(m-2)}\]
## Parameterization methods and Kenward-Roger
While the Kenward-Roger adjusted covariance matrix is adopting a Taylor series to approximate the true value, the choices
of parameterization can change the result. In a simple example of unstructured covariance structure, in our current approach,
where the parameters are elements of the Cholesky factor of $\Sigma$ (see [parameterization](covariance.html)), the second-order derivatives
of $\Sigma$ over our parameters, are non-zero matrices. However, if we use the elements of $\Sigma$ as our parameters, then the second-order
derivatives are zero matrices. However, the differences can be small and will not affect the inference.
If you would like to match SAS results for the unstructured covariance model, you can use the linear Kenward-Roger approximation.
## Implementations in `mmrm`
In package `mmrm`, we have implemented Kenward-Roger calculations based on the previous sections.
Specially, for the first-order and second-order derivatives, we use automatic differentiation to obtain
the results easily for non-spatial covariance structure. For spatial covariance structure, we derive the
exact results.
### Spatial Exponential Derivatives
For spatial exponential covariance structure, we have
\[\theta = (\theta_1,\theta_2)\]
\[\sigma = e^{\theta_1}\]
\[\rho = \frac{e^{\theta_2}}{1 + e^{\theta_2}}\]
\[\Sigma_{ij} = \sigma \rho^{d_{ij}}\]
where $d_{ij}$ is the distance between time point $i$ and time point $j$.
So the first-order derivatives can be written as:
\[
\frac{\partial{\Sigma_{ij}}}{\partial\theta_1} = \frac{\partial\sigma}{\partial\theta_1} \rho^{d_{ij}}\\
= e^{\theta_1}\rho^{d_{ij}} \\
= \Sigma_{ij}
\]
\[
\frac{\partial{\Sigma_{ij}}}{\partial\theta_2} = \sigma\frac{\partial{\rho^{d_{ij}}}}{\partial\theta_2} \\
= \sigma\rho^{d_{ij}-1}{d_{ij}}\frac{\partial\rho}{\partial\theta_2}\\
= \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho) \\
= \sigma \rho^{d_{ij}} {d_{ij}} (1-\rho)
\]
Second-order derivatives can be written as:
\[
\frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_1}\\
= \frac{\partial\Sigma_{ij}}{\partial\theta_1}\\
= \Sigma_{ij}
\]
\[
\frac{\partial^2{\Sigma_{ij}}}{\partial\theta_1\partial\theta_2} = \frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_1} \\
= \frac{\partial\Sigma_{ij}}{\partial\theta_2}\\
= \sigma\rho^{d_{ij}-1}{d_{ij}}\rho(1-\rho)\\
= \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)
\]
\[
\frac{\partial^2{\Sigma_{ij}}}{\partial\theta_2\partial\theta_2}\\
= \frac{\partial{\sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)}}{\partial\theta_2}\\
= \sigma\rho^{d_{ij}}{d_{ij}}(1-\rho)(d_{ij} (1-\rho) - \rho)
\]
# References