Created 2018-08-17
by Hirotaka Iwaki

Intuitaion: paired t-test uses the difference between BEFORE and AFTER to detect the effect of intervention. All baseline characteristics doesn't matter unless it doesn't change during the period and doesn't have any interaction with the intervention. Conditional linear mmodel is sort of expansion of this concept to repeated observations.

# Problem setting
* 4 observations.
* x = 1, 2, 3, 4
* Y = 9, 13, 32, 41
$$Y = \beta_0 x^0 + \beta_1 x^1 + \beta_2 x^2 + \beta_3 x^3$$

# Estimating betas

    x = 1 -> [1, 1, 1, 1]    
    x = 2 -> [1, 2, 4, 8]    
    x = 3 -> [1, 3, 9,27]    
    x = 4 -> [1, 4,16,64]    

$\beta=[\beta_0,\beta_1,\beta_2,\beta_3]^T$

$$Y = X\beta + \epsilon$$

Usually we use *normal equation* to solve this.

$$X^T Y =X^T X\beta$$

Because $X^T X$ is usually invertable
$$(X^T X)^{-1}X^T Y =(X^T X)^{-1}X^T X\beta$$

So,
$$\beta=(X^T X)^{-1}X^T Y$$

Now let's make design matrix in python

In [2]:
import numpy as np
x= np.array([1,2,3,4])
Xt = np.vstack((x**k for k in range(4)))
X = np.transpose(Xt)
print(X)

[[ 1  1  1  1]
 [ 1  2  4  8]
 [ 1  3  9 27]
 [ 1  4 16 64]]


# QR decomposition

QR decompostion is to solve the normal eqution without calclulating an inverse matrix

Any $m\ \times\ n$ matrix can be decomposed to a $m\ \times\ m$ orthogonal matrix and a $m\ \times\ n$ upper triangular 

$$X=QR$$

Orthogonal means, 
$$Q^T Q= Q Q^{T} = I$$

so 
$$Q^T =Q^{-1}$$


In [4]:
Q,R = np.linalg.qr(X)
print (Q)

[[-0.5         0.67082039  0.5         0.2236068 ]
 [-0.5         0.2236068  -0.5        -0.67082039]
 [-0.5        -0.2236068  -0.5         0.67082039]
 [-0.5        -0.67082039  0.5        -0.2236068 ]]


In [5]:
print(R)

[[ -2.          -5.         -15.         -50.        ]
 [  0.          -2.23606798 -11.18033989 -46.51021393]
 [  0.           0.           2.          15.        ]
 [  0.           0.           0.          -1.34164079]]


In [6]:
np.matmul(Q,R)

array([[ 1.,  1.,  1.,  1.],
       [ 1.,  2.,  4.,  8.],
       [ 1.,  3.,  9., 27.],
       [ 1.,  4., 16., 64.]])

In [7]:
Qt = np.transpose(Q)
np.matmul(Q, Qt)

array([[ 1.00000000e+00, -2.05343406e-17,  8.62196630e-17,
        -5.55111512e-17],
       [-2.05343406e-17,  1.00000000e+00,  0.00000000e+00,
        -1.39367340e-17],
       [ 8.62196630e-17,  0.00000000e+00,  1.00000000e+00,
         8.70292897e-17],
       [-5.55111512e-17, -1.39367340e-17,  8.70292897e-17,
         1.00000000e+00]])

In [8]:
np.matmul(Q, Qt).astype('float32')

array([[ 1.0000000e+00, -2.0534340e-17,  8.6219662e-17, -5.5511151e-17],
       [-2.0534340e-17,  1.0000000e+00,  0.0000000e+00, -1.3936734e-17],
       [ 8.6219662e-17,  0.0000000e+00,  1.0000000e+00,  8.7029287e-17],
       [-5.5511151e-17, -1.3936734e-17,  8.7029287e-17,  1.0000000e+00]],
      dtype=float32)

In [9]:
np.linalg.det(Q)

-0.9999999999999999

# Solve the problem without using inverse matrix

Normal equation
$$X^T Y =X^T X\beta$$

can be converted as follows.    

$X$ to $QR$
$$(QR)^T Y = (QR)^T QR \beta$$

Remove parentheses
$$R^T Q^T Y = R^T Q^T Q R \beta$$

As $Q^T Q$ is $I$,
$$R^T Q^T Y = R^T R \beta$$

Now $R^T$ is a triangular so it has $(R^T)^{-1}$
$$(R^T)^{-1} R^T Q^T Y = (R^T)^{-1} R^T R \beta$$

Finally,
$$Q^T Y = R\beta$$

$R$ is an upper triangular matrix, so we can calculate $\beta$s one by one from the bottom raw.    

(ref.https://en.wikipedia.org/wiki/QR_decomposition, http://metodososcaruis.blogspot.com/)

# How do we use it for longitudinal data?

Get the $Qi$ of which the first column of $Q$ is omitted.    
$Qi$ is not an orthogonal matrix anymore.

In [10]:
Qi = Q[:,1:]
Qit = np.transpose(Qi)
print(Qi)

[[ 0.67082039  0.5         0.2236068 ]
 [ 0.2236068  -0.5        -0.67082039]
 [-0.2236068  -0.5         0.67082039]
 [-0.67082039  0.5        -0.2236068 ]]


In [11]:
np.matmul(Qit, Qi)

array([[ 1.00000000e+00,  0.00000000e+00, -1.53429048e-17],
       [ 0.00000000e+00,  1.00000000e+00,  8.32667268e-17],
       [-1.53429048e-17,  8.32667268e-17,  1.00000000e+00]])

In [12]:
np.matmul(Qi, Qit)

array([[ 0.75, -0.25, -0.25, -0.25],
       [-0.25,  0.75, -0.25, -0.25],
       [-0.25, -0.25,  0.75, -0.25],
       [-0.25, -0.25, -0.25,  0.75]])

Let's see what happens to a vector ($\in R^4$) with a constant number, if multiplied by $Qi^T$

In [13]:
constant = np.array([1,1,1,1])
constant_t = np.transpose(constant)
np.matmul(Qit, constant_t)

array([-1.11022302e-16, -1.66533454e-16,  5.55111512e-17])

This $Qi^T$ transfers constant vactor to $O\ (\in R^3)$    

(Side note: The first column of X is intercept $[1,1,1.1]^T$. Thus, the first column vector of Q are the one with the same number, like $[0.5, 0.5, 0.5, 0.5]^T$. The other column vectors are orthogonal to the first vector, which will transfer the data to a unique space of $ R^{4-1}$. More generally, think that $i$ th subject has $m$ repeated observations. Polynomial matrix is just a mean to get a $m \times m$ full rank matrix which has the first column vector  with all ones($\in R^m$). QR decomposition is a way to obtain $Qi$, which is a $m \times (m-1)$ matrix column vectors orthogonal to that vector.)
     
     
     .
     .
Now, consider we are going to evaluate the association between a SNP and the decline in some test score $Y$ after 65 years old    
Our model is, 
$$ Y = \beta_0 + \beta_1 x_{sex} + \beta_2 x_{education} + \beta_3 x_{SNP} + \beta_4 x_{age-65} + \beta_5 x_{SNP} * x_{age-65}$$

$$ (Yi = Xi\beta) $$

We will think about $i$ th subject who was observed at age 65, 66, 67 and 70    
The observed design matrix is;

In [12]:
Xi = np.array([[1, 1, 17, 2, 0, 0],
               [1, 1, 17, 2, 1, 2],
               [1, 1, 17, 2, 2, 4],
               [1, 1, 17, 2, 5, 10]])

We are interested in the $\beta$ of interaction term ($\beta_5$)    

Multiply the above equation with $Qi^T$

$$ Qi^T Yi = Qi^T Xi\beta $$

Let's call the above as "equation in the transformed space"
.    
the results of $Qi^T Xi$ is;

In [13]:
transXi=np.matmul(Qit, Xi)
np.around(transXi, 3)

array([[-0.   , -0.   , -0.   , -0.   , -3.578, -7.155],
       [-0.   , -0.   , -0.   , -0.   ,  1.   ,  2.   ],
       [ 0.   ,  0.   ,  0.   ,  0.   , -0.447, -0.894]])

So it is apparent that in the transformed spcae, intercept, $x_{sex}$, $x_{education}$, $x_{SNP}$ are all 0s.

Because they don't have any information, we cannot estimate $\beta_{0-3}$.    
But we can still estimate $\beta_4$ and $\beta_5$.    

In other words, all the cross-sectional features are omitted in this space and we don't need to think about it anymore.    
This is particularly helpful when we are not sure about cross-sectional fetures affecting an outcome.     
(we are free from worrring about whether baseline scores, number of kids or any other factors should be included in the model or not.)    
The down side is that the number of observations are reduced by 1 (as 4 -> 3)

The above example is just illustrating the $i$ th subject (with 4 observations).      
For whole data analysis, we will transfer observations of each subject with their own $Qi^T$ and stack them together. Then we can apply an algorithm for linear mixed model only with time varying covariates. 

But how about the correlation within individual? Linear mixed models of $i$ th subject is written as;

$$Yi = Xi\beta + Zb_i + \epsilon$$

When we multiply the equation by $Qi^T$

$$Qi^TYi = Qi^TXi\beta + Qi^TZb_i + \epsilon'$$

Not only time-fixed covariates in X but also time-fixed random effect (a column vector of *Z* with all ones) will be dispeared in the new space. 

**Conditional linear model**    
In many cases, we only add a random intercept in linear mixed model. This is $Z$ with all 1. so the transferred data shouldn't have correlation within individuals any more. It is a linear regression problem.    

**Conditional linear mixed model**    
If we care about, time-varying random effects, such as a random slope (the hypothesis is that the rate of change can also vary among individuals) then we will apply linear mixed model.

(Noe: in both models, we should clarify the algorithm that we don't need intercept. This can be done in R as: lmer(Y~a+b-1+(c+d-1|subject), data=data))      

# Appendix

poly function to get $Q_i$     

(equivalent to R's poly function; https://stackoverflow.com/questions/41317127/python-equivalent-to-r-poly-function)

In [14]:
def poly(x, p):
    x = np.array(x)
    X = np.transpose(np.vstack((x**k for k in range(p+1))))
    return np.linalg.qr(X)[0][:,1:]

In [16]:
poly([1,2,3,4],3)

array([[ 0.67082039,  0.5       ,  0.2236068 ],
       [ 0.2236068 , -0.5       , -0.67082039],
       [-0.2236068 , -0.5       ,  0.67082039],
       [-0.67082039,  0.5       , -0.2236068 ]])

There are many ways to obtain Qi. For example, 

In [23]:
Q2i = poly([1,2,5,11],3)
Q2i

array([[-0.48112522,  0.48324967, -0.53384293],
       [-0.35282516,  0.00507349,  0.79087841],
       [ 0.03207501, -0.81302635, -0.2965794 ],
       [ 0.80187537,  0.32470319,  0.03954392]])

In [24]:
Q2it = np.transpose(Q2i)
transXi2=np.matmul(Q2it, Xi)
np.around(transXi2, 2)

array([[ 0.  ,  0.  ,  0.  ,  0.  ,  3.72,  7.44],
       [-0.  , -0.  , -0.  , -0.  ,  0.  ,  0.01],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.4 ,  0.79]])

Confirm the similar result with different $Qi$ ($Q2i$). $\beta_4$ and $\beta_5$ estimates would be the same. 

# Reference


The American Statistician Vol. 55, No. 1, Feb., 2001    
Conditional Linear Mixed Models    
Geert Verbeke, Bart Spiessens and Emmanuel Lesaffre    
Vol. 55, No. 1 (Feb., 2001), pp. 25-34    
https://www.jstor.org/stable/2685526


Eur J Hum Genet. 2015 Oct; 23(10): 1384–1391.    
Published online 2015 Feb 25. doi:  10.1038/ejhg.2015.1    
PMCID: PMC4592098    
PMID: 25712081    
GWAS with longitudinal phenotypes: performance of approximate procedures    
Karolina Sikorska,1,2 Nahid Mostafavi Montazeri,1,3 André Uitterlinden,2 Fernando Rivadeneira,2 Paul HC Eilers,1 and Emmanuel Lesaffre1,4,*
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4592098/

example R code from the above article.


            id	y	Time
            1	1.12	1
            1	1.14	2
            1	1.16	3
            1	1.2	4
            1	1.26	5
            2	0.95	1
            2	0.83	2
            2	0.65	3
            2	0.49	4
            2	0.34	5

    cond = function(data, vars) {
    data = data[order(data$id), ]
    ### delete missing observations
    data1 = data[!is.na(data$y), ]
    ## do the transformations
    ids = unique(data1$id)
    transdata = NULL
    for(i in ids) {
    xi = data1[data1$id == i, vars]
    xi = as.matrix(xi)
    if(nrow(xi) > 1) {
    A = cumsum(rep(1, nrow(xi)))
    A1 = poly(A, degree = length(A)-1)
    transxi = t(A1) %*% xi
    transxi = cbind(i, transxi)
    transdata = rbind(transdata, transxi)
        }
    }
    transdata = as.data.frame(transdata)
    names(transdata) = c("id", vars)
    row.names(transdata) = 1:nrow(transdata)
    return(transdata)
    }

    trdata = cond(mydata, vars = c("Time", "y"))
    #fit the reduced model and extract random slopes
    mod2 = lmer(y ˜ Time - 1 + (Time-1|id), data = trdata)