# Introduction to Longitudinal Data Analysis (LDA)

## Longitudinal Data Set Up

In longitudinal data, we are collecting data (typically health related in Biostatistics) on multiple people over time, for example, weight, heart rate, and systolic blood pressure (SBP), collecting the same data every visit.\
![](./images/001_1.png)

Let $i = 1,...,n$ (subjects) and $j=1,...n_i$ (visits). For longitudinal data analysis, we model one outcome over time, which differs from univariate analysis because we have multiple observations for one outcome. Therefore, LDA is a special case of univariate analysis and we use strategies from multivariate analysis to do LDA.  $${weight}_{ij}={covariates}_{ij} + \epsilon_{ij} = \begin{pmatrix} {weight}_{i1} \\\\ ⋮ \\\\ {weight}_{in_i} \end{pmatrix} $$\
$${heart rate}_{ij}={covariates}_{ij} + \epsilon_{ij}$$\
$${SBP}_{ij}={covariates}_{ij} + \epsilon_{ij}$$\
where $\epsilon_i \sim \text{Normal}(\vec{0}, \Sigma)$. $\Sigma \ne \begin{pmatrix} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \end{pmatrix}$ because this $\Sigma$ assumes that measurements over time from the same person are independent, which is incorrect.

#### Multivariate Longitudinal Analysis

This is not covered in class, but is briefly shown below.\
$$\begin{pmatrix} ⋮ \\\\ {weight}_{ij} \\\\ ⋮ \\\\ {heart rate}_{ij} \\\\  ⋮ \\\\ {SBP}_{ij} \\\\ ⋮ \end{pmatrix}=\begin{pmatrix} ⋮ \\\\ {covariates}_{ij} \\\\ ⋮ \end{pmatrix}+\begin{pmatrix} ⋮ \\\\ {\epsilon}_{ij} \\\\ ⋮ \end{pmatrix}  $$ where $\epsilon_i=\text{Normal}(0,\Sigma)$. $\Sigma$ captures repeated measures of each outcome and correalations of the different outcomes.

## Analysis

### Univariate Analysis

We are analyzing only one outcome at a time.\
$${weight}_i={covariates}_i + \epsilon_i$$\
$${heart rate}_i={covariates}_i + \epsilon_i$$\
$${SBP}_i={covariates}_i + \epsilon_i$$\
where $\epsilon_i \sim \text{Normal}(\vec{0}, \sigma)$. This is **Ordinary Least Squares**.

### Multivariate Analysis

where $\epsilon_i \sim \text{Normal}(\vec{0}, \Sigma)$. $\Sigma$ is the covariance matrix.

If $\Sigma = \begin{pmatrix} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \end{pmatrix}$, then we're assuming the outcomes weight, heart rate, and SBP are independent and we are doing *univariate analysis*.

If $\Sigma = \begin{pmatrix} \sigma_1^2 & \sigma_12 & \sigma_13 \\ \sigma_21 & \sigma_2^2 & \sigma_23 \\ \sigma_31 & \sigma_32 & \sigma_3^2 \end{pmatrix}$ and $\sigma_{ij} \ne 0$, then:

1.  We're assuming weight, heart rate, and SBP are independent.

2.  We're doing multivariate analysis.

3.  $\sigma_{ij}$'s capture the correlations between weight, heart rate, and SBP.

### Why shouldn't we analyze longitudinal data as cross sectional data?

**Reason 1:**\
The data are not independent. Specifically, they are positively correlated, which means observations from the same subject tend to be similar to each other. If the subject has a high value at visit 1, then they are likely to have a relatively high value at visit 2.\
If we use cross sectional techniques like OLS, where $\text{Var}\hat{\beta}=\sigma^2(X^{T}X)^{-1}$, which assumes the data are uncorrelated. When the data are correlated, the variance formula is $\text{Var}\hat{\beta}= (X^{T}X)^{-1}X^T \Sigma X (X^{T}X)^{-1}$. We get the wrong variance estimate, which can lead to incorrect inference conclusions. Confidence intervals miss the truth and you get inflated type 1 error.\
**Reason 2**:\
The longitudinal data has patterns we don't want to miss. Looking at visit 5 only, we cannot say if there is more improvement in treatment compared to placebo. Looking at 1 time point makes it difficult to answer the question.  ![](./images/001_2.png)\
Tanya's drawing shows no improvement (below), but seeing data over time, we can observe patterns that we would miss by observing the data at one time point.\
![](./images/001_3.png)\

## Summary

LDA means analyzing 1 outcome measured at multiple time points.\
Features of this data include data are correalated over time and tend to have patterns over time.\
If we analyze LD using cross sectional techniques, we would get unbaised (?) or consistent (?) estimator $\hat{\beta}$ but the variance of $\hat{\beta}$ would be wrong, leading to incorrect inference. ![](/images/001_3.png)