# What are multilevel models and why do we fit them?

### [AMAZING LINK!](http://mfviz.com/hierarchical-models/)

We have been working in independent observations so far.

Now we will be working on data where the observations are correlated to each other.

Study designs that introduce dependency in the data.

1. Multilevel models.

## Overview

**Fitting statistical models to dependent data observations correlated due to feature of study design**
* Several observations collected at one time point from **sampled clusters** of analytic units (neighborhoods, schools, clinics, etc.)
    * Like people from the same neighborhood may share the same socio-economic status
* Several observations collected over time from the **same individuals** in **longitudinal studies**

**These Models need to reflect the correlations!**

So far, all the models we used assumed all the data are independent of each other. Now, we are talking about observations where the data may be correlated. We need to make choices with models so that these correlations are reflected in the models.

## What are Multilevel Models?

* General class of statistical models to **model dependent data** where **observations within a randomly sampled cluster may be correlated** with each other.
* The coefficients in these types of models are allowed to **randomly vary** across **randomly sampled** higher level clusters.
    * So the regression coefficients don't have to be fixed constants that we are trying to estimate but allow them to vary across these higher level units and estimate the variability in these coefficients which may describe relationships.

* For example, in a longitudinal study, time can be a predictor of an outcome of interest.
    * The intercept and slope are allowed to randomly vary across randomly samples subjects
        * This allows us to calculate the variability among subjects or clusters in terms of these coefficients of interest.
    * So, each subject has their own unique intercept and slope

![how-multilevel-models-work](week-3-img/how-multilevel-models-work.png)

### Multilevel model eqn

**The inclusion of random effects of the higher-level randomly sampled clusters allow the coefficients of the multilevel models to randomly vary. We explicitely include additional effects of these higher-level randomly sampled clusters.**

![how-multilevel-models-work](week-3-img/multilevel-eqn.png)

In the level 1 equation, we write *y* as a function of coefficents. These coefficients have a subscript denoted by j. This means no, the regression coefficients are denoted by what the cluster j is referring to. They randomly vary depending on the cluster j. We refer to these coefficients as random coefficients. These are **NOT parameters**, these are **random variables**.

B1j captures the relationship of the predictive variable x with y for cluster j and we still have the error term e associated with observation i within cluster j.

What gives the multilevel model its name is that this equation defines the regression function for the observations in level 1; the values of the dependent variable, that could be subjects in a clustered study, or it could be repeated measurements in a study where we are collecting repeated measurements over time from the same subjects.

In Multilevel models, we then have **equations at level 2 of the data hierarchy** 

Thesea re equations for those random coefficients at level 1.
* We have an unique equation for B*0j* and an unique equation for B*1j*
    * Thats a slope specific to cluster j
        * However, the intercept specific to cluster j is defined by a fixed intercept, B*0* which is our regression parameter.
            * B*0* is the fixed parameter we are trying to estiamte
            * But we add that u term. That is a random variable for random effect
                * That u term allows each cluster j to have its own unique intercept
        * The same thing is true for the slope of cluster j. We add a random effect called u*1j*
            * that random variable allows each cluster j to have a slope that deviates somewhere from the overall fixed slope defined by B*1*
            
![how-multilevel-models-work](week-3-img/multilevel-eqn-2.png)

**The common assumption that we make is that those random variables u follow a normal distribution with an average of 0 so that the average cluster looks like the average fixed effect B*0* but there is variability in those values of u and it's that variance that we are trying to estimate. So we wanna no how variable are those coefficients around the overall coefficients B*0* and B*1***

<br><br>

* Multilevel models are defined by an explicit inclusion of random effects and by including them, we are saying that observations coming from the same cluster are correlated with each other statistically.
    * That's how we model the correlation of these observations
* If we did not include these random effects, we are assuming that the observations from the same cluster are **independent** from one another and that the correlation between these clusters is Zero, which is a very strong assumption when we are working with dependent data. These random effects allow us to model that correlation.
* Accounting for these correlations in our modelling often substentially improves model fit when we work with dependent data.
    * So it is important to consider wether we get significant improvement in model fit when we add these random effects
* Multilevel models also allow us in decomposing unexplained variance in our given outcome into **between** and **within-cluster** variance that isn't accounted for by predictors. So the **random effects we include capture the between-cluster variability** and the **error term in our equation still capture the within-cluster variability.**

* The key question we try to answer with multilevel models is: **How much** of unexplained variance is due to the **between-cluster variance** arises in the intercepts or slopes for the given model? In other words, *How much of the variability is coming from this **between cluster variance** in the intercepts and slopes? **This is the key research question**
    * If we do not care about this variance, we do not need to use multilevel models
    
* **We need explicit research interest in estimating variances of these random coefficients**, if we are not, we should use other models for dependent data

## Why do we fit multilevel models?

### We need all the following points to be true to warrent multilevel modelling.

* **WE NEED a dataset that is organized into clusters** (Clinics, subjects, etc) where there are several **correlated** observations collected from each of those clusters.
    * We have some reason to believe based on the study design that the observations on our dependent variable are going to be correlated within one of these sampled clusters.

* The **clusters themselves need to be randomly sampled** from a larger population of clusters. In other words, we can't treat variables like gender, race, ethnicity as cluster variables, these are group variables and are **NOT** cluster variables where we have all the possible groups represented in one dataset.
    * We randomly sample neighborhoods, or clinics or hospitals, we don't randomly sample male or female, race, etc.
    * These random effects allow us to make inference about that larger population from which the clusters were sampled.
 
* We wish to **explicitly model correlation** of observations within the same cluster.
    * The study design gives rise to this kind of dependency and we want and we want to model that correlation when we fit that statistical model to the data.

* We have **explicit research interest** in estimating between cluster variance in selected regression coefficients in our model.
    * There other models for dependent data that we can use if we are not interested in the between cluster variance.

**Examples**

![how-multilevel-models-work](week-3-img/multilevel-eg.png)

### Advantages over other approaches for dependent data

* When we fit these models, we estimate **one parameter** that represents the variance of a given random coefficient across the clusters, instead of calculating **unique regression coefficients for every possible cluster.**
    * So this purely stratified approach wehre every clister gets their own unique fixed regression coefficient, we just estimate one parameter that describes the variance of those random effects
    * This makes it way more efficient especially when we have a large number of clusters.

* Clusters with smaller sample sizes do not have as pronounced of an effect on variance estimate as larger clusters; their effects shring towards the overall mean of the outcome when using these random effects. This is called **shrinkage**
    * This really matters when a lot of clusters have smaller sample sizes as we don't want them to have a large influence on the overall variance.

### Example

**With multilevel models, we estimate the variance in a given random coefficient across these higherlevel clusters and when we do that we can add cluster-level predictors to the level 2 equations for the random coefficients. We do this to explain variance in those random effects**

![how-multilevel-models-work](week-3-img/example-1.png)


Lets take a longitudinal example: <br>
y = outcome <br>
x = predictor of interest, in this case, age <br>
t = it's a subscript that represents the time frame in which the measurement was collected <br> 
i = sunject that is repeatedly measured. <br>
We have unique intercepts and unique coefficients in this model for each subject.

At level 2, we have added a subject level predictor T with a subscript i. We can use T to explain variability in those u just like we would in any other linear regression model.
* We can think of those level 2 equations as mini regression models for those random coefficients. So by adding t and its corosponding regression parameter to B*01* or B*11*, we are trying to explain variance in those random coefficients, the between cluster variance.

![how-multilevel-models-work](week-3-img/example-2.png)

This is an unique feature of Multilevel models. Once we estimate B*01* and B*11*, and test hypothesis about those parameters, if those parameters are significant, that means we are explaining some of the between cluster variance. So we can make statements like, "45% of the --read the image--" <br>
This is an unique advantage of multilevel models. **We can make inference about how much variance in the random effects gets explained by this higher level covariance.**

![how-multilevel-models-work](week-3-img/what-next-1.png)


For which of the following data structures might a multilevel model be appropriate (depending on the research objectives of the analyst)? Select all that apply.


1. A data set of patients nested within hospitals.

is selected.This is correct.
Answers: a) and c). In a data set of patients nested within hospitals, observations on a dependent variable of interest may be correlated within a hospital, and this correlation could be modeled using random hospital effects. With a simple random sample from a single school, there is no higher level of clustering in the observations. In the repeated measures study, observations from the same person over time (especially those related to weight loss) will tend to be correlated with each other, and this correlation could also be modeled with random person effects. Finally, if each neighborhood is only measured once, and there are no higher levels of clustering of neighborhoods, then there is no need to model any correlations of the neighborhood observations within higher level units.


2. A simple random sample of students from a single local high school.

Un-selected is correct 
is not selected.This is correct.

3. Repeated measurements of weights over time from multiple people in a weight loss study.

Correct 
Answers: a) and c). In a data set of patients nested within hospitals, observations on a dependent variable of interest may be correlated within a hospital, and this correlation could be modeled using random hospital effects. With a simple random sample from a single school, there is no higher level of clustering in the observations. In the repeated measures study, observations from the same person over time (especially those related to weight loss) will tend to be correlated with each other, and this correlation could also be modeled with random person effects. Finally, if each neighborhood is only measured once, and there are no higher levels of clustering of neighborhoods, then there is no need to model any correlations of the neighborhood observations within higher level units.

4. A data set of neighborhoods with a dependent variable only measured once for each neighborhood.

Un-selected is correct 

# Likelihood Ratio Tests for Fixed Effects and Variance Components
## Hypothesis Testing for Multilevel Regression Models
* When fitting multilevel models, many hypothesis tests regarding the parameters are based on comparisons of competing models
* **Reference model:** A “full” or “saturated” model containing all parameters of interest; different from the null model where only intercepts are included
* **Nested model:** A “reduced” model, where some of the parameters in the reference model are constrained to zero 

## Likelihood Ratio Testing
* Compared to the reference model, does fitting the nested model substantially drop the likelihood (or make the observed data seem less likely)?
* Likelihood Ratio Tests: -2**ML** log-likelihoods

Fixed Effects

Covariance Parameters

## Likelihood Ratio Tests for Fixed Effects
* Simple idea, but assumes “large” sample of clusters and “large” samples within clusters
* **Null hypothesis:** selected fixed effects are all equal to zero (not important predictors)
* **Test statistic:** difference in final -2 ML log-likelihoods between nested and reference models (nested model has some fixed effects set to zero)
* Refer difference to chi-square distribution with q degrees of freedom, where q is difference between two models in number of fixed effects estimated
## Likelihood Ratio Tests for (Co)variance Parameters
* **If the null hypothesis does not specify that a covariance parameter is equal to the boundary of its parameter space (e.g., a variance is equal to zero):**
    * Same approach as for fixed effects, using restricted maximum likelihood

* **If testing that a variance is equal to zero (e.g., the variance of a given random effect):**
    * Compute test statistics, but refer to approximate mixture of chi-square distributions

See below for an example using the ESS data

Not widely implemented in software...

## Likelihood Ratio Test Example: Testing for Random Slopes in the ESS Case Study
* Null hypothesis: The variance of the random interviewer effects on the slope of interest is zero (in other words, these random effects on the slope are not needed in the model)
* Alternative hypothesis: The variance of the random interviewer effects on the slope of interest is greater than zero
* First, fit the model WITH random interviewer effects on the slope of interest, using restricted maximum likelihood estimation
-2 REML log-likelihood = 7143.3

* Next, fit the nested model WITHOUT the random interviewer effects on the slope:
-2 REML log-likelihood = 7166.8 (higher value = worse fit!)

* Compute the positive difference in the -2 REML log-likelihood values (“REML criterion”) for the models:
Test Statistic (TS) = 7166.8 – 7143.3 = 23.5

* Refer the TS to a mixture of chi-square distributions with 1 and 2 DF, and equal weight 0.5:  
* (greater than) $> 0.5*(1-pchisq(23.5,1)) + 0.5*(1-pchisq(23.5,2))**

[1] 4.569231e-06

p < 0.001 (Reject the null hypothesis that the variance of the random interviewer effects on the slope is zero; strong evidence of interviewer variance in the slopes!)

* We would follow the same approach when testing the variance of the random interviewer effects on the intercept
* If we fit a model that ONLY includes random effects of the clusters on the intercept (a random intercept model), we would test that variance by removing the random effects, and referring the same test statistic to a mixture of chi-square distributions with 0 and 1 DF, and equal weight 0.5