# Adventures in Covariance

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pymc3 as pm
import statsmodels.api as smf
import arviz as az
from scipy import stats

import warnings
warnings.filterwarnings("ignore")

### Easy

#### 13E1

Add to the following model varying slopes on the predictor x. 

$$
\begin{align}
y_i &\sim Normal(\mu_i, \sigma) \\
\mu_i &= \alpha_{group[i]} + \beta x_i \\
\alpha_{group} &\sim Normal(\alpha, \sigma_{\alpha}) \\
\alpha &\sim Normal(0, 10) \\
\beta &\sim Normal(0, 1) \\
\sigma &\sim HalfCauchy(0, 2) \\
\sigma_{\alpha} &\sim HalfCauchy(0, 2) \\
\end{align}
$$

#### 13E2

Think up a context in which varying intercepts will be positively correlated with varying slopes. Provide a mechanistic explanation for the correlation.

#### 13E3

When is it possible for a varying slopes model to have fewer effective parameters (as estimated by WAIC or DIC) than the corresponding model with fixed (unpooled) slopes? Explain.

#### 13E4

Fit this multilevel model to the simulated café data: 
$$
\begin{align}
W_i &\sim Normal(\mu_i, \sigma) \\
\mu_i &= \alpha_{café[i]} + \beta_{café[i]}A_i \\
\alpha_{café} &\sim Normal(\alpha, \sigma_{\alpha}) \\
\beta_{café} &\sim Normal(\beta,\sigma_{\beta}) \\
\alpha &\sim Normal(0, 10) \\
\beta &\sim Normal(0, 10) \\
\sigma &\sim HalfCauchy(0, 1) \\
\sigma_{\alpha} &\sim HalfCauchy(0, 1) \\
\sigma_{\alpha} &\sim HalfCauchy(0, 1)
\end{align}
$$


### Medium

#### 13M1

Repeat the café robot simulation from the beginning of the chapter. This time, set `rho` to zero, so that there is no correlation between intercepts and slopes. How does the posterior distribution of the correlation reflect this change in the underlying simulation?

#### 13M2

#### 13M3

Re-estimate the varying slopes model for the UCBadmit data, now using a non-centered parameterization. Compare the efficiency of the forms of the model, using `n_eff`. Which is better? Which chain sampled faster?

#### 13M4

Use WAIC to compare the Gaussian process model of Oceanic tools to the models fit to the same data in Chapter 10. Pay special attention to the effective numbers of parameters, as estimated by WAIC.

### Hard

#### 13H1

Let’s revisit the Bangladesh fertility data, `data(bangladesh)`, from thepractice problems for Chapter 12. Fit a model with both varying intercepts by `district_id` and varying slopes of `urban` by `district_id`. You are still predicting `use.contraception`. Inspect the correlation between the intercepts and slopes. Can you interpret this correlation, in terms of what it tells you about the pattern of contraceptive use in the sample? It might help to plot the mean (or median) varying effect estimates for both the intercepts and slopes, by district. Then you can visualize the correlation and maybe more easily think through what it means to have a particular correlation. Plotting predicted proportion of women using contraception, with urban women on one axis and rural on the other, might also help.

#### 13H2

Varying effects models are useful for modeling time series, as well as spatial clustering. In a time series, the observations cluster by entities that have continuity through time, such as individuals. Since observations within individuals are likely highly correlated, the multilevel structure can help quite a lot. You’ll use the data in `data(Oxboys)`, which is 234 height measurements on 26 boys from an Oxford Boys Club (I think these were like youth athletic leagues?), at 9 different ages (centered and standardized) per boy. You’ll be interested in predicting `height`, using `age`, clustered by `Subject` (individual boy).

Fit a model with varying intercepts and slopes (on age), clustered by Subject. Present and in- terpret the parameter estimates. Which varying effect contributes more variation to the heights, the intercept or the slope?

#### 13H3

Now consider the correlation between the varying intercepts and slopes. Can you explain its value? How would this estimated correlation influence your predictions about a new sample of boys?

#### 13H4

Use `mvrnorm` (in `library(MASS)`) or `rmvnorm` (in `library(mvtnorm)`) to simulate a new sample of boys, based upon the posterior mean values of the parameters. That is, try to simulate varying intercepts and slopes, using the relevant parameter estimates, and then plot the predicted trends of height on age, one trend for each simulated boy you produce. A sample of 10 simulated boys is plenty, to illustrate the lesson. You can ignore uncertainty in the posterior, just to make the problem a little easier. But if you want to include the uncertainty about the parameters, go for it.

Note that you can construct an arbitrary variance-covariance matrix to pass to either `mvrnorm` or `rmvnorm` with something like:
```r
 S <- matrix( c( sa^2 , sa*sb*rho , sa*sb*rho , sb^2 ) , nrow=2 )
```

where `sa` is the standard deviation of the first variable, `sb` is the standard deviation of the second variable, and `rho` is the correlation between them.