# 3. Hierarchical Models

We previously modelled tips for each day as completely independent of
any other day. Perhaps this implicit assumption is not correct. How
might we improve our tips model?

One idea: a hierarchical model where each day has a **hyper-prior**
that allows days to be **simultaneously** independent and related.

## 3.1 Sharing information, sharing priors

Hierarchical models are particularly useful when the data has a
natural hierarchy. Some examples are:

- Geographical regions (for example, cities, counties, and states)
- Students within schools
- Patients nested with hospitals
- Repeated measurements **on the same** individuals

In a hierarchical model, the parameters of the priors are themselves
drawn from another (prior) distribution (often called **hyperpriors**).
This structure allows groups to be different but also to share
information between groups while, at the same time, allowing
**differences between groups.**



In [None]:
# Perform our typical data science and PyMC imports

# Import cytoolz for data manipulation
import cytoolz.curried as ctc

# Import PyMC and supporting packages
import arviz as az
import pymc as pm
import preliz as pz

# Import other "data science" packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from torch.utils.model_dump import hierarchical_pickle

In [None]:
# Some additional initialization (for consistency)
az.style.use('arviz-grayscale')
from cycler import cycler
default_cycler = cycler(color=["#000000", "#6a6a6a", "#bebebe", "#2a2eec"])
plt.rc('axes', prop_cycle=default_cycler)
plt.rc('figure', dpi=300)

# Set a consistent random seed
rng = np.random.default_rng(seed=123)

## 3.2 Hierarchical shifts

Proteins are molecules formed by 20 units call amino acids. And each
amino acid can appear in a protein zero or more times.

One way to study proteins is nuclear magnetic resonance. This technique
allows us to measure different quantities such as the chemical shift.

Suppose we want to compare a theoretical method of computing chemical
shifts with experimental observations. This experiment allows us to
evaluate the theory.

Luckily, someone has already performed both the theoretical calculations
and the experiments. We just need to perform the comparison.

The data frame, indicated by the variable, `cs_data`, has four columns:

- A code that identifies the protein
- A second column that names the amino acid
- The third column contains the theoretical chemical shift values
- The fourth column has the experimental values


In [None]:
# Import the data of interest
cs_data = pd.read_csv('data/chemical_shifts_theo_exp.csv')
cs_data

In [None]:
# The difference (`diff`) is our measure of interest
diff = cs_data.theo - cs_data.exp

# Encode the amino acid name as categories
cat_encode = pd.Categorical(cs_data['aa']) # amino acid
idx = cat_encode.codes

# Use the categories as "coordinates"
coords = {'aa': cat_encode.categories}

In [None]:
diff

Now that we have the data, how should we proceed? One option: take the
empirical differences and fit a Gaussian or Student's T model. Because
amino acids are a "family," it would make sense to assume they are all
the same and estimate a single Gaussian for **all** the differences.

But one may argue: Is not each amino acid different from all the others?
Biologically, yes. Chemically, yes, but I am uncertain how different. If
we treat each amino acid differently from all the others, will my model
of reality be better than a single model?

Here are some of the consequences:

| Single model                        | Multiple models                               |
|-------------------------------------|-----------------------------------------------|
| Our estimates will be more accurate | More detailed analysis but with less accuracy |

What should we do?

When in doubt, do **everything**! We will build a hierarchical model.
This choice allows estimates at a group level with a "restriction"
that all items belong to a larger group or population.

However, to see the difference, we will actually build **two models**.

- A non-hierarchical (unpooled) model
- A hierarchical model

For reference, the unpooled model is essentially the same as our
`comparing_groups` model from [chapter 2](./ch02-prog-probabilistically.ipynb).

In [None]:
# Our non-hierarchical model
with pm.Model(coords=coords) as cs_nh: # chemical shifts non-hierarchical
    mu = pm.Normal('mu', mu=00, sigma=10, dims='aa')
    sigma = pm.HalfNormal('sigma', sigma=10, dims='aa')
    y = pm.Normal('y', mu=mu[idx], sigma=sigma[idx], observed=diff)

    idata_cs_nh = pm.sample()

Now we will build the hierarchical version of the model.

We add **two** hyperpriors:

- One for the mean of $\mu$
- One for the standard deviation of $\mu$

We leave $\sigma$ **without** hyperpriors; that is, we assume that the
variance between observed and theoretical values should be unique
**for each group.**. This choice is a **modelling choice**. Remember
that you may face a problem in which independent variances does not
seem reasonable. In this situation, feel free to add a hyperprior
for $\sigma$.

In [None]:
with pm.Model(coords=coords) as cs_h:
    # Hyper priors
    mu_mu = pm.Normal('mu_mu', mu=0, sigma=10)
    mu_sd = pm.HalfNormal('mu_sd', sigma=10)

    # Priors
    mu = pm.Normal('mu', mu=mu_mu, sigma=mu_sd, dims='aa')
    sigma = pm.HalfNormal('sigma', sigma=10, dims='aa')

    # Likelihood
    y = pm.Normal('y', mu=mu[idx], sigma=sigma[idx], observed=diff)
    idata_cs_h = pm.sample()

In [None]:
pm.model_to_graphviz(cs_nh)

In [None]:
pm.model_to_graphviz(cs_h)


We can compare results using the `plot_forest` function of `ArviZ`.
We can pass more than one model to this function.

Plotting multiple models is useful when we want to compare the values
of parameters from different models - like the current example.

The plot includes both the 94% HDI and the inter-quartile range. The
vertical dashed line is the global mean according to the hierarchical
model. This value is close to zero which is expected for theoretical
values faithfully representing experimental ones.

In [None]:
axes = az.plot_forest(
    [idata_cs_nh, idata_cs_h],
    model_names=['non-hierarchical', 'hierarchical'],
    var_names='mu',
    combined=True,
    r_hat=False,
    ess=False,
    figsize=(10, 7),
    colors='cycle',
)
y_lims = axes[0].get_ylim()
axes[0].vlines(idata_cs_h.posterior['mu_mu'].mean(),
               *y_lims,
               color='k',
               ls=':')
plt.show()

The most relevant part of this plot is that the estimates from the
hierarchical model are pulled toward the partially pooled mean;
equivalently, they are shrunken in comparison to the unpooled estimates.
Additionally, the effect is more pronounced for the groups farther away
from the mean (such as "PRO"). That uncertainty is on par with or
smaller than the uncertainty from the non-hierarchical model. The
estimates are partially pooled because we have one estimate for each
group, but estimates for individual groups restrict each other through
the hyperprior. Therefore, we get an intermediate situation between
having a single group with all the chemical shifts together and
having 20 separate groups, one per amino acid.