# Introduction

This notebook is designed to be the "exercise" notebook for you to practice defining hierarchical models. We will do this with the finches dataset again.

In [None]:
import pandas as pd
import pymc3 as pm
from data import load_finches_2012
from utils import despine_traceplot
import arviz as az

%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [None]:
df = load_finches_2012()
df.groupby('species').size()

In [None]:
df.sample(5)

In [None]:
df.groupby('species')['beak_depth'].describe()

In [None]:
fortis_filter = df['species'] == 'fortis'
scandens_filter = df['species'] == 'scandens'
unknown_filter = df['species'] == 'unknown'

**Exercise:** Define a hierarchical model for the finches beak depths. For bonus points, use NumPy-like fancy indexing!

If you'd like a hint, one possible model you can implement is shown below.

![](../images/darwins-finches-hierarchical-model.jpg)

In [None]:
with pm.Model() as beak_depth_model:
    # SD can only be positive, therefore it is reasonable to constrain to >0
    # Likewise for betas.
    sd_hyper = pm.HalfCauchy('sd_hyper', beta=100)
    beta_hyper = pm.HalfCauchy('beta_hyper', beta=100)
    
    # Beaks cannot be of "negative" mean, therefore, HalfNormal is 
    # a reasonable, constrained prior.
    mean = pm.HalfNormal('mean', sd=sd_hyper, shape=(3,))
    sd = pm.HalfCauchy('sd', beta=beta_hyper, shape=(3,))
    nu = pm.Exponential('nu', lam=1/29.) + 1
    
    # Define the likelihood distribution for the data.
    like = pm.StudentT('likelihood', 
                       nu=nu,
                       mu=mean[df['species_enc']], 
                       sd=sd[df['species_enc']], 
                       observed=df['beak_depth'])

Sample from the posterior distribution!

In [None]:
with beak_depth_model:
    trace = pm.sample(2000, nuts_kwargs={'target_accept': 0.95})

Visualize the traceplots to check for convergence.

In [None]:
traces = az.plot_trace(trace, var_names=['mean'])
despine_traceplot(traces)

Visualize the posterior distributions using the `plot_posterior` or `forestplot` functions.

In [None]:
ax1, ax2, ax3 = az.plot_posterior(trace, var_names=['mean'])
ax1.set_title('fortis')
ax2.set_title('scandens')
ax3.set_title('unknown')

Now, repeat the model specification for beak length.

In [None]:
with pm.Model() as beak_length_model:
    # SD can only be positive, therefore it is reasonable to constrain to >0
    # Likewise for betas.
    sd_hyper = pm.HalfCauchy('sd_hyper', beta=100)
    beta_hyper = pm.HalfCauchy('beta_hyper', beta=100)
    
    # Beaks cannot be of "negative" mean, therefore, HalfNormal is 
    # a reasonable, constrained prior.
    mean = pm.HalfNormal('mean', sd=sd_hyper, shape=(3,))
    sd = pm.HalfCauchy('sd', beta=beta_hyper, shape=(3,))
    nu = pm.Exponential('nu', lam=1/29.) + 1
    
    # Define the likelihood distribution for the data.
    like = pm.StudentT('likelihood', 
                       nu=nu,
                       mu=mean[df['species_enc']], 
                       sd=sd[df['species_enc']], 
                       observed=df['beak_length'])

In [None]:
with beak_length_model:
    trace = pm.sample(2000, nuts_kwargs={'target_accept': 0.95})

In [None]:
traces = az.plot_trace(trace)
despine_traceplot(traces)

In [None]:
ax1, ax2, ax3 = az.plot_posterior(trace, var_names=['mean'])
ax1.set_title('fortis')
ax2.set_title('scandens')
ax3.set_title('unknown')

**Discuss:** 

- Are the estimates for the unknown species' beak depth and beak length more reasonable? How so?