## Week 5 Lecture 1 - Conditional Manatees

McElreath's lecture for today: https://www.youtube.com/watch?v=QhHfo6-Bx8o

McElreath's lectures for the whole book are available here: https://github.com/rmcelreath/statrethinking_winter2019

An R/Stan repo of code is available here: https://vincentarelbundock.github.io/rethinking2/

An excellent port to Python/PyMC Code is available here: https://github.com/dustinstansbury/statistical-rethinking-2023

You are encouraged to work through both of these versions to re-enforce what we're doing in class.

In [None]:
# Import python packages
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
import scipy as sp 
import random as rd
import pymc as pm
import arviz as az
from matplotlib import pyplot as plt


# Helper functions
def stdize(x):
    return (x-np.mean(x))/np.std(x)


def indexall(L):
    poo = []
    for p in L:
        if not p in poo:
            poo.append(p)
    Ix = np.array([poo.index(p) for p in L])
    return poo,Ix

## Information-based model comparsion

With the information criteria outlined last week, we can now see how to use this information in practice to help diagnose problems in model specification. This is a really powerful result, so we'll take our time to step through it.

# The Monkies

We'll illustrate how to do this using the primate data set

In [None]:
mdata = pd.read_csv('Primates301.csv')
mdata['Spp'] = (mdata.genus+' '+mdata.species).values
mdata.head()

In [None]:
# Drop rows missing longevity, brain size, or body mass
mdata.dropna(subset=['longevity', 'brain', 'body'], inplace=True)
mdata.shape

In this situation we want to bulid a model looking at the influence of body mass and brain size on lifespan. We can represent the causal model using the `networkx` package:

In [None]:
import networkx as nx

In [None]:
# Create Monkey DAG
mDAG = nx.DiGraph()
#mDAG.add_edges_from([("M", "L"), ("M", "B"), ("B", "L"), ("U", "M"), ("U", "L")])
mDAG.add_edges_from([("M", "L"), ("M", "B"), ("B", "L")])

In [None]:
# Plot DAG
nx.draw_networkx(mDAG, arrows=True)
plt.tight_layout()
plt.savefig('primateDAG.jpg',dpi=300)

Here we're asserting that longer lifespans are caused by being bigger and having bigger brains (clever monkies). We can run three models to see which has the most WAIC-based support:

$$
M_{MB}: \enspace \enspace L \sim N(\beta_0+\beta_M M+\beta_B B, \sigma) \enspace \enspace
$$

$$
M_{M}: \enspace \enspace   L \sim N(\beta_0+\beta_M M, \sigma)
$$

$$
M_{B}: \enspace \enspace  L \sim N(\beta_0+\beta_B B, \sigma)
$$

Looking piecewise at how much better things are when adding each parameter.

In [None]:
# Grab data

# Body mass
M = stdize(np.log(mdata.body.values))
# Brain size
B = stdize(np.log(mdata.brain.values))
# Lifespan
L = stdize(np.log(mdata.longevity.values))
# Ratio of brain size to body size
R = B/M

In [None]:
mdata.body.values

In [None]:
# Full monkey
with pm.Model() as M_MB:
    # Baseline intercept
    β0 = pm.Normal('Intercept', 0, 0.2)
    # Body mass effect
    β1 = pm.Normal('M', 0, 0.5)
    # Brain size
    β2 = pm.Normal('B', 0, 0.5)
    # Linear model
    #μ = pm.Deterministic('mu',β0+β1*M+β2*B)
    μ = β0+β1*M+β2*B
    # Error
    σ = pm.Uniform('SD_obs', 0, 10)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=L)

In [None]:
# Run sampler
with M_MB:
    trace_mb = pm.sample(1000, idata_kwargs={'log_likelihood':True})

In [None]:
# Big monkey
with pm.Model() as M_M:
    # Baseline intercept
    β0 = pm.Normal('Intercept', 0, 0.2)
    # Body mass effect
    β1 = pm.Normal('M', 0, 0.5)
    # Linear model
    #μ = pm.Deterministic('mu',β0+β1*M)
    μ = β0+β1*M
    # Error
    σ = pm.Uniform('SD_obs', 0, 10)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=L)

In [None]:
# Run sampler
with M_M:
    trace_m = pm.sample(1000, idata_kwargs={'log_likelihood':True})

In [None]:
# Smart monkey
with pm.Model() as M_B:
    # Baseline intercept
    β0 = pm.Normal('Intercept', 0, 0.2)
    # Brain size
    β1 = pm.Normal('B', 0, 0.5)
    # Linear model
    #μ = pm.Deterministic('mu',β0+β1*B)
    μ = β0+β1*B
    # Error
    σ = pm.Uniform('SD_obs', 0, 10)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=L)

In [None]:
# Run sampler
with M_B:
    trace_b = pm.sample(1000, idata_kwargs={'log_likelihood':True})

In [None]:
WAIC_compare = az.compare({'Full' : trace_mb,'Big' : trace_m,'Smart' : trace_b}, ic='waic', method='pseudo-BMA', scale='deviance')
WAIC_compare

This table gives lots of useful information for comparision - but also relative WAIC based model weight - which has the interpretation of the WAIC-based probability that Full is the best model (lowest KL-divergence) in the set of models compared. The calcualtion for this is

$$
weight_{i} = \frac{exp(-\frac{1}{2}\Delta_{i})}{\sum_{i=1}^{K}-\frac{1}{2}\Delta_{k})}
$$

where $\Delta_{i}$ is the `elpd_diff` above, the difference in WAIC units between each model and the lowest WAIC model. From the table above we can use the `elpd_diff` values to do the calculation:

In [None]:
(np.exp(-WAIC_compare.elpd_diff)/sum(np.exp(-WAIC_compare.elpd_diff))).round(2)

These are not identical as Arviz uses some other calculation but close enough...

In [None]:
az.plot_compare(WAIC_compare)
plt.savefig('waiccompare.jpg',dpi=300);

So if we look at the WAIC model results, we get evidence that the full model (M_MB) and the smart (M_B) model have equal support, while the body mass model is far worse. What's going on? Well to figure it out let's look at the posteriors from all three models:

In [None]:
az.plot_forest([trace_mb, trace_m, trace_b], model_names=['Full','Big','Smart'], figsize=(10, 5))
plt.axvline(0)
plt.savefig('forestape.jpg',dpi=300);

What you can see is that the effect of body size that is so strong in model `M_M` ('Big') gets negative in the full model (`M_MB`), while the effect of brain size in model `M_B` ('Smart') remains positive (although more uncertain). So why does body mass go negative in the joint model? Well to figure that out, we can look at the pointwise WAIC predictions between the Smart model and the joint model:

In [None]:
# Full model pointwise WAIC estimates
pWAIC_mb = pm.waic(trace_mb, scale='deviance', pointwise=True)
# Brain size model pointwise WAIC estimates
pWAIC_b = pm.waic(trace_b, scale='deviance', pointwise=True)
# Difference between them
dWAIC = pWAIC_mb.waic_i.values-pWAIC_b.waic_i.values

In [None]:
Rplot = R
Rplot[Rplot<0] = min(Rplot[Rplot>0])

In [None]:
# Set figure size
plt.figure(figsize=(10, 5))
# Select species to label
sppx = ['Cebus albifrons','Cebus capucinus', 'Cebus olivaceus',
       'Gorilla gorilla', 'Lepilemur leucopus', 'Cacajao melanocephalus']
sindx = [list(mdata.Spp.values).index(s) for s in sppx]

[plt.scatter(d,l,s=50+r*30, facecolor='blue', alpha=0.3, edgecolor='black') for r,d,l in zip(Rplot,dWAIC,L)];
[plt.text(d,l,s,) for l,d,s in zip(L[sindx],dWAIC[sindx],mdata.Spp.values[sindx])]
plt.text(-0.01,-2.9,'<-- Full model better',horizontalalignment='right', weight='bold')
plt.text(0.01,-2.9,'Smart model better -->', weight='bold')
plt.xlabel('Pointwise difference in WAIC', fontsize=17)
plt.ylabel('log(longevity) (std)', fontsize=17)
plt.axvline(0,linestyle=':',c='black')
plt.axhline(0,linestyle=':',c='black')
plt.savefig('waicprime.jpg',dpi=300);

# Conditionaily

A couple of classic examples of conditioning on a collider can be seen in thinking about manatees


<img src="manatees.jpg" width="600">

Because of the observation of scars on so many manatees, and so many dying from boat strikes, Florida passed laws requring cowlings over the propellers of speadboats to save manatees. However the problem with this - other than helping with nasty scars - is that it doesn't deal with the cause of death in manatees, which is idiots running them over in shallow water and causing massive internal trauma. In other words 


A second favourite example comes from WWII, where [Abraham Wald](https://en.wikipedia.org/wiki/Abraham_Wald) was tasked with figuring out where to place the limited armour that could be added to bombers and other allied airplanes


<img src="mkV.jpg" width="600">

In [a series of memos](http://people.ucsc.edu/~msmangel/Wald.pdf) Wald figured out that rather than putting armour on areas of returning planes that had lots of bullet holes, it was correct to put the armour on areas that did not. 


<img src="holes.png" width="600">

As for the manatees, by conditioning on survivors, the cause of death becomes obscured. 


# Interactions

Interactions measure the influence of various predictors conditional on the other predictors in a model. Interactions can happen with the slope or the intercept or both. 


Let's start with a discrete interaction on a continuous variable. By way of example, let's have a look at the ruggedness of African nations:

In [None]:
adata = pd.read_csv('rugged.csv', sep=";")
adata.dropna(subset=['rgdppc_2000'], inplace=True)
adata['Continent'] = np.array(['Non-African','African'])[adata.cont_africa.values]
adata.head()

In [None]:
adata.columns.values

If we take a look at the economic output of each country (measured by GDP, which is problematic, but anyhow...), the relationship between the ruggedness of the terrain and GDP is positive among African nations while being negative everywhere else

In [None]:
# GDP ratio relative to mean
GDP = np.log(adata.rgdppc_2000.values)/np.mean(np.log(adata.rgdppc_2000.values))
# Ruggedness
RUG = (adata.rugged/max(adata.rugged)).values
meanRUG = np.mean(RUG)
# Africa index - equivlant to indexall(adata['Continent'])
Ia, Continent = pd.factorize(adata['Continent'], sort=True)

In [None]:
Continent

In [None]:
# Grab function to plot a line
from numpy.polynomial.polynomial import polyfit

In [None]:
_, ax = plt.subplots(1,2, figsize=(10,4))

# New data
xnew = np.linspace(0,1,30)
# Fit with polyfit
b0, b1 = polyfit(RUG[Ia==0], GDP[Ia==0], 1)

ax[0].scatter(RUG[Ia==0],GDP[Ia==0])
ax[0].plot(xnew,b0+b1*xnew)
ax[0].set_ylim(0.7, 1.35)
ax[0].set_title('African nations', fontsize=16)
ax[0].set_ylabel('log(GDP) (ratio to mean)', fontsize=16)
ax[0].set_xlabel('Ruggedness (ratio)', fontsize=16)


# Fit with polyfit
b0, b1 = polyfit(RUG[Ia==1], GDP[Ia==1], 1)
ax[1].scatter(RUG[Ia==1],GDP[Ia==1])
ax[1].plot(xnew,b0+b1*xnew)
ax[1].set_ylim(0.7, 1.35)
ax[1].set_title('Non-African nations', fontsize=16)
ax[1].set_xlabel('Ruggedness (ratio)', fontsize=16)
plt.savefig('rugged.jpg',dpi=300);

So to put together an interaction model we can start by guessing at some relatively benign priors and adding them to a model that allows for different intercepts and slopes for African/non-African nations:

In [None]:
# Ruggedness model
with pm.Model(coords = {"Cont": Continent}) as rugged:
    # African/non intercepts
    β0 = pm.Normal('Intercept', 1, 1, dims='Cont')
    # Ruggedness effect
    β1 = pm.Normal('Ruggedness', 0, 1, dims='Cont')
    # Linear model
    μ = β0[Ia]+β1[Ia]*RUG-meanRUG
    # Error
    σ = pm.Exponential('SD_obs', 1)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=GDP)

In [None]:
# Sample from prior predictive distribution
ppd_ = pm.sample_prior_predictive(200, model=rugged)

In [None]:
_, ax = plt.subplots(1,2, figsize=(10,4))

# New data
xnew = np.linspace(0,1,30)

ax[0].scatter(RUG[Ia==0],GDP[Ia==0])

# Sample from prior predictive
[ax[0].plot(xnew,b0+b1*xnew, alpha=0.3, c='black') for b0,b1 in zip(ppd_.prior['Intercept'][0].values.T[1], ppd_.prior['Ruggedness'][0].values.T[1])]

ax[0].set_ylim(0.7, 1.35)
ax[0].set_title('African nations', fontsize=16)
ax[0].set_ylabel('log(GDP) (ratio to mean)', fontsize=16)
ax[0].set_xlabel('Ruggedness (ratio)', fontsize=16)


ax[1].scatter(RUG[Ia==1],GDP[Ia==1])
# Sample from prior predictive
[ax[1].plot(xnew,b0+b1*xnew, alpha=0.3, c='black') for b0,b1 in zip(ppd_.prior['Intercept'][0].values.T[0], ppd_.prior['Ruggedness'][0].values.T[0])]
ax[1].set_ylim(0.7, 1.35)
ax[1].set_title('Non-African nations', fontsize=16)
ax[1].set_xlabel('Ruggedness (ratio)', fontsize=16)
plt.savefig('rugged_ppd.jpg',dpi=300);

Well this is a mess, let's try something tighter:

In [None]:
# Ruggedness model
with pm.Model(coords = {"Continent": Continent}) as rugged:
    # African/non intercepts
    β0 = pm.Normal('Intercept', 1, .1, dims='Continent')
    # Ruggedness effect
    β1 = pm.Normal('Ruggedness', 0, .1, dims='Continent')
    # Linear model
    μ = β0[Ia]+β1[Ia]*RUG-meanRUG
    # Error
    σ = pm.Exponential('SD_obs', 1)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=GDP)

In [None]:
# Sample from prior predictive distribution
ppd_ = pm.sample_prior_predictive(200,model=rugged)

In [None]:
_, ax = plt.subplots(1,2, figsize=(10,4))

# New data
xnew = np.linspace(0,1,30)

ax[0].scatter(RUG[Ia==0],GDP[Ia==0])

# Sample from prior predictive
[ax[0].plot(xnew,b0+b1*xnew, alpha=0.3, c='black') for b0,b1 in zip(ppd_.prior['Intercept'][0].values.T[1], ppd_.prior['Ruggedness'][0].values.T[1])]

ax[0].set_ylim(0.7, 1.35)
ax[0].set_title('African nations', fontsize=16)
ax[0].set_ylabel('log(GDP) (ratio to mean)', fontsize=16)
ax[0].set_xlabel('Ruggedness (ratio)', fontsize=16)


ax[1].scatter(RUG[Ia==1],GDP[Ia==1])
# Sample from prior predictive
[ax[1].plot(xnew,b0+b1*xnew, alpha=0.3, c='black') for b0,b1 in zip(ppd_.prior['Intercept'][0].values.T[0], ppd_.prior['Ruggedness'][0].values.T[0])]
ax[1].set_ylim(0.7, 1.35)
ax[1].set_title('Non-African nations', fontsize=16)
ax[1].set_xlabel('Ruggedness (ratio)', fontsize=16)
plt.savefig('rugged_ppd2.jpg',dpi=300);

Humm, not bad, maybe a little too tight (missing stuff at the bottom), one more try:

In [None]:
# Ruggedness model
with pm.Model(coords = {"Continent": Continent}) as rugged:
    # African/non intercepts
    β0 = pm.Normal('Intercept', 1, 0.1, dims='Continent')
    # Ruggedness effect
    β1 = pm.Normal('Ruggedness', 0, 0.3, dims='Continent')
    # Linear model
    μ = β0[Ia]+β1[Ia]*RUG
    # Error
    σ = pm.Exponential('SD_obs', 1)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=GDP)

In [None]:
# Sample from prior predictive distribution
ppd_ = pm.sample_prior_predictive(200,model=rugged)

In [None]:
_, ax = plt.subplots(1,2, figsize=(10,4))

# New data
xnew = np.linspace(0,1,30)

ax[0].scatter(RUG[Ia==0],GDP[Ia==0])

# Sample from prior predictive
[ax[0].plot(xnew,b0+b1*xnew, alpha=0.3, c='black') for b0,b1 in zip(ppd_.prior['Intercept'][0].values.T[1], ppd_.prior['Ruggedness'][0].values.T[1])]

ax[0].set_ylim(0.7, 1.35)
ax[0].set_title('African nations', fontsize=16)
ax[0].set_ylabel('log(GDP) (ratio to mean)', fontsize=16)
ax[0].set_xlabel('Ruggedness (ratio)', fontsize=16)


ax[1].scatter(RUG[Ia==1],GDP[Ia==1])
# Sample from prior predictive
[ax[1].plot(xnew,b0+b1*xnew, alpha=0.3, c='black') for b0,b1 in zip(ppd_.prior['Intercept'][0].values.T[0], ppd_.prior['Ruggedness'][0].values.T[0])]
ax[1].set_ylim(0.7, 1.35)
ax[1].set_title('Non-African nations', fontsize=16)
ax[1].set_xlabel('Ruggedness (ratio)', fontsize=16)
plt.savefig('rugged_ppd3.jpg',dpi=300);

Ok, seems to get the balance of just a bit wider than is sensible. So let's put some data in and see what we get

In [None]:
with rugged:
    trace_r = pm.sample(1000)

In [None]:
az.plot_posterior(trace_r)
plt.savefig('rugged_posterior.jpg',dpi=300);

In [None]:
_, ax = plt.subplots(1,2, figsize=(10,4))

# New data
xnew = np.linspace(0,1,30)

ax[0].scatter(RUG[Ia==0],GDP[Ia==0])

# Sample from prior predictive
[ax[0].plot(xnew,b0+b1*xnew, alpha=0.01, c='black') for b0,b1 in zip(trace_r.posterior['Intercept'][0].values.T[0], trace_r.posterior['Ruggedness'][0].values.T[0])]

ax[0].set_ylim(0.7, 1.35)
ax[0].set_title('African nations', fontsize=16)
ax[0].set_ylabel('log(GDP) (ratio to mean)', fontsize=16)
ax[0].set_xlabel('Ruggedness (ratio)', fontsize=16)


ax[1].scatter(RUG[Ia==1],GDP[Ia==1])
# Sample from prior predictive
[ax[1].plot(xnew,b0+b1*xnew, alpha=0.01, c='black') for b0,b1 in zip(trace_r.posterior['Intercept'][0].values.T[1], trace_r.posterior['Ruggedness'][0].values.T[1])]
ax[1].set_ylim(0.7, 1.35)
ax[1].set_title('Non-African nations', fontsize=16)
ax[1].set_xlabel('Ruggedness (ratio)', fontsize=16)
plt.savefig('rugged_fits.jpg',dpi=300);

In [None]:
# Net difference in ruggedness effect
Net_rug = trace_r.posterior['Ruggedness'][0].values.T[0]-trace_r.posterior['Ruggedness'][0].values.T[1]

In [None]:
plt.hist(Net_rug)
plt.xlabel('Non-africa θ - Africa θ')
plt.savefig('rugged_net.jpg',dpi=300);

In [None]:
1-sum(Net_rug<0)/len(Net_rug)

So what this says is a couple of things:

1. In flat places, GDP among African nations is a fraction (0.86) of flat Non-African (1.1) nations
2. As ruggedness increases, African nations become wealthier while Non-African nations become poorer
3. There is strong evidence of ruggedness having a more positive slope in African nations

Point (2) bears thinking about. Why does this happen? Well it has a very dark answer that you'll explore in the homework this week.

An important point about interactions is that they are ALWAYS difficult to interpret. One of the most basic problems is that they are statistically but not congnitively symmetric: 

    - *The effect of ruggedness on a nation's GDP depends on what continent it is from*
    - *The effect of continent depends on ruggedness*
    
If you read these carefully, one will make complete sense and the other is nonsense. This is your causal brain understanding that you can't move nations among continents.

# Bloomin' tulips

For a more comprehensive data, let's have a look at a continuous interaction, including combinations of water and shade on tulip blooming. 

First, let's import the data

In [None]:
bdata = pd.read_csv('tulips.csv',sep=';')
bdata.head()

And knowing our flower husbandry, we can readily imagine that the quantities of water and light, as well as their combination, influencing the blooming of flowers. First we standardize the variables:

In [None]:
# Response scaled to proportion of max
B = bdata.blooms.values/max(bdata.blooms.values)
# Water covariate - mean centred
W = bdata.water.values-np.mean(bdata.water.values)
# Shade covariate - mean centred
S = bdata.shade.values-np.mean(bdata.shade.values)

Next we need some priors - what would be sensible? Well now that blooms (B) has been scaled to be between 0 and 1, the intercept should be mostly in this range. To figure out what's within that range, we can first take a stab that the intercept will be at roughly 0.5 when water (W) and shade (S) are at their mean values. For the standard deviation we can then figure out what the cumulative probability is for 0 and 1, given a $N(0.5, \sigma)$ distribtuion, with $\sigma$ found by guessing:

In [None]:
# Start with Normal(0.5, 1) at zero
sp.stats.norm.cdf(0, 0.5, 1)

Well this is too big, more than 30% of the prior would be below zero, maybe we should aim for the conventional 5% (which means 2.5% should be below zero and the other 2.5% above 1). How about $N(0.5, 0.25)$?

In [None]:
# Normal(0.5, 0.25) at zero
sp.stats.norm.cdf(0, 0.5, 0.25)

In [None]:
# Normal(0.5, 0.25) at one
1-sp.stats.norm.cdf(1, 0.5, 0.25)

Looking good - ok onto the slopes. What's reasonable here? Well despite knowing there's likely an interaction effect, our ignorant, slightly wide prior should allow for either water or shade to account for the full 0 to 1 range in blooms. So to do this we can see how wide the range is for each (using the funny `np.ptp()` function in `numpy`:

In [None]:
# Peak to peak range, the name of which comes from min/max calculations of waveforms
np.ptp(W),np.ptp(S)

So over a range of 2 units, we need a prior that can span the full range from 0 to 1. This would imply a slope of -0.5 or 0.5 (i.e. $2*0.5=1$). So a sensible prior for $2SD=0.5$ would seem to be 0.25:

In [None]:
# Normal(0, 0.25) at -0.5
sp.stats.norm.cdf(-0.5, 0, 0.25)

In [None]:
# Normal(0, 0.25) at 0.5
1-sp.stats.norm.cdf(0.5, 0, 0.25)

Perfect, now putting this all together:

In [None]:
# Basic bloom model
with pm.Model() as bloom:
    # African/non intercepts
    β0 = pm.Normal('Intercept', 0.5, 0.25)
    # Water effect
    β1 = pm.Normal('Water', 0, 0.25)
    # Shade effect
    β2 = pm.Normal('Shade', 0, 0.25)
    # Linear model
    μ = β0+β1*W+β2*S
    # Error
    σ = pm.Exponential('SD_obs', 1)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=B)

In [None]:
# Interaction bloom model
with pm.Model() as ibloom:
    # African/non intercepts
    β0 = pm.Normal('Intercept', 0.5, 0.25)
    # Water effect
    β1 = pm.Normal('Water', 0, 0.25)
    # Shade effect
    β2 = pm.Normal('Shade', 0, 0.25)
    # Interaction effect
    β3 = pm.Normal('ShadeWater', 0, 0.25)
    # Linear model
    μ = β0+β1*W+β2*S+β3*S*W
    # Error
    σ = pm.Exponential('SD_obs', 1)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=B)

In [None]:
# Sample from prior predictive distributions
ppd_ib = pm.sample_prior_predictive(200,model=ibloom)

In [None]:
ppd_ib.prior['Water'][0].shape

In [None]:
_, ax = plt.subplots(1,3, figsize=(15,4))

# New data
xnew = np.linspace(-1,1,30)

# Sample from prior predictive
ix = 0
sx = -1
[ax[ix].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(ppd_ib.prior['Intercept'][0].values, ppd_ib.prior['Water'][0].values, ppd_ib.prior['Shade'][0].values, ppd_ib.prior['ShadeWater'][0].values)]
ax[ix].set_ylim(0., 1.)
ax[ix].set_title('Shade='+str(sx), fontsize=16)
ax[ix].set_xlabel('Water', fontsize=16)
ax[ix].set_ylabel('Blooms (scaled)', fontsize=16)


# Sample from prior predictive
ix = 1
sx = 0
[ax[ix].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(ppd_ib.prior['Intercept'][0].values, ppd_ib.prior['Water'][0].values, ppd_ib.prior['Shade'][0].values, ppd_ib.prior['ShadeWater'][0].values)]
ax[ix].set_ylim(0., 1.)
ax[ix].set_title('Shade='+str(sx), fontsize=16)
ax[ix].set_xlabel('Water', fontsize=16)
ax[ix].set_ylabel('Blooms (scaled)', fontsize=16)

# Sample from prior predictive
ix = 2
sx = 1
[ax[ix].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(ppd_ib.prior['Intercept'][0].values, ppd_ib.prior['Water'][0].values, ppd_ib.prior['Shade'][0].values, ppd_ib.prior['ShadeWater'][0].values)]
ax[ix].set_ylim(0., 1.)
ax[ix].set_title('Shade='+str(sx), fontsize=16)
ax[ix].set_xlabel('Water', fontsize=16)
ax[ix].set_ylabel('Blooms (scaled)', fontsize=16);

Well these look like shit, maybe something a bit smaller?

In [None]:
# Interaction bloom model
with pm.Model() as ibloom:
    # African/non intercepts
    β0 = pm.Normal('Intercept', 0.5, 0.25)
    # Water effect
    β1 = pm.Normal('Water', 0, 0.1)
    # Shade effect
    β2 = pm.Normal('Shade', 0, 0.1)
    # Interaction effect
    β3 = pm.Normal('ShadeWater', 0, 0.1)
    # Linear model
    μ = β0+β1*W+β2*S+β3*S*W
    # Error
    σ = pm.Exponential('SD_obs', 1)
    # Likelihood
    Yi = pm.Normal('Yi', μ, σ, observed=B)

In [None]:
# Sample from prior predictive distributions
ppd_ib = pm.sample_prior_predictive(200,model=ibloom)

In [None]:
_, ax = plt.subplots(1,3, figsize=(15,4))

# New data
xnew = np.linspace(-1,1,30)

# Sample from prior predictive
ix = 0
sx = -1
[ax[ix].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(ppd_ib.prior['Intercept'][0].values, ppd_ib.prior['Water'][0].values, ppd_ib.prior['Shade'][0].values, ppd_ib.prior['ShadeWater'][0].values)]
ax[ix].set_ylim(0., 1.)
ax[ix].set_title('Shade='+str(sx), fontsize=16)
ax[ix].set_xlabel('Water', fontsize=16)
ax[ix].set_ylabel('Blooms (scaled)', fontsize=16)


# Sample from prior predictive
ix = 1
sx = 0
[ax[ix].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(ppd_ib.prior['Intercept'][0].values, ppd_ib.prior['Water'][0].values, ppd_ib.prior['Shade'][0].values, ppd_ib.prior['ShadeWater'][0].values)]
ax[ix].set_ylim(0., 1.)
ax[ix].set_title('Shade='+str(sx), fontsize=16)
ax[ix].set_xlabel('Water', fontsize=16)
ax[ix].set_ylabel('Blooms (scaled)', fontsize=16)

# Sample from prior predictive
ix = 2
sx = 1
[ax[ix].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(ppd_ib.prior['Intercept'][0].values, ppd_ib.prior['Water'][0].values, ppd_ib.prior['Shade'][0].values, ppd_ib.prior['ShadeWater'][0].values)]
ax[ix].set_ylim(0., 1.)
ax[ix].set_title('Shade='+str(sx), fontsize=16)
ax[ix].set_xlabel('Water', fontsize=16)
ax[ix].set_ylabel('Blooms (scaled)', fontsize=16);
plt.savefig('tulip_ppd.jpg',dpi=300);

These seem a bit better, let's run with them

In [None]:
with bloom:
    trace_b = pm.sample(1000, idata_kwargs={'log_likelihood':True})
with ibloom:
    trace_ib = pm.sample(1000, idata_kwargs={'log_likelihood':True})

In [None]:
pm.summary(trace_b)

In [None]:
pm.summary(trace_ib)

In [None]:
_, ax = plt.subplots(2,3, figsize=(15,10))

# New data
xnew = np.linspace(-1,1,30)
# Number of samples to take
ns = 20
# Grab ns random samples
Ix = np.random.uniform(0,len(trace_ib.posterior['Intercept'][0].values),ns).astype(int)

# Sample from prior predictive
ix = 0
iy = 0
sx = -1

[ax[ix,iy].plot(xnew,b0+b1*xnew+b2*sx, alpha=0.3, c='black') for b0,b1,b2 in zip(trace_b.posterior['Intercept'][0].values[Ix], trace_ib.posterior['Water'][0].values[Ix], trace_ib.posterior['Shade'][0].values[Ix])]
ax[ix,iy].set_ylim(0., 1.)
ax[ix,iy].set_title('Basic model: Shade='+str(sx), fontsize=16)
ax[ix,iy].set_xlabel('Water', fontsize=16)
ax[ix,iy].set_ylabel('Blooms (scaled)', fontsize=16)
ax[ix,iy].scatter(W[S==sx],B[S==sx])

# Sample from prior predictive
iy = 1
sx = 0
[ax[ix,iy].plot(xnew,b0+b1*xnew+b2*sx, alpha=0.3, c='black') for b0,b1,b2 in zip(trace_b.posterior['Intercept'][0].values[Ix], trace_ib.posterior['Water'][0].values[Ix], trace_ib.posterior['Shade'][0].values[Ix])]
ax[ix,iy].set_ylim(0., 1.)
ax[ix,iy].set_title('Basic model: Shade='+str(sx), fontsize=16)
ax[ix,iy].set_xlabel('Water', fontsize=16)
ax[ix,iy].set_ylabel('Blooms (scaled)', fontsize=16)
ax[ix,iy].scatter(W[S==sx],B[S==sx])

# Sample from prior predictive
iy = 2
sx = 1
[ax[ix,iy].plot(xnew,b0+b1*xnew+b2*sx, alpha=0.3, c='black') for b0,b1,b2 in zip(trace_b.posterior['Intercept'][0].values[Ix], trace_ib.posterior['Water'][0].values[Ix], trace_ib.posterior['Shade'][0].values[Ix])]
ax[ix,iy].set_ylim(0., 1.)
ax[ix,iy].set_title('Basic model: Shade='+str(sx), fontsize=16)
ax[ix,iy].set_xlabel('Water', fontsize=16)
ax[ix,iy].set_ylabel('Blooms (scaled)', fontsize=16)
ax[ix,iy].scatter(W[S==sx],B[S==sx])



# Sample from prior predictive
ix = 1
iy = 0
sx = -1

[ax[ix,iy].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(trace_ib.posterior['Intercept'][0].values[Ix], trace_ib.posterior['Water'][0].values[Ix], trace_ib.posterior['Shade'][0].values[Ix], trace_ib.posterior['ShadeWater'][0].values[Ix])]
ax[ix,iy].set_ylim(0., 1.)
ax[ix,iy].set_title('Interaction model: Shade='+str(sx), fontsize=16)
ax[ix,iy].set_xlabel('Water', fontsize=16)
ax[ix,iy].set_ylabel('Blooms (scaled)', fontsize=16)
ax[ix,iy].scatter(W[S==sx],B[S==sx])


# Sample from prior predictive
iy = 1
sx = 0
[ax[ix,iy].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(trace_ib.posterior['Intercept'][0].values[Ix], trace_ib.posterior['Water'][0].values[Ix], trace_ib.posterior['Shade'][0].values[Ix], trace_ib.posterior['ShadeWater'][0].values[Ix])]
ax[ix,iy].set_ylim(0., 1.)
ax[ix,iy].set_title('Interaction model: Shade='+str(sx), fontsize=16)
ax[ix,iy].set_xlabel('Water', fontsize=16)
ax[ix,iy].set_ylabel('Blooms (scaled)', fontsize=16)
ax[ix,iy].scatter(W[S==sx],B[S==sx])

# Sample from prior predictive
iy = 2
sx = 1
[ax[ix,iy].plot(xnew,b0+b1*xnew+b2*sx+b3*xnew*sx, alpha=0.3, c='black') for b0,b1,b2,b3 in zip(trace_ib.posterior['Intercept'][0].values[Ix], trace_ib.posterior['Water'][0].values[Ix], trace_ib.posterior['Shade'][0].values[Ix], trace_ib.posterior['ShadeWater'][0].values[Ix])]
ax[ix,iy].set_ylim(0., 1.)
ax[ix,iy].set_title('Interaction model: Shade='+str(sx), fontsize=16)
ax[ix,iy].set_xlabel('Water', fontsize=16)
ax[ix,iy].set_ylabel('Blooms (scaled)', fontsize=16)
ax[ix,iy].scatter(W[S==sx],B[S==sx])
plt.tight_layout()
plt.savefig('tulip_fits.jpg',dpi=300);

So, with just one interaction we've got a heck of a lot going on. Nervous yet? 