## Week 6 Lecture 2 - Binomial regression

McElreath's lecture for today: https://www.youtube.com/watch?v=hRJtKCIDTwc

McElreath's lectures for the whole book are available here: https://github.com/rmcelreath/statrethinking_winter2019

An R/Stan repo of code is available here: https://vincentarelbundock.github.io/rethinking2/

An excellent port to Python/PyMC Code is available here: https://github.com/dustinstansbury/statistical-rethinking-2023

You are encouraged to work through both of these versions to re-enforce what we're doing in class.

In [None]:
# Import python packages
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
import scipy as sp 
import random as rd
import pdb
import pymc as pm
import arviz as az
import networkx as nx
from matplotlib import pyplot as plt
import dataframe_image as dfi


# Helper functions
def stdize(x):
    return (x-np.mean(x))/np.std(x)


def indexall(L):
    poo = []
    for p in L:
        if not p in poo:
            poo.append(p)
    Ix = np.array([poo.index(p) for p in L])
    return poo,Ix

def indexall_(L):
    Il, Ll = pd.factorize(L, sort=True)
    return Ll, Il

## Monkey hands

In chapter 11 (p325) there is a description of experiments done by [Silk et al 2005](https://www.nature.com/articles/nature04243), whereby chimpanzees are given the option of having food for themselves alone, or having the same quantity of food for themselves plus giving it to another chimp at the opposite end of a table.  The study design looks like this:

![inline](chimps1.jpg)![inline](chimps2.jpg)

where, given this particular setup, pulling the right hand lever would give just the subject (the *actor* in the text) a grape, while pulling the left lever would also give the other chimp a grape. The placement of the other grape is randomly assigned because chimps are also right/left handed. 

In [None]:
# Grab chimp data
cdata = pd.read_csv('chimpanzees.csv',sep=';')
cdata.head()

Here we have four potential treatments:

- prosoc_left = 0 AND condition = 0 (Two food items on right and no partner)
- prosoc_left = 0 AND condition = 1 (Two food items on right and partner present)
- prosoc_left = 1 AND condition = 0 (Two food items on left and no partner)
- prosoc_left = 1 AND condition = 1 (Two food items on left and partner present)

that we can encode into one of four treatments:

In [None]:
cdata['treatment'] = 1 + cdata.prosoc_left + 2*cdata.condition
cdata.treatment.values

To figure out if chimps enact pro-social behaviour and give the other chimp a grape (without penalty to themselves) reuquires a statistical model. To start, we'll build a simple model based on what happens with the left-hand lever (i.e. if the left was pulled it's 1, if the right it's 0):

$$
\begin{aligned}
L_i & \sim Bin(1,p_i) \\
logit(p_i) & = \beta_{actor}+\beta_{treatment}
\end{aligned}
$$

The next thing we need to do is specify some priors. And for that we'll start with some reasonable values then take a look through piror predictive simulation. Thinking about this, we know that the full range of 0 to 1, is about -4 to 4 on the log-odds (logit) scale. So for a normal prior to span that range, we'd need somthing that is about $2SD=4$ or $SD=2$. So let's try that:

In [None]:
# Inverse-logit function
def invlogit(x):
    return np.exp(x)/(1+np.exp(x))

# Random samples from a N(0,2)
plt.hist(invlogit(np.random.normal(0,2,1000)))
plt.savefig('histo1.jpg',dpi=300);

Not quite flat - McElreath uses $N(0, 1.5)$, so let's try that

In [None]:
# Random samples from a N(0,1.5)
plt.hist(invlogit(np.random.normal(0,1.5,10000)))
plt.savefig('histo2.jpg',dpi=300);

Can we do better still? $N(0, 1.7)$?

In [None]:
# Random samples from a N(0,1.7)
plt.hist(invlogit(np.random.normal(0,1.7,10000)))
plt.savefig('histo3.jpg',dpi=300);

For the treatment effects, what we want to know is the difference in how often the left hand lever is pulled when there is a pro-social grape in the partner's box and when there isn't. So this we can represent as leading to some sort of increase in probability. In general these kinds of behavioural difference in psychology are low, so we can pick a prior with low potential differences, something like $N(0,0.25)$:

In [None]:
n_ = 10000
nt = invlogit(np.random.normal(0,1.7,n_))
tn = invlogit(nt+np.random.normal(0,0.25,n_))
plt.hist(abs(nt-tn))
plt.savefig('betaT.jpg',dpi=300);

This isn't too bad, it favours lower values with a peak at about a 10% difference. This might seem too narrow but remember that any normal prior can still take on extreeme values (they're not prohibited), but they're just unlikely. If you have a lot of data this prior will be overwhealmed. Keep in mind what such regularization does: **good priors hurt the fit to sample but improve prediction**. 

Ok, so let's build our model:

In [None]:
# Grab data
# Left pull - response
L = cdata.pulled_left.values
# Individual chimps
Actor,Ia = indexall(cdata.actor.values)
Chimp = ['Chimp '+str(a) for a in Actor]
nchimps = len(Actor)
# Treatment
Treatment,It = indexall(cdata.treatment.values)
Treatment = ['R/N','L/N','R/P','L/P']
ntreat = len(Treatment)

In [None]:
Treatment

In [None]:
with pm.Model(coords={'Chimp':Chimp, 'Treat': Treatment}) as Chimps:
    # Individual intercepts
    β0 = pm.Normal('Actor', 0, 1.7, dims='Chimp')
    # Treatment effects
    β1 = pm.Normal('Treatment', 0, 0.25, dims='Treat')

    # Linear model
    p = pm.invlogit(β0[Ia]+β1[It])

    # Likelihood
    Yi = pm.Binomial('Yi', 1, p,observed=L)

In [None]:
with Chimps:
    trace_c = pm.sample(1000, idata_kwargs={"log_likelihood": True})

In [None]:
tmp = pm.summary(trace_c)
tmp

In [None]:
dfi.export(tmp.style.background_gradient(), 'df_m1.png')

In [None]:
ChimpTrace = trace_c

In [None]:
from scipy.special import expit as logistic
az.plot_forest(ChimpTrace, var_names=['Actor'], transform=logistic, combined=True)
plt.tight_layout()
plt.savefig('m1forest.jpg',dpi=300);

From this, we can see that various chimps have various levels of handedness, with one chimp only ever pulling the left lever. 

Let's move on to the treatments:


In [None]:
az.plot_forest(ChimpTrace,var_names=['Treatment'])
plt.axvline(0)
plt.tight_layout()
plt.savefig('m1forest2.jpg',dpi=300);

From this we can see that, while there are slight differences between R/N vs R/P and L/N vs L/P, they're pretty small, with considerable overlap for L and some overlap in R. We can also look at this in terms of absolute differences:


In [None]:
# Grab posteriors, index by 0 to grab first of 4 chains
tmp = trace_c.posterior['Treatment'][0].T.values

In [None]:
plt.hist(tmp[0]-tmp[2])
plt.axvline(0,c='red',lw=4)
plt.title('Right P(effect)=R/N-R/P='+str(sum((tmp[0]-tmp[2])>0)/len(tmp[0])))
plt.savefig('m1RT.jpg',dpi=300);

In [None]:
plt.hist(tmp[1]-tmp[3])
plt.axvline(0,c='red',lw=4)
plt.title('Left P(effect)=L/N-L/P='+str(sum((tmp[1]-tmp[3])>0)/len(tmp[0])))
plt.savefig('m1LT.jpg',dpi=300);

So both have more density above zero, suggestive of an effect, with stronger evidence among right handers. However these differences are N-P (no one there - another chimp there) which means chimps are slighly less likey to pull the left or right levers when there is someone on the other side. Jerks.

To see what our model actually does here, we can push our estimates back through the model and output the expectations for each chimp; first a look at the data itself:

In [None]:
# Set figure size
plt.figure(figsize=(10, 3))
oset = 1
for i in range(7):
    # Grab chimp
    l = L[Ia==i]
    t = It[Ia==i]
    # Grab data for each treatment
    rn = np.mean(l[t==0])
    ln = np.mean(l[t==1])
    rp = np.mean(l[t==2])
    lp = np.mean(l[t==3])
    # Plot rights
    plt.plot((oset,oset+2),(rn,rp),c='b',zorder=0)
    if i==0:
        plt.scatter(oset,rn, facecolors='white', edgecolors='b', label='Alone')
        plt.scatter(oset+2,rp, facecolors='b', edgecolors='b', label='Partner')
    else:
        plt.scatter(oset,rn, facecolors='white', edgecolors='b')
        plt.scatter(oset+2,rp, facecolors='b', edgecolors='b')
    # Plot lefts
    plt.plot((oset+1,oset+3),(ln,lp),c='b',zorder=0)
    plt.scatter(oset+1,ln, facecolors='white', edgecolors='b')
    plt.scatter(oset+3,lp, facecolors='b', edgecolors='b')
    oset += 4
[plt.axvline(x+0.5, c='grey') for x in [4,8,12,16,20,24]]
[plt.text(x-3+0.5,1.1,'Actor '+str(int(x/4))) for x in [4,8,12,16,20,24,28]]
plt.axhline(0.5,linestyle='--',c='grey')
plt.tick_params(bottom=False, labelbottom=False)
plt.legend()
plt.savefig('data.jpg',dpi=300);

Next, what the model actually sees:

In [None]:
a_trace = trace_c.posterior['Actor'][0].T
t_trace = trace_c.posterior['Treatment'][0].T

In [None]:
# Grab effects sizes for each treatment
t1 = t_trace[0]
t2 = t_trace[1]
t3 = t_trace[2]
t4 = t_trace[3]

# Set figure size
plt.figure(figsize=(10, 3))
oset = 1
for i in range(7):
    # Grab chimp intercept
    l = a_trace[i]
    # Calculate individual effects
    rn = np.median(invlogit(l+t1))
    rn2 = np.std(invlogit(l+t1))*2
    ln = np.median(invlogit(l+t2))
    ln2 = np.std(invlogit(l+t2))*2
    rp = np.median(invlogit(l+t3))
    rp2 = np.std(invlogit(l+t3))*2
    lp = np.median(invlogit(l+t4))
    lp2 = np.std(invlogit(l+t4))*2
    
    # Plot rights
    plt.plot((oset,oset+2),(rn,rp),c='b',zorder=0)
    plt.plot((oset,oset),(rn-rn2,rn+rn2),c='black',zorder=0)
    plt.plot((oset+2,oset+2),(rp-rp2,rp+rp2),c='black',zorder=0)

    plt.scatter(oset,rn, facecolors='white', edgecolors='b')
    plt.scatter(oset+2,rp, facecolors='b', edgecolors='b')
    
    # Plot lefts
    plt.plot((oset+1,oset+3),(ln,lp),c='b',zorder=0)
    plt.plot((oset+1,oset+1),(ln-ln2,ln+ln2),c='black',zorder=0)
    plt.plot((oset+3,oset+3),(lp-lp2,lp+lp2),c='black',zorder=0)
    plt.scatter(oset+1,ln, facecolors='white', edgecolors='b')
    plt.scatter(oset+3,lp, facecolors='b', edgecolors='b')
    oset += 4
[plt.axvline(x+0.5, c='grey') for x in [4,8,12,16,20,24]]
[plt.text(x-3+0.5,1.1,'Actor '+str(int(x/4))) for x in [4,8,12,16,20,24,28]]
plt.axhline(0.5,linestyle='--',c='grey')
plt.tick_params(bottom=False, labelbottom=False)
plt.savefig('model.jpg',dpi=300);

These results should convey two things:
1. That the differences in handedness among chimps is the strongest effect
2. That the model as written assumes very little effect when the partner is added (filled circles)

We'll revisit this model later on when we look at multilevel models.

# Proportional odds

The results above give us absolute probabilites for each chimp pulling the left lever when the partner is present, but what about the relative change? These are called proportional odds and can be calculated very simply with a per unit increase in $x$, in other words the coefficient for the addition of a partner:

$$
\frac{exp(\beta_0+\beta_1(x_i+1))}{exp(\beta_0+\beta_1(x_i))}
$$

which, with a bit of algebra reduces down to

$$
\frac{exp(\beta_0)exp(\beta_1 x_i)exp(\beta_1)}{exp(\beta_0)exp(\beta_1 x_i)} = exp(\beta_1)
$$

For the chimps model, adding a partner is given by the difference between coefficients estimated in the treatment node of our model:

In [None]:
# Right hand
RHpo = np.exp(t3-t1).values
plt.hist(RHpo)
plt.axvline(1,c='red',lw=4)
plt.title('Proportional odds (R) = '+str(np.round(np.mean(RHpo),2)))
plt.savefig('m1RTpo.jpg',dpi=300);

In [None]:
# Left hand
LHpo = np.exp(t4-t2).values
plt.hist(LHpo)
plt.axvline(1,c='red',lw=4)
plt.title('Proportional odds (L) = '+str(np.round(np.mean(LHpo),2)))
plt.savefig('m1LTpo.jpg',dpi=300);

So in both cases, adding a partner to a particular side reduces the odds of pulling that lever.

# Aggregated binomial

While these data analyzsed so far are raw values for pulling the left lever (0/1), binomials are about numbers of successes in a given number of trials. As such, provided there is nothing special about the order of things, we can condense the data into a table of successes:

In [None]:
# Add side covariate
cdata['side'] = np.array(['right','left'])[cdata.prosoc_left]
# Label treatments
cdata['treatment'] = np.array(['RN','LN','RP','LP'])[cdata.treatment.values-1]
# Partner present
cdata['partner'] = cdata.condition

In [None]:
cdata.head()

In [None]:
cdata2 = pd.pivot_table(cdata, values='pulled_left', index=['side','actor','treatment','partner'], aggfunc=np.sum).reset_index()

In [None]:
dfi.export(cdata2.head(), 'aggdata.png')
cdata2.head()

In [None]:
# Grab data
# Number of left pulls
L = cdata2.pulled_left.values
# Individual chimps
Actor,Ia = indexall(cdata2.actor.values)
Chimp = ['Chimp '+str(a) for a in Actor]
nchimps = len(Actor)
# Treatment
Treatment,It = indexall(cdata2.treatment.values)
ntreat = len(Treatment)

In [None]:
with pm.Model(coords={'Chimp':Chimp, 'Treat': Treatment}) as Chimps2:
    # Individual intercepts
    β0 = pm.Normal('Actor', 0, 1.7, dims='Chimp')
    # Treatment effects
    β1 = pm.Normal('Treatment', 0, 0.25, dims='Treat')

    # Linear model
    p = pm.invlogit(β0[Ia]+β1[It])

    # Likelihood
    Yi = pm.Binomial('Yi', 18, p, observed=L)

In [None]:
with Chimps2:
    trace_c2 = pm.sample(1000, idata_kwargs={"log_likelihood": True})

In [None]:
tmp2 = pm.summary(trace_c2)
dfi.export(tmp2.style.background_gradient(), 'df_m2.png')
tmp2

In [None]:
axes = az.plot_forest(
    [trace_c, trace_c2], model_names=["Raw", "Aggregated"], var_names=["Actor"])
plt.tight_layout()
plt.savefig('chimps_models_actor.jpg',dpi=300);

In [None]:
axes = az.plot_forest(
    [trace_c, trace_c2], model_names=["Raw", "Aggregated"], var_names=["Treatment"])
plt.tight_layout()
plt.savefig('chimps_models_treat.jpg',dpi=300);

In [None]:
raw_loo = pm.loo(trace_c, Chimps)
agg_loo = pm.loo(trace_c2, Chimps2)

In [None]:
pm.waic?

In [None]:
raw_waic = pm.waic(trace_c, scale='deviance', pointwise=True)
agg_waic = pm.waic(trace_c2, scale='deviance', pointwise=True)

In [None]:
raw_waic

In [None]:
agg_waic.waic_i

In [None]:
tmp = raw_loo
tmp

In [None]:
tmp2 = agg_loo
tmp2

In [None]:
np.array(Actor)[Ia[agg_loo.pareto_k.values>0.5]]

In [None]:
agg_loo.pareto_k.values>0.5

In [None]:
loofail = cdata2.iloc[agg_loo.pareto_k.values>0.5,]
dfi.export(loofail.style.background_gradient(), 'kgt05.png')
loofail

This is interesting, but it's hard to see why these particular datapoints are failing - let's plot them to see:

In [None]:
# Set figure size
plt.figure(figsize=(10, 3))
oset = 1
for i in range(7):
    # Grab chimp
    l = L[Ia==i]
    t = It[Ia==i]
    # Grab data for each treatment
    rn = l[t==0]/18
    ln = l[t==2]/18
    rp = l[t==1]/18
    lp = l[t==3]/18
    # Grab k value fails
    lf2 = loofail.iloc[loofail.actor.values==np.array(Actor)[i]]
    # Plot rights
    if i==0:
        plt.plot((oset,oset+2),(rn,rp),c='b',zorder=0, label='Alone')
        plt.scatter(oset,rn, facecolors='white', edgecolors='b', label='Partner')
    else:
        plt.plot((oset,oset+2),(rn,rp),c='b',zorder=0)
        plt.scatter(oset,rn, facecolors='white', edgecolors='b')
    if 'RN' in lf2.treatment.values:
        plt.scatter(oset,rn, facecolors='white', edgecolors='r')
    plt.scatter(oset+2,rp, facecolors='b', edgecolors='b')
    if 'RP' in lf2.treatment.values:
        plt.scatter(oset+2,rp, facecolors='r', edgecolors='r')
    # Plot lefts
    plt.plot((oset+1,oset+3),(ln,lp),c='b',zorder=0)
    plt.scatter(oset+1,ln, facecolors='white', edgecolors='b')
    if 'LN' in lf2.treatment.values:
        plt.scatter(oset+1,ln, facecolors='white', edgecolors='r')
    plt.scatter(oset+3,lp, facecolors='b', edgecolors='b')
    if 'LP' in lf2.treatment.values:
        plt.scatter(oset+3,lp, facecolors='r', edgecolors='r')
    oset += 4
[plt.axvline(x+0.5, c='grey') for x in [4,8,12,16,20,24]]
[plt.text(x-3+0.5,1.1,'Actor '+str(int(x/4))) for x in [4,8,12,16,20,24,28]]
plt.axhline(0.5,linestyle='--',c='grey')
plt.tick_params(bottom=False, labelbottom=False)
plt.legend()
plt.savefig('aggdata.jpg',dpi=300);

So by using the aggregrated case, we have collapsed the data by a factor of 18 (the actor by treatment observations), and this makes it harder to fit out of sample. Rather than doing 'leave one out' cross-validation, it's more like 'leave 18 out' cross validation.

# Aggregated admissions

Here is an example of aggregated regression from UC Berkeley, a classic example looking at gender bias in admissions. First, import the data 

In [None]:
bdata = pd.read_csv('UCBadmit.csv',sep=";")
bdata['gender'] = bdata['applicant.gender']
dfi.export(bdata.style.background_gradient(), 'UCBadmit.png')
bdata

The question here is **is there systematic bias against female applicants at UC Berkeley?** If we look at the proportions admitted by sex

In [None]:
bdata.admit[bdata.gender=='female'].sum()/bdata.applications[bdata.gender=='female'].sum()

In [None]:
bdata.admit[bdata.gender=='male'].sum()/bdata.applications[bdata.gender=='male'].sum()

So this looks bad - 14% more accepted males to accepted females, and at liberal-old Berkeley! Let's build a model to estimate the level of this effect:

In [None]:
# Response
A = bdata.admit.values
N = bdata.applications.values

# Sex
Sex_,Is = indexall(bdata.gender.values)

In [None]:
with pm.Model(coords={'Sex_':Sex_}) as Admit:
    # log-odds of admission by sex
    β0 = pm.Normal('Sex', 0, 1.7, dims='Sex_')

    # Linear model
    p = pm.invlogit(β0[Is])

    # Likelihood
    Yi = pm.Binomial('Yi', N, p, observed=A)

In [None]:
with Admit:
    trace_ba = pm.sample(1000)

In [None]:
pm.summary(trace_ba)

Calculate the probability of admission by sex

In [None]:
invlogit(trace_ba.posterior['Sex'][0].T).mean(axis=1).values

Which is just what we saw with the raw values. What's the difference in probability?

In [None]:
diff = (invlogit(trace_ba.posterior['Sex'][0].values.T[0])-invlogit(trace_ba.posterior['Sex'][0].values.T[1]))
np.mean(diff)

In [None]:
plt.hist(diff)
plt.axvline(0,c='red',lw=4)
plt.title('Berkeley Male vs Female acceptance probability')
plt.savefig('rawrates.jpg',dpi=300);

So the 14% we saw earlier. Now let's push these back through the model and see how our predictions look (i.e. the **most important model check**):

In [None]:
for i in range(6):
    # Male/female per department
    x = 1 + 2 * i
    
    # Plot data
    y1 = bdata.admit[x] / bdata.applications[x]
    y2 = bdata.admit[x+1] / bdata.applications[x+1]
    if i==5:
        plt.plot([x, x+1], [y1, y2], '-C0o', lw=2, label='Data')
    else:
        plt.plot([x, x+1], [y1, y2], '-C0o', lw=2)
    plt.text(x + 0.25, (y1+y2)/2 + 0.05, bdata.dept[x])
    
    # Model predictions male
    pmale = invlogit(trace_ba.posterior['Sex'][0].values.T[0])
    ynew1 = np.quantile(pmale,0.5)
    ynew1_lo = np.quantile(pmale,0.05)
    ynew1_hi = np.quantile(pmale,0.95)
    # Model predictions female
    pfemale = invlogit(trace_ba.posterior['Sex'][0].values.T[1])
    ynew2 = np.quantile(pfemale,0.5)
    ynew2_lo = np.quantile(pfemale,0.025)
    ynew2_hi = np.quantile(pfemale,0.975)
    
    if i==5:
        plt.scatter([x, x+1], [ynew1,ynew2], c='black',label='Model')
        plt.plot([x, x+1], [ynew1,ynew2], c='grey')
        plt.plot([x,x],[ynew1_lo,ynew1_hi],c='black')
        plt.plot([x+1,x+1],[ynew2_lo,ynew2_hi],c='black')
    else:
        plt.scatter([x, x+1], [ynew1,ynew2], c='black')
        plt.plot([x, x+1], [ynew1,ynew2], c='grey')
        plt.plot([x,x],[ynew1_lo,ynew1_hi],c='black')
        plt.plot([x+1,x+1],[ynew2_lo,ynew2_hi],c='black')
    
plt.ylim(0, 1)
plt.ylabel('Acceptance rate')
plt.xlabel('Department')
plt.legend()
plt.savefig('rawmodelfit.jpg',dpi=300);

These are pretty shithouse predictions - and of course this example is classic because it is an example of Simpson's paradox. The key being to include a covariate for department - departments have wildly differing admission rates and depending on the numbers of male and female applications to each department, we can see bias where none exists. Let's run another model that handles all this:

In [None]:
# Department
Dept_,Id = indexall(bdata.dept.values)

In [None]:
with pm.Model(coords={'Sex_':Sex_, 'Dept_':Dept_}) as AdmitD:
    # log-odds of admission by sex
    β0 = pm.Normal('Sex', 0, 1.7, dims='Sex_')
    # log-odds of admission by department
    β1 = pm.Normal('Department', 0, 1.7, dims='Dept_')

    # Linear model
    p = pm.invlogit(β0[Is]+β1[Id])

    # Likelihood
    Yi = pm.Binomial('Yi', N, p, observed=A)

In [None]:
with AdmitD:
    trace_bad = pm.sample(1000)

In [None]:
pm.plot_forest(trace_bad)
plt.tight_layout()
plt.savefig('rightmodel.jpg',dpi=300);

So the difference between sexes disappears, and males are accepted at slightly lower rates. We can use our DAG-building skills to see why:

In [None]:
G = nx.DiGraph([('S','D'), ('D','A'), ('S','A')])

options = {
    "font_size": 36,
    "node_size": 3000,
    "node_color": "white",
    "edgecolors": "black",
    "linewidths": 5,
    "width": 4,
}
nx.draw_networkx(G, **options)
ax = plt.gca()
ax.margins(0.20)
plt.axis("off")
plt.savefig('UCB_dag.jpg',dpi=300);

Let's see what the model sees now:

In [None]:
for i in range(6):
    # Male/female per department
    x = 1 + 2 * i
    
    # Plot data
    y1 = bdata.admit[x] / bdata.applications[x]
    y2 = bdata.admit[x+1] / bdata.applications[x+1]
    if i==5:
        plt.plot([x, x+1], [y1, y2], '-C0o', lw=2, label='Data')
    else:
        plt.plot([x, x+1], [y1, y2], '-C0o', lw=2)
    plt.text(x + 0.25, (y1+y2)/2 + 0.05, bdata.dept[x])
    
    # Model parameters
    bmale = trace_bad.posterior['Sex'][0].values.T[0].mean()
    bfemale = trace_bad.posterior['Sex'][0].values.T[0].mean()
    bdept = trace_bad.posterior['Department'][0].values.T[i].mean()
    ynew1 = invlogit(bmale+bdept)
    ynew2 = invlogit(bfemale+bdept)
    
    if i==5:
        plt.scatter([x, x+1], [ynew1,ynew2], c='black', label='Model')
    else:
        plt.scatter([x, x+1], [ynew1,ynew2], c='black')
    plt.plot([x, x+1], [ynew1,ynew2], c='grey')
    
plt.ylim(0, 1)
plt.ylabel('Acceptance rate')
plt.xlabel('Department')
plt.legend()
plt.savefig('rightmodelfit.jpg',dpi=300);

Doing better, but that Department A suggests much higher acceptance rates for females, something to be modelled later perhaps.