# Week 02 \\\ Examples for lecture
# Rock the Vote!

## Did TV advertisements affect voter turnout among 18-19 year olds?

## Load Libraries

In [None]:
# for reading json files
import json

# numerical libraries
import numpy as np
import scipy as sp
import pystan

# pandas!
import pandas as pd

# plotting libraries
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import seaborn as sns
%pylab inline

In [None]:
sns.set(style="white")

## A function to print a long string nicely

In [None]:
def print_info(info,wpl=12):
    """
    nicely print a long paragraph
    """
    
    long_info = info.split()
    num_lines = round(len(long_info) / wpl)
    
    info_break = []
    
    # break up the long string into multiple lines
    for i in range(num_lines):
        hld = ''
        chunk = long_info[wpl*i:wpl*(i+1)]
        
        # piece each line into one string
        for i in range(len(chunk)):
            hld = hld + chunk[i] + ' '
        
        info_break.append(hld)
    
    # now print!
    for i in range(len(info_break)):
        print(info_break[i])

In [None]:
def print_vars(var_dict):
    """
    nicely print the infomation about each variable
    """
    # what's the longest variable name?
    max_len = 0
    for k in var_dict.keys():
        if len(k) > max_len:
            max_len = len(k)
    
    for k in var_dict.keys():
        len_k = len(k)
        print(str(k) + ' '*(max_len - len_k + 1) + ' :::  ' + var_dict[k])

## Class Example 2: Rock the Vote

Jackman presents this example in his _Bayesian Analysis for the Social Sciences_ in Example 7.9 on pages 355-362. The exercise provides an opportunity to estimate a binomial dependent variable and sets us up to talk about this example later when we talk about multi-level models. Also, a great opportunity to dive into Bayesian modeling in the context of a field experiment.

>Prior to the presidential election in November 2004, we assembled a nationwide list of cable systems that covered only a single zip code. Small cable TV systems are a fertile source of experimental data for social scientists because their small size makes them inexpensive and conducive to large-N randomized studies. In order to test the televised messages in an environment that would not be dominated by other election-related advertisements, we removed all cable systems in 16 states that the Los Angeles Times classified as presidential battlegrounds (closely contested states). We then excluded any systems that had no time available in prime time during the week before the election or that cost more than 15 dollars per 30-second advertisement on the USA television network. We excluded all systems in Mississippi because its voter file is very difficult to obtain. This left 85 cable systems for randomization.

>Random assignment of the cable systems took place as follows. Each system was matched with one or two other systems in the same state according to its past turnout rate in presidential elections. This procedure resulted in 40 strata containing the 85 cable systems. After sorting the list of 85 cable systems by strata and then by a random number, the first cable system in each stratum was assigned to the treatment condition, the others to control.

>People living within the treatment systems saw two different 30-second advertisements produced by Rock the Vote. Both advertisements used the same format. The first dealt with the draft and the second, with education. In the draft advertisement, a young couple dancing at a party is talking about the man’s new job. He is very excited to be working in promotions and hopes to start his own firm in 6 months. The woman interrupts him and says, ‘‘That’s if you don’t get drafted.’’ The man is puzzled. She clarifies, ‘‘Drafted, for the war?’’ He responds, ‘‘Would they do that?’’ The advertisement closes with everyone at the party looking into the camera and the words, ‘‘It’s up to you’’ on the screen. The voiceover says, ‘‘The Draft. One of the issues that will be decided this November. Remember to vote on November 2nd.’’ The closing image is of the Rock the Vote logo on a black screen.

>The second Rock the Vote advertisement dealt with education. A young man arrives at work with news that he has been accepted to college. His colleagues congratulate him and one of them asks, ‘‘Books, room, board, tuition ... how can you pay for all of that?’’ The advertisement closes with everyone looking out at the camera and the words, ‘‘It’s up to you’’ written on the screen. The voiceover is similar to the one above but with education substituted for draft. We showed both advertisements equally in all cable systems.

>Each cable system comprises several thousand voters, and the entire data set encompasses approximately 850,000 registered voters. Of special  interest are the 23,869 voters who are 18 and 19 years of age, for whom this election represents the first federal election in which they are eligible to vote and to whom these ads were specifically addressed. The methodological question is what is the most efficient and reliable way to analyze these data? This question was particularly compelling since our previous mass-media turnout experiments suggested the effects of treatment were likely to be small in magnitude, but not zero (Vavreck  and Green 2006)

In [None]:
# read json file into a dictionary
with open('data/rock_the_vote_data.json', 'r') as f:
    json_data = json.load(f)

# close the file
f.close()

In [None]:
# what's the source?
print(json_data['source'])

In [None]:
# where can i get these data?
print(json_data['url'])

In [None]:
# print some info about the dataset
print_info(json_data['info'])

In [None]:
# what variables are in the dataset?
print_vars(json_data['vars'])

In [None]:
# just give it to me in a dataframe
data = pd.DataFrame(json_data['data'])
data.head(15)

## What is the question?

We want to know if the _Rock the Vote_ TV advertisements had any affect on voter turnour among 18-19 year olds. Translating to statistics, is the distribution of the probability of voting in treated markets different from the distribution of the probability of voting in untreated markets? "Distribution of probability" corresponds to the distribution of the binomial parameter.

Let's look at the distribution of the data to see if there is a difference between treated and untreated markets.

In [None]:
plt.figure(figsize=(10,5))

sns.kdeplot(data[data.treated == 1]['p'],shade=True,label='treated',color='darksalmon')
sns.kdeplot(data[data.treated == 0]['p'],shade=True,label='untreated',color='cadetblue')

# make it pretty
sns.despine()

# label the figure
plt.title('Distribution of Turnout: Treated vs Untreated Markets',family='serif',size=14)
plt.xlabel('theta',family='serif',size=12)
plt.ylabel('density',family='serif',size=12)
plt.legend();

Because this is binomial data, we should look at the turnout rates as they relate to the number of registered voters in the market. Do we see more noise in smaller markets? Remember the Central Limit Theorem? If we see a small number of draws from a binomial distribution, the observed probability of success is likely to vary quite a bit. However, if we are lucky and get to see lots of data from a market, then the observed success rate is likely to be closer to the true, unobserved rate.

In [None]:
plt.figure(figsize=(7,7))
plt.scatter(data.n,data.p)

# make it pretty
sns.despine()

# label the figure
plt.title('Turnout versus Size of Market',family='serif',size=14)
plt.xlabel('size of market',family='serif',size=12)
plt.ylabel('turnout',family='serif',size=12)

In [None]:
# generate data from various binomials when the underlying rate of success is the same
test_1 = sp.stats.binom.rvs(n=10,p=0.2,size=1000)/10
test_2 = sp.stats.binom.rvs(n=100,p=0.2,size=1000)/100
test_3 = sp.stats.binom.rvs(n=500,p=0.2,size=1000)/500

# Initialise the figure and a subplot axes.
num_rows = 1
num_cols = 3
fig, ax = plt.subplots(num_rows, num_cols, figsize=(12, 4))

# overall title
fig.suptitle('Why do we see lots of variation in small markets?',y=1.05,fontsize=16,fontfamily='serif')

# now plot the observed PDFs of the observed rate of success
sns.kdeplot(test_1,shade=True,color='cornflowerblue',ax=ax[0])
ax[0].set_title('Distribution of Success Rate 10 Trials',fontsize=12,fontfamily='serif')
ax[0].set_xlim(0,1)

sns.kdeplot(test_2,shade=True,color='cornflowerblue',ax=ax[1])
ax[1].set_title('Distribution of Success Rate 100 Trials',fontsize=12,fontfamily='serif')
ax[1].set_xlim(0,1)

sns.kdeplot(test_3,shade=True,color='cornflowerblue',ax=ax[2])
ax[2].set_title('Distribution of Success Rate 500 Trials',fontsize=12,fontfamily='serif')
ax[2].set_xlim(0,1)

# make the plot prettier
plt.tight_layout()
plt.show()

Pulling this thread a little further, suppose the true turnout rate in all 85 markets is the same and equals the observed mean, 0.53. How much variation would we expect to see from sampling? ie, ontological uncertainty introduced by random factors such as an 18 year old getting into a wreck on their way to the voting booth.

In [None]:
# mean turnout rate across all 85 markets
np.mean(data.p)

In [None]:
# Initialise the figure and a subplot axes.
num_rows = 3
num_cols = 3
fig, ax = plt.subplots(num_rows, num_cols, figsize=(12, 12))

# overall title
fig.suptitle('Simulations show variability across markets due to sampling',
             y=1.05,fontsize=16,fontweight='bold',fontfamily='serif')

for i in range(num_rows):
    for j in range(num_cols):
        # generate samples
        sim_voters = sp.stats.binom.rvs(n=data.n,p=np.mean(data.p),size=len(data)) / data.n
        
        # plot simulated and actual
        ax[i,j].scatter(data.n,sim_voters,s=12,color='cornflowerblue')
        ax[i,j].scatter(data.n,data.p,s=6,alpha=0.75,color='salmon')
        ax[i,j].axhline(0.55,alpha=0.25,lw=4,color='silver')
        sns.despine(ax=ax[i,j])

# make the plot prettier
plt.tight_layout()
plt.show()

Clearly there is more going on here than just sampling variation, but it does explain a good portion of overall variability in turnout rates across the 85 markets. So let's get to building a model.

## What is our model?

Let $i = 1, \ldots N$ index the 85 markets in the data, and let $s = 1, \ldots S$ index the 40 strata.

We are modeling a random variable, $\theta_i$, that is the probability of turnout in market $i$. This is a binomial likelihood because in each market there are a series of Bernoulli trials among registered voters who are 18 and 19 years old. "Success" occurs when one of these registered voters votes, "failure" occurs when they do not vote. 

Translating to the binomial distribution: 
* $n$ in the dataframe is the number of trials and is equal to the number of registered voters, 
* $r$ in the dataframe is the number of registered voters that voted
* $p$ in the dataframe is the observed rate of success in market $i$.

The complication here is that some markets are treated with _Rock the Vote_ adds. We let $T$ be a variable that equals 1 when the market was treated and 0 if it was not. In the dataframe this variable is ``treated``.

Before we do that, we need to transform our random variable, $\theta$. Why? $\theta$ must live on $(0,1)$, but we want to decompose $\theta$ into different parts. It is easier to do this if we don't have to worry about things adding up to 1. So we write $\theta = f(\alpha)$. $f(\cdot)$ takes an $\alpha$ which lies anywhere on the real line and maps it to a value between $(0,1)$.

This transformation allows us to construct $\alpha$ out of components that capture various parts of the system. Here are some ideas:
1. Some markets are treated and some are not. We can specify a component of $alpha$ that captures this.
2. Some markets are in different strata---which had similar turnout rates in past elections. How might this affect $\theta$? 
3. Suppose we know what region each strata is in, we could account for similarities within a region.
4. Suppose we knew past Republican vote totals for each market. Would this have an effect on $\theta$?

The only thing we know about a market is what strata it is in and whether it was treated. So to start let's assume that $\alpha_i$ (our modeled turnout rate in market $i$) is a function of a common turnout rate plus the effect of the treatment:

$\alpha_i = \alpha + \delta * T$


#### Likelihood

$r_i \sim \textrm{Binomial}(N,\theta_i)$

where $\theta_i = f(\alpha_i)$
and $\alpha_i = \alpha + \delta * T$.

What is $f(\cdot)$? It is the inverse-logit function:

$f(x) = \textrm{logit}^-1(x) = \frac{\exp(x)}{1 + \exp(x)}$

In [None]:
def inv_logit(x):
    return np.exp(x) / (1 + np.exp(x))

In [None]:
x = np.linspace(-10,10,100)
y = inv_logit(x)

plt.plot(x,y)
plt.xlabel("$x$")
plt.xlabel("$f(x)$")
sns.despine();

#### Priors

What are our parameters?

* $\alpha$ the common level of turnout
* $\delta$ the common treatment effect.

Any ideas for what prior to put on these values? What do we know?

* $\alpha_i$ can be anything on the real line, so we do not want to restrict ourselves to distributions with positive support.
* $\alpha_i$ is continuous, so we need to focus on continuous probability distributions.

Let's start with the normal distribution for each. Do we have any reason _a priori_ to specify a non-zero mean for either of these normal distributions? Unless we conjure up something, let's turn to the variances.

To get a sense for what happens when we go from a normal distribution through the logit function and out to $\theta \in (0,1)$, let's simulate some values

In [None]:
# how does that translate to theta?

# Initialise the figure and a subplot axes.
num_rows = 2
num_cols = 2
fig, ax = plt.subplots(num_rows, num_cols, figsize=(10, 5))

# overall title
fig.suptitle('From normal to inverse logit',y=1.025,fontsize=14,fontweight='bold',fontfamily='serif')

# sample from a normal(0,2)
norm_samps = sp.stats.norm.rvs(loc=0,scale=2,size=1000)

sns.kdeplot(norm_samps,shade=True,ax=ax[0,0])
sns.despine(ax=ax[0,0])
ax[0,0].set_title('x ~ normal(0,2)',fontfamily='serif')
ax[0,0].set_xlim(-10,10)

sns.kdeplot(logit(norm_samps),shade=True,ax=ax[0,1])
sns.despine(ax=ax[0,1])
ax[0,1].set_title('theta = invlogit(x)',fontfamily='serif')
ax[0,1].set_xlim(0,1)

# sample from a normal(0,1)
norm_samps = sp.stats.norm.rvs(loc=0,scale=1,size=1000)

sns.kdeplot(norm_samps,shade=True,ax=ax[1,0])
sns.despine(ax=ax[1,0])
ax[1,0].set_title('x ~ normal(0,1)',y=0.92,fontfamily='serif')
ax[1,0].set_xlim(-10,10)

sns.kdeplot(logit(norm_samps),shade=True,ax=ax[1,1])
sns.despine(ax=ax[1,1])
ax[1,1].set_title('theta = invlogit(x)',y=0.92,fontfamily='serif')
ax[1,1].set_xlim(0,1)

# make the plot prettier
plt.tight_layout()
plt.show();

Based on these simulations, do we want to have large variances on our normal distributions? The larger we make them, the more prior weight we put on extreme values of $\theta$. Based on prior knowledge about overall election turnout, we can make a case for smaller variances.

Checking this in Stan.

In [None]:
# what's in this stan model anyway?
f = open('rock_the_vote_prior.stan', 'r')
file_contents = f.read()
print (file_contents)
f.close()

In [None]:
# compile Stan model
pm = pystan.StanModel(file="rock_the_vote_prior.stan")

In [None]:
prior_data = {'N':len(data),'reg_vtr':data.n,'votes':data.r,'T':data.treated}

In [None]:
# conduct MCMC using Stan
pr_draws = pm.sampling(data=prior_data,iter=1000, chains=4)

In [None]:
# give us a dictionary containing posterior draws for each parameter in the model
pr_pd = pr_draws.extract(permuted=True)

In [None]:
pr_pd.keys()

In [None]:
pr_pd['sim_votes'].shape

In [None]:
?? np.random.choice

In [None]:
rd_idx = np.random.choice(range(len(data)),num_rows*num_cols,replace=False)

In [None]:
rd_idx

In [None]:
# Initialise the figure and a subplot axes.
num_rows = 3
num_cols = 3
fig, ax = plt.subplots(num_rows, num_cols, figsize=(12, 12))

# overall title
fig.suptitle('Prior predictive checking: flip book',
             y=1.05,fontsize=16,fontweight='bold',fontfamily='serif')

rd_idx = np.random.choice(range(len(data)),num_rows*num_cols,replace=False)

c = 0
for i in range(num_rows):
    for j in range(num_cols):
        # generate samples
        sim_voters = pr_pd['sim_votes'][rd_idx[c],:] / data.n
        
        # plot simulated and actual
        ax[i,j].scatter(data.n,sim_voters,s=12,color='cornflowerblue')
        ax[i,j].scatter(data.n,data.p,s=6,alpha=0.75,color='salmon')
        sns.despine(ax=ax[i,j])
        c += 1

# make the plot prettier
plt.tight_layout()
plt.show()

## Estimate the model

In [None]:
# compile Stan model
sm = pystan.StanModel(file="rock_the_vote.stan")

In [None]:
stan_data = {'N':len(data),'reg_vtr':data.n,'votes':data.r,'T':data.treated}

In [None]:
# conduct MCMC using Stan
pst_draws = sm.sampling(data=stan_data,iter=1000, chains=4)

In [None]:
# give us a dictionary containing posterior draws for each parameter in the model
pst_pd = pst_draws.extract(permuted=True)

In [None]:
pst_pd.keys()

In [None]:
print(pst_draws)

We can see from look at the $\theta$'s that the model finds that markets that were treated with _Rock the Vote_ advertisements had an 2 percentage point increase in turnout, from 0.53 to 0.55.

## Posterior Checks

In [None]:
# Initialise the figure and a subplot axes.
num_rows = 4
num_cols = 2
fig, ax = plt.subplots(num_rows, num_cols, figsize=(16, 16))

# overall title
fig.suptitle('Graphical depictions of the posterior',y=1.025,fontsize=18,fontfamily='serif')

# ___ROW ONE___
# **trace plots**
ax[0,0].plot(pst_pd['alpha'],lw=1,alpha=0.75,color='cornflowerblue')
ax[0,0].set_title('Trace plot of alpha',fontsize=12,fontfamily='serif')
ax[0,1].plot(pst_pd['delta'],lw=1,alpha=0.75,color='cornflowerblue')
ax[0,1].set_title('Trace plot of delta',fontsize=12,fontfamily='serif')

# ___ROW TWO___
# **joint distribution** of parameters
sns.kdeplot(pr_pd['alpha'], pr_pd['delta'],
            color='cornflowerblue', shade=True, shade_lowest=False,ax=ax[1,0])
sns.kdeplot(pst_pd['alpha'], pst_pd['delta'],
            color="salmon", shade=True, shade_lowest=False,ax=ax[1,1])

ax[1,0].set_title('Prior joint distribution of parameters',fontsize=12,fontfamily='serif')
ax[1,0].set_xlim(-2,2)
ax[1,0].set_ylim(-2,2)
ax[1,1].set_title('Posterior joint distribution of parameters',fontsize=12,fontfamily='serif')
ax[1,1].set_xlim(-2,2)
ax[1,1].set_ylim(-2,2)

# ___ROW THREE___
# prior and posterior of **alpha**
sns.kdeplot(pr_pd['alpha'],shade=True, lw=3,color='cornflowerblue',shade_lowest=False,ax=ax[2,0])
sns.kdeplot(pst_pd['alpha'],shade=True, lw=3,color='salmon',shade_lowest=False,ax=ax[2,1])

ax[2,0].set_title('Prior distribution of alpha',fontsize=12,fontfamily='serif')
ax[2,0].set_xlim(-2,2)
ax[2,1].set_title('Posterior distribution of alpha',fontsize=12,fontfamily='serif')
ax[2,1].set_xlim(-2,2)

# ___ROW FOUR___
# prior and posterior of **delta**
sns.kdeplot(pr_pd['delta'],shade=True, lw=3,color='cornflowerblue',shade_lowest=False,ax=ax[3,0])
sns.kdeplot(pst_pd['delta'],shade=True, lw=3,color='salmon',shade_lowest=False,ax=ax[3,1])

ax[3,0].set_title('Prior distribution of delta',fontsize=12,fontfamily='serif')
ax[3,0].set_xlim(-2,2)
ax[3,1].set_title('Posterior distribution of delta',fontsize=12,fontfamily='serif')
ax[3,1].set_xlim(-2,2)


# make the plot prettier
plt.tight_layout()
plt.show()

Now let's plot predicted against actual. We will sample from the posteriors for $\theta_i$ then compare against the observed turnout rates.

In [None]:
# Initialise the figure and a subplot axes.
num_rows = 3
num_cols = 3
fig, ax = plt.subplots(num_rows, num_cols,sharex=True, sharey=True, figsize=(12, 12))

# overall title
fig.suptitle('Posterior predictive checking: flip book',
             y=1.05,fontsize=16,fontweight='bold',fontfamily='serif')

c = 0
for i in range(num_rows):
    for j in range(num_cols):
        # generate samples
        row_idx = np.random.randint(low=0,high=pst_pd['theta'].shape[0],size=pst_pd['theta'].shape[1])
        col_idx = list(range(pst_pd['theta'].shape[1]))
        idx = np.vstack([row_idx,col_idx]).T
        rnd_post = [pst_pd['theta'][x[0],x[1]] for x in idx]
        ppc_df = pd.DataFrame(np.vstack([rnd_post,data.p,data.treated]).T,
                              columns=['sim_theta','obs_theta','treated'])

        # plot data
        sns.scatterplot(x="obs_theta", y="sim_theta", hue="treated", data=ppc_df,ax=ax[i,j])
  
# make the plot prettier
plt.tight_layout()
plt.show()

The model comes no where near matching the observed data. We have constrained it with two parameters, but there are easy ways we can dramatically improve these fits! Stay tuned. 

In case you still want to know...did _Rock the Vote_ work?

In [None]:
data.treated[:5]

In [None]:
# since there are only two estimated rates of turnout, 
# we can just compare the posterior of one with and 
# without treatment

plt.figure(figsize=(8,5))
sns.kdeplot(pst_pd['theta'][:,0],shade=True,label='untreated')
sns.kdeplot(pst_pd['theta'][:,1],shade=True,label='treated')
plt.xlabel('voter turnout');