# Modeling elections

In [1]:
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
import pystan

## Data

The `electoral_votes` variable is a dictionary containing the number of Electoral College votes for each state. For example
```
  >>> electoral_votes['Indiana']
  11
```
Data from [Wikipedia: United_States_Electoral_College](https://en.wikipedia.org/wiki/United_States_Electoral_College)

The `survey_results` variable is a dictionary mapping from states to an array of survey results for each candidate. Each row in a survey results array represents one survey and each column represents one candidate. There are 4 columns, representing Clinton, Trump, Johnson, and Stein in that order. In the example below, Clinton got 340 votes in the first survey, Trump got 258, Johnson got 27, and Stein got 13.
```
  >>> survey_results['Indiana']
  array([[340, 258,  27,  13],
         [240, 155,   5,   5],
         [235, 155,  50,  20],
         [308, 266,  49,  35],
         [222, 161,  80,  30]])
```
Data from [Wikipedia: Statewide opinion polling for the United States presidential election, 2016](https://en.wikipedia.org/wiki/Statewide_opinion_polling_for_the_United_States_presidential_election,_2016)


In [17]:
electoral_votes = {
    'Alabama': 9,
    'Alaska': 3,
    'Arizona': 11,
    'Arkansas': 6,
    'Colorado': 9,
}

survey_results = {
    'Alabama': np.array([], dtype=int).reshape(0, 4),
    'Alaska': np.array([400 * np.array([.47, .43, .07, .03]), 500 * np.array([.36, .37, .07, .03]), 500 * np.array([.34, .37, .10, .02]), 660 * np.array([.31, .36, .18, .06])], dtype=int),
    'Arizona': np.array([392 * np.array([.45, .47, .05, .02]), 550 * np.array([.39, .47, .04, .03]), 719 * np.array([.40, .45, .09, .03]), 769 * np.array([.44, .49, .05, .01]), 2229 * np.array([.45, .44, .07, .01]), 700 * np.array([.43, .47, .02, .02]), 550 * np.array([.41, .45, .03, .01]), 994 * np.array([.42, .44, .04, .01]), 550 * np.array([.40, .42, .05, .02]), 2385 * np.array([.48, .46, .05, .01]), 401 * np.array([.45, .46, .04, .01]), 550 * np.array([.41, .41, .05, .02]), 1538 * np.array([.39, .44, .06, .02]), 713 * np.array([.43, .38, .06, .01]), 400 * np.array([.39, .37, .08, .03]), 600 * np.array([.44, .42, .09, .01]), 718 * np.array([.42, .42, .05, .01]), 484 * np.array([.41, .46, .09, .01]), 649 * np.array([.38, .40, .12, .03])], dtype=int),
    'Arkansas': np.array([463 * np.array([.33, .56, .04, .02]), 831 * np.array([.34, .55, .03, .01]), 600 * np.array([.29, .57, .05, .03])], dtype=int),
    'Colorado': np.array([1150 * np.array([.45, .44, .05, .04]), 500 * np.array([.44, .38, .07, .02]), 550 * np.array([.39, .39, .05, .04]), 750 * np.array([.44, .41, .08, .04]), 685 * np.array([.45, .37, .10, .03]), 400 * np.array([.49, .38, .07, .03]), 602 * np.array([.44, .33, .10, .03]), 694 * np.array([.46, .40, .06, .02]), 784 * np.array([.41, .42, .13, .03]), 991 * np.array([.40, .39, .07, .02]), 644 * np.array([.44, .42, .10, .02]), 540 * np.array([.41, .34, .12, .03]), 600 * np.array([.38, .42, .13, .02]), 704 * np.array([.48, .43, .04, .02]), 605 * np.array([.43, .38, .07, .02]), 997 * np.array([.42, .39, .07, .02])], dtype=int),
}

states = sorted(survey_results.keys())
print 'Modeling', len(states), 'states with', sum(electoral_votes[s] for s in states), 'electoral college votes'

Modeling 5 states with 38 electoral college votes


## Generative model

1. For each state we generate an $\vec{\alpha}$ vector, which defines a Dirichlet distribution over the proportion of votes that go to each of the 4 candidates whenever we do a survey — including the final survey, namely the election itself which we want to predict. The prior over each component of $\vec{\alpha}$ is taken as a Cauchy distribution with location 0 and scale 1. Since the components of $\vec{\alpha}$ are positive, we actually use the positive half-Cauchy distribution.

2. For each survey in a state we generate a probability vector $\vec{p_i} \sim \text{Dirichlet}(\vec{\alpha})$ for the probability that a voter selects each of the 4 candidates.

3. For each survey, we then generate the number of votes going to each candidate as $\vec{k_i} \sim \text{Multinomial}(\vec{p_i})$.

### Tasks

* Use Stan to sample from the posterior distribution over $\alpha$ and visualize your results. There are 10 states, so you will have 10 posteriors.
* The posteriors over $\alpha$ show a lot of variation between different states. Explain the results you get in terms of the model and the data.

In [19]:
stan_code = '''
data {
int<lower=1> K;  // Number of candidates(categories)
int<lower=0> N;  // Number of surveys
real cauchy_loc;  // Hyperparameters for Cauchy
real<lower=0> cauchy_scale;  // Hyperparameters for Cauchy
int<lower=0> x[N,K];  // Survey results(observations)
}

parameters {
vector[K] alpha;  // Concentration parameters for Dirichlet
simplex[K] p[N];  // Probability parameters for Multinomial
}

model {
for (i in 1:K) {
    alpha[i] ~ cauchy(cauchy_loc, cauchy_scale);
}
for (i in 1:N) { // For each survey
    p[i,:] ~ dirichlet(fabs(alpha));
    x[i,:] ~ multinomial(p[i,:]);
}
}
'''

stan_model = pystan.StanModel(model_code=stan_code)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_8f3004cabe1cb659a1b84d3acf11a32f NOW.


In [20]:
for state, obs in survey_results.items():
    data = {
        'K':4,
        'N':len(obs),
        'cauchy_loc': 0.,
        'cauchy_scale': 1.,
        'x': obs
    }
    
    print state
    stan_output = stan_model.sampling(data=data)
    summary = stan_output.stansummary(pars=['alpha'])
    print summary

Alabama




Inference for Stan model: anon_model_8f3004cabe1cb659a1b84d3acf11a32f.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha[1]   0.17    0.48   9.79 -16.37   -1.1-4.8e-3   1.04   15.2  417.0   1.01
alpha[2]  -0.24    0.37   8.05 -14.57  -1.06  -0.03   0.93  13.31  473.0   1.01
alpha[3]  -0.09    0.31   8.51 -11.07  -1.02-1.9e-3   1.04  12.94  747.0    1.0
alpha[4]   0.07    0.42  11.67  -14.8  -1.07 7.5e-4   1.06  16.61  786.0   1.01

Samples were drawn using NUTS at Mon Oct 22 14:26:45 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Arizona




Inference for Stan model: anon_model_8f3004cabe1cb659a1b84d3acf11a32f.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha[1] -88.99    0.98  24.85 -147.6 -102.5 -85.63 -71.56 -49.95  644.0   1.01
alpha[2]  46.97   56.88  83.79 -124.6  -2.09  79.38  100.3 150.56    2.0   3.48
alpha[3]    5.9    7.52  11.08  -17.0   0.04  10.29  12.94  19.11    2.0   3.46
alpha[4]   0.02     2.5   3.65  -5.32  -3.36  -0.05   3.42   5.34    2.0   3.94

Samples were drawn using NUTS at Mon Oct 22 14:27:58 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Arkansas




Inference for Stan model: anon_model_8f3004cabe1cb659a1b84d3acf11a32f.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha[1]   1.93   16.03   31.8 -57.14 -14.65  -0.25  15.55  78.68    4.0   1.47
alpha[2]  10.53    12.8  54.92 -94.74 -17.66  14.08  34.44 129.62   18.0   1.15
alpha[3]  -0.13    2.07   4.01  -9.15  -2.06   0.02   1.94   7.58    4.0   1.51
alpha[4]   0.72    1.01   1.91  -3.37 9.1e-3   0.88   1.55   4.65    4.0   1.53

Samples were drawn using NUTS at Mon Oct 22 14:28:18 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Alaska




Inference for Stan model: anon_model_8f3004cabe1cb659a1b84d3acf11a32f.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha[1]  -0.28    9.18  14.79 -26.87 -11.49  -0.02  10.59  27.63    3.0   2.06
alpha[2]   0.03     9.6  15.48 -28.35  -11.6  -0.12  11.33  29.21    3.0   2.06
alpha[3]   1.88    2.22   3.71  -5.85  -0.05   2.37   4.11   8.62    3.0   1.84
alpha[4]  -0.76     0.9   1.47  -3.22  -1.65  -1.05   0.03    2.3    3.0   2.01

Samples were drawn using NUTS at Mon Oct 22 14:28:29 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
Colorado




Inference for Stan model: anon_model_8f3004cabe1cb659a1b84d3acf11a32f.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha[1]  -0.06   64.36  94.22 -140.7 -86.91  -1.41  87.21 135.87    2.0   3.76
alpha[2]   0.25   58.59  85.82 -126.2 -79.21   2.19  79.81 127.12    2.0   3.74
alpha[3]  -0.06   11.84  17.35 -25.97 -16.01  -0.85  15.89  25.34    2.0   3.72
alpha[4] 7.7e-3    4.09    6.0  -8.94  -5.51   0.08   5.54   8.85    2.0   3.65

Samples were drawn using NUTS at Mon Oct 22 14:29:45 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).


## Simulation time

Use the posterior samples to predict the outcome of the presidential elections.

* Predict the probability that each candidate will win each state.
   * Use the posterior $\alpha$ samples to generate posterior predictive samples for $p$ — the proportion of votes each candidate would get in each state in an election.
   * Use these $p$ samples to estimate the probability that each candidate will win each state.
* Predict the probability that each candidate will win the presidential election.
   * Use the posterior predictive probability that each candidate will win each state to generate samples over the total number Electoral College votes each candidate would get in an election.
   * Use the total number of votes to generate samples over who would win the election.