In [1]:
from __future__ import division, print_function
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

import math
import numpy as np
import numpy.random
import scipy as sp
import scipy.stats

from ballot_comparison import findNmin_ballot_comparison_rates
from hypergeometric import trihypergeometric_optim, simulate_ballot_polling_power

# Example of a hybrid audit

There are two strata. One contains every CVR county and the other contains every no-CVR county.
There were 110,000 ballots cast in the election, 100,000 in the CVR stratum and 10,000 in the no-CVR stratum.
In the CVR stratum, there were 45,500 votes reported for A, 49,500 votes for candidate B, and 5,000 invalid ballots.
In the no-CVR stratum, there were 7,500 votes reported for A, 1,500 votes for B, and 1000 invalid ballots.
A won overall, with 53,000 votes to B's 51,000, but not in the CVR stratum.

The reported vote margin between A and B is 2000 votes, a "diluted margin" of $2000/110000 = 1.8\%$.

For any $\lambda$, the reported outcome of the election is correct if the overstatement of the margin in the CVR stratum is less than $2000\lambda$ votes and if the overstatement of the margin in the no-CVR stratum is less than $2000(1-\lambda)$ votes. 
For this example, we set $\lambda = 0.1$, which allocates most overstatement error to the no-CVR stratum.

We want to limit the risk of certifying an incorrect outcome to at most $\alpha=10\%$. 
We allocate risk unequally between the two strata: $\alpha_1 = 3\%$ in the CVR stratum and $\alpha_2 = 7\%$ in the no-CVR stratum; this gives an overall risk limit of $1-(1-.03)(1-.07) < 9.8\%$.

We test the following pair of null hypotheses, using independent samples from the two strata:

* the overstatment in the CVR stratum is less than $2000\lambda$. We test at significance level
(risk limit) $\alpha_1$ using a ballot-level comparison audit

* the overstatment in the no-CVR stratum is less than $2000(1-\lambda)$. We test this at significance level (risk limit) $\alpha_2$ using a ballot-polling audit

If either null is not rejected, we hand count the corresponding stratum completely, adjust the null
in the other stratum to reflect the known tally in the other stratum, and then determine whether there needs to be
more auditing in the stratum that was not fully hand counted.



In [2]:
lambda1 = 0.3
lambda2 = 1-lambda1
alpha1 = 0.03
alpha2 = 0.07
N1 = 100000
N2 = 10000
N_w1 = 45500
N_l1 = 49500
N_w2 = 7500
N_l2= 1500
margin = (N_w1 + N_w2 - N_l1 - N_l2)

# CVR stratum

We compute the sample size needed to confirm the election outcome, for a number of assumed rates of error in the population of ballots.

We take the chosen $\lambda$ from above and plug it in as the parameter `null_lambda` in the function below.

We set $\gamma = 1.03905$ as in "A Gentle Introduction to Risk-limiting Audits."

In [3]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1, null_lambda=lambda1)

1213.0

In [4]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1, null_lambda=lambda1)

1569.0

# No-CVR stratum

Below, we compute the sample size $n$ needed to confirm the election outcome.

Define
$$
    c = \text{reported margin in the stratum } - \lambda_2 \text{overall reported margin}.
$$

The reported margin in the stratum could be large or small, but it is known. 

$c$ defines the null hypothesis. We test the null that the actual margin in the stratum is less than or equal to $c$: $A_{w, 2} - A_{\ell, 2} \leq c$. Here, $A_{w, 2}+A_{\ell,2}$ is an unknown nuisance parameter.

In practice, we will maximize the $p$-value over all possible pairs $(A_{w,2}, A_{\ell, 2})$ in the null.

In [5]:
c = (N_w2-N_l2) - lambda2*margin
print("margin adjusted for allowable overstatement is ", c)
simulate_ballot_polling_power(N_w=N_w2, N_l=N_l2, N=N2, 
                              null_margin=c, n=250, alpha=alpha2,
                              reps=10000, verbose=False)

margin adjusted for allowable overstatement is  4600.0
threshold margin is  136


0.8933

In [7]:
c = (N_w2-N_l2) - lambda2*margin
print("margin adjusted for allowable overstatement is ", c)
simulate_ballot_polling_power(N_w=N_w2, N_l=N_l2, N=N2, 
                              null_margin=c, n=350, alpha=alpha2,
                              reps=10000, verbose=False)

margin adjusted for allowable overstatement is  4600.0
threshold margin is  186


0.96