# Hybrid audit examples

In [1]:
from __future__ import division, print_function
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

import math
import numpy as np
import numpy.random
import scipy as sp
import scipy.stats

from ballot_comparison import findNmin_ballot_comparison_rates
from hypergeometric import trihypergeometric_optim, simulate_ballot_polling_power

# Example 1 - medium sized election, close race

There are two strata. One contains every CVR county and the other contains every no-CVR county.
There were 110,000 ballots cast in the election, 100,000 in the CVR stratum and 10,000 in the no-CVR stratum.

In the CVR stratum, there were 45,500 votes reported for A, 49,500 votes for candidate B, and 5,000 invalid ballots.
In the no-CVR stratum, there were 7,500 votes reported for A, 1,500 votes for B, and 1000 invalid ballots.
A won overall, with 53,000 votes to B's 51,000, but not in the CVR stratum.
The reported vote margin between A and B is 2,000 votes, a "diluted margin" of $2,000/110,000 = 1.8\%$.


Candidate | Stratum 1 | Stratum 2 | total 
---|---|---|---
A | 45,500 | 7,500 | 53,000
B | 49,500 | 1,500 | 51,000
Ballots | 100,000 | 10,000 | 110,000
Diluted margin | -4% | 60% | 1.8%




For any $\lambda$, the reported outcome of the election is correct if the overstatement of the margin in the CVR stratum is less than $2000\lambda$ votes and if the overstatement of the margin in the no-CVR stratum is less than $2000(1-\lambda)$ votes. 
For this example, we set $\lambda = 0.1$, which allocates most overstatement error to the no-CVR stratum.

We want to limit the risk of certifying an incorrect outcome to at most $\alpha=10\%$. 
We allocate risk unequally between the two strata: $\alpha_1 = 3\%$ in the CVR stratum and $\alpha_2 = 7\%$ in the no-CVR stratum; this gives an overall risk limit of $1-(1-.03)(1-.07) < 9.8\%$.

We test the following pair of null hypotheses, using independent samples from the two strata:

* the overstatment in the CVR stratum is less than $2000\lambda$. We test at significance level
(risk limit) $\alpha_1$ using a ballot-level comparison audit

* the overstatment in the no-CVR stratum is less than $2000(1-\lambda)$. We test this at significance level (risk limit) $\alpha_2$ using a ballot-polling audit

If either null is not rejected, we hand count the corresponding stratum completely, adjust the null
in the other stratum to reflect the known tally in the other stratum, and then determine whether there needs to be
more auditing in the stratum that was not fully hand counted.



In [2]:
lambda1 = 0.3
lambda2 = 1-lambda1
alpha1 = 0.03
alpha2 = 0.07
N1 = 100000
N2 = 10000
N_w1 = 45500
N_l1 = 49500
N_w2 = 7500
N_l2= 1500
margin = (N_w1 + N_w2 - N_l1 - N_l2)

# CVR stratum

We compute the sample size needed to confirm the election outcome, for a number of assumed rates of error in the population of ballots.

We take the chosen $\lambda$ from above and plug it in as the parameter `null_lambda` in the function below.

We set $\gamma = 1.03905$ as in "A Gentle Introduction to Risk-limiting Audits."

In [3]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1, null_lambda=lambda1)

1213.0

In [4]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1, null_lambda=lambda1)

1569.0

# No-CVR stratum

Below, we compute the sample size $n$ needed to confirm the election outcome.

Define
$$
    c = \text{reported margin in the stratum } - \lambda_2 \text{overall reported margin}.
$$

The reported margin in the stratum could be large or small, but it is known. 

$c$ defines the null hypothesis. We test the null that the actual margin in the stratum is less than or equal to $c$: $A_{w, 2} - A_{\ell, 2} \leq c$. Here, $A_{w, 2}+A_{\ell,2}$ is an unknown nuisance parameter.

In practice, we will maximize the $p$-value over all possible pairs $(A_{w,2}, A_{\ell, 2})$ in the null.

In [5]:
c = (N_w2-N_l2) - lambda2*margin
print("margin adjusted for allowable overstatement is ", c)
power = simulate_ballot_polling_power(N_w=N_w2, N_l=N_l2, N=N2, 
                              null_margin=c, n=250, alpha=alpha2,
                              reps=10000, verbose=False)
print("In 10,000 simulations with a sample size of 250 ballots, the rate of stopping the audit is ", power)

margin adjusted for allowable overstatement is  4600.0
The critical value of the test is  136
In 10,000 simulations with a sample size of 250 ballots, the rate of stopping the audit is  0.8933


In [6]:
c = (N_w2-N_l2) - lambda2*margin
print("margin adjusted for allowable overstatement is ", c)
power = simulate_ballot_polling_power(N_w=N_w2, N_l=N_l2, N=N2, 
                              null_margin=c, n=450, alpha=alpha2,
                              reps=10000, verbose=False)
print("In 10,000 simulations with a sample size of 450 ballots, the rate of stopping the audit is ", power)

margin adjusted for allowable overstatement is  4600.0
The critical value of the test is  235
In 10,000 simulations with a sample size of 450 ballots, the rate of stopping the audit is  0.9872


# What if we could do a ballot-comparison audit for the entire contest?

With current technology, this isn't possible. We'll use a risk limit of 10% to be consistent with the example above.

In [7]:
alpha = 0.1

In [8]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, 
                                null_lambda=1)

263.0

In [9]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, 
                                null_lambda=1)

284.0

# Instead, what if we did an inefficient approach?

In Section 2.3 of the paper, we suggest a simple-but-pessimistic approach: sample uniformly from all counties as if one were performing a ballot-level comparison audit everywhere, but to treat any ballot selected from a legacy county as a two-vote overstatement.

In this example, $10,000/1,100,000 \approx 9\%$ of ballots come from the no-CVR stratum. We find that we'd proceed to a full hand count.

In [10]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=N1/(N1+N2), s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

nan

If, instead, the margin were a bit larger (in this example, let's say 10,000 votes) and the no-CVR counties made up only 1.2% of total ballots, things would be more favorable.

In [11]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=0.012, s2=0,
                                reported_margin=10000, N=N1+N2, null_lambda=1)

430.0

# Example 2 - large election, large margin

There are two strata. One contains every CVR county and the other contains every no-CVR county.
There were 2 million ballots cast in the election, 1.9 million in the CVR stratum and 100,000 in the no-CVR stratum.

In the CVR stratum, the diluted margin was $21\%$: there were 1,102,000 votes reported for A, 703,000 votes reported for candidate B, and 76,000 invalid ballots.
In the no-CVR stratum, the diluted margin was $-10\%$: there were 42,500 votes reported for A, 52,500 votes for B, and 5,000 invalid ballots.
A won overall, with 1,144,500 votes to B's 755,500, but not in the CVR stratum.
The reported vote margin between A and B is 389,000 votes, a "diluted margin" of $389,000/2,000,000 = 19.45\%$.


Candidate | Stratum 1 | Stratum 2 | total 
---|---|---|---
A | 1,102,000 | 42,500 | 1,144,500
B | 703,000 |  52,500 | 755,500
Ballots | 1,900,000 | 100,000 | 2,000,000
Diluted margin | 21% | -10% | 19.45%



For any $\lambda$, the reported outcome of the election is correct if the overstatement of the margin in the CVR stratum is less than $389000\lambda$ votes and if the overstatement of the margin in the no-CVR stratum is less than $389000(1-\lambda)$ votes. 
For this example, we set $\lambda = 0.1$, which allocates most overstatement error to the no-CVR stratum.

We want to limit the risk of certifying an incorrect outcome to at most $\alpha=5\%$. 
We allocate risk unequally between the two strata.
The total risk is at most $1 - (1-\alpha_1)(1-\alpha_2)$.
One choice is to set $\alpha_1 = 1\%$ and to solve for the value of $\alpha_2$
which makes $1 - (1-\alpha_1)(1-\alpha_2) = \alpha$.
In this case, $\alpha_1=1\%$ and $\alpha_2 = 4.04\%$ achieve the desired risk limit.

We test the following pair of null hypotheses, using independent samples from the two strata:

* the overstatment in the CVR stratum is less than $389000\lambda$. We test at significance level
(risk limit) $\alpha_1$ using a ballot-level comparison audit

* the overstatment in the no-CVR stratum is less than $389000(1-\lambda)$. We test this at significance level (risk limit) $\alpha_2$ using a ballot-polling audit

If either null is not rejected, we hand count the corresponding stratum completely, adjust the null
in the other stratum to reflect the known tally in the other stratum, and then determine whether there needs to be
more auditing in the stratum that was not fully hand counted.



In [12]:
lambda1 = 0.9
lambda2 = 1-lambda1
alpha1 = 0.01
alpha2 = 0.0404
N1 = 1900000
N2 = 100000
N_w1 = 1102000
N_l1 = 703000
N_w2 = 42500
N_l2= 52500
margin = (N_w1 + N_w2 - N_l1 - N_l2)

# CVR stratum

We compute the sample size needed to confirm the election outcome, for a number of assumed rates of error in the population of ballots.

We take the chosen $\lambda$ from above and plug it in as the parameter `null_lambda` in the function below.

We set $\gamma = 1.03905$ as in "A Gentle Introduction to Risk-limiting Audits."

In [13]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1, null_lambda=lambda1)

50.0

In [14]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1, null_lambda=lambda1)

50.0

# No-CVR stratum

Below, we compute the sample size $n$ needed to confirm the election outcome.

Define
$$
    c = \text{reported margin in the stratum } - \lambda_2 \text{overall reported margin}.
$$

The reported margin in the stratum could be large or small, but it is known. 

$c$ defines the null hypothesis. We test the null that the actual margin in the stratum is less than or equal to $c$: $A_{w, 2} - A_{\ell, 2} \leq c$. Here, $A_{w, 2}+A_{\ell,2}$ is an unknown nuisance parameter.

In practice, we will maximize the $p$-value over all possible pairs $(A_{w,2}, A_{\ell, 2})$ in the null.

In [15]:
# Assuming that the stratum reported margin is accurate

# We don't know N_w, N_\ell so maximize the p-value over all possibilities.
c = int((N_w2-N_l2) - lambda2*margin)
print("Margin that must be exceeded given the allowable overstatement is ", c)
power = simulate_ballot_polling_power(N_w=N_w2, N_l=N_l2, N=N2, 
                              null_margin=c, n=50, alpha=alpha2,
                              reps=10000, verbose=False)
print("In 10,000 simulations with a sample size of 50 ballots, the rate of stopping the audit is ", power)

Margin that must be exceeded given the allowable overstatement is  -48899
The critical value of the test is  -13
In 10,000 simulations with a sample size of 50 ballots, the rate of stopping the audit is  0.8955


In [16]:
c = int((N_w2-N_l2) - lambda2*margin)
print("Margin that must be exceeded given the allowable overstatement is ", c)
power = simulate_ballot_polling_power(N_w=N_w2, N_l=N_l2, N=N2, 
                              null_margin=c, n=100, alpha=alpha2,
                              reps=10000, verbose=False)
print("In 10,000 simulations with a sample size of 100 ballots, the rate of stopping the audit is ", power)

Margin that must be exceeded given the allowable overstatement is  -48899
The critical value of the test is  -33
In 10,000 simulations with a sample size of 100 ballots, the rate of stopping the audit is  0.993


# What if we could do a ballot-comparison audit for the entire contest?

With current technology, this isn't possible. We'll use a risk limit of 10% to be consistent with the example above.

In [17]:
alpha = 0.1

In [18]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

24.0

In [19]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

24.0

# Instead, what if we did an inefficient approach?

In Section 2.3 of the paper, we suggest a simple-but-pessimistic approach: sample uniformly from all counties as if one were performing a ballot-level comparison audit everywhere, but to treat any ballot selected from a legacy county as a two-vote overstatement.

In this example, $100,000/2,000,000 = 5\%$ of ballots come from the no-CVR stratum. We find that we'd proceed to a full hand count.

In [20]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=N1/(N1+N2), s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

nan