# Hybrid audit examples

In [1]:
from __future__ import division, print_function
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

import math
import numpy as np
import numpy.random
import scipy as sp
import scipy.stats

from ballot_comparison import findNmin_ballot_comparison_rates
from hypergeometric import trihypergeometric_optim, simulate_ballot_polling_power
from fishers_combination import simulate_fisher_combined_audit, calculate_lambda_range

# Example 1 - medium sized election, close race

There are two strata. One contains every CVR county and the other contains every no-CVR county.
There were 110,000 ballots cast in the election, 100,000 in the CVR stratum and 10,000 in the no-CVR stratum.

In the CVR stratum, there were 45,500 votes reported for A, 49,500 votes for candidate B, and 5,000 invalid ballots.
In the no-CVR stratum, there were 7,500 votes reported for A, 1,500 votes for B, and 1000 invalid ballots.
A won overall, with 53,000 votes to B's 51,000, but not in the CVR stratum.
The reported vote margin between A and B is 2,000 votes, a "diluted margin" of $2,000/110,000 = 1.8\%$.


Candidate | Stratum 1 | Stratum 2 | total 
---|---|---|---
A | 45,500 | 7,500 | 53,000
B | 49,500 | 1,500 | 51,000
Ballots | 100,000 | 10,000 | 110,000
Diluted margin | -4% | 60% | 1.8%


The reported outcome of the election is correct if, for every $\lambda$, either the overstatement of the margin in the CVR stratum is less than $2000\lambda$ votes or the overstatement of the margin in the no-CVR stratum is less than $2000(1-\lambda)$ votes. 

We want to limit the risk of certifying an incorrect outcome to at most $\alpha=10\%$. 

# Using Fisher's method to combine the audits

In [2]:
alpha = 0.1

N_w1 = 45500
N_l1 = 49500
N_w2 = 7500
N_l2 = 1500
N1 = 100000
N2 = 10000
margin = (N_w1 + N_w2 - N_l1 - N_l2)

In [3]:
calculate_lambda_range(N_w1, N_l1, N1, N_w2, N_l2, N2)

(-7.0, 3.0)

By plotting the Fisher's combined $p$-value along a grid, we determined that the Fisher's combined $p$-value is maximized somewhere on $[0, 1]$. Below, we restrict the search to that region.

In [4]:
np.random.seed(20180514)

n1 = 750
n2 = 500

power = simulate_fisher_combined_audit(N_w1, N_l1, N1, N_w2, N_l2, N2, n1, n2, alpha,
    reps=10000, feasible_lambda_range=(-7.0, 3.0))
print("In 10000 simulations with a CVR stratum sample size of 700 ballots and \
        \n no-CVR stratum sample size of 500 ballots, the rate of stopping the audit is ", \
      power)

In 10000 simulations with a CVR stratum sample size of 700 ballots and         
 no-CVR stratum sample size of 500 ballots, the rate of stopping the audit is  0.9438


# What if we could do a ballot-comparison audit for the entire contest?

With current technology, this isn't possible. We'll use a risk limit of 10% to be consistent with the example above.

In [5]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, 
                                null_lambda=1)

263.0

In [6]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, 
                                null_lambda=1)

284.0

# Instead, what if we did an inefficient approach?

In Section 2.3 of the paper, we suggest a simple-but-pessimistic approach: sample uniformly from all counties as if one were performing a ballot-level comparison audit everywhere, but to treat any ballot selected from a legacy county as a two-vote overstatement.

In this example, $10,000/1,100,000 \approx 9\%$ of ballots come from the no-CVR stratum. We find that we'd proceed to a full hand count.

In [7]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=N1/(N1+N2), s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

nan

If, instead, the margin were a bit larger (in this example, let's say 10,000 votes) and the no-CVR counties made up only 1.2% of total ballots, things would be more favorable.

In [8]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=0.012, s2=0,
                                reported_margin=10000, N=N1+N2, null_lambda=1)

430.0

# Example 2 - large election, large margin

There are two strata. One contains every CVR county and the other contains every no-CVR county.
There were 2 million ballots cast in the election, 1.9 million in the CVR stratum and 100,000 in the no-CVR stratum.

In the CVR stratum, the diluted margin was $21\%$: there were 1,102,000 votes reported for A, 703,000 votes reported for candidate B, and 76,000 invalid ballots.
In the no-CVR stratum, the diluted margin was $-10\%$: there were 42,500 votes reported for A, 52,500 votes for B, and 5,000 invalid ballots.
A won overall, with 1,144,500 votes to B's 755,500, but not in the CVR stratum.
The reported vote margin between A and B is 389,000 votes, a "diluted margin" of $389,000/2,000,000 = 19.45\%$.


Candidate | Stratum 1 | Stratum 2 | total 
---|---|---|---
A | 1,102,000 | 42,500 | 1,144,500
B | 703,000 |  52,500 | 755,500
Ballots | 1,900,000 | 100,000 | 2,000,000
Diluted margin | 21% | -10% | 19.45%



We want to limit the risk of certifying an incorrect outcome to at most $\alpha=5\%$. 


# Using Fisher's method to combine the audits

In [9]:
alpha = 0.05
N1 = 1900000
N2 = 100000
N_w1 = 1102000
N_l1 = 703000
N_w2 = 42500
N_l2= 52500
margin = (N_w1 + N_w2 - N_l1 - N_l2)

In [10]:
calculate_lambda_range(N_w1, N_l1, N1, N_w2, N_l2, N2)

(0.7686375321336761, 1.2827763496143958)

By plotting the Fisher's combined $p$-value along a grid, we determined that the Fisher's combined $p$-value is maximized somewhere on $[0.5, 1.5]$. Below, we restrict the search to that region.

In [11]:
np.random.seed(20180514)
n1 = 50
n2 = 25
power = simulate_fisher_combined_audit(N_w1, N_l1, N1, N_w2, N_l2, N2, n1, n2, alpha,
    reps=10000, feasible_lambda_range = (0.5, 2))
print("In 10,000 simulations with a CVR stratum sample size of 50 ballots \n \
and no-CVR stratum sample size of 25 ballots, the rate of stopping the audit is ", \
      power)

In 10,000 simulations with a CVR stratum sample size of 50 ballots 
 and no-CVR stratum sample size of 25 ballots, the rate of stopping the audit is  0.9328


# What if we could do a ballot-comparison audit for the entire contest?

With current technology, this isn't possible. 

In [12]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

31.0

In [13]:
# Assuming that the audit will find 1-vote overstatements at rate 0.1%
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0.001, s1=0, r2=0, s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

31.0

# Instead, what if we did an inefficient approach?

In Section 2.3 of the paper, we suggest a simple-but-pessimistic approach: sample uniformly from all counties as if one were performing a ballot-level comparison audit everywhere, but to treat any ballot selected from a legacy county as a two-vote overstatement.

In this example, $100,000/2,000,000 = 5\%$ of ballots come from the no-CVR stratum. That is large enough
that if we treat all ballots sampled from the no-CVR stratum as 2-vote overstatemnts, the audit would be expected to require a full hand count. 

In [14]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha, gamma=1.03905, 
                                r1=0, s1=0, r2=N1/(N1+N2), s2=0,
                                reported_margin=margin, N=N1+N2, null_lambda=1)

nan