In [3]:
from __future__ import division
import math
import numpy as np
import numpy.random
import scipy as sp
import scipy.stats
from scipy.optimize import minimize_scalar

from ballot_comparison import findNmin_ballot_comparison_rates
from hypergeometric import hypergeometric_optim

# Example of a hybrid audit

Suppose we have two strata, CVR counties and non-CVR counties.
There were 10,000 ballots cast in the election, 90% (9000) in the CVR stratum and 10% (1000) in the non-CVR stratum.

We're interested in a contest between winner $w$ and loser $l$ that appears on 50% of ballots (5000). Suppose the contest was reported exactly correctly: $w$ received 2,600 votes and $\ell$ received 2,400. This is a diluted margin of $2\%$ and a raw margin of 200 ballots.

For $\lambda \in [0,1]$, the outcome of the election was called correctly if the overstatement of the margin in the CVR stratum is less than $200\lambda$ votes and if the overstatement of the margin in the non-CVR stratum is less than $200(1-\lambda)$ votes. For this example, we'll set $\lambda = 0.9$ to reflect the sizes of the two strata.

We'd like to limit the audit risk to $\alpha=10\%$. We divide risk unequally between the two strata: $\alpha_1 = 3\%$ in the CVR stratum and $\alpha_2 = 7\%$ in the non-CVR stratum.

We do this by testing the following null hypotheses:
* in the CVR stratum, test the null that the overstatment is less than $200\lambda$ at risk level $\alpha_1$ using a ballot comparison test
* in the non-CVR stratum, test the null that the overstatment is less than $200(1-\lambda)$ at risk level $\alpha_2$ using a ballot polling test



In [4]:
lambda1 = 0.9
lambda2 = 1-lambda1
alpha1 = 0.03
alpha2 = 0.07
margin = 200
N1 = 9000
N2 = 1000

# CVR stratum

Below, we compute the sample size needed to confirm the election outcome, for varying rates of errors in the population of ballots.

We take the chosen $\lambda$ from above and plug it in as the parameter `null_lambda` in the function below.

We set $\gamma = 1.03905$ as in "A Gentle Introduction to Risk-limiting Audits".

In [5]:
# Assuming that the audit will find no errors
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0, s1=0, r2=0, s2=0,
                                reported_margin=200, N=N1, null_lambda=lambda1)

294.0

In [6]:
# Assuming that the audit will find 1-vote overstatements at rate 1%
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0.01, s1=0, r2=0, s2=0,
                                reported_margin=200, N=N1, null_lambda=lambda1)

651.0

In [9]:
# Assuming that the audit will find 1-vote understatements at rate 1%
findNmin_ballot_comparison_rates(alpha=alpha1, gamma=1.03905, 
                                r1=0, s1=0.01, r2=0, s2=0,
                                reported_margin=200, N=N1, null_lambda=lambda1)

220

# Non-CVR stratum

Below, we compute the sample size needed to confirm the election outcome, for varying sample size $n$.

Define
$$
    c = \text{reported margin in the stratum } - \lambda_2 \text{overall reported margin}.
$$

The reported margin in the stratum could be large or small, but it is known. Below, we will vary it just to see the effect.

$c$ defines the null hypothesis. We test the null that the actual margin in the stratum is less than or equal to $c$: $A_{w, 2} - A_{\ell, 2} \leq c$. Here, $A_{w, 2}$ is an unknown nuisance parameter.

In practice, we will maximize the $p$-value over all possible values of $A_{w,2}$ and $A_{\ell, 2}$ under the null.

In [8]:
# Assuming that the audit finds exactly the right vote proportions
# and that the stratum reported margin (in percentages) equals the overall
# sample size = 100, 10% of the stratum size.

sample = np.array([0]*24 + [1]*26 + [np.nan]*50)
c = 0.02*N2 - lambda2*margin
hypergeometric_optim(sample, popsize=N2, null_margin=c)

0.44232495594329824

In [9]:
# Assuming that the audit finds exactly the right vote proportions
# and that the stratum reported margin is 1% -- less than the overall
# sample size = 100, 10% of the stratum size.

sample = np.array([0]*24 + [1]*26 + [np.nan]*50)
c = 0.01*N2 - lambda2*margin
hypergeometric_optim(sample, popsize=N2, null_margin=c)

0.41219851732492452

In [10]:
# Assuming that the audit finds exactly the right vote proportions
# and that the stratum reported margin is 5% -- greater than the overall
# sample size = 100, 10% of the stratum size.

sample = np.array([0]*24 + [1]*26 + [np.nan]*50)
c = 0.05*N2 - lambda2*margin
hypergeometric_optim(sample, popsize=N2, null_margin=c)

1.0