**Bayesian risk estimation for District Attorney-Public Administrator**
(Orange County, CA, 2018 Primary)

Based on email from Neal McBurnett, and Orange County Audit report.

Written by Ron Rivest 11/3/2018.  (MIT License for code.)

See [Bayesian Tabulation Audits Explained and Extended](https://arxiv.org/abs/1801.00528)

Four candidates: R, S, M, A (first initial of last name only).
Reported vote counts:

     R: 209,148
     S: 191,346
     M: 121,818
     A: 20,890
     total: 543,202  

**Voting Rule:** If one candidate wins a majority, then they win.  Else the top two go on to general election.


**Caveats** The code has some minor approximations that shouldn't matter,
and that make the code shorter, e.g. using Dirichlet approximation rather
than Dirichlet-multinomial (OK since sample is very small compared to
total number of votes).  It also uses a Haldane prior (setting all pseudocounts to 0)
(OK I believe since sample contains all candidates).  Code could be tweaked
easily for a slightly different Bayesian approach.


In [1]:
from collections import Counter
import numpy as np

In [2]:
def gamma(x):
    """ Return usual gamma(x), made 'safe' by returning 0 on 0 input. """
    if x<=0:
        return 0
    else:
        return np.random.gamma(x)

In [3]:
def contest_outcome(tally):
    """ 
    Return "top two" (aka runoff) outcome for given tally.
    
    tally is Counter mapping candidates to vote counts.
    """
    
    total_votes = sum(tally.values())
    top_two = tally.most_common(2)
    if top_two[0][1] > total_votes/2:
            return (top_two[0][0],)
    L = sorted([(votes, candidate) for candidate, votes in tally.items()], reverse=True)
    return tuple(sorted([top_two[0][0], top_two[1][0]]))

In [4]:
# check outcome for actual election. Reported outcome was ('R', 'S')
tally = Counter({"R":209148, "S":191346, "M":121918, "A":20890})
print("Election tally:", tally)
reported_outcome = contest_outcome(tally)
print("Reported election outcome:", reported_outcome)
print()
# following sample tally from Neal's email
sample_tally = Counter({"R":22, "S":23, "M":9, "A":6})
print("Sample tally:", sample_tally)
print("Election outcome on sample:", contest_outcome(sample_tally))

Election tally: Counter({'R': 209148, 'S': 191346, 'M': 121918, 'A': 20890})
Reported election outcome: ('R', 'S')

Sample tally: Counter({'S': 23, 'R': 22, 'M': 9, 'A': 6})
Election outcome on sample: ('R', 'S')


In [5]:
# 100000 trials takes about 3 seconds
n_trials = 100000
counts = Counter()

for trial in range(n_trials):
    trial_tally = Counter({candidate: gamma(votes)
                    for candidate, votes in sample_tally.items()})
    trial_outcome = contest_outcome(trial_tally)
    counts[trial_outcome] += 1

print("Outcome         Percent")
for outcome, cnt in counts.most_common():
    print("  {:13s}  {:6.2%}"
          .format(str(outcome), cnt/n_trials))

Outcome         Percent
  ('R', 'S')     93.55%
  ('S',)          3.38%
  ('R',)          1.81%
  ('M', 'S')      0.71%
  ('M', 'R')      0.45%
  ('A', 'S')      0.07%
  ('A', 'R')      0.04%


In [6]:
print("Bayesian risk estimate: {:.1%}".format(1.0 - (counts[reported_outcome] / n_trials)))

Bayesian risk estimate: 6.5%
