**Bayesian risk estimation for Atty-Public Admin**
(Orange County, CA, 2018)

Based on email from Neal McBurnett, and Orange County Audit report.

Written by Ron Rivest 11/3/2018.  (MIT License for code.)

See [https://arxiv.org/abs/1801.00528] ("Bayesian Tabulation Audits Explained and Extended")

Four candidates: R, S, M, A (first initial of last name only).
Reported vote counts:

     R: 209,148
     S: 191,346
     M: 121,818
     A: 20,890
     total: 543,202
     
     

**Voting Rule:** If one candidate wins a majority,then they win.  Else the top 2 go on to general election.


**Caveats** The code has some minor approximations that shouldn't matter,
and that make the code shorter, e.g. using Dirichlet approximation rather
than Dirichlet-multinomial (OK since sample is very small compared to
total number of votes).  It also uses a Haldane prior (setting all pseudocounts to 0)
(OK I believe since sample contains all candidates).  Code could be tweaked
easily for a slightly different Bayesian approach.


In [1]:
import numpy as np
def gamma(x):
    """ Return usual gamma(x), made 'safe' by returning 0 on 0 input. """
    if x<=0:
        return 0
    else:
        return np.random.gamma(x)

In [2]:
def contest_outcome(tally):
    """ 
    Return outcome for given tally.
    
    tally is dict mapping candidates to vote counts.
    """
    
    total_votes = sum([tally[c] for c in tally])
    for c in tally:
        if tally[c] > total_votes/2:
            return c
    L = sorted([(tally[c],c) for c in tally], reverse=True)
    return tuple(sorted([L[0][1], L[1][1]]))              

In [3]:
# check outcome for actual election: should be ('R', 'S')
tally = {"R":209148, "S":191346, "M":121918, "A":20890}
print("Election tally:", tally)
print("Reported election outcome:", contest_outcome(tally))
print()
# following sample tally from Neal's email
sample_tally = {"R":22, "S":23, "M":9, "A":6}
print("Sample tally:", sample_tally)
print("Election outcome on sample:", contest_outcome(sample_tally))

Election tally: {'R': 209148, 'S': 191346, 'M': 121918, 'A': 20890}
Reported election outcome: ('R', 'S')

Sample tally: {'R': 22, 'S': 23, 'M': 9, 'A': 6}
Election outcome on sample: ('R', 'S')


In [7]:
n_trials = 1000000
count = {}
for trial in range(n_trials):
    sample_tally2 = {c:gamma(sample_tally[c]) 
                     for c in sample_tally}
    trial_outcome = contest_outcome(sample_tally2)
    count[trial_outcome] = 1+count.get(trial_outcome, 0)
L = sorted([(count[outcome], outcome) for outcome in count], 
           reverse=True)
print("Outcome         Percent")
for cnt, outcome in L:
    print("  {:13s}  {:5.2f}"
          .format(str(outcome), 100*cnt/n_trials))

Outcome         Percent
  ('R', 'S')     93.52
  S               3.38
  R               1.82
  ('M', 'S')      0.71
  ('M', 'R')      0.48
  ('A', 'S')      0.06
  ('A', 'R')      0.04
  ('A', 'M')      0.00
