# Orange County Risk-Limiting Audit for 2018 Primary Election

This jupyter notebook demonstrates the analysis of the audits of 5 contests in the Orange County California 2018 primary election.

Three of the five contests were used to drive a risk-limiting audit, and all achieved the risk limit of 20%: Assessor, Auditor-Controller, and Clerk-Recorder.

Two contests were audited opportunistically: District Attorney-Public Administrator and Sheriff-Coroner.

See the [README](../README.md) for the methodology.

In [99]:
%load_ext version_information
%version_information numpy, scipy, matplotlib

Software,Version
Python,3.6.6 64bit [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]
IPython,5.5.0
OS,Linux 4.15.0 38 generic x86_64 with Ubuntu 18.04 bionic
numpy,1.13.3
scipy,0.19.1
matplotlib,2.1.1
Wed Oct 31 14:26:34 2018 MDT,Wed Oct 31 14:26:34 2018 MDT


In [2]:
import sys
sys.path.append("../src/rla_utils")

In [77]:
import json
import parse_hart
import analyze_rounds
from collections import namedtuple

from IPython.core.display import HTML, Markdown, display

In [78]:
def printmd(string):
    display(Markdown(string))

table = "|c1|c2|\n"
table += "|---|---|\n"

for e in [{'c1': 42, 'c2': "hello"}, {'c1': 17, 'c2': "world"}]:
    table += '| {}|{}|\n'.format(e['c1'], e['c2'])

printmd(table)
print(table)


|c1|c2|
|---|---|
| 42|hello|
| 17|world|


|c1|c2|
|---|---|
| 42|hello|
| 17|world|



In [32]:
class Struct(object):
    """Comment removed"""
    def __init__(self, data):
        for name, value in data.items():
            setattr(self, name, self._wrap(value))

    def _wrap(self, value):
        if isinstance(value, (tuple, list, set, frozenset)): 
            return type(value)([self._wrap(v) for v in value])
        else:
            return Struct(value) if isinstance(value, dict) else value

In [44]:
# From https://stackoverflow.com/questions/6578986/how-to-convert-json-data-into-a-python-object/15882054#15882054
def _json_object_hook(d): return namedtuple('X', d.keys())(*d.values())
def json2obj(data): return json.loads(data, object_hook=_json_object_hook)

## Download contest results from Orange County elections website

In [4]:
%%bash
wget -nv -N http://ocvote.com/fileadmin/live/pri2018/media.zip
unzip -o media.zip

Archive:  media.zip
  inflating: contest_table.txt       


## Parse the contest results
Generate `contests.json` and `cvr.csv` files.

The cvr.csv file only contains the contests identified by the `-C` option, i.e. the five auditable countywide contests in the given election.

In [2]:
!python3 ../src/rla_utils/parse_hart.py -C '[12, 13, 14, 15, 16]' contest_table.txt > parse_hart.out

!mv /tmp/cvr.csv /tmp/contests.json .

## Run the audit
The results of the audit have been exported from RLATool via `rla_export`, and pre-loaded into this repository into the file `all_contest_audit_details_by_cvr.json` in two directories,
`initial-export` and `final-export`.

In [6]:
contests_file = "contests.json"

In [50]:
contests = json.load(open(contests_file, "r"))

In [33]:
# contests = Struct(json.load(open(contests_file, "r")))

In [47]:
# contests = json.load(open(contests_file, "r"), object_hook=_json_object_hook)

ValueError: Type names and field names must be valid identifiers: 'JOSH JONES'

`contests` is a dictionary of data on all the contests. We're only interested in the five which were selected for audit.

old: assumes they're structs

In [42]:
contests.Assessor

<__main__.Struct at 0x7fb1f0588198>

In [38]:
repr(contests)

'<__main__.Struct object at 0x7fb1f15b7ef0>'

In [51]:
[contest for contest in contests if 'selected' in contest]

[]

Back to good code

In [52]:
[contest for contest in contests.values() if 'selected' in contest]

[{'ballots': 635224,
  'choices': {'CLAUDE PARRISH': {'absentee_votes': 238395,
    'early_votes': 2115,
    'election_votes': 118124,
    'name': 'CLAUDE PARRISH',
    'votes': 358634},
   'NATHANIEL FERNANDEZ EPSTEIN': {'absentee_votes': 42540,
    'early_votes': 685,
    'election_votes': 27606,
    'name': 'NATHANIEL FERNANDEZ EPSTEIN',
    'votes': 70831},
   'RICHARD B. RAMIREZ': {'absentee_votes': 53562,
    'early_votes': 725,
    'election_votes': 36344,
    'name': 'RICHARD B. RAMIREZ',
    'votes': 90631}},
  'losers': ['NATHANIEL FERNANDEZ EPSTEIN'],
  'majority_margin': 0.37910693410447305,
  'margin': 0.038069894788654406,
  'name': 'Assessor',
  'precedence': 140,
  'precinct_count': 1561,
  'registered': 1481881,
  'selected': True,
  'winners': ['CLAUDE PARRISH', 'RICHARD B. RAMIREZ']},
 {'ballots': 635224,
  'choices': {'ERIC H. WOOLERY': {'absentee_votes': 243989,
    'early_votes': 2382,
    'election_votes': 123333,
    'name': 'ERIC H. WOOLERY',
    'votes': 36970

In [53]:
contest = contests['Assessor']

In [54]:
contest

{'ballots': 635224,
 'choices': {'CLAUDE PARRISH': {'absentee_votes': 238395,
   'early_votes': 2115,
   'election_votes': 118124,
   'name': 'CLAUDE PARRISH',
   'votes': 358634},
  'NATHANIEL FERNANDEZ EPSTEIN': {'absentee_votes': 42540,
   'early_votes': 685,
   'election_votes': 27606,
   'name': 'NATHANIEL FERNANDEZ EPSTEIN',
   'votes': 70831},
  'RICHARD B. RAMIREZ': {'absentee_votes': 53562,
   'early_votes': 725,
   'election_votes': 36344,
   'name': 'RICHARD B. RAMIREZ',
   'votes': 90631}},
 'losers': ['NATHANIEL FERNANDEZ EPSTEIN'],
 'majority_margin': 0.37910693410447305,
 'margin': 0.038069894788654406,
 'name': 'Assessor',
 'precedence': 140,
 'precinct_count': 1561,
 'registered': 1481881,
 'selected': True,
 'winners': ['CLAUDE PARRISH', 'RICHARD B. RAMIREZ']}

In [73]:
votes = 0
for choice in contest['choices'].values():
    votes += choice['votes']
    print("{:.2%} {:s}".format(choice['votes'] / contest['ballots'], choice['name']))

print("{:.2%} {:s}".format((contest['ballots'] - votes) / contest['ballots'], 'NO_VOTE'))

56.46% CLAUDE PARRISH
14.27% RICHARD B. RAMIREZ
11.15% NATHANIEL FERNANDEZ EPSTEIN
18.12% NO_VOTE


In [75]:
HTML('<a href="http://example.com">link</a>')

In [95]:
votes = 0
table = "|Vote share|Votes|Candidate|\n"
table += "|---|---|:---|\n"

for choice in contest['choices'].values():
    votes += choice['votes']
    table += "|{:.2%}|{:d}|{:s}|\n".format(choice['votes'] / contest['ballots'], choice['votes'], choice['name'])

residual = contest['ballots'] - votes
table += "|{:.2%}|{:d}|{:s}|\n".format(residual / contest['ballots'], residual, '<b>RESIDUAL VOTE</b>')

printmd(table)

|Vote share|Votes|Candidate|
|---|---|:---|
|56.46%|358634|CLAUDE PARRISH|
|14.27%|90631|RICHARD B. RAMIREZ|
|11.15%|70831|NATHANIEL FERNANDEZ EPSTEIN|
|18.12%|115128|<b>RESIDUAL VOTE</b>|


In [96]:
print(table)

|Vote share|Votes|Candidate|
|---|---|:---|
|56.46%|358634|CLAUDE PARRISH|
|14.27%|90631|RICHARD B. RAMIREZ|
|11.15%|70831|NATHANIEL FERNANDEZ EPSTEIN|
|18.12%|115128|<b>RESIDUAL VOTE</b>|



In [76]:
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

table = "|c1|c2|\n"
table += "|---|---|\n"

for e in [{'c1': 42, 'c2': "hello"}, {'c1': 17, 'c2': "world"}]:
    table += '| {}|{}|\n'.format(e['c1'], e['c2'])

printmd(table)
print(table)


|c1|c2|
|---|---|
| 42|hello|
| 17|world|


|c1|c2|
|---|---|
| 42|hello|
| 17|world|



## Calculate sample sizes and risk levels
analyze_rounds.py is used both to calculate the initial sample size (when no ballot cards have been observed yet), and to calculate risk levels after samples have been drawn and observed, and calculate any expansion of the audit that may be necessary. 

The code currently assumes that all contests are California top-two primary contests: either a single outright winner if they got more than 50% of all ballots cast for a valid candidate, or two winners to advance to the general election.

For each contest, it shows the reported votes and the votes observed in the sample. It then evaluates the various margins to be checked. There are two situations.

If the reported votes imply an outright winner, we are auditing just the margin between each candidate and a pool of all the other candidates.  That is the case, for example, in the Assessor contest.

If there was no apparent outright winner, we are auditing two things: whether there should have been an outright winner, and whether the two reported winners actually beat each of the reported losers. We do pairwise comparisons for each of the corresponding margins.

In either situation, the "Max" line indicates which of the margins calls for the largest sample size, from this point forward.

The actual sample size is determined manually based on these calculations, and incorporates tradeoffs between the risk of not ending up with convincing evidence, and thus needing to expand the audit to another round, vs the possibility that a sample might be a fortunate one in which the audit could finish early.

Note that if there are additional margins near the "Max" margin, samples unfavorable to confirming either one may drive the audit to expand, so a somewhat larger sample size than indicated is prudent.

### Initial sample sizes

Run analyze_rounds on the `initial-export` data, to get initial sample sizes.

In [5]:
!python3 ../src/rla_utils/analyze_rounds.py contests.json initial-export

Contest: Assessor, with 3 candidates. 0 samples entered

  358634 reported votes, 0 sample votes for CLAUDE PARRISH
  90631 reported votes, 0 sample votes for RICHARD B. RAMIREZ
  70831 reported votes, 0 sample votes for NATHANIEL FERNANDEZ EPSTEIN

       Sample 25: Risk 100.00% with margin: 37.91%; counts W: 358634 L: 161462 w: 0 l: 0 for CLAUDE PARRISH vs pool
       Sample 9: Risk 100.00% with margin: 65.15%; counts W: 429465 L: 90631 w: 0 l: 0 for pool vs RICHARD B. RAMIREZ
       Sample 7: Risk 100.00% with margin: 72.76%; counts W: 449265 L: 70831 w: 0 l: 0 for pool vs NATHANIEL FERNANDEZ EPSTEIN

  Max: Sample 25: Risk 100.00% with margin: 37.91%; counts W: 358634 L: 161462 w: 0 l: 0 for CLAUDE PARRISH vs pool


Contest: Auditor-Controller, with 2 candidates. 0 samples entered

  369704 reported votes, 0 sample votes for ERIC H. WOOLERY
  127768 reported votes, 0 sample votes for TONI SMART

       Sample 15: Risk 100.00% with margin: 48.63%; counts W: 369704 

### Final risk levels and sample sizes for other contests

Run analyze_rounds on the `final-export` data, to confirm that the risk limit was met for 3 contests, and calculate how many additional samples might meet the risk limit for the other contests.

In [6]:
!python3 ../src/rla_utils/analyze_rounds.py contests.json final-export

Contest: Assessor, with 3 candidates. 160 samples entered

  358634 reported votes, 40 sample votes for CLAUDE PARRISH
  90631 reported votes, 11 sample votes for RICHARD B. RAMIREZ
  70831 reported votes, 12 sample votes for NATHANIEL FERNANDEZ EPSTEIN

       Sample 0: Risk 15.02% with margin: 37.91%; counts W: 358634 L: 161462 w: 40 l: 23 for CLAUDE PARRISH vs pool
       Sample 0: Risk 0.00% with margin: 65.15%; counts W: 429465 L: 90631 w: 52 l: 11 for pool vs RICHARD B. RAMIREZ
       Sample 0: Risk 0.00% with margin: 72.76%; counts W: 449265 L: 70831 w: 51 l: 12 for pool vs NATHANIEL FERNANDEZ EPSTEIN

  Max: Sample 0: Risk 15.02% with margin: 37.91%; counts W: 358634 L: 161462 w: 40 l: 23 for CLAUDE PARRISH vs pool


Contest: Auditor-Controller, with 2 candidates. 160 samples entered

  369704 reported votes, 46 sample votes for ERIC H. WOOLERY
  127768 reported votes, 17 sample votes for TONI SMART

       Sample 0: Risk 0.10% with margin: 48.63%; counts W: 3