# Assertion RLA

Tool to audit generalized assertions about contests, including assertions for RLAs of IRV contests.

This tool can audit any number of contests simultaneously using the same sample.
The contests can be audited to different risk limits.
The contests can have different social choice functions, including majority or super-majority,
plurality, multi-winner plurality, and IRV.

The audit is "simultaneous" across contests in the following sense: 

>If the reported outcome of a contest is incorrect, the chance that the audit stops without a full hand count is at most the risk limit for that contest. 

The sample is drawn as a random sample of individual ballots, with or without replacement, from a single pool of all ballots cast in the contest(s).

The tool requires as input:

+ audit-specific and contest-specific parameters, such as
    - whether to sample with or without replacement
    - a risk limit for each contest to be audited
    - the social choice function for each contest, including the number of winners
    - candidate identifiers
+ a ballot manifest
+ a random seed
+ a file of cast vote records
+ reported results for each contest
+ assertions about each contest
+ human reading of voter intent from the paper ballots selected for audit

The tool helps select ballots for audit, and reports when the audit has found sufficiently strong evidence to stop.

The tool exports a log of all the audit inputs except the CVR file, but including the auditors' manually determined voter intent from the audited ballots.

In [1]:
from __future__ import division, print_function

from ipywidgets import interact, interactive, fixed, interact_manual, Dropdown, Layout, Box
import ipywidgets as widgets
from IPython.display import display, HTML

from collections import OrderedDict
from itertools import product
import math
import json

import numpy as np
from ballot_comparison import ballot_comparison_pvalue
from assertion_audit_utils import \
    Assertion, Assorter, CVR, \
    check_audit_parameters, write_audit_parameters, write_ballots_sampled

from cryptorandom.cryptorandom import SHA256
from cryptorandom.sample import sample_by_index

from suite_tools import write_audit_results, \
        check_valid_vote_counts, \
        check_overvote_rates, find_winners_losers, print_reported_votes, \
        estimate_n, estimate_escalation_n, \
        parse_manifest, unique_manifest, find_ballot, \
        audit_contest

# Audit parameters.

* `seed`: the numeric seed for the pseudo-random number generator used to draw sample 
* `replacement`: whether to sample with replacement. If the sample is drawn with replacement, gamma must also be specified.
* `gamma`: the gamma parameter used in the ballot-level comparison method from Lindeman and Stark (2012), based on Stark (2010). Require gamma $\ge$ 1.
gamma=1.03905 is a common value; it makes 2-vote overstatements "cost" 5 times more than 1-vote overstatements. Smaller values yield smaller sample sizes when there are no two-vote overstatement errors.
* `N_ballots`: an upper bound on the number of ballots cast in the contest. This should be derived independently of the voting system.


----

* `cvr_file`: filename for CVRs (input)
* `manifest_file`: filename for ballot manifest (input)
* `assertion_file`: filename of assertions for IRV contests, in RAIRE format
* `mvr_file`: filename for manually ascertained votes from sampled ballots (input)
* `log_file`: filename for audit log (output)

----

* `error_rates`: dict of expected error rates. The keys are
    + `o1_rate`: expected rate of 1-vote overstatements. Recommended value $\ge$ 0.001 if there are hand-marked ballots. Larger values increase the initial sample size, but make it more likely that the audit will conclude in a single round if the audit finds errors
    + `o2_rate`: expected rate of 2-vote overstatements. Recommended value 0.
    + `u1_rate`: expected rate of 1-vote understatements. Recommended value 0.
    + `u2_rate`: expected rate of 2-vote understatements. Recommended value 0.

* `contests`: a dict of contest-specific data 
    + the keys are unique contest identifiers for contests under audit
    + the values are dicts with keys:
        - `risk_limit`: the risk limit for the audit of this contest
        - `ballots_cast`: an upper bound on the number of cast ballots that contain the contest
        - `choice_function`: `plurality`, `majority`, `super-majority`, or `IRV`
        - `n_winners`: number of winners for majority contests. (Multi-winner IRV not supported; multi-winner super-majority is nonsense)
        - `super_majority`: for super-majority contests, the fraction of valid votes required to win, e.g., 2/3. Super-majority contests can have at most two candidates.
        - `candidates`: list of names or identifiers of candidates
        - `reported_winners` : list of identifier(s) of candidate(s) reported to have won. Length should equal `n_winners`.
        - `assertions`: a set of Assertions (see technical documentation) that collectively imply the reported outcome is correct

In [2]:
seed = 12345678901234567890  # use, e.g., 20 rolls of a 10-sided die. Seed doesn't have to be numeric
replacement = True  # Sampling without replacement isn't implemented
gamma=1.03905
N_ballots = 300000

In [3]:
cvr_file = './Data/cvr.json'
manifest_file = './Data/manifest.csv'
assertion_file = './Data/assertion.csv'
mvr_file = './Data/mvr.csv'
log_file = './Data/log.json'

In [4]:
error_rates = {'o1_rate':0.002,      # expect 2 1-vote overstatements per 1000 ballots in the CVR stratum
               'o2_rate':0,          # expect 0 2-vote overstatements
               'u1_rate':0,          # expect 0 1-vote understatements
               'u2_rate':0}          # expect 0 2-vote understatements

In [5]:
## TEST

votes = {"Alice": 4, "Bob": 1, "Candy": 0, "Dan": ''}

vv = {"Candy": 2, "Dan": True, "Ellie": 7, "Alice": 2}

votes.update(vv)

print(votes)

winr = "Alice"
winr_func = lambda c: int(bool(CVR.get_vote_from_votes(winr, c)))

print(winr_func(votes))

{'Alice': 2, 'Bob': 1, 'Candy': 2, 'Dan': True, 'Ellie': 7}
1


In [9]:
winners = ["Alice","Bob"]
losers = ["Candy","Dan"]
assns = Assertion.make_plurality_assertions(winners, losers)
print(assns)
share_to_win = 2/3
assn = Assertion.make_supermajority_assertion("Alice", losers, share_to_win)
print(assn)
print(assn['Alice v all'].assorter.assort(votes))
votes2 = {"Alice": True, "Bob": False, "Candy": 0, "Dan": ''}
print(assn['Alice v all'].assorter.assort(votes2))
votes3 = {"Alice": False, "Bob": 1, "Candy": 1, "Dan": ''}
print(assn['Alice v all'].assorter.assort(votes3))


{'Alice v Candy': <assertion_audit_utils.Assertion object at 0x11523f208>, 'Alice v Dan': <assertion_audit_utils.Assertion object at 0x11523fef0>, 'Bob v Candy': <assertion_audit_utils.Assertion object at 0x11523ff60>, 'Bob v Dan': <assertion_audit_utils.Assertion object at 0x11523ffd0>}
{'Alice v all': <assertion_audit_utils.Assertion object at 0x11523f9b0>}
1.1102230246251565e-16
1.0
-0.4999999999999999


In [None]:
# contests to audit

contests = {'mayor':{'risk_limit':0.05,
                     'choice_function':'IRV',
                     'n_winners':1,
                     'candidates':['Alice','Bob','Cindy'],
                     'reported_winners' : ['Alice']
                    },
            'city_council':{'risk_limit':0.05,
                     'choice_function':'plurality',
                     'n_winners':3,
                     'candidates':['Doug','Emily','Frank','Gail','Harry'],
                     'reported_winners' : ['Doug', 'Emily', 'Frank']
                    },
            'measure_1':{'risk_limit':0.05,
                     'choice_function':'super-majority',
                     'super_majority':2/3,
                     'n_winners':1,
                     'candidates':['yes','no'],
                     'reported_winners' : ['yes']
                    }                  
           }

In [None]:
check_audit_parameters(gamma, error_rates, contests)
write_audit_parameters(log_file, seed, replacement, gamma, N_ballots, error_rates, contests)

## Read the CVRs and ballot manifest

In [None]:
# read the cast vote records
with open(cvr_file, 'r') as f:
    # do something
    # cvrs = ???
    pass

# read the ballot manifest
manifest = read_manifest_from_csv(manifest_file)

## Find audit parameters and conduct audit

* For each contest:
    - find claimed outcome by applying SCF to CVRs
    - complain if claimed outcome disagrees with reported outcome
    - construct assertions that imply contest outcome is correct
    - for each assertion:
        + find generalized diluted margin
        
* Find initial (incremental) sample size from smallest diluted margin, for the sampling plan
    - Complain if expected error rates imply any assertion is incorrect

* For each assertion:
    - Initialize discrepancy counts to zero (o1, o2, u1, u2)
    - Initialize measured risk to 1
* While measured risk for any assertion exceeds its risk limit:
    - expand sample by estimated increment
        + identify ballots in manifest
        + update the log file with incremental sample
    - import audit results when ballots have been audited
    - for each assertion:
        + for each sampled ballot:
            - increment discrepancy count for the assertion
        + find measured risk
    - update log file with new measured risks
    - if any measured risk exceeds its risk limit:
        + estimate incremental sample required to complete the audit

In [None]:
# find contest results
for c in contests.keys():
    contests[c]['winners'] = find_winners(contests[c])

In [None]:
# expand the ballot manifest into a dict. keys are batches, values are ballot numbers.
manifest = parse_manifest(ballot_manifest)
poll_manifest_parsed = parse_manifest(ballot_manifest_poll)

In [None]:
# assign each ballot a unique ID
unique_cvr_manifest = unique_manifest(cvr_manifest_parsed)

In [None]:
# look up sample ballots

cvr_sample = []
for s in sample1:
    original_ballot_label, batch_label, which_ballot = find_ballot(s, \
                                                                   unique_cvr_manifest, \
                                                                   cvr_manifest_parsed)
    cvr_sample.append([s, batch_label, which_ballot])

cvr_sample.sort(key=lambda x: x[2]) # Sort second on order within batches
cvr_sample.sort(key=lambda x: x[1]) # Sort first based on batch label
cvr_sample.insert(0,["sampled ballot", "batch label", "which ballot in batch"])

display(HTML(
    '<table><tr>{}</tr></table>'.format(
        '</tr><tr>'.join(
            '<td>{}</td>'.format('</td><td>'.join(str(_) for _ in row)) for row in cvr_sample)
        )
 ))

# Enter the sample data

In [None]:
# Find audit p-values across assertions

In [None]:
# Identify assertions not yet confirmed

In [None]:
# Log the status of the audit 

# Escalation: how many more ballots should be drawn?

This tool estimates how many more ballots will need to be audited to confirm any remaining contests. The enlarged sample size is based on:

* ballots already sampled
* assumption that we will continue to see overstatements and understatements at the same rate that observed in the sample

In [None]:
sample_sizes_new = {}

# TBD
