# ONEAudit Demo

ONEAudit (Refereed paper: https://link.springer.com/chapter/10.1007/978-3-031-48806-1_5 Full version: https://arxiv.org/abs/2303.03335 ) is a way to use batch tally information efficiently in RLAs. 
It is vastly more efficient than batch-level comparison auditing.
For each SHANGRLA assertion, ONEAudit creates an "average" assorter value for each reporting batch.
That is then used in a ballot-level comparison audit based on comparing the assorter applied to the manually ascertained vote (MVR, manual vote record) on each ballot card the audit selects to the average assorter value for the batch to which each ballot belongs.

ONEAudit is useful in a variety of situations, including:

+ when batch tallies are available, whether or not the batches correspond to physical batches, in contrast to traditional batch-level comparison audits, which work only when reporting batches correspond to physical batches
+ when CVRs are available for batches of cards but there is no mapping from individual cards to individual CVRs (this is common for precinct-count optical scan systems)
+ when CVRs are available for some cards but not others (ONEAudit improves on SUITE in that situation, avoiding the need for stratification)

`SHANGRLA` currently infers batch information from Dominion CVRs as currently (that is, as of August 2024) used by San Francisco, as follows:

+ when reading the CVRs using `shangrla.formats.Dominion.read_cvrs()`, set `pool_groups` to be the list of `CountingGroupID`s that should be audited using ONEAudit CVRs derived from batch tallies.
    - For SF, that is `CountingGroupID == 1`.
    - if `pool_groups` is nonempty, the CVRs with `CountingGroupID in pool_groups` are marked as `pool`
+ apply `shangrla.core.Audit.CVR.check_tally_pools()`: ensure every CVR in each `tally_pool` has the same value of `pool`
+ apply`shangrla.core.Audit.CVR.add_pool_contests()`: ensure every CVR in each `tally_pool` for which `pool == True` has every contest in the tally_pool
+ create the assertions for the contests under audit using functions in `shangrla.core.Audit.Assertion`
+ apply `shangrla.core.Audit.Assorter.set_tally_pool_means()` to set the assorter means
+ estimate initial sample size
+ audit then proceeds as a standard ballot-level comparison audit. The MVR for each inspected card will automatically be compared to the ONEAudit CVRs for the batch to which that card belongs


## Workflow for a ONEAudit RLA

+ Read overall audit information (including the seed) and contest information
+ Read RAIRE assertions for IRV contests and construct assertions for all other contests
+ Read ballot manifest
+ For comparison audits, read cvrs. Every CVR should have a corresponding manifest entry. Set `pool` and `pool_group` for each CVR that contributes to a ONEAudit pooled CVR.
+ Prepare ~2EZ:
    - `N_phantoms = max_cards - cards_in_manifest`
    - If `N_phantoms < 0`: complain
    - Else: create `N_phantoms` phantom cards
    - For each contest `c`:
        + `N_c` is the input upper bound on the number of cards that contain `c`
        + If `N_c is None`: `N_c = max_cards - non_c_cvrs`, where `non_c_cvrs` is #CVRs that don't contain `c`
        + `C_c` is the number of CVRs that contain the contest
        + If `C_c > N_c`: complain
        + Else if `N_c - C_c > N_phantoms`: complain
        + Else:
            - Consider contest `c` to be on the first `N_c - C_c` phantom CVRs
            - Consider contest `c` to be on the first `N_c - C_c` phantom ballots
+ Create Assertions for every Contest. This involves also creating an Assorter for every Assertion, and a `NonnegMean` test
for every Assertion.
+ Calculate assorter margins for all assorters:
    - If `not use_style`: apply the Assorter to all cards and CVRs, including phantoms
    - Else: apply the assorter only to cards/cvrs reported to contain the contest, including phantoms that contain the contest
+ Create ONEAudit CVRs for groups of CVRs that should be pooled.
+ Set `assertion.test.u` to the appropriate value for each assertion: `assorter.upper_bound` for polling audits or 
      `2/(2-assorter.margin/assorter.upper_bound)` for ballot-level comparison audits
+ Estimate starting sample size for the specified sampling design (w/ or w/o replacement, stratified, etc.), for chosen risk function, use of card-style information, etc.:
    - User-specified criterion, controlled by parameters. Examples:
        + expected sample size for completion, on the assumption that there are no errors
        + 90th percentile of sample size for completion, on the assumption that errors are not more frequent than specified
    - If `not use_style`: base estimate on sampling from the entire manifest, i.e., smallest assorter margin
    - Else: use consistent sampling:
        + Augment each CVR (including phantoms) with a probability of selection, `p`, initially 0
        + For each contest `c`:
            - Find sample size `n_c` that meets the criterion 
            - For each non-phantom CVR that contains the contest, set `p = max(p, n_c/N_c)` 
        + Estimated sample size is the sum of `p` over all non-phantom CVRs
+ Draw the random sample:
    - Use the specified design, including using consistent sampling for style information
    - Express sample cards in terms of the manifest
    - Export
+ Read manual interpretations of the cards (MVRs)
+ Calculate attained risk for each assorter
    - Use ~2EZ to deal with phantom CVRs or cards; the treatment depends on whether `use_style == True`
+ Report
+ Estimate incremental sample size if any assorter nulls have not been rejected
+ Draw incremental sample; etc

# Audit parameters.

The overall audit involves information that is the same across contests, encapsulated in
a dict called `audit`:

* `seed`: the numeric seed for the pseudo-random number generator used to draw sample (for SHA256 PRNG)
* `sim_seed`: seed for simulations to estimate sample sizes (for Mersenne Twister PRNG)
* `quantile`: quantile of the sample size to use for setting initial sample size
* `cvr_file`: filename for CVRs (input)
* `manifest_file`: filename for ballot manifest (input)
* `use_style`: Boolean. If True, use card style information (inferred from CVRs) to target samples. If False, sample from all cards, regardless of the contest. This should come from external touchstones such as physical inventories or voter participation records, not from the voting system. In particualar, it is dangerous to assume that every card containing a contest has a corresponding CVR that contains the contest.
* `sample_file`: filename for sampled card identifiers (output)
* `mvr_file`: filename for manually ascertained votes from sampled cards (input)
* `log_file`: filename for audit log (output)
* `error_rate_1`: expected rate of 1-vote overstatements. Recommended value $\ge$ 0.001 if there are hand-marked ballots. Larger values increase the initial sample size, but make it more likely that the audit will conclude after a single round even if the audit finds errors
* `error_rate_2`: expected rate of 2-vote overstatements. 2-vote overstatements should be extremely rare.
Recommended value: 0. Larger values increase the initial sample size, but make it more likely that the audit will conclude after a single round even if the audit finds errors
* `reps`: number of replications to use to estimate sample sizes. If `reps is None`, uses a deterministic method
* `quantile`: quantile of sample size to estimate. Not used if `reps is None`
* `strata`: a dict describing the strata. Keys are stratum identifiers; values are dicts containing:
    + `max_cards`: an upper bound on the number of pieces of paper cast in the contest. This should be derived independently of the voting system. A ballot consists of one or more cards.
    + `replacement`: whether to sample from this stratum with replacement. 
    + `use_style`: True if the sample in that stratum uses card-style information.
    + `audit_type` one of Contest.POLLING, Contest.CARD_COMPARISON, Contest.BATCH_COMPARISON but only POLLING and CARD_COMPARISON are currently implemented. 
    + `test`: the name of the function to be used to measure risk. Options are `kaplan_markov`,`kaplan_wald`,`kaplan_kolmogorov`,`wald_sprt`,`kaplan_mart`, `alpha_mart`, `betting_mart`. 
Not all risk functions work with every social choice function or every sampling method. 
    + `estim`: the estimator to be used by the `alpha_mart` risk function. Options:  
        - `fixed_alternative_mean` (default)
        - `shrink_trunc`
        - `optimal_comparison`
    + `bet`: the method to select the bet for the `betting_mart` risk function. Options:
        - `fixed_bet` (default)
        - `agrapa`
    + `test_kwargs`: keyword arguments for the risk function

----

* `contests`: a dict of contest-specific data 
    + the keys are unique contest identifiers for contests under audit
    + the values are Contest objects with attributes:
        - `risk_limit`: the risk limit for the audit of this contest
        - `cards`: an upper bound on the number of cast cards that contain the contest
        - `choice_function`: `Audit.SOCIAL_CHOICE_FUNCTION.PLURALITY`, 
          `Audit.SOCIAL_CHOICE_FUNCTION.SUPERMAJORITY`, or `Audit.SOCIAL_CHOICE_FUNCTION.IRV`
        - `n_winners`: number of winners for majority contests. (Multi-winner IRV, aka STV, is not supported)
        - `share_to_win`: for super-majority contests, the fraction of valid votes required to win, e.g., 2/3.
           (share_to_win*n_winners must be less than 100%)
        - `candidates`: list of names or identifiers of candidates
        - `reported_winners` : list of identifier(s) of candidate(s) reported to have won.
           Length should equal `n_winners`.
        - `assertion_file`: filename for a set of json descriptors of Assertions (see technical documentation) that collectively imply the reported outcome of the contest is correct. Required for IRV; ignored for other social choice functions
        - `audit_type`: the audit strategy. Currently `Audit.AUDIT_TYPE.POLLING (ballot-polling)`, 
           `Audit.AUDIT_TYPE.CARD_COMPARISON` (ballot-level comparison audits), and `Audit.AUDIT_TYPE.ONEAUDIT`
            are implemented. HYBRID and STRATIFIED are planned.
        - `test`: the risk function for the audit. Default is `NonnegMean.alpha_mart`, the alpha supermartingale test
        - `estim`: estimator for the alternative hypothesis for the test. Default is `NonnegMean.shrink_trunc`
        - `use_style`: True to use style information from CVRs to target the sample. False for polling audits or for sampling from all ballots for every contest.
        - other keys and values are added by the software, including `cvrs`, the number of CVRs that contain the contest, and `p`, the sampling fraction expected to be required to confirm the contest

In [1]:
# if shangrla has not already been installed, install it then restart the kernel
# !pip install -e "../"

In [2]:
import math
import json
import warnings
import numpy as np
import pandas as pd
import csv
import copy

import glob
import os, sys

from collections import OrderedDict
from IPython.display import display, HTML

from cryptorandom.cryptorandom import SHA256, int_from_hash
from cryptorandom.sample import sample_by_index

from shangrla.core.Audit import Audit, Assertion, Assorter, Contest, CVR, Stratum
from shangrla.core.NonnegMean import NonnegMean
from shangrla.formats.Dominion import Dominion

sys.path.append(os.path.realpath('./SHANGRLA'))

In [3]:
audit = Audit.from_dict({
         'seed':           12345678901234567890,
         'sim_seed':       314159265,
         'cvr_file':       './data/SF_CVR_Export_20240311150227/CvrExport_*.json',
         'manifest_file':  './data/SF_CVR_Export_20240311150227/ballotManifest-dummy.xlsx',
         'sample_file':    './data/SF_CVR_Export_20240311150227/sample.csv',
         'mvr_file':       './data/SF_CVR_Export_20240311150227/mvr.json',
         'log_file':       './data/SF_CVR_Export_20240311150227/log.json',
         'quantile':       0.8,
         'error_rate_1':   0.001,
         'error_rate_2':   0.0,
         'reps':           100,
         'strata':         {'stratum_1': {'max_cards':   443578, 
                                          'use_style':   True,
                                          'replacement': False,
                                          'audit_type':  Audit.AUDIT_TYPE.ONEAUDIT,
                                          'test':        NonnegMean.alpha_mart,
                                          'estimator':   NonnegMean.optimal_comparison,
                                          'test_kwargs': {}
                                         }
                           }
        })



In [4]:
# find upper bound on total cards across strata
audit.max_cards = np.sum([s.max_cards for s in audit.strata.values()])
audit.max_cards

443578

In [5]:
# contests to audit. Edit with details of your contest 
contest_dict = {
               '1':{
                   'name': 'PRESIDENT OF THE UNITED STATES-DEM',
                   'risk_limit':       0.05,
                   'cards': 152649,
                   'choice_function':  Contest.SOCIAL_CHOICE_FUNCTION.PLURALITY,
                   'n_winners': 1,
                   'candidates': ['1','2','3','4','5','6','7','8'],
                   'winner': ['5'],
                   'assertion_file':   None,
                   'audit_type':       Audit.AUDIT_TYPE.ONEAUDIT,
                   'test':             NonnegMean.alpha_mart,
                   'estim':            NonnegMean.optimal_comparison
                  },
               '2':{
                   'name':            'PRESIDENT OF THE UNITED STATES-REP',
                   'risk_limit':       0.05,
                   'cards':            15591,
                   'choice_function':  Contest.SOCIAL_CHOICE_FUNCTION.PLURALITY,
                   'n_winners':        1,
                   'candidates': ['10','11','12','13','14','15','16','17','9'],
                   'winner': ['11'],
                   'assertion_file':   None,
                   'audit_type':       Audit.AUDIT_TYPE.ONEAUDIT,
                   'test':             NonnegMean.alpha_mart,
                   'estim':            NonnegMean.optimal_comparison
                  }
               }

contests = Contest.from_dict_of_dicts(contest_dict)

In [6]:
# read the assertions for the IRV contest
for c in contests:
    if contests[c].choice_function == Contest.SOCIAL_CHOICE_FUNCTION.IRV:
        with open(contests[c].assertion_file, 'r') as f:
            contests[c].assertion_json = json.load(f)['audits'][0]['assertions']

In [7]:
# construct the dict of dicts of assertions for each contest
Assertion.make_all_assertions(contests)

True

In [8]:
audit.check_audit_parameters(contests)

## Read the ballot manifest

In [9]:
# special for Primary/Dominion manifest format
manifest = pd.read_excel(audit.manifest_file)

## Read the CVR data and create CVR objects

In [10]:
# for ballot-level comparison audits
cvr_list = []
for _fname in glob.glob(audit.cvr_file):
    cvr_list.extend(Dominion.read_cvrs(_fname, use_current=True, enforce_rules=True, include_groups=[1,2],
                                      pool_groups=[1]))
    
cvr_list = Dominion.raire_to_dominion(cvr_list)

In [11]:
# check that the CVR IDs are unique
unique_ids = len(set(c.id for c in cvr_list))
print(f'cvrs: {len(cvr_list)} unique IDs: {unique_ids}')
assert unique_ids == len(cvr_list), 'CVR IDs are not unique'

cvrs: 443578 unique IDs: 443578


In [12]:
# add lexicographic position in batch to CVRs from polling places, where actual position is unknown
_ = CVR.set_card_in_batch_lex(cvr_list)

In [13]:
# double-check whether the manifest accounts for every card
audit.max_cards, np.sum(manifest['Total Ballots'])

(443578, 443578)

In [14]:
# Check that there is a card in the manifest for every card (possibly) cast. If not, add phantoms.
manifest, manifest_cards, phantom_cards = Dominion.prep_manifest(manifest, audit.max_cards, len(cvr_list))
manifest

Unnamed: 0.1,Unnamed: 0,Tray #,Tabulator Number,Batch Number,Total Ballots,VBMCart.Cart number,cum_cards
0,0,1,1,267,68,1,68
1,1,2,919,0,98,2,166
2,2,3,6,287,4,3,170
3,3,4,2,86,83,4,253
4,4,5,12,204,91,5,344
...,...,...,...,...,...,...,...
6437,6437,6438,11,191,22,6438,443224
6438,6438,6439,1023,0,118,6439,443342
6439,6439,6440,13,202,75,6440,443417
6440,6440,6441,12,39,147,6441,443564


## Create CVRs for phantom cards

In [15]:
# For Comparison Audits (including ONEAudit) Only
#----------------------------

# If the sample draws a phantom card, these CVRs will be used in the comparison.
# phantom MVRs should be treated as zeros by the Assorter for every contest

# setting use_style = False to generate phantoms

cvr_list, phantom_vrs = CVR.make_phantoms(audit=audit, contests=contests, cvr_list=cvr_list, prefix='phantom-1-')
print(f"Created {phantom_vrs} phantom records")

Created 0 phantom records


In [16]:
# find the mean of the assorters for the CVRs and check whether the assertions are met
min_margin = Assertion.set_all_margins_from_cvrs(audit=audit, contests=contests, cvr_list=cvr_list)

print(f'minimum assorter margin {min_margin}')
Contest.print_margins(contests)

minimum assorter margin 0.29034022535827586
margins in contest 1:
	assertion 5 v 4: 0.8080522680693274
	assertion 5 v 8: 0.8100366066034048
	assertion 5 v 1: 0.8077916385305233
	assertion 5 v 7: 0.7808342514601179
	assertion 5 v 3: 0.8068083543613984
	assertion 5 v 2: 0.8096041985049343
	assertion 5 v 6: 0.7784885856108801
margins in contest 2:
	assertion 11 v 14: 0.5938081172738212
	assertion 11 v 10: 0.5942457061590636
	assertion 11 v 13: 0.59451919921234
	assertion 11 v 16: 0.575976370200197
	assertion 11 v 12: 0.5804616562739306
	assertion 11 v 9: 0.29034022535827586
	assertion 11 v 17: 0.5850016409583196
	assertion 11 v 15: 0.594683295044306


In [17]:
audit.write_audit_parameters(contests=contests)

## Create ONEAudit Assorter means for each assertion for each tally pool

In [18]:
pools = set(c.tally_pool for c in cvr_list if c.pool)
len(pools)

501

In [19]:
# ensure every CVR in each `tally_pool` has the same value of `pool`
cvr_list = CVR.check_tally_pools(cvr_list)
len(cvr_list)

443578

In [20]:
# find all contest IDs mentioned in the pooled CVRs
tally_pool = {}
for p in pools:
    tally_pool[p] = CVR.pool_contests(list([c for c in cvr_list if c.tally_pool == p]))  

In [21]:
# ensure every CVR in each `tally_pool` for which `pool == True` has every contest in the tally_pool
CVR.add_pool_contests(cvr_list, tally_pool)

True

In [23]:
# set pooled assorter means
for con in contests.values():
    for a in con.assertions.values():
        a.assorter.set_tally_pool_means(cvr_list=cvr_list, tally_pool=pools)

## Set up for sampling

## Find initial sample size

In [24]:
# find initial sample size 
sample_size = audit.find_sample_size(contests, cvrs=cvr_list)  
print(f'{sample_size=}\n{[(i, c.sample_size) for i, c in contests.items()]}')

sample_size=81
[('1', 7), ('2', 20)]


## Draw the first sample

In [25]:
# draw the initial sample using consistent sampling
prng = SHA256(audit.seed)
CVR.assign_sample_nums(cvr_list, prng)

True

In [26]:
sampled_cvr_indices = CVR.consistent_sampling(cvr_list=cvr_list, contests=contests)
n_sampled_phantoms = np.sum(sampled_cvr_indices > manifest_cards)
print(f'The sample includes {n_sampled_phantoms} phantom cards.')

The sample includes 0 phantom cards.


In [27]:
len(cvr_list), manifest_cards, audit.max_cards

(443578, 443578, 443578)

In [28]:
# for comparison audit
cards_to_retrieve, sample_order, cvr_sample, mvr_phantoms_sample = \
    Dominion.sample_from_cvrs(cvr_list, manifest, sampled_cvr_indices)

# for polling audit
# cards_to_retrieve, sample_order, mvr_phantoms_sample = Dominion.sample_from_manifest(manifest, sample)

In [29]:
# write the sample
if os.path.exists(audit.sample_file):
    os.remove(audit.sample_file)
    
Dominion.write_cards_sampled(audit.sample_file, cards_to_retrieve, print_phantoms=False)

## Read the audited sample data

# for real data
with open(audit.mvr_file) as f:
    mvr_json = json.load(f)

mvr_sample = CVR.from_dict(mvr_json['ballots'])

In [None]:
# for simulated data, no errors
mvr_sample = cvr_sample.copy()

## Find measured risks for all assertions

In [None]:
CVR.prep_comparison_sample(mvr_sample, cvr_sample, sample_order)  # for comparison audit
# CVR.prep_polling_sample(mvr_sample, sample_order)  # for polling audit

In [None]:
p_max = Assertion.set_p_values(contests=contests, mvr_sample=mvr_sample, cvr_sample=cvr_sample)
print(f'maximum assertion p-value {p_max}')
done = audit.summarize_status(contests)

In [None]:
# Log the status of the audit 
audit.write_audit_parameters(contests)

# How many more cards should be audited?

Estimate how many more cards will need to be audited to confirm any remaining contests. The enlarged sample size is based on:

* cards already sampled
* the assumption that we will continue to see errors at the same rate observed in the sample

In [None]:
# Estimate sample size required to confirm the outcome, if errors continue
# at the same rate as already observed.

new_size = audit.find_sample_size(contests, cvrs=cvr_list, mvr_sample=mvr_sample, cvr_sample=cvr_sample)
print(f'{new_size=}\n{[(i, c.sample_size) for i, c in contests.items()]}')


In [None]:
# save the first sample
sampled_cvr_indices_old, cards_to_retrieve_old, sample_order_old, cvr_sample_old, mvr_phantoms_sample_old = \
    sampled_cvr_indices, cards_to_retrieve,     sample_order,     cvr_sample,     mvr_phantoms_sample

In [None]:
# draw the sample
sampled_cvr_indices = CVR.consistent_sampling(cvr_list=cvr_list, contests=contests)
n_sampled_phantoms = np.sum(sampled_cvr_indices > manifest_cards)
print(f'The sample includes {n_sampled_phantoms} phantom cards.')

# for comparison audit
cards_to_retrieve, sample_order, cvr_sample, mvr_phantoms_sample = \
    Dominion.sample_from_cvrs(cvr_list, manifest, sampled_cvr_indices)

# for polling audit
# cards_to_retrieve, sample_order, mvr_phantoms_sample = Dominion.sample_from_manifest(manifest, sample)

# write the sample
# could write only the incremental sample using list(set(cards_to_retrieve) - set(cards_to_retrieve_old))
Dominion.write_cards_sampled(audit.sample_file, cards_to_retrieve, print_phantoms=False)

In [None]:
# for real data
with open(audit.mvr_file) as f:
    mvr_json = json.load(f)

mvr_sample = CVR.from_dict(mvr_json['ballots'])

# for simulated data, no errors
mvr_sample = cvr_sample.copy()

## Find measured risks for all assertions

In [None]:
CVR.prep_comparison_sample(mvr_sample, cvr_sample, sample_order)  # for comparison audit
# CVR.prep_polling_sample(mvr_sample, sample_order)  # for polling audit

###### TEST
# permute part of the sample to introduce errors deliberately
mvr_sample = cvr_sample.copy()
n_errs = 5
errs = mvr_sample[0:n_errs].copy()
np.random.seed(12345678)
np.random.shuffle(errs)
mvr_sample[0:n_errs] = errs

In [None]:
p_max = Assertion.set_p_values(contests=contests, mvr_sample=mvr_sample, cvr_sample=cvr_sample)
print(f'maximum assertion p-value {p_max}')
done = audit.summarize_status(contests)