# ONEAudit Demo

ONEAudit (Refereed paper: https://link.springer.com/chapter/10.1007/978-3-031-48806-1_5 Full version: https://arxiv.org/abs/2303.03335 ) is a way to use batch tally information efficiently in RLAs. 
It is vastly more efficient than batch-level comparison auditing.
For each SHANGRLA assertion, ONEAudit creates an "average" assorter value for each reporting batch.
That is then used in a ballot-level comparison audit based on comparing the assorter applied to the manually ascertained vote (MVR, manual vote record) on each ballot card the audit selects to the average assorter value for the batch to which each ballot belongs.

ONEAudit is useful in a variety of situations, including:

+ when batch tallies are available, whether or not the batches correspond to physical batches, in contrast to traditional batch-level comparison audits, which work only when reporting batches correspond to physical batches
+ when CVRs are available for batches of cards but there is no mapping from individual cards to individual CVRs (this is common for precinct-count optical scan systems)
+ when CVRs are available for some cards but not others (ONEAudit improves on SUITE in that situation, avoiding the need for stratification)

`SHANGRLA` currently infers batch information from Dominion CVRs as currently (that is, as of August 2024) used by San Francisco, as follows:

+ when reading the CVRs using `shangrla.formats.Dominion.read_cvrs()`, set `pool_groups` to be the list of `CountingGroupID`s that should be audited using ONEAudit CVRs derived from batch tallies.
    - For SF, that is `CountingGroupID == 1`.
    - if `pool_groups` is nonempty, the CVRs with `CountingGroupID in pool_groups` are marked as `pool`
+ apply `shangrla.core.Audit.CVR.check_tally_pools()`: ensure every CVR in each `tally_pool` has the same value of `pool`
+ apply`shangrla.core.Audit.CVR.add_pool_contests()`: ensure every CVR in each `tally_pool` for which `pool == True` has every contest in the tally_pool
+ create the assertions for the contests under audit using functions in `shangrla.core.Audit.Assertion`
+ apply `shangrla.core.Audit.Assorter.set_tally_pool_means()` to set the assorter means
+ estimate initial sample size
+ audit then proceeds as a standard ballot-level comparison audit. The MVR for each inspected card will automatically be compared to the ONEAudit CVRs for the batch to which that card belongs


## Workflow for a ONEAudit RLA

+ Read overall audit information (including the seed) and contest information, including an upper bound on the number of cards that contain each contest.
+ Read ballot manifest
+ Read cvrs.
    - Every CVR should have a corresponding manifest entry.
    - `pool` and `pool_group` should have been set for each CVR that contributes to a ONEAudit pooled CVR; for Dominion, these are inferred from group and batch information
+ Check that CVR IDs are unique.
+ Check whether the number of CVRs that mention each contest is not greater than the upper bound for each contest. If any is greater, complain.
+ Dominion assigns pseudo-random values of the location in the batch to PCOS cards. Replace those with the _ranks_ of those values to establish a canonical ordering of the PCOS cards within each tally batch.
+ Prepare ~2EZ (phantoms to zombies):
    - find upper bound on total cards across strata
    - `N_phantoms = max_cards - cards_in_manifest`
    - If `N_phantoms < 0`: complain
    - Else: create `N_phantoms` phantom cards
    - For each contest `c`:
        + `N_c` is the input upper bound on the number of cards that contain `c`
        + If `N_c is None`: `N_c = max_cards - non_c_cvrs`, where `non_c_cvrs` is #CVRs that don't contain `c`
        + `C_c` is the number of CVRs that contain the contest
        + If `C_c > N_c`: complain
        + Else if `N_c - C_c > N_phantoms`: complain
        + Else:
            - Consider contest `c` to be on the first `N_c - C_c` phantom CVRs
            - Consider contest `c` to be on the first `N_c - C_c` phantom ballots
        + Create CVRs for phantom cards
+ Make assertions:
    - Read RAIRE assertions for IRV contests
    - Create Assertions for every Contest. This involves also creating an Assorter for every Assertion, and a `NonnegMean` test
for every Assertion.
+ Modify CVRs for ONEAudit for groups of CVRs that should be pooled.
     - Within each pool group, check whether every card has every contest mentioned on any card in the group.
     - If a contest is missing from a card in the group, add the contest--without votes.
     - Update the number of cards that contain each contest to include any cards to which that contest originally did not appear on the CVR, but was added for ONEAudit.
+ Find ONEAudit Assorter means for each assertion for each tally pool.
+ Calculate assorter margins for all assorters:
    - If `not use_style`: apply the Assorter to all cards and CVRs, including phantoms and ONEAudit CVRs
    - Else: apply the assorter only to cards/cvrs reported to contain the contest, including phantoms and ONEAudit CVRs that contain the contest
+ Set `assertion.test.u` to the appropriate value for each assertion:
      `2/(2-assorter.margin/assorter.upper_bound)` for ballot-level comparison audits, including ONEAudit
+ Estimate starting sample size for the specified sampling design, for chosen risk function, use of card-style information, etc.:
    - User-specified criterion, controlled by parameters. Examples:
        + percentile of sample size for completion, taking into account the difference between the assorter mean for a pool group and the assorter applied to individual CVRs in that group, on the assumption that _additional_ errors are not more frequent than specified
    - If `not use_style`: base estimate on sampling from the entire manifest, i.e., smallest assorter margin
    - Else: use consistent sampling:
        + Augment each CVR (including phantoms) with a probability of selection, `p`, initially 0
        + For each contest `c`:
            - Find sample size `n_c` that meets the criterion 
            - For each non-phantom CVR that contains the contest, set `p = max(p, n_c/N_c)` 
        + Estimated sample size is the sum of `p` over all non-phantom CVRs
+ Draw the random sample:
    - Use the specified design, including using consistent sampling for style information
    - Double-check that this particular sample would be adequate for the audit to stop on the assumption that the CVRs are accurate, then reset the p-values
    - Express sample cards in terms of the manifest
    - Export
+ Read manual interpretations of the cards (MVRs)
+ Calculate attained risk for each assorter
    - Use ~2EZ to deal with phantom CVRs or cards; the treatment depends on whether `use_style == True`
    - If a sampled card cannot be found/retrieved, use the phantom-to-zombie transformation for it
    - Use the pooled assorter means for cards in pooled batches
+ Report
+ Estimate incremental sample size if any assorter nulls have not been rejected
+ Draw incremental sample; etc

# Audit parameters.

The overall audit involves information that is the same across contests, encapsulated in
a dict called `audit`:

* `seed`: the numeric seed for the pseudo-random number generator used to draw sample (for SHA256 PRNG)
* `sim_seed`: seed for simulations to estimate sample sizes (for Mersenne Twister PRNG)
* `quantile`: quantile of the sample size to use for setting initial sample size
* `cvr_file`: filename for CVRs (input)
* `manifest_file`: filename for ballot manifest (input)
* `use_style`: Boolean. If True, use card style information (inferred from CVRs) to target samples. If False, sample from all cards, regardless of the contest. This should come from external touchstones such as physical inventories or voter participation records, not from the voting system. In particualar, it is dangerous to assume that every card containing a contest has a corresponding CVR that contains the contest.
* `sample_file`: filename for sampled card identifiers (output)
* `mvr_file`: filename for manually ascertained votes from sampled cards (input)
* `log_file`: filename for audit log (output)
* `error_rate_1`: expected rate of 1-vote overstatements. Recommended value $\ge$ 0.001 if there are hand-marked ballots. Larger values increase the initial sample size, but make it more likely that the audit will conclude after a single round even if the audit finds errors
* `error_rate_2`: expected rate of 2-vote overstatements. 2-vote overstatements should be extremely rare.
Recommended value: 0. Larger values increase the initial sample size, but make it more likely that the audit will conclude after a single round even if the audit finds errors
* `reps`: number of replications to use to estimate sample sizes. If `reps is None`, uses a deterministic method
* `quantile`: quantile of sample size to estimate. Not used if `reps is None`
* `strata`: a dict describing the strata. Keys are stratum identifiers; values are dicts containing:
    + `max_cards`: an upper bound on the number of pieces of paper cast in the contest. This should be derived independently of the voting system. A ballot consists of one or more cards.
    + `replacement`: whether to sample from this stratum with replacement. 
    + `use_style`: True if the sample in that stratum uses card-style information.
    + `audit_type` in [Contest.POLLING, Contest.CARD_COMPARISON, Contest.ONEAUDIT, Contest.BATCH_COMPARISON]. BATCH_COMPARISON isn't currently implemented. 

----

* `contests`: a dict of contest-specific data 
    + the keys are unique contest identifiers for contests under audit
    + the values are Contest objects with attributes:
        - `risk_limit`: the risk limit for the audit of this contest
        - `cards`: an upper bound on the number of cast cards that contain the contest
        - `choice_function`: `Audit.SOCIAL_CHOICE_FUNCTION.PLURALITY`, 
          `Audit.SOCIAL_CHOICE_FUNCTION.SUPERMAJORITY`, or `Audit.SOCIAL_CHOICE_FUNCTION.IRV`
        - `n_winners`: number of winners for majority contests. (Multi-winner IRV, aka STV, is not supported)
        - `share_to_win`: for super-majority contests, the fraction of valid votes required to win, e.g., 2/3.
           (share_to_win*n_winners must be less than 100%)
        - `candidates`: list of names or identifiers of candidates
        - `reported_winners` : list of identifier(s) of candidate(s) reported to have won.
           Length should equal `n_winners`.
        - `assertion_file`: filename for a set of json descriptors of Assertions (see technical documentation) that collectively imply the reported outcome of the contest is correct. Required for IRV; ignored for other social choice functions
        - `audit_type`: the audit strategy. Currently `Audit.AUDIT_TYPE.POLLING (ballot-polling)`, 
           `Audit.AUDIT_TYPE.CARD_COMPARISON` (ballot-level comparison audits), and `Audit.AUDIT_TYPE.ONEAUDIT`
            are implemented. HYBRID and STRATIFIED are planned.
    + `test`: the name of the function to be used to measure risk. Options are `kaplan_markov`,`kaplan_wald`,`kaplan_kolmogorov`,`wald_sprt`,`kaplan_mart`, `alpha_mart`, `betting_mart`. 
Not all risk functions work with every social choice function or every sampling method. Default is `NonnegMean.alpha_mart`
    + `estim`: the estimator to be used by the `alpha_mart` risk function. Options:  
        - `fixed_alternative_mean` (default)
        - `shrink_trunc`
        - `optimal_comparison`
    + `bet`: the method to select the bet for the `betting_mart` risk function. Options:
        - `fixed_bet` (default)
        - `agrapa`
    + `test_kwargs`: keyword arguments for the risk function
        - `use_style`: True to use style information from CVRs to target the sample. False for polling audits or for sampling from all ballots for every contest.
        - other keys and values are added by the software, including `cvrs`, the number of CVRs that contain the contest, and `p`, the sampling fraction expected to be required to confirm the contest

In [1]:
# if shangrla has not already been installed, install it then restart the kernel
# !pip install -e "../"

In [2]:
import math
import json
import warnings
import numpy as np
import pandas as pd
import csv
import copy

import glob
import os, sys

from collections import OrderedDict
from IPython.display import display, HTML

from cryptorandom.cryptorandom import SHA256, int_from_hash
from cryptorandom.sample import sample_by_index

from shangrla.core.Audit import Audit, Assertion, Assorter, Contest, CVR, Stratum
from shangrla.core.NonnegMean import NonnegMean
from shangrla.formats.Dominion import Dominion

sys.path.append(os.path.realpath('./SHANGRLA'))

In [3]:
audit = Audit.from_dict({
         'seed':           12345678901234567890,
         'sim_seed':       314159265,
         'cvr_file':       './data/SF_CVR_Export_20240311150227/CvrExport_*.json',
         'manifest_file':  './data/SF_CVR_Export_20240311150227/ballotManifest-dummy.xlsx',
         'sample_file':    './data/SF_CVR_Export_20240311150227/sample.csv',
         'mvr_file':       './data/SF_CVR_Export_20240311150227/mvr.json',
         'log_file':       './data/SF_CVR_Export_20240311150227/log.json',
         'quantile':       0.8,
         'error_rate_1':   0.001,
         'error_rate_2':   0.0,
         'reps':           200,
         'strata':         {'stratum_1': {'max_cards':   443578, 
                                          'use_style':   True,
                                          'replacement': False
                                         }
                           }
        })

In [4]:
# contests to audit. Edit with details of your contest 
contest_dict = {
               '1':{
                   'name': 'PRESIDENT OF THE UNITED STATES-DEM',
                   'risk_limit':       0.05,
                   'cards':            168822,
                   'choice_function':  Contest.SOCIAL_CHOICE_FUNCTION.PLURALITY,
                   'n_winners':        1,
                   'candidates':       ['1','2','3','4','5','6','7','8'],
                   'winner':           ['5'],
                   'assertion_file':   None,
                   'audit_type':       Audit.AUDIT_TYPE.ONEAUDIT,
                   'test':             NonnegMean.alpha_mart,
                   'estim':            NonnegMean.shrink_trunc,
                   'test_kwargs':      {'d': 100, 'f': 0}
                  },
               '2':{
                   'name':            'PRESIDENT OF THE UNITED STATES-REP',
                   'risk_limit':       0.05,
                   'cards':            18282,
                   'choice_function':  Contest.SOCIAL_CHOICE_FUNCTION.PLURALITY,
                   'n_winners':        1,
                   'candidates':       ['10','11','12','13','14','15','16','17','9'],
                   'winner':           ['11'],
                   'assertion_file':   None,
                   'audit_type':       Audit.AUDIT_TYPE.ONEAUDIT,
                   'test':             NonnegMean.alpha_mart,
                   'estim':            NonnegMean.shrink_trunc,
                   'test_kwargs':      {'d': 100, 'f': 0}
                  }
               }

contests = Contest.from_dict_of_dicts(contest_dict)

## Read the ballot manifest

In [5]:
# special for Primary/Dominion manifest format
manifest = pd.read_excel(audit.manifest_file)

## Read the CVR data and create CVR

In [6]:
# for card comparison audits and ONEAudit
cvr_list = []
for _fname in glob.glob(audit.cvr_file):
    cvr_list.extend(Dominion.read_cvrs(_fname, use_current=True, enforce_rules=True, include_groups=[1,2],
                                      pool_groups=[1]))


In [7]:
# check that the CVR IDs are unique
unique_ids = len(set(c.id for c in cvr_list))
print(f'cvrs: {len(cvr_list)} unique IDs: {unique_ids}')
assert unique_ids == len(cvr_list), 'CVR IDs are not unique'

cvrs: 443578 unique IDs: 443578


In [8]:
# check that the number of CVRs that mention each contest is less than the upper bound on the number of
# cards that contain the contest
Contest.check_cards(contests, cvr_list)

In [9]:
# add lexicographic position in batch to CVRs from polling places, where actual position is unknown
_ = CVR.set_card_in_batch_lex(cvr_list)

In [10]:
# find upper bound on total cards across strata
audit.max_cards = np.sum([s.max_cards for s in audit.strata.values()])
audit.max_cards

443578

In [11]:
# check whether the manifest accounts for every card
audit.max_cards, np.sum(manifest['Total Ballots'])

(443578, 443578)

In [12]:
# Check that there is a card in the manifest for every card (possibly) cast. If not, add phantoms.
manifest, manifest_cards, phantom_cards = Dominion.prep_manifest(manifest, audit.max_cards, len(cvr_list))
manifest

Unnamed: 0.1,Unnamed: 0,Tray #,Tabulator Number,Batch Number,Total Ballots,VBMCart.Cart number,cum_cards
0,0,1,1,267,68,1,68
1,1,2,919,0,98,2,166
2,2,3,6,287,4,3,170
3,3,4,2,86,83,4,253
4,4,5,12,204,91,5,344
...,...,...,...,...,...,...,...
6437,6437,6438,11,191,22,6438,443224
6438,6438,6439,1023,0,118,6439,443342
6439,6439,6440,13,202,75,6440,443417
6440,6440,6441,12,39,147,6441,443564


## Create CVRs for phantom cards

In [13]:
# For Comparison Audits (including ONEAudit)
#----------------------------

# If the sample draws a phantom card, these CVRs will be used in the comparison.
# phantom MVRs should be treated as zeros by the Assorter for every contest

cvr_list, phantom_vrs = CVR.make_phantoms(audit=audit, contests=contests, cvr_list=cvr_list, prefix='phantom-1-')
print(f"Created {phantom_vrs} phantom records")

Created 0 phantom records


In [14]:
len(cvr_list), manifest_cards, audit.max_cards

(443578, 443578, 443578)

# Set up assertions

In [15]:
# read the assertions for the IRV contest
for c in contests:
    if contests[c].choice_function == Contest.SOCIAL_CHOICE_FUNCTION.IRV:
        with open(contests[c].assertion_file, 'r') as f:
            contests[c].assertion_json = json.load(f)['audits'][0]['assertions']

In [16]:
# construct the dict of dicts of assertions for each contest
Assertion.make_all_assertions(contests)

True

In [17]:
audit.check_audit_parameters(contests)

# Add contests to pooled CVRs as needed

In [18]:
# ensure every CVR in each tally_pool has the same value of `pool`
cvr_list = CVR.check_tally_pools(cvr_list)
len(cvr_list)

443578

In [19]:
# find the set of tally_pools for which pool==True
pools = set(c.tally_pool for c in cvr_list if c.pool)
len(pools)

501

In [20]:
# make dict of all contest IDs mentioned in each tally_pool of CVRs for which pool==True
tally_pools = CVR.pool_contests(cvr_list)

# ensure every CVR in each tally_pool for which pool==True has every contest in that tally_pool
CVR.add_pool_contests(cvr_list, tally_pools)

True

In [21]:
# update no. cards that contain each contest to account for adding contests to some CVRs
Contest.check_cards(contests, cvr_list, force=True)



# Find ONEAudit Assorter means for each assertion for each tally pool

In [22]:
# set pooled assorter means
for con in contests.values():
    for a in con.assertions.values():
        a.assorter.set_tally_pool_means(cvr_list=cvr_list, tally_pools=tally_pools)

In [23]:
# find the mean of the assorters for the CVRs and check whether the assertions are met
min_margin = Assertion.set_all_margins_from_cvrs(audit=audit, contests=contests, cvr_list=cvr_list)

print(f'minimum assorter margin {min_margin}')
Contest.print_margins(contests)

minimum assorter margin 0.09245775997213035
margins in contest 1:
	assertion 5 v 3: 0.6961519393632736
	assertion 5 v 4: 0.6972252462217043
	assertion 5 v 6: 0.6717163198863316
	assertion 5 v 2: 0.6985643243022228
	assertion 5 v 7: 0.6737402699622299
	assertion 5 v 8: 0.6989374262101535
	assertion 5 v 1: 0.6970003628799379
margins in contest 2:
	assertion 11 v 12: 0.18484584567148588
	assertion 11 v 9: 0.09245775997213035
	assertion 11 v 14: 0.18909597631074715
	assertion 11 v 10: 0.1892353248562968
	assertion 11 v 13: 0.1893224176972652
	assertion 11 v 15: 0.18937467340184644
	assertion 11 v 17: 0.18629158683156244
	assertion 11 v 16: 0.1834175230796029


In [24]:
audit.write_audit_parameters(contests=contests)

## Set up for sampling

## Find initial sample size

In [25]:
# find initial sample size 
sample_size = audit.find_sample_size(contests, cvrs=cvr_list)  
print(f'{sample_size=}\n{[(i, c.sample_size) for i, c in contests.items()]}')

sample_size=134
[('1', 9), ('2', 126)]


In [26]:
# how many cards have nonzero sampling probability?
sum([c.p > 0 for c in cvr_list])

211248

In [27]:
# how many cards contain every contest?
sum([all(c.has_contest(con) for con in contests) for c in cvr_list])

41819

## Draw the first sample

In [28]:
# draw the initial sample using consistent sampling
prng = SHA256(audit.seed)
CVR.assign_sample_nums(cvr_list, prng)

True

In [29]:
sampled_cvr_indices = CVR.consistent_sampling(cvr_list=cvr_list, contests=contests)
n_sampled_phantoms = np.sum(sampled_cvr_indices > manifest_cards)
print(f'Initial sample size {len(sampled_cvr_indices)} cards of which {n_sampled_phantoms} are phantom cards.')

Initial sample size 131 cards of which 0 are phantom cards.


In [30]:
# separate mvr phantoms; don't ask auditors to retrieve phantoms; merge retrieved MVR sample with sampled MVR phantoms

In [31]:
# for comparison audit
cards_to_retrieve, sample_order, cvr_sample, mvr_phantoms_sample = \
    Dominion.sample_from_cvrs(cvr_list, manifest, sampled_cvr_indices)

# for polling audit
# cards_to_retrieve, sample_order, mvr_phantoms_sample = Dominion.sample_from_manifest(manifest, sample)

In [32]:
# Sanity check to ensure that **this** sample will be adequate if the CVRs are accurate
mvr_sample = cvr_sample.copy()
p_max = Assertion.set_p_values(contests=contests, mvr_sample=mvr_sample, cvr_sample=cvr_sample)
print(f'maximum assertion p-value {p_max}')
done = audit.summarize_status(contests)

maximum assertion p-value 0.03977704308695613

p-values for assertions in contest 1
	5 v 3: 0.03472815874232628
	5 v 4: 0.03447558941953453
	5 v 6: 0.03977704308695613
	5 v 2: 0.033461569289599855
	5 v 7: 0.03970109536099454
	5 v 8: 0.03407604082765047
	5 v 1: 0.03432985564832535

contest 1 AUDIT COMPLETE at risk limit 0.05. Measured risk 0.03977704308695613

p-values for assertions in contest 2
	11 v 12: 0.00013345462914903094
	11 v 9: 0.029053377544232078
	11 v 14: 0.00010874368764037137
	11 v 10: 0.00010797534999911844
	11 v 13: 0.00010702437183118834
	11 v 15: 0.00010694179349176133
	11 v 17: 0.0001221557938486509
	11 v 16: 0.0001429167299686427

contest 2 AUDIT COMPLETE at risk limit 0.05. Measured risk 0.029053377544232078


In [33]:
# reset things before collecting data
mvr_sample = None
Assertion.reset_p_values(contests=contests)

True

In [34]:
# write the sample
if os.path.exists(audit.sample_file):
    os.remove(audit.sample_file)
    
Dominion.write_cards_sampled(audit.sample_file, cards_to_retrieve, print_phantoms=False)

# Read the audited sample data.

## Any ballot that cannot be retrieved should be marked as a "zombie" (treated in the least favorable way for every contest it might contain).

# for real data
# with open(audit.mvr_file) as f:
#    mvr_json = json.load(f)

# mvr_sample = CVR.from_dict(mvr_json['ballots'])

In [35]:
# Test: SIMULATED DATA WITH NO ERRORS
mvr_sample = cvr_sample.copy()

## Find measured risks for all assertions

In [36]:
CVR.prep_comparison_sample(mvr_sample, cvr_sample, sample_order)  # for comparison audit
# CVR.prep_polling_sample(mvr_sample, sample_order)  # for polling audit

In [37]:
p_max = Assertion.set_p_values(contests=contests, mvr_sample=mvr_sample, cvr_sample=cvr_sample)
print(f'maximum assertion p-value {p_max}')
done = audit.summarize_status(contests)

maximum assertion p-value 0.03977704308695613

p-values for assertions in contest 1
	5 v 3: 0.03472815874232628
	5 v 4: 0.03447558941953453
	5 v 6: 0.03977704308695613
	5 v 2: 0.033461569289599855
	5 v 7: 0.03970109536099454
	5 v 8: 0.03407604082765047
	5 v 1: 0.03432985564832535

contest 1 AUDIT COMPLETE at risk limit 0.05. Measured risk 0.03977704308695613

p-values for assertions in contest 2
	11 v 12: 0.00013345462914903094
	11 v 9: 0.029053377544232078
	11 v 14: 0.00010874368764037137
	11 v 10: 0.00010797534999911844
	11 v 13: 0.00010702437183118834
	11 v 15: 0.00010694179349176133
	11 v 17: 0.0001221557938486509
	11 v 16: 0.0001429167299686427

contest 2 AUDIT COMPLETE at risk limit 0.05. Measured risk 0.029053377544232078


In [38]:
# Log the status of the audit 
audit.write_audit_parameters(contests)

# How many more cards should be audited?

Estimate how many more cards will need to be audited to confirm any remaining contests. The enlarged sample size is based on:

* cards already sampled
* the assumption that we will continue to see errors at the same rate observed in the sample

In [39]:
# Estimate sample size required to confirm the outcome, if errors continue
# at the same rate as already observed.

new_size = audit.find_sample_size(contests, cvrs=cvr_list, mvr_sample=mvr_sample, cvr_sample=cvr_sample)
print(f'{new_size=}\n{[(i, c.sample_size) for i, c in contests.items()]}')


new_size=131
[('1', 0), ('2', 0)]


In [40]:
# save the first sample
sampled_cvr_indices_old, cards_to_retrieve_old, sample_order_old, cvr_sample_old, mvr_phantoms_sample_old = \
    sampled_cvr_indices, cards_to_retrieve,     sample_order,     cvr_sample,     mvr_phantoms_sample

In [41]:
# draw the sample
sampled_cvr_indices = CVR.consistent_sampling(cvr_list=cvr_list, contests=contests)
n_sampled_phantoms = np.sum(sampled_cvr_indices > manifest_cards)
print(f'The sample includes {n_sampled_phantoms} phantom cards.')

# for comparison audit
cards_to_retrieve, sample_order, cvr_sample, mvr_phantoms_sample = \
    Dominion.sample_from_cvrs(cvr_list, manifest, sampled_cvr_indices)

# for polling audit
# cards_to_retrieve, sample_order, mvr_phantoms_sample = Dominion.sample_from_manifest(manifest, sample)

# write the sample
# could write only the incremental sample using list(set(cards_to_retrieve) - set(cards_to_retrieve_old))
Dominion.write_cards_sampled(audit.sample_file, cards_to_retrieve, print_phantoms=False)

The sample includes 0 phantom cards.


In [43]:
# for real data
#with open(audit.mvr_file) as f:
#    mvr_json = json.load(f)
#mvr_sample = CVR.from_dict(mvr_json['ballots'])

# for simulated data, no errors
mvr_sample = cvr_sample.copy()

## Find measured risks for all assertions

In [46]:
CVR.prep_comparison_sample(mvr_sample, cvr_sample, sample_order)  # for comparison audit
# CVR.prep_polling_sample(mvr_sample, sample_order)  # for polling audit

In [47]:
p_max = Assertion.set_p_values(contests=contests, mvr_sample=mvr_sample, cvr_sample=cvr_sample)

print(f'maximum assertion p-value {p_max}')

done = audit.summarize_status(contests)

IndexError: index -1 is out of bounds for axis 0 with size 0