# Explore optimal round schedules

There are many benefits to doing ballot-polling audits in distinct rounds.
E.g., when each round is sorted by ballot location, it is more efficient to retrieve the selected ballots.

But some efficiency in the overall audit is lost, because the audit doesn't have the opportunity to stop as soon as possible.

This notebook has code to calculate the average number of ballots audited when it is done in rounds, and explores the effects of different round schedules.

TODO:
* Try different margins
* **Bug fix**: incorporate cost of final escallation to full hand count, for a given total number of ballots. *Until then, probably best to allways incorportate 99th percentile.*
* Optimize by overall cost (assuming costs for pulling a ballot, pulling a new batch, etc.
* allow more generality by deriving (or calculating, if possible) functions for the stopping size at each quantile.

Prepare `quantiles` dict-of-dicts based on BRAVO results.

Import quantile table from BRAVO paper: https://www.usenix.org/system/files/conference/evtwote12/evtwote12-final27.pdf

In [1]:
bravo_table_1 = """True Share 25th 50th 75th 90th 99th Mean ASN
70% 12 22 38 60 131 30 30
65% 23 38 66 108 236 53 53
60% 49 84 149 244 538 119 119
58% 77 131 231 381 840 184 185
55% 193 332 587 974 2,157 469 469
54% 301 518 916 1,520 3,366 730 731
53% 531 914 1,619 2,700 5,980 1,294 1,295
52% 1,188 2,051 3637 6,053 13,455 2,900 2,902
51% 4,725 8,157 14,486 24,149 53,640 11,556 11,562
50.5% 18,839 32,547 57,838 96,411 214,491 46,126 46,150"""

In [2]:
quantile_list = [25, 50, 75, 90, 99]

In [3]:
columnnames = quantile_list + ["mean", "asn"]

In [4]:
columnnames

[25, 50, 75, 90, 99, 'mean', 'asn']

In [5]:
# Develop and print quantiles by margin
quantiles = {}

for row in bravo_table_1.split('\n')[1:]:
    row = row.replace(",", "").replace("%", "")
    cols = row.split()
    winner_share = float(cols[0]) / 100
    vals = [int(v) for v in cols[1:]]
    # print(winner_share, list(zip(columnnames, vals)))
    
    margin = round(winner_share - (1.0 - winner_share), 2)
    quantiles[margin] = dict(zip(columnnames, vals))
    print(margin, quantiles[margin])

0.4 {25: 12, 50: 22, 75: 38, 90: 60, 99: 131, 'mean': 30, 'asn': 30}
0.3 {25: 23, 50: 38, 75: 66, 90: 108, 99: 236, 'mean': 53, 'asn': 53}
0.2 {25: 49, 50: 84, 75: 149, 90: 244, 99: 538, 'mean': 119, 'asn': 119}
0.16 {25: 77, 50: 131, 75: 231, 90: 381, 99: 840, 'mean': 184, 'asn': 185}
0.1 {25: 193, 50: 332, 75: 587, 90: 974, 99: 2157, 'mean': 469, 'asn': 469}
0.08 {25: 301, 50: 518, 75: 916, 90: 1520, 99: 3366, 'mean': 730, 'asn': 731}
0.06 {25: 531, 50: 914, 75: 1619, 90: 2700, 99: 5980, 'mean': 1294, 'asn': 1295}
0.04 {25: 1188, 50: 2051, 75: 3637, 90: 6053, 99: 13455, 'mean': 2900, 'asn': 2902}
0.02 {25: 4725, 50: 8157, 75: 14486, 90: 24149, 99: 53640, 'mean': 11556, 'asn': 11562}
0.01 {25: 18839, 50: 32547, 75: 57838, 90: 96411, 99: 214491, 'mean': 46126, 'asn': 46150}


In [6]:
def mean_via_rounds(margin, round_quantiles):
    """For a given list of round sizes, specified by a list of quantiles,
    return the average number of ballots audited, as both a raw number
    and as a multiple of the mean value for auditing with rounds of size 1.

    FIXME: incorporate cost of final escallation to full hand count,
       adding a parameter for total number of ballots
    """
    last_q = 0
    total = 0.0
    for q in round_quantiles:
        total += (q - last_q) / 100 * quantiles[margin][q]
        last_q = q
    return (total, total / quantiles[margin]['asn'])

In [7]:
for roundqs in [
    [50, 75, 90, 99],
    [75, 90, 99],
    [25, 50, 75, 99],
    [25, 75, 99],
    [90, 99]]:
    ballots, excess = mean_via_rounds(margin, roundqs)
    print("Normalized: %.3f, ballots: %d for round schedule %s" % (excess, ballots, roundqs))

Normalized: 1.398, ballots: 64498 for round schedule [50, 75, 90, 99]
Normalized: 1.672, ballots: 77144 for round schedule [75, 90, 99]
Normalized: 1.707, ballots: 78783 for round schedule [25, 50, 75, 99]
Normalized: 1.844, ballots: 85106 for round schedule [25, 75, 99]
Normalized: 2.298, ballots: 106074 for round schedule [90, 99]
