# Investigation of anomalous undervote rate in Lt. Gov contest

## P.B. Stark and Kellie Ottoboni

If, within each piece of geography (county or precinct), each undervote 
in a statewide contest were equally likely to be in any of the other statewide contests (Lt. Gov, SoS, Attorney General, Commissioner of Agriculture, Commissioner of Insurance, Secretary of Labor, Superintendent of Schools), what is the chance that the highest undervote rate would be as high or higher than observed for the Lt. Gov. contest?

Randomization model: within each piece of geography, each undervote is equally likely to be in any of the 7 statewide contests (not counting Governor), independently across undervotes and across geography.

Test statistic: largest undervote rate across the 7 contests.

Test for the entire election, and separately for Polling-Place and Vote-By-Mail.

Data flow:

+ for each piece of geography, determine the total number of ballots cast from <ElectionVoterTurnout />
    - subtract total votes in each of the 7 contests from the number of ballots cast to get the number of undervotes in each of the contests in each piece of geography
    - pool the 7 contests to get the total number of undervotes for that piece of geography
    - aggregate over geography to get the total number of undervotes in each of the 7 contests, and the maximum undervote rate
+ Test
    - in each piece of geography, randomly assign each undervote a label of 1--7
    - aggregate across geography
    - record maximum undervote rate across the 7 contests

In [1]:
%matplotlib inline
import math
import numpy as np
import scipy as sp
import scipy.optimize
from scipy.stats import hypergeom, binom, norm, chi2
from scipy import special
from cryptorandom.cryptorandom import SHA256
from cryptorandom import sample
from permute.utils import binom_conf_interval
from permute.npc import fisher
import matplotlib.pyplot as plt
import pandas as pd
import csv

reps = 100

In [2]:
statewide_contests = np.array(["Lieutenant Governor", "Secretary Of State", \
                               "Attorney General", "Commissioner Of Agriculture", \
                               "Commissioner Of Insurance", "State School Superintendent", "Commissioner Of Labor"])

votes = pd.read_csv('../../Data/undervotes_by_county.csv')
votes.head()

Unnamed: 0,County,Vote type,Total ballots,Contest,Ballots cast,Undervotes
0,Appling,Absentee by Mail,530,Lieutenant Governor,523,7
1,Appling,Advance in Person,3298,Lieutenant Governor,3092,206
2,Appling,Election Day,2978,Lieutenant Governor,2768,210
3,Appling,Provisional,3,Lieutenant Governor,3,0
4,Atkinson,Absentee by Mail,88,Lieutenant Governor,88,0


In [3]:
total_votes = votes.groupby(["County", "Contest"]).agg(np.sum)
total_votes = total_votes.reset_index()
total_votes.head()

Unnamed: 0,County,Contest,Total ballots,Ballots cast,Undervotes
0,Appling,Attorney General,6809,6562,247
1,Appling,Commissioner Of Agriculture,6809,6577,232
2,Appling,Commissioner Of Insurance,6809,6607,202
3,Appling,Commissioner Of Labor,6809,6592,217
4,Appling,Lieutenant Governor,6809,6386,423


## Calculate the number of undervotes in each county

In [4]:
def calculate_undervotes_by_geography(df, geography_name):
    geog_undervotes = df.groupby([geography_name]).agg({'Undervotes' : 'sum',
                                                     'Total ballots' : lambda x: x.iloc[0]})
    geog_undervotes = geog_undervotes.reset_index()
    return geog_undervotes

In [5]:
undervote_totals = calculate_undervotes_by_geography(total_votes, "County")

## Test statistics

In [6]:
def calculate_max_undervote_rate(df):
    statewide_votes = df.groupby(["Contest"]).agg(np.sum)
    rates = statewide_votes["Undervotes"]/statewide_votes["Total ballots"]
    return np.max(rates)


def calculate_max_reversal2(undervotes):
    return np.max(np.diff(np.flip(undervotes)))

def calculate_max_reversal(undervotes):
    maxu = 0
    u = len(undervotes)
    for i in range(u): 
        for j in range(i+1, u): 
            maxu = np.max([maxu, undervotes[i]-undervotes[j]])
    return maxu


def calculate_correlation(undervotes):
    return np.corrcoef(undervotes, np.arange(1, len(undervotes)+1))[0,1]

## Function to randomly allocate undervotes and compute a statistic

In [7]:
def permute_undervotes(vote_totals, prng=np.random):
    statewide_contests = np.array(["Lieutenant Governor", "Secretary Of State", \
                                   "Attorney General", "Commissioner Of Agriculture", \
                                   "Commissioner Of Insurance", "State School Superintendent", "Commissioner Of Labor"])
    
    permuted_undervotes = np.zeros(7)
    N = np.sum(vote_totals.loc[vote_totals["Contest"]==statewide_contests[0], "Total ballots"])
    U = np.sum(vote_totals["Undervotes"])
    slots = np.repeat(np.arange(7), N)
        
    # in each geog. unit, randomly allocate the U undervotes to each of 7 contests
    random_contests = sample.random_sample(slots, U, replace=False, prng=prng)
    for i in range(len(statewide_contests)):
        permuted_undervotes[i] = np.sum(random_contests == i)

    # Calculate the largest reversal U_i - U_j, for i < j
    return calculate_max_reversal(permuted_undervotes)


def loop_over_counties(vote_totals_df, reps, prng=np.random):
    counties = np.unique(vote_totals_df["County"])
    pvalues = {}
    
    # loop over counties, conduct test separately in each
    for county in counties:
        # Filter the data down to the county
        df = vote_totals_df[vote_totals_df["County"] == county]
        df = df.copy()
        
        # calculate observed undervotes for each contest
        undervotes_observed = [np.sum(df.loc[df["Contest"]==statewide_contests[i], "Undervotes"]) \
                       for i in range(7)]
        tst = calculate_max_reversal(undervotes_observed)
        
        # run test
        perm_distribution = [permute_undervotes(df, prng=prng) for i in range(reps)]
        pvalues[county] = (1+np.sum(perm_distribution >= tst))/(reps+1)
    
    return pvalues

def calculate_combined_pvalue(pvalue_dict):
    fisher_val = fisher(list(pvalue_dict.values()))
    n = len(pvalue_dict)
    return chi2.sf(fisher_val, df=2*n)

## Initialize the pseudo-random number generator

In [8]:
#seed = '2018CV313418 3463593937'  # case caption number [space] 10 rolls of 10-sided dice
#prng = SHA256(seed)

seed = 1234
prng = np.random
np.random.seed(seed)

# Results for the entire election

In [9]:
undervotes_observed = [np.sum(votes.loc[votes["Contest"]==statewide_contests[i], "Undervotes"]) \
                       for i in range(7)]
for i in range(7):
    print(statewide_contests[i], undervotes_observed[i])
tst = calculate_max_reversal(undervotes_observed)
print("Max reversal:", tst)

Lieutenant Governor 159038
Secretary Of State 55748
Attorney General 76972
Commissioner Of Agriculture 95862
Commissioner Of Insurance 77717
State School Superintendent 76878
Commissioner Of Labor 89892
Max reversal: 103290


In [10]:
pvalues = loop_over_counties(votes, reps=reps, prng=prng)
pvalues

{'Appling': 0.009900990099009901,
 'Atkinson': 0.009900990099009901,
 'Bacon': 0.009900990099009901,
 'Baker': 0.009900990099009901,
 'Baldwin': 0.009900990099009901,
 'Banks': 0.009900990099009901,
 'Barrow': 0.009900990099009901,
 'Bartow': 0.009900990099009901,
 'Ben Hill': 0.009900990099009901,
 'Berrien': 0.009900990099009901,
 'Bibb': 0.009900990099009901,
 'Bleckley': 0.009900990099009901,
 'Brantley': 0.009900990099009901,
 'Brooks': 0.009900990099009901,
 'Bryan': 0.009900990099009901,
 'Bulloch': 0.009900990099009901,
 'Burke': 0.009900990099009901,
 'Butts': 0.009900990099009901,
 'Calhoun': 0.009900990099009901,
 'Camden': 0.009900990099009901,
 'Candler': 0.009900990099009901,
 'Carroll': 0.009900990099009901,
 'Catoosa': 0.009900990099009901,
 'Charlton': 0.009900990099009901,
 'Chatham': 0.009900990099009901,
 'Chattahoochee': 0.0297029702970297,
 'Chattooga': 0.009900990099009901,
 'Cherokee': 0.009900990099009901,
 'Clarke': 0.009900990099009901,
 'Clay': 0.00990099009

In [11]:
calculate_combined_pvalue(pvalues)

2.720560400187251e-142

# Results for the absentee votes

In [12]:
absentee_votes = votes[votes["Vote type"]=="Absentee by Mail"]
absentee_votes = absentee_votes.copy()

In [13]:
undervotes_observed_abs = [np.sum(absentee_votes.loc[absentee_votes["Contest"]==statewide_contests[i], "Undervotes"]) \
                       for i in range(7)]
for i in range(7):
    print(statewide_contests[i], undervotes_observed_abs[i])
tst_abs = calculate_max_reversal(undervotes_observed_abs)
print("Max reversal:", tst_abs)

Lieutenant Governor 2244
Secretary Of State 3066
Attorney General 3788
Commissioner Of Agriculture 4543
Commissioner Of Insurance 3758
State School Superintendent 4735
Commissioner Of Labor 5523
Max reversal: 785


In [14]:
pvalues_abs = loop_over_counties(absentee_votes, reps=reps, prng=prng)
pvalues_abs

{'Appling': 1.0,
 'Atkinson': 0.9801980198019802,
 'Bacon': 1.0,
 'Baker': 0.9900990099009901,
 'Baldwin': 1.0,
 'Banks': 0.8415841584158416,
 'Barrow': 1.0,
 'Bartow': 0.04950495049504951,
 'Ben Hill': 0.9504950495049505,
 'Berrien': 0.900990099009901,
 'Bibb': 0.06930693069306931,
 'Bleckley': 0.9801980198019802,
 'Brantley': 1.0,
 'Brooks': 0.9702970297029703,
 'Bryan': 0.900990099009901,
 'Bulloch': 0.6633663366336634,
 'Burke': 1.0,
 'Butts': 0.7425742574257426,
 'Calhoun': 1.0,
 'Camden': 0.9900990099009901,
 'Candler': 0.9207920792079208,
 'Carroll': 0.26732673267326734,
 'Catoosa': 0.594059405940594,
 'Charlton': 0.45544554455445546,
 'Chatham': 0.6534653465346535,
 'Chattahoochee': 1.0,
 'Chattooga': 0.27722772277227725,
 'Cherokee': 0.45544554455445546,
 'Clarke': 1.0,
 'Clay': 1.0,
 'Clayton': 0.019801980198019802,
 'Clinch': 0.5643564356435643,
 'Cobb': 0.9702970297029703,
 'Coffee': 0.9405940594059405,
 'Colquitt': 0.8415841584158416,
 'Columbia': 0.06930693069306931,
 'Co

In [15]:
calculate_combined_pvalue(pvalues_abs)

0.9999999999999878

# Results for the election day votes

In [16]:
electionday_votes = votes[votes["Vote type"]=="Election Day"]
electionday_votes = electionday_votes.copy()

In [17]:
undervotes_observed_ed = [np.sum(electionday_votes.loc[electionday_votes["Contest"]==statewide_contests[i], "Undervotes"]) \
                       for i in range(7)]
for i in range(7):
    print(statewide_contests[i], undervotes_observed_ed[i])
tst_ed = calculate_max_reversal(undervotes_observed_ed)
print("Max reversal:", tst_ed)

Lieutenant Governor 82293
Secretary Of State 27531
Attorney General 39451
Commissioner Of Agriculture 50152
Commissioner Of Insurance 41017
State School Superintendent 40077
Commissioner Of Labor 47062
Max reversal: 54762


In [18]:
pvalues_ed = loop_over_counties(electionday_votes, reps=reps, prng=prng)
pvalues_ed

{'Appling': 0.009900990099009901,
 'Atkinson': 0.009900990099009901,
 'Bacon': 0.009900990099009901,
 'Baker': 0.009900990099009901,
 'Baldwin': 0.009900990099009901,
 'Banks': 0.009900990099009901,
 'Barrow': 0.009900990099009901,
 'Bartow': 0.04950495049504951,
 'Ben Hill': 0.009900990099009901,
 'Berrien': 0.009900990099009901,
 'Bibb': 0.009900990099009901,
 'Bleckley': 0.009900990099009901,
 'Brantley': 0.009900990099009901,
 'Brooks': 0.009900990099009901,
 'Bryan': 0.009900990099009901,
 'Bulloch': 0.009900990099009901,
 'Burke': 0.009900990099009901,
 'Butts': 0.009900990099009901,
 'Calhoun': 0.009900990099009901,
 'Camden': 0.009900990099009901,
 'Candler': 0.009900990099009901,
 'Carroll': 0.009900990099009901,
 'Catoosa': 0.009900990099009901,
 'Charlton': 0.009900990099009901,
 'Chatham': 0.009900990099009901,
 'Chattahoochee': 0.07920792079207921,
 'Chattooga': 0.009900990099009901,
 'Cherokee': 0.009900990099009901,
 'Clarke': 0.009900990099009901,
 'Clay': 0.00990099009

In [19]:
calculate_combined_pvalue(pvalues_ed)

1.4909155050381697e-138

# Repeat everything with the 2014 data

In [20]:
votes_2014 = pd.read_csv('../../Data/undervotes_by_county_2014.csv')
votes_2014.head()

Unnamed: 0,County,Vote type,Total ballots,Contest,Ballots cast,Undervotes
0,Appling,Absentee by Mail,306,Lieutenant Governor,302,4
1,Appling,Advance in Person,976,Lieutenant Governor,970,6
2,Appling,Election Day,3122,Lieutenant Governor,3073,49
3,Appling,Provisional,4,Lieutenant Governor,4,0
4,Atkinson,Absentee by Mail,68,Lieutenant Governor,66,2


In [21]:
total_votes_2014 = votes_2014.groupby(["County", "Contest"]).agg(np.sum)
total_votes_2014 = total_votes_2014.reset_index()
total_votes_2014.head()

Unnamed: 0,County,Contest,Total ballots,Ballots cast,Undervotes
0,Appling,Attorney General,4408,4362,46
1,Appling,Commissioner Of Agriculture,4408,4373,35
2,Appling,Commissioner Of Insurance,4408,4341,67
3,Appling,Commissioner Of Labor,4408,4366,42
4,Appling,Lieutenant Governor,4408,4349,59


In [22]:
undervote_totals_2014 = calculate_undervotes_by_geography(total_votes_2014, "County")

## Results for the entire 2014 election

In [23]:
undervotes_observed_2014 = [np.sum(votes_2014.loc[votes["Contest"]==statewide_contests[i], "Undervotes"]) \
                       for i in range(7)]
for i in range(7):
    print(statewide_contests[i], undervotes_observed_2014[i])
tst_2014 = calculate_max_reversal(undervotes_observed_2014)
print("Max reversal:", tst_2014)

Lieutenant Governor 23228
Secretary Of State 24635
Attorney General 28035
Commissioner Of Agriculture 42912
Commissioner Of Insurance 32429
State School Superintendent 28399
Commissioner Of Labor 44730
Max reversal: 14513


In [24]:
pvalues_2014 = loop_over_counties(total_votes_2014, reps=reps, prng=prng)
pvalues_2014

{'Appling': 0.009900990099009901,
 'Atkinson': 0.0297029702970297,
 'Bacon': 0.009900990099009901,
 'Baker': 0.009900990099009901,
 'Baldwin': 0.009900990099009901,
 'Banks': 0.019801980198019802,
 'Barrow': 0.10891089108910891,
 'Bartow': 0.5346534653465347,
 'Ben Hill': 0.009900990099009901,
 'Berrien': 0.009900990099009901,
 'Bibb': 0.009900990099009901,
 'Bleckley': 0.6534653465346535,
 'Brantley': 0.009900990099009901,
 'Brooks': 0.009900990099009901,
 'Bryan': 0.009900990099009901,
 'Bulloch': 0.009900990099009901,
 'Burke': 0.009900990099009901,
 'Butts': 0.8613861386138614,
 'Calhoun': 0.04950495049504951,
 'Camden': 0.009900990099009901,
 'Candler': 0.42574257425742573,
 'Carroll': 0.0594059405940594,
 'Catoosa': 0.009900990099009901,
 'Charlton': 0.009900990099009901,
 'Chatham': 0.009900990099009901,
 'Chattahoochee': 0.48514851485148514,
 'Chattooga': 0.009900990099009901,
 'Cherokee': 0.009900990099009901,
 'Clarke': 0.019801980198019802,
 'Clay': 0.10891089108910891,
 'Cl

In [25]:
calculate_combined_pvalue(pvalues_2014)

5.674362585810374e-92

## Results for the 2014 absentee votes

In [26]:
absentee_votes_2014 = votes_2014[votes_2014["Vote type"]=="Absentee by Mail"]
absentee_votes_2014 = absentee_votes_2014.copy()

In [27]:
undervotes_observed_abs_2014 = [np.sum(absentee_votes_2014.loc[absentee_votes_2014["Contest"]==statewide_contests[i], \
                                                               "Undervotes"]) for i in range(7)]
for i in range(7):
    print(statewide_contests[i], undervotes_observed_abs_2014[i])
tst_abs_2014 = calculate_max_reversal(undervotes_observed_abs_2014)
print("Max reversal:", tst_abs_2014)

Lieutenant Governor 1283
Secretary Of State 1855
Attorney General 2158
Commissioner Of Agriculture 2751
Commissioner Of Insurance 2537
State School Superintendent 2554
Commissioner Of Labor 3226
Max reversal: 214


In [28]:
pvalues_abs_2014 = loop_over_counties(absentee_votes_2014, reps=reps, prng=prng)
pvalues_abs_2014

{'Appling': 0.9702970297029703,
 'Atkinson': 1.0,
 'Bacon': 0.8316831683168316,
 'Baker': 0.9900990099009901,
 'Baldwin': 0.9702970297029703,
 'Banks': 0.9702970297029703,
 'Barrow': 0.9504950495049505,
 'Bartow': 0.9405940594059405,
 'Ben Hill': 0.8811881188118812,
 'Berrien': 0.9405940594059405,
 'Bibb': 0.49504950495049505,
 'Bleckley': 0.9900990099009901,
 'Brantley': 0.9306930693069307,
 'Brooks': 0.009900990099009901,
 'Bryan': 0.45544554455445546,
 'Bulloch': 1.0,
 'Burke': 0.1782178217821782,
 'Butts': 0.6732673267326733,
 'Calhoun': 0.7524752475247525,
 'Camden': 0.9801980198019802,
 'Candler': 1.0,
 'Carroll': 0.9702970297029703,
 'Catoosa': 0.8910891089108911,
 'Charlton': 0.9108910891089109,
 'Chatham': 0.16831683168316833,
 'Chattahoochee': 0.9801980198019802,
 'Chattooga': 0.504950495049505,
 'Cherokee': 0.7722772277227723,
 'Clarke': 0.9702970297029703,
 'Clay': 0.5544554455445545,
 'Clayton': 0.6435643564356436,
 'Clinch': 0.8613861386138614,
 'Cobb': 0.297029702970297,

In [29]:
calculate_combined_pvalue(pvalues_abs_2014)

1.0

## Results for the 2014 election day votes

In [30]:
electionday_votes_2014 = votes_2014[votes_2014["Vote type"]=="Election Day"]
electionday_votes_2014 = electionday_votes_2014.copy()

In [31]:
undervotes_observed_ed_2014 = [np.sum(electionday_votes_2014.loc[electionday_votes_2014["Contest"]==statewide_contests[i], \
                                                                 "Undervotes"]) for i in range(7)]
for i in range(7):
    print(statewide_contests[i], undervotes_observed_ed_2014[i])
tst_ed_2014 = calculate_max_reversal(undervotes_observed_ed_2014)
print("Max reversal:", tst_ed_2014)

Lieutenant Governor 15106
Secretary Of State 16374
Attorney General 18324
Commissioner Of Agriculture 26828
Commissioner Of Insurance 21500
State School Superintendent 18787
Commissioner Of Labor 30206
Max reversal: 8041


In [32]:
pvalues_ed_2014 = loop_over_counties(electionday_votes_2014, reps=reps, prng=prng)
pvalues_ed_2014

{'Appling': 0.009900990099009901,
 'Atkinson': 0.09900990099009901,
 'Bacon': 0.009900990099009901,
 'Baker': 0.009900990099009901,
 'Baldwin': 0.009900990099009901,
 'Banks': 0.04950495049504951,
 'Barrow': 0.44554455445544555,
 'Bartow': 0.9504950495049505,
 'Ben Hill': 0.009900990099009901,
 'Berrien': 0.009900990099009901,
 'Bibb': 0.009900990099009901,
 'Bleckley': 0.25742574257425743,
 'Brantley': 0.009900990099009901,
 'Brooks': 0.009900990099009901,
 'Bryan': 0.009900990099009901,
 'Bulloch': 0.009900990099009901,
 'Burke': 0.009900990099009901,
 'Butts': 0.8613861386138614,
 'Calhoun': 0.019801980198019802,
 'Camden': 0.009900990099009901,
 'Candler': 0.7920792079207921,
 'Carroll': 0.06930693069306931,
 'Catoosa': 0.0297029702970297,
 'Charlton': 0.009900990099009901,
 'Chatham': 0.009900990099009901,
 'Chattahoochee': 0.09900990099009901,
 'Chattooga': 0.07920792079207921,
 'Cherokee': 0.009900990099009901,
 'Clarke': 0.27722772277227725,
 'Clay': 0.07920792079207921,
 'Clay

In [33]:
calculate_combined_pvalue(pvalues_ed_2014)

9.787767078248564e-80