# Crossover Analysis for State Wide 2022 General Election Primary

According to Niesse<sup>1</sup>, Georgia Republican legislators are claiming that the primary was tainted by their opponents. The essential claim is the that Democrats influenced the outcome of the Republican Party Primary by pulling a Republican Party primary ballot rather than a Democratic ballot. This merits of this claim can be tested using the voting history provided by the Georgia Secretary of State. The essence of the test is to count how many voters with a history of voting mostly in Democratic primaries pulled a Republican Party primary ballot. One then compares that count to the total number of ballots cast in the Republican primary. The count is signiticant only if it is larger than the margin of victory.

> **_Note:_** The voting history was retrieved on June 9th, 2022 from the Georgia Secretary of State website. The voting history is still being updated, so it is not quite in sync with the vote tallies yet. The voter list was retrieved in March 2022, so some voters will have registered and voted after the voter list was retieved. The number of unaccounted for voters will be noted below.

In [6]:
%load_ext autoreload
%autoreload 2
import numpy as np
from segmentation.voter_segmentation import VoterSegmentation

Get an instance of the voter segmentation class.

In [7]:
root_dir = '~/Documents/data'
vs = VoterSegmentation(root_dir)

Get a summary of the voter history. For the crossover analysis, voters are scored after excluding the 5/24 election. Scoring will yield an observed party for each voter among other factors.

In [8]:
hist = vs.voter_history_summary()

In [21]:
score = vs.score_voters(end=len(hist.columns)-1)

Load Voter History Summary Time: 29.1
Compute Ops Time: 1141.3
Compute max_ballots_cast Time: 4.7
Compute ballots_cast Time: 11.1
Compute gn_max Time: 12.6
Compute pn_max Time: 15.8
Compute gn Time: 11.9
Compute rn Time: 10.7
Compute dn Time: 11.5
Compute gr Time: 3.9
Compute pr Time: 3.6
Compute ra Time: 3.5
Reorder Time: 2.1


The first step is to get a list of the voters that pulled a Republican ballot according to the most recent (June 9th) update of the voting history.

In [3]:
hist_r = hist[hist.loc[:, '2022-05-24']=='RP']
print(f'No. of voters that put a Republican Party ballot {len(hist_r.index)}')

NameError: name 'hist' is not defined

In [2]:
score_r = hist_r.merge(score, on='voter_id', how='inner')

NameError: name 'hist_r' is not defined

The score includes a primary participation rate. Let's get a count of the number of voters that never participated in a primary.

In [24]:
no_history_count = len(score_r[score_r.pr == 0].index)
print(f'No. voters in Republican primary that never participated in a primary is {no_history_count}')
print(f'which is {no_history_count/len(hist_r.index)*100:.1f}% of the Republican primary participants.')


No. voters in Republican primary that never participated in a primary is 305395
which is 25.7% of the Republican primary participants.


In [25]:
rep_history_count = len(score_r[score_r.ra > .5].index)
rep_history_count

798749

In [26]:
dem_history_count = len(score_r[(score_r.ra < .5)].index)
dem_history_count

57234

In [28]:
first_rep_primary_count = len(score_r[(score_r.ra < .5) & (score_r.rn == 0)].index)
first_rep_primary_count

46412

In [38]:
weak_dem_count = len(score_r[(score_r.ra < .5) & (score_r.dn == 1)].index)
weak_dem_count

27033

In [62]:
ind_history_count = len(score_r[np.isclose(score_r.ra, .5) & (score_r.pr > 0)].index)
ind_history_count

25302

In [64]:
no_history = score_r[score_r.pr == 0]
np.mean(2022-no_history.year_of_birth)

52.37524646994166

In [65]:
rep_history = score_r[score_r.ra > .5]
np.mean(2022-rep_history.year_of_birth)

62.86274706213514

In [66]:
dem_history = score_r[(score_r.ra < .5)]
np.mean(2022-dem_history.year_of_birth)

59.020706693683124

In [67]:
ind_history = score_r[np.isclose(score_r.ra, .5) & (score_r.pr > 0)]
np.mean(2022-ind_history.year_of_birth)

63.03821832266224

In [21]:
1165907+720603

1886510

In [22]:
1178625+731594

1910219

In [23]:
vs_f = vs[vs.county_code == '060']
vs_f

Unnamed: 0,voter_id,county_code,max_ballots_cast,ballots_cast,gn_max,pn_max,gn,rn,dn,gr,pr,ra
3266137,00001338,060,9,3,4,5,1,1,1,0.25,0.4,0.5
3266138,12814671,060,2,2,1,1,1,1,0,1.00,1.0,1.0
3266139,00001993,060,9,7,4,5,4,1,2,1.00,0.6,0.4
3266140,00002362,060,9,4,4,5,3,1,0,0.75,0.2,0.6
3266141,00002694,060,9,2,4,5,1,1,0,0.25,0.2,0.6
...,...,...,...,...,...,...,...,...,...,...,...,...
3811350,13257219,060,1,1,0,1,0,0,1,0.00,1.0,0.0
3811351,13257231,060,1,1,0,1,0,0,1,0.00,1.0,0.0
3811352,13257258,060,1,1,0,1,0,0,1,0.00,1.0,0.0
3811353,13257884,060,1,1,0,1,0,0,1,0.00,1.0,0.0


In [24]:
sum(vs_f.ra < .5)

211119

In [25]:
sum(vs_f.ra > .5)

83649

In [26]:
sum(vs_f.ra == .5)

250216

In [50]:
vh = sut.db.voter_history_for_date(2022, 5, 24)

In [52]:
vh_r = vh[vh.party=='R']
vh_r

Unnamed: 0,voter_id,date,type,party,county_id,absentee,provisional,supplemental
0,03391038,20220524,001,R,023,1,0,0
5,11039600,20220524,001,R,060,1,0,0
10,12416618,20220524,001,R,060,1,0,0
17,11720072,20220524,001,R,060,1,0,0
19,12828687,20220524,001,R,060,1,0,0
...,...,...,...,...,...,...,...,...
1969068,00067954,20220524,001,R,155,1,0,0
1969075,01303222,20220524,001,R,159,1,0,0
1969076,01646483,20220524,001,R,154,1,0,0
1969077,02782651,20220524,001,R,154,1,0,0


In [42]:
hist_fulton = hist[hist.county_code == '060']
hist_fulton_r = hist_fulton[hist_fulton.loc[:, '2022-05-24']=='RP']
len(hist_fulton_r.index)

69086

In [43]:
hist_fulton = hist[hist.county_code == '060']
hist_fulton_d = hist_fulton[hist_fulton.loc[:, '2022-05-24']=='DP']
len(hist_fulton_d.index)

114673

In [44]:
(69086+114673)/737975

0.24900437006673667