From [VEST 2020](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/K7760H) documentation:

"Election results and precinct shapefile from the NC State Board of Elections (https://www.ncsbe.gov/results-data)

Buncombe 681, Henderson CV, Wake 01-07A, 07-07A are missing from the 20201018 shapefile. They are added from the 20190827 shapefile.

North Carolina produces two sets of election results data. The precinct results are the unaltered results as initially reported by the counties. Many counties report early votes by vote center while provisional and other nonstandard ballots may be reported countywide. The precinct-sorted results are then produced within 30 days after the election. In the precinct-sorted data nearly all votes are assigned to precincts regardless of the manner by which the ballots were cast. However, North Carolina law requires the addition of statistical "noise" to the precinct-sorted data wherever any given vote by any specific voter may otherwise be deduced via cross referencing the various election-related data sets produced by the SBE.

For the 2020 general election 51 counties reported all votes by precinct in their initial precinct results. The precinct-sorted data set was used instead for the counties listed below.

Alleghany, Avery, Beaufort, Bertie*, Bladen, Buncombe, Cabarrus, Caldwell, Camden, Currituck, Dare, Davidson*, Davie, Duplin*, Durham*, Edgecombe, Guilford, Halifax*, Harnett, Haywood, Henderson, Hertford, Hyde, Johnston, Jones, Lee, Lincoln, Macon, Martin, Mecklenburg*, Moore, Nash, New Hanover*, Northampton*, Orange, Pasquotank, Pitt*, Polk, Richmond, Scotland, Stokes*, Surry*, Tyrrell*, Wake, Washington, Watauga, Wayne, Wilkes*, Yadkin

In counties marked by asterisk some votes were still reported by vote center or countywide in the precinct-sorted data. These were distributed by candidate to precincts based on the precinct-level reported vote. The precinct-sorted results were further adjusted to match the certified countywide totals based on the precinct-level vote by candidate."

Note that the RDH checked which counties contained key words that would indicate sorting was required and found that the 2022 list of counties that needed sorting was down to 46 from 51. The list mostly overlaps with the above with a few added and removed. For more information see code below. 

**2022 RDH Processing:** 

Absentee and voting center votes were allocated proportionally to precincts, by share of precinct-reported vote.

The precinct shapefile available [here](https://www.nconemap.gov/datasets/voting-precincts/explore?location=35.097107%2C-79.888900%2C7.41) was last updated in March of 2023 and therefore has precinct names missing and that do not match the November 2022 election results. After reaching out to the NCSBE, we received the following response which led to all but 5 precinct names matching between the two files:

*I’m not sure which file that site is displaying/making available for download. But I would suggest using this one: https://s3.amazonaws.com/dl.ncsbe.gov/PrecinctMaps/SBE_PRECINCTS_20220831.zip, as it is the data closest to the Nov 2022 election while also being before the election.*

*We provide shapefiles on our ftp site, which is linked to on our Voting Maps/Redistricting page: https://www.ncsbe.gov/results-data/voting-maps-redistricting*

# Load packages and data

In [1]:
import pandas as pd
import geopandas as gp
import os
from pdv_functions import *
pd.options.display.max_columns = 100
'''
Sources:
precinct shp: https://s3.amazonaws.com/dl.ncsbe.gov/PrecinctMaps/SBE_PRECINCTS_20220831.zip
precinct election results: https://www.ncsbe.gov/results-data/election-results/historical-election-results-data
'''

'\nSources:\nprecinct shp: https://s3.amazonaws.com/dl.ncsbe.gov/PrecinctMaps/SBE_PRECINCTS_20220831.zip\nprecinct election results: https://www.ncsbe.gov/results-data/election-results/historical-election-results-data\n'

In [2]:
#gdf = gp.read_file("./raw-from-source/Voting_Precincts/Voting_Precincts.shp")
gdf = gp.read_file("./raw-from-source/SBE_PRECINCTS_20220831/SBE_PRECINCTS_20220831.shp")
df = pd.read_table("./raw-from-source/results_pct_20221108 (1).zip", sep = "\t")
sorted_prec = pd.read_csv("./raw-from-source/sorted_precincts/AllCounties.txt", sep="\t")

print("# prec ids in gdf not in df: ",len(set((gdf.county_nam.str.upper()+gdf.prec_id.str.upper()))-set(df.County.str.upper()+df.Precinct.str.upper())))
print("# prec ids in df not in gdf: ", len(set(df.County.str.upper()+df.Precinct.str.upper())-set(gdf.county_nam.str.upper()+gdf.prec_id.str.upper())))
print("shape df: ", (df.County.str.upper()+df.Precinct.str.upper()).nunique(), "\nshape gdf: ", (gdf.county_nam.str.upper()+gdf.prec_id.str.upper()).nunique())

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


# prec ids in gdf not in df:  0
# prec ids in df not in gdf:  325
shape df:  2977 
shape gdf:  2652


# Process unsorted election results

From Ballotpedia:
- To include: US Senate, US House, State Senate, State House, State Supreme Court 
    - 'US SENATE'
    - 'US HOUSE OF REPRESENTATIVES DISTRICT XX'
    - 'NC STATE SENATE DISTRICT XX'
    - 'NC HOUSE OF REPRESENTATIVES DISTRICT XXX'
    - 'NC SUPREME COURT ASSOCIATE JUSTICE SEAT XX'
    
- Not sure: Intermediate Appelate Courts
    - 'NC COURT OF APPEALS JUDGE SEAT XX'
- Not to include: School boards, Municipal government, local ballot measures

## Grab info for column dictionaries

In [5]:
#Set party col
potential_party = df['Choice Party']
party_dict = {'DEM':'D','LIB':'L','REP':'R','UNA':'U','GRE':'G', "na":"N"}
df["col_party"] = df.loc[df['Choice Party'].isin(party_dict.keys()), "Choice Party"].map(party_dict)
df.loc[df["col_party"].isna(), 'col_party'] = "N"


#Set last name abrv - will need to edit 
df["col_last_name"] = "na"
df.loc[df["Choice"].str.contains(". "), "col_last_name"] = df["Choice"].str.split(pat=" ").str[2].str.slice(stop=3).str.upper()

#Correcting for unique instances
df.loc[(df["Choice"]=='Ted Davis, Jr.')|(df["Choice"]=='Gettys Cohen, Jr.')|(df["Choice"]=='Paul Lowe, Jr.')|(df["Choice"]=='Howard Penny, Jr.'), "col_last_name"] = df["Choice"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
df.loc[(df["Choice"]=='Philip E. (Phil) Berger')|(df["Choice"]=='Mary Price (Pricey) Harrison')|(df["Choice"]=='Susan Lee (Susie) Scott')|(df["Choice"]=='Milton F. (Toby) Fitch'), "col_last_name"] = df["Choice"].str.split(pat=" ").str[3].str.slice(stop=3).str.upper()
df.loc[df["Choice"]=="Michael Greer O'Shea", "col_last_name"] = "OSH"

df.loc[df["col_last_name"].isna(), "col_last_name"] = df["Choice"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
df.loc[df["Choice"] == "Write-In (Miscellaneous)", "col_last_name"] = "OWR"


#Set contest
general_office_dict = {"US SENATE":"USS", "US HOUSE": "CON", "STATE SENATE":"SU", "NC HOUSE OF REPRESENTATIVES": "SL", 
                       "NC SUPREME COURT": "SSC", "NC COURT OF APPEALS JUDGE":"IA"}
df["col_office"]='na'
df.loc[(df["Contest Name"].str.contains("US SENATE")), "col_office"] = "US SENATE"
df.loc[(df["Contest Name"].str.contains("US HOUSE")), "col_office"] = "US HOUSE"
df.loc[(df["Contest Name"].str.contains("STATE SENATE")), "col_office"] = "STATE SENATE"
df.loc[(df["Contest Name"].str.contains("NC HOUSE OF REPRESENTATIVES")), "col_office"] = "NC HOUSE OF REPRESENTATIVES"
df.loc[(df["Contest Name"].str.contains("NC SUPREME COURT")), "col_office"] = "NC SUPREME COURT"
df.loc[(df["Contest Name"].str.contains("NC COURT OF APPEALS JUDGE")), "col_office"] = "NC COURT OF APPEALS JUDGE"
df["office_abr"] = df["col_office"].map(general_office_dict)


#Set districts
#Get CONG DIST
df["col_cong_dist"] = "na"
df.loc[df["Contest Name"].str.contains("US HOUSE"), "col_cong_dist"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET state sen dist
df["col_su_dist"] = "na"
df.loc[df["Contest Name"].str.contains("STATE SENATE"), "col_su_dist"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET state house dist
df["col_sl_dist"] = "na"
df.loc[df["Contest Name"].str.contains("NC HOUSE OF REPRESENTATIVES"), "col_sl_dist"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET ssc seat
df["col_ssc_seat"] = "na"
df.loc[df["Contest Name"].str.contains("NC SUPREME COURT"), "col_ssc_seat"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET court of appeals dist
df["col_ia_seat"] = "na"
df.loc[df["Contest Name"].str.contains("NC COURT OF APPEALS JUDGE"), "col_ia_seat"] = df["Contest Name"].str.split(pat=" ").str[-1]


#Create column names
df["full_col_names"] = "na"
#cong
df.loc[df["Contest Name"].str.contains("US HOUSE"), "full_col_names"] = "G" + df["office_abr"] + df['col_cong_dist'] + df['col_party'] + df['col_last_name']
#us sen
df.loc[df["Contest Name"].str.contains("US SENATE"), "full_col_names"] = "G22" + df["office_abr"] + df['col_party'] + df['col_last_name']
#state sen
df.loc[df["Contest Name"].str.contains("STATE SENATE"), "full_col_names"] = "G" + df["office_abr"] + df['col_su_dist'].str.zfill(2) + df['col_party'] + df['col_last_name']
#state house
df.loc[df["Contest Name"].str.contains("NC HOUSE OF REPRESENTATIVES"), "full_col_names"] = "G" + df["office_abr"] + df['col_sl_dist'].str.zfill(3) + df['col_party'] + df['col_last_name']
#state ssc
df.loc[df["Contest Name"].str.contains("NC SUPREME COURT"), "full_col_names"] = "G22" + df["office_abr"] + df["col_ssc_seat"].str.zfill(2) + df['col_last_name']
#IA court
df.loc[df["Contest Name"].str.contains("NC COURT OF APPEALS JUDGE"), "full_col_names"] = "G22" + df["office_abr"] + df["col_ia_seat"].str.zfill(2) + df['col_last_name']


#filter
df = df[~df["office_abr"].isna()]
#Make dict
unsorted_df_column_name_dict = pd.Series((df["Choice"]+", "+df["Contest Name"]).values, index=df["full_col_names"]).to_dict()

## Pivot

In [6]:
## PIVOT RESULTS
df_pivot = df.pivot_table(index = ['County','Precinct'],
                         columns = ['full_col_names'],
                        values = ['Total Votes'],
                         aggfunc = 'sum')


#Clean up the indices
df_pivot.reset_index(inplace = True,drop=False)
df_pivot[('County', 'County')] = df_pivot[('County', '')]
df_pivot[('Precinct', 'Precinct')] = df_pivot[('Precinct', '')]


#Rename the columns
df_pivot.columns = df_pivot.columns.map(pd.Series([col[1] for col in df_pivot.columns], index = [col for col in df_pivot.columns]).to_dict())
df_pivot = df_pivot.fillna(0)


df_pivot["UNIQUE_ID"] = df_pivot["County"] + "---" + df_pivot["Precinct"]

## Re-allocate absentee votes
For write in, assign last name = OWR

In [7]:
searchfor = ['ABS', 'PROVISIONAL','PROVISIOINAL','PROVI ', 'PROV',
             'ONE STOP','ONESTOP','OS ','OS-',' OS','OSAP','OSCA',
             'OSCH','OSKD','OSLL','OSLOB','OSNR','OSOP','OSTA','OSWA',
             'OSDU','-OS','OSAV','OSBOE','OSGR','OSHS','OSJB','OSSE','OSWD',
             'OSCS','OSHT','MAOS','DBOS',
             'CURBSIDE','TRANS','LEE COUNTY BOE', 'MCSWAIN CENTER' 
            ]
in_sos =  df_pivot[df_pivot["Precinct"].str.contains('|'.join(searchfor))]
in_sos = in_sos.groupby(by=["County"]).sum().reset_index()
in_sos

Unnamed: 0,County,G22IA08FLO,G22IA08THO,G22IA09SAL,G22IA09STR,G22IA10ADA,G22IA10TYS,G22IA11JAC,G22IA11STA,G22SSC03DIE,G22SSC03INM,G22SSC05ALL,G22SSC05ERV,G22USSDBEA,G22USSGHOH,G22USSLBRA,G22USSNLEW,G22USSNOWR,G22USSRBUD,GCON01DDAV,GCON01RSMI,GCON02DROS,GCON02RVIL,GCON03DGAS,GCON03RMUR,GCON04DFOU,GCON04RGEE,GCON05DPAR,GCON05RFOX,GCON06DMAN,GCON06LWAT,GCON06RCAS,GCON07DGRA,GCON07RROU,GCON08DHUF,GCON08RBIS,GCON09DCLA,GCON09RHUD,GCON10DGEN,GCON10NJIM,GCON10NOWR,GCON10RMCH,GCON11DBEA,GCON11LCOA,GCON11REDW,GCON12DADA,GCON12RLEE,GCON13DNIC,GCON13RHIN,GCON14DJAC,...,GSU22LUBI,GSU22RCOL,GSU23DMEY,GSU23RWOO,GSU24DGIB,GSU24RBRI,GSU25DEWI,GSU25RGAL,GSU26NOWR,GSU26NROB,GSU26RBER,GSU27DGAR,GSU27RSES,GSU28DROB,GSU28RSCH,GSU29DCRU,GSU29RCRA,GSU30DJOH,GSU30RJAR,GSU31RKRA,GSU32DLOW,GSU32RWAR,GSU33DHOR,GSU33RFOR,GSU34DSAN,GSU34RNEW,GSU35RJOH,GSU36RSET,GSU37RSAW,GSU38DMOH,GSU39DSAL,GSU39RROB,GSU40DWAD,GSU40RSHI,GSU41DMAR,GSU41RLEO,GSU42DHUN,GSU42RRUS,GSU43ROVE,GSU44RALE,GSU45RPRO,GSU46DMAR,GSU46RDAN,GSU47RHIS,GSU48DCAR,GSU48RMOF,GSU49DMAY,GSU49RAND,GSU50DMCC,GSU50RCOR
0,ALLEGHANY,2199.0,967.0,889.0,2283.0,949.0,2206.0,936.0,2204.0,2221.0,998.0,2193.0,1017.0,963.0,20.0,48.0,0.0,2.0,2211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,920.0,2328.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2413.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AVERY,1533.0,759.0,731.0,1563.0,756.0,1531.0,750.0,1536.0,1531.0,767.0,1502.0,799.0,777.0,12.0,32.0,0.0,1.0,1500.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,755.0,1559.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1663.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BEAUFORT,6158.0,3829.0,3656.0,6362.0,3793.0,6203.0,3779.0,6183.0,6289.0,3781.0,6186.0,3878.0,3897.0,55.0,134.0,2.0,1.0,6038.0,0.0,0.0,0.0,0.0,3768.0,6330.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BERTIE,1267.0,2073.0,2101.0,1292.0,2106.0,1275.0,2099.0,1275.0,1316.0,2093.0,1282.0,2124.0,2137.0,15.0,35.0,0.0,1.0,1257.0,2216.0,1218.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BLADEN,3508.0,3218.0,3148.0,3577.0,3156.0,3547.0,3143.0,3493.0,3545.0,3228.0,3567.0,3184.0,3248.0,48.0,79.0,0.0,2.0,3476.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3207.0,3621.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,BUNCOMBE,22790.0,46789.0,44971.0,24560.0,46556.0,22969.0,46563.0,22899.0,23425.0,46454.0,22945.0,46944.0,46494.0,596.0,692.0,6.0,22.0,22347.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46492.0,810.0,22753.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12271.0,7699.0,0.0,0.0,0.0,34541.0,15146.0,0.0,0.0
6,CABARRUS,18023.0,18115.0,17478.0,18712.0,18096.0,18044.0,17914.0,18162.0,18148.0,18137.0,18022.0,18247.0,18362.0,241.0,426.0,0.0,15.0,17471.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2538.0,4025.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15738.0,14018.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17528.0,17632.0,734.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,CALDWELL,12803.0,4428.0,4282.0,12975.0,4413.0,12818.0,4425.0,12784.0,12860.0,4435.0,12571.0,4747.0,4474.0,103.0,204.0,0.0,11.0,12601.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3558.0,10681.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,764.0,0.0,7.0,2300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10301.0,0.0,0.0,3339.0,0.0,0.0,0.0,0.0,0.0,0.0
8,CAMDEN,1542.0,654.0,609.0,1597.0,646.0,1557.0,627.0,1566.0,1574.0,631.0,1563.0,651.0,631.0,15.0,45.0,1.0,3.0,1549.0,0.0,0.0,0.0,0.0,632.0,1596.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,CASWELL,308.0,171.0,161.0,319.0,165.0,313.0,171.0,310.0,311.0,171.0,308.0,175.0,169.0,1.0,5.0,0.0,0.0,305.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,176.0,3.0,303.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,164.0,321.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
election_results = df_pivot[~df_pivot["Precinct"].str.contains('|'.join(searchfor))]
election_results = allocate_absentee(election_results,in_sos,df_pivot.columns[df_pivot.columns.str.startswith("G")],'County')
print('Done')

Done


# GDF

- reverse clip union
- gp overlay, how=symmetric difference - use symmetric difference to make file that has holes to put precincts into - rather than concat having two shapes cut out spots to fill in, then concatenate product of symmetric difference and missing precinct shapes
- can check by dissolving on unique precinct id

In [9]:
assert gdf['geometry'].isna().any()==False
#gdf.plot()

# Check VEST file for shapes missing from 2022 shp where the name is in 2022 election results

In [10]:
#check VEST file
vest = gp.read_file("./raw-from-source/vest_nc_2020/nc_2020.shp")

In [11]:
vest[vest['PREC_ID'].isin(['CV', '01-07A', '07-07A'])]

Unnamed: 0,PREC_ID,ENR_DESC,COUNTY_NAM,COUNTY_ID,G20PRERTRU,G20PREDBID,G20PRELJOR,G20PREGHAW,G20PRECBLA,G20PREOWRI,G20USSRTIL,G20USSDCUN,G20USSLBRA,G20USSCHAY,G20GOVRFOR,G20GOVDCOO,G20GOVLDIF,G20GOVCPIS,G20LTGRROB,G20LTGDHOL,G20ATGRONE,G20ATGDSTE,G20TRERFOL,G20TREDCHA,G20SOSRSYK,G20SOSDMAR,G20AUDRSTR,G20AUDDWOO,G20AGRRTRO,G20AGRDWAD,G20INSRCAU,G20INSDGOO,G20LABRDOB,G20LABDHOL,G20SPIRTRU,G20SPIDMAN,G20SSCRNEW,G20SSCDBEA,G20SSCRBER,G20SSCDINM,G20SSCRBAR,G20SSCDDAV,G20SACRWOO,G20SACDSHI,G20SACRGOR,G20SACDCUB,G20SACRDIL,G20SACDSTY,G20SACRCAR,G20SACDYOU,G20SACRGRI,G20SACDBRO,geometry
2652,CV,CV_CAROLINA VILLAGE,HENDERSON,45,213,243,1,2,1,0,221,216,4,6,175,283,0,1,228,222,224,227,237,193,212,224,215,224,250,190,229,201,229,203,225,212,217,219,225,209,235,198,231,198,222,205,234,191,224,203,222,202,"POLYGON ((973321.651 597985.741, 973249.732 59..."
2657,01-07A,01-07A,WAKE,92,44,194,0,0,0,0,38,187,5,6,34,199,2,3,38,193,34,196,38,189,35,198,36,193,45,186,39,188,37,191,40,189,38,192,40,191,35,193,41,189,41,185,35,195,38,193,36,194,"MULTIPOLYGON (((2104589.500 741442.312, 210458..."
2660,07-07A,07-07A,WAKE,92,182,335,1,1,0,2,184,312,4,3,142,371,1,0,183,310,173,323,217,267,159,338,161,324,300,187,193,279,187,281,186,286,190,299,189,301,195,281,193,279,191,279,212,265,193,270,185,278,"POLYGON ((2096537.911 775663.634, 2096510.875 ..."


In [12]:
vest[vest['PREC_ID'].isin(['CV', '01-07A', '07-07A'])].to_file("missing_precincts.shp")

## Remove/Add shapes

remove 01-07 from 2022 file, replace with 01-07A and 01-07 from VEST file and for 07-07 and 07A

gp.GeoDataFrame(pd.concat([precinct1,precinct2]),crs=precinct_crs)

.explode for 'CV' - pull out that piece na d, then pd concat to vest shape
remove all 2022 attribute data, use the 2020 data, make some dummy field to make the two rows have same value to dissolve on - keep name from 2020 one dissolve on, that will give the shape 
then with shape to put in, do symmetric difference again

In [14]:
#In QGIS, check what GDF shapes the "missing_precincts" shapes are within and determine it is HV-2, 07-07, 01-07
shps_contains_missing_prec = gdf[gdf['prec_id'].isin(['HV-2','07-07', '01-07'])]
#isolate precincts of interest from 2020 VEST file
missing_precincts = vest[['PREC_ID', 'ENR_DESC', 'COUNTY_NAM','geometry']][vest['PREC_ID'].isin(['CV', '01-07A', '07-07A'])]
#overlay 2020 VEST subset prec shapes with 2022 shapes that contain those - ENR_
symm_diff_subset = gp.overlay(shps_contains_missing_prec,missing_precincts,how='symmetric_difference')

gdf_w_cutouts = gdf[~gdf['prec_id'].isin(['HV-2','07-07', '01-07'])]
#Do pd concat instead of overlay union
subset_reform = gp.overlay(missing_precincts, symm_diff_subset, how='union')
gdf_reform = gp.overlay(subset_reform, gdf_w_cutouts, how='union')

#os.mkdir('./reformed_shp')
#gdf_reform[(gdf_reform['prec_id_1'].isin(['HV-2','07-07', '01-07']))|(gdf_reform['PREC_ID_1'].isin(['01-07A', '07-07A', 'CV']))].to_file('reformed_shp/reformed_shp.shp')
#gdf_reform.to_file('reformed_shp/reformed_shp.shp')

  subset_reform = gp.overlay(missing_precincts, symm_diff_subset, how='union')
  gdf_reform = gp.overlay(subset_reform, gdf_w_cutouts, how='union')


In [18]:
gdf_reform

Unnamed: 0,PREC_ID,ENR_DESC,COUNTY_NAM,geometry
2652,CV,CV_CAROLINA VILLAGE,HENDERSON,"POLYGON ((973321.651 597985.741, 973249.732 59..."
2657,01-07A,01-07A,WAKE,"MULTIPOLYGON (((2104589.500 741442.312, 210458..."
2660,07-07A,07-07A,WAKE,"POLYGON ((2096537.911 775663.634, 2096510.875 ..."


In [19]:
symm_diff_subset

Unnamed: 0,id,prec_id,enr_desc,county_nam,of_prec_id,county_id,PREC_ID,ENR_DESC,COUNTY_NAM,geometry
0,107.0,01-07,01-07,WAKE,,92.0,,,,"POLYGON ((2106279.121 739420.501, 2106272.357 ..."
1,108.0,01-07,01-07,CABARRUS,,13.0,,,,"POLYGON ((1515403.666 565410.319, 1515411.388 ..."
2,510.0,07-07,07-07,WAKE,,92.0,,,,"POLYGON ((2091828.429 772912.417, 2091548.173 ..."
3,1968.0,HV-2,HENDERSONVILLE-2,HENDERSON,,45.0,,,,"MULTIPOLYGON (((971978.045 581028.943, 972010...."
4,,,,,,,CV,CV_CAROLINA VILLAGE,HENDERSON,"MULTIPOLYGON (((970943.419 597325.475, 970943...."


In [20]:
gdf_w_cutouts

Unnamed: 0,id,prec_id,enr_desc,county_nam,of_prec_id,county_id,geometry
0,1,0003,ALBEMARLE NUMBER 3,STANLY,,84,"POLYGON ((1644857.853 584760.831, 1644768.728 ..."
1,2,0003,DREXEL 03,BURKE,,12,"POLYGON ((1220715.101 726879.358, 1220723.026 ..."
2,22,0019,LINVILLE 01,BURKE,,12,"POLYGON ((1142032.036 735283.149, 1141893.094 ..."
3,23,0019,RIDENHOUR,STANLY,,84,"POLYGON ((1590551.101 597789.851, 1590587.761 ..."
4,71,007,007,MECKLENBURG,,60,"POLYGON ((1465943.013 524872.128, 1465762.759 ..."
...,...,...,...,...,...,...,...
2647,10150,JMV,JAMESVILLE,MARTIN,,58,"POLYGON ((2656570.989 769941.315, 2656623.057 ..."
2648,10151,GRF,GRIFFINS,MARTIN,,58,"POLYGON ((2572997.648 730116.690, 2572954.935 ..."
2649,1962,GSN,GOOSE NEST,MARTIN,,58,"POLYGON ((2489121.775 774862.002, 2489165.442 ..."
2650,2386,RBV,ROBERSONVILLE,MARTIN,,58,"POLYGON ((2492739.958 755657.848, 2492576.655 ..."


In [32]:
gdf_reform

Unnamed: 0,PREC_ID_1,ENR_DESC_1,COUNTY_NAM_1,id_1,prec_id_1,enr_desc_1,county_nam_1,of_prec_id_1,county_id_1,PREC_ID_2,ENR_DESC_2,COUNTY_NAM_2,id_2,prec_id_2,enr_desc_2,county_nam_2,of_prec_id_2,county_id_2,geometry
0,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,CV,CV_CAROLINA VILLAGE,HENDERSON,2134.0,NE,NORTHEAST,HENDERSON,,45.0,"MULTIPOLYGON (((970943.422 597325.475, 970955...."
1,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,"POLYGON ((970913.231 599176.888, 970913.231 59..."
2,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,,,,,,,,,,"POLYGON ((973281.472 597898.662, 973299.701 59..."
3,01-07A,01-07A,WAKE,,,,,,,,,,,,,,,,"MULTIPOLYGON (((2103902.867 739811.109, 210389..."
4,07-07A,07-07A,WAKE,,,,,,,,,,,,,,,,"POLYGON ((2096510.875 775663.062, 2096507.624 ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2652,,,,,,,,,,,,,10150.0,JMV,JAMESVILLE,MARTIN,,58.0,"POLYGON ((2656570.989 769941.315, 2656623.057 ..."
2653,,,,,,,,,,,,,10151.0,GRF,GRIFFINS,MARTIN,,58.0,"POLYGON ((2572997.648 730116.690, 2572954.935 ..."
2654,,,,,,,,,,,,,1962.0,GSN,GOOSE NEST,MARTIN,,58.0,"POLYGON ((2489121.775 774862.002, 2489165.442 ..."
2655,,,,,,,,,,,,,2386.0,RBV,ROBERSONVILLE,MARTIN,,58.0,"POLYGON ((2492739.958 755657.848, 2492576.655 ..."


In [59]:
gdf_reform[['ENR_DESC_1','ENR_DESC_2']][(gdf_reform['ENR_DESC_1']!=gdf_reform['ENR_DESC_2'])&(~gdf_reform['ENR_DESC_2'].isna())]#|(~gdf_reform['ENR_DESC_2'].isna()))]
#CUT COL ENR_DESC_1

Unnamed: 0,ENR_DESC_1,ENR_DESC_2


In [51]:
gdf_reform[['PREC_ID_2','ENR_DESC_1','ENR_DESC_2','enr_desc_1','enr_desc_2']][(gdf_reform['enr_desc_2']!=gdf_reform['enr_desc_1'])&((~gdf_reform['enr_desc_1'].isna())&(~gdf_reform['enr_desc_2'].isna()))]

Unnamed: 0,PREC_ID_2,ENR_DESC_1,ENR_DESC_2,enr_desc_1,enr_desc_2


In [48]:
gdf_reform[['PREC_ID_1','PREC_ID_2','ENR_DESC_1','ENR_DESC_2','enr_desc_1','enr_desc_2']][(gdf_reform['PREC_ID_1']!=gdf_reform['PREC_ID_2'])&((~gdf_reform['PREC_ID_1'].isna())|(~gdf_reform['PREC_ID_2'].isna()))]

Unnamed: 0,PREC_ID_1,PREC_ID_2,ENR_DESC_1,ENR_DESC_2,enr_desc_1,enr_desc_2
2,CV,,CV_CAROLINA VILLAGE,,,
3,01-07A,,01-07A,,,
4,07-07A,,07-07A,,,


In [49]:
gdf_reform

Unnamed: 0,PREC_ID_1,ENR_DESC_1,COUNTY_NAM_1,id_1,prec_id_1,enr_desc_1,county_nam_1,of_prec_id_1,county_id_1,PREC_ID_2,ENR_DESC_2,COUNTY_NAM_2,id_2,prec_id_2,enr_desc_2,county_nam_2,of_prec_id_2,county_id_2,geometry
0,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,CV,CV_CAROLINA VILLAGE,HENDERSON,2134.0,NE,NORTHEAST,HENDERSON,,45.0,"MULTIPOLYGON (((970943.422 597325.475, 970955...."
1,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,"POLYGON ((970913.231 599176.888, 970913.231 59..."
2,CV,CV_CAROLINA VILLAGE,HENDERSON,,,,,,,,,,,,,,,,"POLYGON ((973281.472 597898.662, 973299.701 59..."
3,01-07A,01-07A,WAKE,,,,,,,,,,,,,,,,"MULTIPOLYGON (((2103902.867 739811.109, 210389..."
4,07-07A,07-07A,WAKE,,,,,,,,,,,,,,,,"POLYGON ((2096510.875 775663.062, 2096507.624 ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2652,,,,,,,,,,,,,10150.0,JMV,JAMESVILLE,MARTIN,,58.0,"POLYGON ((2656570.989 769941.315, 2656623.057 ..."
2653,,,,,,,,,,,,,10151.0,GRF,GRIFFINS,MARTIN,,58.0,"POLYGON ((2572997.648 730116.690, 2572954.935 ..."
2654,,,,,,,,,,,,,1962.0,GSN,GOOSE NEST,MARTIN,,58.0,"POLYGON ((2489121.775 774862.002, 2489165.442 ..."
2655,,,,,,,,,,,,,2386.0,RBV,ROBERSONVILLE,MARTIN,,58.0,"POLYGON ((2492739.958 755657.848, 2492576.655 ..."


## Merge precinct boundaries with precinct election results from unsorted data, votes allocated by RDH

In [15]:
set(df_pivot['County']) - set(gdf_reform['county_nam'])

KeyError: 'county_nam'

In [19]:
merge_attempt2 = pd.merge(gdf_reform, election_results.fillna(value=0), left_on=['county_nam','prec_id'], right_on=["County","Precinct"], how='outer', indicator=True)
export_attempt2 = merge_attempt2[merge_attempt2['_merge']!='both']
export_attempt2[['county_nam', 'prec_id', 'enr_desc', 'of_prec_id', 'County', 'Precinct', '_merge']].to_csv('./merge_attempt2.csv')


'''
	county_nam	prec_id	enr_desc	of_prec_id	County	Precinct	_merge
2248	CASWELL	PROVI	PROVIDENCE				left_only
2652					HENDERSON	CV	right_only -- seems to be relatively normal precinct: https://www.hendersoncountync.gov/elections/page/carolina-village-cv
2653					LEE	MCSWAIN CENTER	right_only -- confirmed this is a OS location: https://leecountync.gov/departments/elections/polling_sites.php
2654					WAKE	01-07A	right_only
2655					WAKE	07-07A	right_only


'''

'\n\tcounty_nam\tprec_id\tenr_desc\tof_prec_id\tCounty\tPrecinct\t_merge\n2248\tCASWELL\tPROVI\tPROVIDENCE\t\t\t\tleft_only\n2652\t\t\t\t\tHENDERSON\tCV\tright_only -- seems to be relatively normal precinct: https://www.hendersoncountync.gov/elections/page/carolina-village-cv\n2653\t\t\t\t\tLEE\tMCSWAIN CENTER\tright_only -- confirmed this is a OS location: https://leecountync.gov/departments/elections/polling_sites.php\n2654\t\t\t\t\tWAKE\t01-07A\tright_only\n2655\t\t\t\t\tWAKE\t07-07A\tright_only\n\n\n'

# Checks

- vote total check against MEDSL election results file... - unless this is duplicate because from same source? Check at county and statewide level
- check vote total for precincts that did not match
- look at VEST 2020 shp in QGIS - checkout 4 mismatched precincts

In [20]:
#Do the unmatched precincts contain votes?
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith('G')]][merge_attempt2['Precinct'].isin(['CV', 'MCSWAIN CENTER', '01-07A', '07-07A'])]
#Yes. The number of votes in LEE county's MCSWAIN CENTER suggests it is a voting center and therefore should be distributed

Unnamed: 0,G22IA08FLO,G22IA08THO,G22IA09SAL,G22IA09STR,G22IA10ADA,G22IA10TYS,G22IA11JAC,G22IA11STA,G22SSC03DIE,G22SSC03INM,...,GSU45RPRO,GSU46DMAR,GSU46RDAN,GSU47RHIS,GSU48DCAR,GSU48RMOF,GSU49DMAY,GSU49RAND,GSU50DMCC,GSU50RCOR
2652,182.0,202.0,180.0,207.0,196.0,192.0,196.0,188.0,206.0,185.0,...,0.0,0.0,0.0,0.0,191.0,199.0,0.0,0.0,0.0,0.0
2653,5570.0,3942.0,4187.0,5410.0,3935.0,5575.0,3897.0,5598.0,5461.0,4115.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2654,42.0,166.0,160.0,40.0,153.0,38.0,153.0,40.0,42.0,159.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2655,248.0,644.0,572.0,304.0,621.0,258.0,609.0,265.0,231.0,698.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### State total checks - visually compare with NYT reported totals

In [25]:
#Check US Senate totals against NYT: https://www.nytimes.com/interactive/2022/11/08/us/elections/results-north-carolina-us-senate.html?action=click&pgtype=Article&state=default&module=election-results&context=election_recirc&region=RaceLink
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("G22USS")]].sum()

G22USSDBEA    1784049.0
G22USSGHOH      29934.0
G22USSLBRA      51640.0
G22USSNLEW        137.0
G22USSNOWR       2378.0
G22USSRBUD    1905786.0
dtype: float64

In [189]:
#Check US House totals against NYT: https://www.nytimes.com/interactive/2022/11/08/us/elections/results-north-carolina-us-house-district-10.html (change 10 to district number)
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GCON")]].sum()

GCON01DDAV    134996.0
GCON01RSMI    122780.0
GCON02DROS    190714.0
GCON02RVIL    104155.0
GCON03DGAS     82378.0
GCON03RMUR    166520.0
GCON04DFOU    194983.0
GCON04RGEE     96442.0
GCON05DPAR    102269.0
GCON05RFOX    175279.0
GCON06DMAN    139553.0
GCON06LWAT      2810.0
GCON06RCAS    116635.0
GCON07DGRA    120222.0
GCON07RROU    164047.0
GCON08DHUF     79192.0
GCON08RBIS    183998.0
GCON09DCLA    101202.0
GCON09RHUD    131453.0
GCON10DGEN     73174.0
GCON10NJIM       110.0
GCON10NOWR       242.0
GCON10RMCH    194681.0
GCON11DBEA    144165.0
GCON11LCOA      5515.0
GCON11REDW    174232.0
GCON12DADA    140494.0
GCON12RLEE     83414.0
GCON13DNIC    143090.0
GCON13RHIN    134256.0
GCON14DJAC    148738.0
GCON14RHAR    109014.0
dtype: float64

In [43]:
#Check State Senate against Ballotpedia: https://ballotpedia.org/North_Carolina_State_Senate_District_1
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GSU01")]].sum()
#--- Does not match! -- 61034
# DOES MATCH WIKI

GSU01RSAN    61486.0
dtype: float64

In [44]:
#Check State Senate against Ballotpedia: https://ballotpedia.org/North_Carolina_State_Senate_District_2
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GSU02")]].sum()
#--- Does not match! -- 52889
# DOES MATCH WIKI

GSU02RPER    53067.0
dtype: float64

In [42]:
#Check State Senate against Ballotpedia: https://ballotpedia.org/North_Carolina_State_Senate_District_3 
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GSU03")]].sum()
#--- Does not match! -- 34146, 37822
# DOES MATCH WIKI

GSU03DJOR    34320.0
GSU03RHAN    37984.0
dtype: float64

In [47]:
#Check State Senate against Ballotpedia: https://ballotpedia.org/North_Carolina_State_Senate_District_4
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GSU04")]].sum()
#--- Does not match Ballotopedia! -- 25840, 34744 
# DOES MATCH WIKI
#+ NOTE NAME MESSED UP -- should be FIT for Fitch

GSU04D(TO    28543.0
GSU04RNEW    38638.0
dtype: float64

In [None]:
#^See that percent share of vote is essentially the same even though vote totals do not match.

In [51]:
#Check against Wikipedia and results match: https://en.wikipedia.org/wiki/2022_North_Carolina_Senate_election
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GSU0")]].sum()
#+ NOTE NAME MESSED UP -- should be FIT for Fitch in district 4
# NOTE RESULTS DO NOT MATCH BALLOTPEDIA for any of the districts: https://ballotpedia.org/North_Carolina_State_Senate_District_4
# Ballotpedia got results from NCSBE which does match our results: https://er.ncsbe.gov/?election_dt=11/08/2022&county_id=0&office=NCS&contest=0

GSU01RSAN    61486.0
GSU02RPER    53067.0
GSU03DJOR    34320.0
GSU03RHAN    37984.0
GSU04D(TO    28543.0
GSU04RNEW    38638.0
GSU05DSMI    36557.0
GSU05RKOZ    33432.0
GSU06RLAZ    33339.0
GSU07DMOR    43198.0
GSU07RLEE    44908.0
GSU08RRAB    67693.0
GSU09RJAC    50252.0
dtype: float64

In [52]:
#Check against NCSBE results sum: https://er.ncsbe.gov/?election_dt=11/08/2022&county_id=0&office=NCS&contest=0
merge_attempt2[merge_attempt2.columns[merge_attempt2.columns.str.startswith("GSU1")]].sum()
#NOTE NAME MESSED UP -- should be COH for Cohen, not JR.

GSU10DJR.    27165.0
GSU10RSAW    48083.0
GSU11DSPE    34333.0
GSU11RBAR    41701.0
GSU12DCHA    20914.0
GSU12RBUR    36304.0
GSU13DGRA    50937.0
GSU13LMUN     2769.0
GSU13RBAN    28001.0
GSU14DBLU    45020.0
GSU14LLAS     1875.0
GSU14RBAK    18378.0
GSU15DCHA    52472.0
GSU15LBRO     2463.0
GSU15RPRI    22776.0
GSU16DADC    49204.0
GSU16GTRU     1348.0
GSU16LWAT     1771.0
GSU16RPOW    23161.0
GSU17DBAT    45279.0
GSU17LBOW     1922.0
GSU17RCAV    40167.0
GSU18DBOD    42783.0
GSU18LBRO     2219.0
GSU18RSYK    38296.0
GSU19DAPP    30755.0
GSU19RMER    27601.0
dtype: float64

### Check how checks were done for ERJ for NC 2020: https://github.com/nonpartisan-redistricting-datahub/erj-nc
Spencer highlights sorted vs unsorted results and notes that for sorted counties should use those results instead:

North Carolina produces two sets of election results data. The precinct results are the unaltered results as initially reported by the counties. Many counties report early votes by vote center while provisional and other nonstandard ballots may be reported countywide. The precinct-sorted results are then produced within 30 days after the election. In the precinct-sorted data nearly all votes are assigned to precincts regardless of the manner by which the ballots were cast. However, North Carolina law requires the addition of statistical "noise" to the precinct-sorted data wherever any given vote by any specific voter may otherwise be deduced via cross referencing the various election-related data sets produced by the SBE.

For the 2020 general election 51 counties reported all votes by precinct in their initial precinct results. The precinct-sorted data set was used instead for the counties listed below.

Alleghany, Avery, Beaufort, Bertie*, Bladen, Buncombe, Cabarrus, Caldwell, Camden, Currituck, Dare, Davidson*, Davie, Duplin*, Durham*, Edgecombe, Guilford, Halifax*, Harnett, Haywood, Henderson, Hertford, Hyde, Johnston, Jones, Lee, Lincoln, Macon, Martin, Mecklenburg*, Moore, Nash, New Hanover*, Northampton*, Orange, Pasquotank, Pitt*, Polk, Richmond, Scotland, Stokes*, Surry*, Tyrrell*, Wake, Washington, Watauga, Wayne, Wilkes*, Yadkin

In counties marked by asterisk some votes were still reported by vote center or countywide in the precinct-sorted data. These were distributed by candidate to precincts based on the precinct-level reported vote. The precinct-sorted results were further adjusted to match the certified countywide totals based on the precinct-level vote by candidate.

G20PRERTRU - Donald J. Trump (Republican Party)
G20PREDBID - Joseph R. Biden (Democratic Party)

In [54]:
## PIVOT RESULTS
county_pivot = df.pivot_table(index = ['County'],
                         columns = ['full_col_names'],
                        values = ['Total Votes'],
                         aggfunc = 'sum')


#Clean up the indices
county_pivot.reset_index(inplace = True,drop=False)
county_pivot[('County', 'County')] = county_pivot[('County', '')]
#df_pivot[('Precinct', 'Precinct')] = df_pivot[('Precinct', '')]


#Rename the columns
county_pivot.columns = county_pivot.columns.map(pd.Series([col[1] for col in county_pivot.columns], index = [col for col in county_pivot.columns]).to_dict())
county_pivot = county_pivot.fillna(0)


#county_pivot["UNIQUE_ID"] = county_pivot["County"] + "---" + county_pivot["Precinct"]

In [62]:
pd.options.display.max_rows = 100
county_pivot[['G22USSRBUD','G22USSDBEA','G22USSLBRA','G22USSGHOH','G22USSNOWR','G22USSNLEW']]

Unnamed: 0,G22USSRBUD,G22USSDBEA,G22USSLBRA,G22USSGHOH,G22USSNOWR,G22USSNLEW
0,32866.0,25866.0,837.0,412.0,32.0,0.0
1,11833.0,3031.0,243.0,140.0,18.0,0.0
2,3648.0,1249.0,85.0,37.0,2.0,0.0
3,3711.0,3324.0,87.0,35.0,6.0,0.0
4,8371.0,3172.0,187.0,80.0,7.0,0.0
5,5089.0,1595.0,92.0,54.0,4.0,0.0
6,12338.0,6245.0,283.0,103.0,5.0,2.0
7,2781.0,3492.0,59.0,27.0,1.0,0.0
8,6745.0,4799.0,149.0,83.0,7.0,0.0
9,44911.0,26685.0,920.0,522.0,54.0,1.0


# Process state-sorted election results

In [244]:
#cut out over/under votes
sorted_prec = sorted_prec[~sorted_prec['result_type_desc'].str.contains("ER VOTES")]
sorted_prec['result_type_desc'].value_counts()

<NORMAL>    306787
WRITE-IN     22274
Name: result_type_desc, dtype: int64

## Grab info for column dictionaries

In [28]:
#Set party col
potential_party = sorted_prec['candidate_party_lbl']
party_dict = {'DEM':'D','LIB':'L','REP':'R','UNA':'U','GRE':'G', "na":"N"}
sorted_prec["col_party"] = sorted_prec.loc[sorted_prec['candidate_party_lbl'].isin(party_dict.keys()), 'candidate_party_lbl'].map(party_dict)
sorted_prec.loc[sorted_prec["col_party"].isna(), 'col_party'] = "N"


#Set last name abrv - will need to edit 
sorted_prec["col_last_name"] = "na"
#General cases
sorted_prec.loc[sorted_prec["candidate_name"].str.contains(". "), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[2].str.slice(stop=3).str.upper()
#Correcting for unique instances
sorted_prec.loc[(sorted_prec["candidate_name"]=='Ted Davis, Jr.')|(sorted_prec["candidate_name"]=='Gettys Cohen, Jr.')|(sorted_prec["candidate_name"]=='Paul Lowe, Jr.')|
                (sorted_prec["candidate_name"]=='Howard Penny, Jr.'), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
sorted_prec.loc[(sorted_prec["candidate_name"]=='Philip E. (Phil) Berger')|(sorted_prec["candidate_name"]=='Mary Price (Pricey) Harrison')|(sorted_prec["candidate_name"]=='Susan Lee (Susie) Scott')|
                (sorted_prec["candidate_name"]=='Milton F. (Toby) Fitch'), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[3].str.slice(stop=3).str.upper().unique()
sorted_prec.loc[sorted_prec["candidate_name"]=="Michael Greer O'Shea", "col_last_name"] = "OSH"
sorted_prec.loc[sorted_prec["col_last_name"].isna(), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
sorted_prec.loc[sorted_prec["candidate_name"] == "Write-In (Miscellaneous)", "col_last_name"] = "OWR"


#Set contest
general_office_dict = {"US SENATE":"USS", "US HOUSE": "CON", "STATE SENATE":"SU", "NC HOUSE OF REPRESENTATIVES": "SL", 
                       "NC SUPREME COURT": "SSC", "NC COURT OF APPEALS JUDGE":"IA"}
sorted_prec["col_office"]='na'
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("US SENATE")), "col_office"] = "US SENATE"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("US HOUSE")), "col_office"] = "US HOUSE"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("STATE SENATE")), "col_office"] = "STATE SENATE"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("NC HOUSE OF REPRESENTATIVES")), "col_office"] = "NC HOUSE OF REPRESENTATIVES"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("NC SUPREME COURT")), "col_office"] = "NC SUPREME COURT"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("NC COURT OF APPEALS JUDGE")), "col_office"] = "NC COURT OF APPEALS JUDGE"
sorted_prec["office_abr"] = sorted_prec["col_office"].map(general_office_dict)


#Set districts
#Get CONG DIST
sorted_prec["col_cong_dist"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("US HOUSE"), "col_cong_dist"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET state sen dist
sorted_prec["col_su_dist"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("STATE SENATE"), "col_su_dist"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET state house dist
sorted_prec["col_sl_dist"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC HOUSE OF REPRESENTATIVES"), "col_sl_dist"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET ssc seat
sorted_prec["col_ssc_seat"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC SUPREME COURT"), "col_ssc_seat"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET court of appeals dist
sorted_prec["col_ia_seat"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC COURT OF APPEALS JUDGE"), "col_ia_seat"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]


#Create column names
sorted_prec["full_col_names"] = "na"
#cong
sorted_prec.loc[sorted_prec["contest_title"].str.contains("US HOUSE"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec['col_cong_dist'] + sorted_prec['col_party'] + sorted_prec['col_last_name']
#us sen
sorted_prec.loc[sorted_prec["contest_title"].str.contains("US SENATE"), "full_col_names"] = "G22" + sorted_prec["office_abr"] + sorted_prec['col_party'] + sorted_prec['col_last_name']
#state sen
sorted_prec.loc[sorted_prec["contest_title"].str.contains("STATE SENATE"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec['col_su_dist'].str.zfill(2) + sorted_prec['col_party'] + sorted_prec['col_last_name']
#state house
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC HOUSE OF REPRESENTATIVES"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec['col_sl_dist'].str.zfill(3) + sorted_prec['col_party'] + sorted_prec['col_last_name']
#state ssc
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC SUPREME COURT"), "full_col_names"] = "G22" + sorted_prec["office_abr"] + sorted_prec["col_ssc_seat"].str.zfill(2) + sorted_prec['col_last_name']
#IA court
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC COURT OF APPEALS JUDGE"), "full_col_names"] = "G22" + sorted_prec["office_abr"] + sorted_prec["col_ia_seat"].str.zfill(2) + sorted_prec['col_last_name']


#filter
sorted_prec = sorted_prec[~sorted_prec["office_abr"].isna()]
#Set column key dict
sorted_df_column_name_dict = pd.Series((sorted_prec["candidate_name"]+", "+sorted_prec["contest_title"]).values, index=sorted_prec["full_col_names"]).to_dict()
#set countyfp
sorted_prec["COUNTYFP"] = 2*sorted_prec["county_id"]-1

## Pivot and filter for sorted counties

In [145]:
## PIVOT RESULTS
sorted_prec_pivot = sorted_prec.pivot_table(index = ['county','COUNTYFP','precinct_name'],
                         columns = ['full_col_names'],
                        values = ['vote_ct'],
                         aggfunc = 'sum')


#Clean up the indices
sorted_prec_pivot.reset_index(inplace = True,drop=False)
sorted_prec_pivot[('county', 'county')] = sorted_prec_pivot[('county', '')]
sorted_prec_pivot[('COUNTYFP', 'COUNTYFP')] = sorted_prec_pivot[('COUNTYFP', '')]
sorted_prec_pivot[('precinct_name', 'precinct_name')] = sorted_prec_pivot[('precinct_name', '')]


#Rename the columns
sorted_prec_pivot.columns = sorted_prec_pivot.columns.map(pd.Series([col[1] for col in sorted_prec_pivot.columns], index = [col for col in sorted_prec_pivot.columns]).to_dict())
sorted_prec_pivot = sorted_prec_pivot.fillna(0)


#separate out counties sorted by the state
sorted_counties = set(in_sos["County"].values)
df_sorted = sorted_prec_pivot[sorted_prec_pivot['county'].isin(sorted_counties)]
gdf_sorted = gdf_reform[gdf_reform['county_nam'].isin(sorted_counties)]

df_sorted["precinct_name"] = df_sorted["precinct_name"].str.strip() 
df_sorted["county"] = df_sorted["county"].str.strip()
df_sorted["UNIQUE_ID"] = df_sorted["county"].astype(str) + "---" + df_sorted["precinct_name"].astype(str)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["precinct_name"] = df_sorted["precinct_name"].str.strip()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["county"] = df_sorted["county"].str.strip()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["UNIQUE_ID"] = df_sorted["county"].astype(str) + "---" + df_sorted["preci

## Joining sorted county df and gdf precinct

In [122]:
#Testing which precinct identifier column better matches
enr_merge = pd.merge(gdf_sorted, df_sorted.fillna(value=0), left_on=['county_nam','enr_desc'], right_on=['county','precinct_name'], how='outer', indicator=True)
prec_id_merge = pd.merge(gdf_sorted, df_sorted.fillna(value=0), left_on=['county_nam','prec_id'], right_on=['county','precinct_name'], how='outer', indicator=True)
#enr_merge[["county_nam", "enr_desc","prec_id", "county","precinct_name","_merge"]][enr_merge['_merge']!="both"].to_csv("./enr_merge.csv")
#prec_id_merge[["county_nam", "enr_desc","prec_id", "county","precinct_name","_merge"]][prec_id_merge['_merge']!="both"].to_csv("./prec_id_merge.csv")


#Correct precinct names to make them match between sorted df for isolated counties and gdf for isolated counties
gdf_sorted["prec"] = gdf_sorted["enr_desc"].str.upper()
gdf_sorted.loc[(gdf_sorted["county_nam"]=="BUNCOMBE")|(gdf_sorted["county_nam"]=="DURHAM"),"prec"] = gdf_sorted["prec_id"]
gdf_sorted.loc[(gdf_sorted["county_nam"]=="NORTHAMPTON")&(gdf_sorted["enr_desc"]=="GARYSBURG/PLEASA_PLEASANT HILL"),"prec"] = "GARYSBURG/PLEASANT HILL"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="ORANGE")&(gdf_sorted["enr_desc"]=="HillsboroughEast"),"prec"] = "HILLSBOROUGH EAST"
gdf_sorted.loc[gdf_sorted["county_nam"]=="MECKLENBURG", "prec"] = "PCT "+gdf_sorted["enr_desc"].str.zfill(3)
gdf_sorted.loc[(gdf_sorted["county_nam"]=="MECKLENBURG")&(gdf_sorted["enr_desc"].str.endswith(".1")), "prec"] = "PCT "+gdf_sorted["enr_desc"].str.slice(stop=-2)
gdf_sorted.loc[(gdf_sorted["county_nam"]=="AVERY")&(gdf_sorted["prec_id"]=="14"), "prec"] = "NEWLAND 1"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="AVERY")&(gdf_sorted["prec_id"]=="15"), "prec"] = "NEWLAND 2"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="CASWELL")&(gdf_sorted["enr_desc"]=="YANCEYVILLE 2"), "prec"] = "YANCEYVILLE"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="PITT")&(gdf_sorted["enr_desc"]=="GREENVILE  13A"), "prec"] = "GREENVILLE 13A"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="JOHNSTON")&(gdf_sorted["prec_id"]=="PR25"), "prec"] = "WEST SELMA"
gdf_sorted.loc[gdf_sorted["county_nam"]=="WAKE", "prec"] = "PRECINCT "+gdf_sorted["enr_desc"].str.zfill(4)
gdf_sorted.loc[gdf_sorted["county_nam"]=="LEE", "prec"] = "PRECINCT "+gdf_sorted["enr_desc"]
df_sorted["prec"] = df_sorted["precinct_name"].str.upper()
df_sorted["prec"] = df_sorted["precinct_name"].str.replace("#","")
df_sorted["prec"] = df_sorted["prec"].astype(str).str.replace("  ", " ")
gdf_sorted["prec"] = gdf_sorted["prec"].astype(str).str.replace("  ", " ")


#Re-merge for max match
enr_prec_merge = pd.merge(gdf_sorted, df_sorted.fillna(value=0), left_on=['county_nam','prec'], right_on=['county','prec'], how='outer', indicator=True)
enr_prec_merge[["county_nam", "enr_desc","prec_id", "county","precinct_name","prec","_merge"]][enr_prec_merge["_merge"]!="both"].to_csv("./enr_prec_merge.csv")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
A value is trying to be set on a copy of a slice from a DataFr

# Check precinct counts

In [15]:
sos_unsorted = df_pivot.copy()
sos_unsorted['unsprec_counter'] = 1
sos_unsorted_county = sos_unsorted.groupby(["County"]).sum().reset_index()
sos_sorted = df_sorted.copy()
sos_sorted['sprec_counter'] = 1
sos_sorted_county = sos_sorted.groupby(["County"]).sum().reset_index()
prec_count_compare = pd.merge(sos_unsorted, sos_sorted, on="County")
prec_count_compare[prec_count_compare['unsprec_counter']!=prec_count_compare['sprec_counter']]

In [18]:
sos_county[[ "unsprec_counter"]]

Unnamed: 0_level_0,prec_counter
County,Unnamed: 1_level_1
ALAMANCE,38
ALEXANDER,10
ALLEGHANY,7
ANSON,9
ASHE,17
...,...
WAYNE,28
WILKES,33
WILSON,24
YADKIN,16


# Combine unsorted and sorted results

check unsorted counties with those from sorted files and see if they match - hurdle would be that I did not match the precinct names in the sorted df for counties not in the necessary counties

In [166]:
df_unsorted.shape[0]+df_sorted.shape[0]

2655

In [None]:
cnts = set(df_pivot['County'].unique())

In [177]:
ucnts = cnts.difference(sorted_counties)

In [185]:
#Isolate counties from unsorted df that do not need sorting
df_unsorted = df_pivot[~df_pivot['County'].isin(sorted_counties)]
df_sorted["County"] = df_sorted["county"]
#common columns are elction columns and UNIQUE_ID

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["County"] = df_sorted["county"]


In [218]:
df_sorted_no_dup_columns = df_unsorted[set(df_unsorted.columns)]

In [217]:
df_unsorted = df_unsorted[set(df_unsorted.columns)]
df_sorted = df_sorted[set(df_sorted.columns)]
#How is df[set(df.columns)] different from just df???

dff = pd.concat([df_unsorted, df_sorted], ignore_index=True)#, keys=['County', 'UNIQUE_ID'])
dff.shape

(2655, 567)

In [223]:
dff['UNIQUE_ID'].nunique()

2655

In [228]:
dff['County'].nunique()

100

In [230]:
dff["County"].isna().any()

False

In [226]:
df_sorted[set(df_sorted.columns)]

Unnamed: 0,Unnamed: 1,Unnamed: 2,GSU14RBAK,GSL099DMAJ,G22SSC05ERV,GSL048DPIE,GSU41RLEO,G22IA11STA,GSL113RJOH,GSL107RCOO,...,GSL094DHUB,GCON07NROU,GSU22LUBI,GSU22NVOT,GSL090NVOT,GSL021DLIU,GCON10NJIM,GSU33NVOT,GSL019RMIL,GSL043NVOT
48,ALLEGHANY,CHERRY LANE ...,0.0,0.0,220.0,0.0,0.0,632.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
49,ALLEGHANY,GAP CIVIL ...,0.0,0.0,643.0,0.0,0.0,1525.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50,ALLEGHANY,GLADE CREEK ...,0.0,0.0,203.0,0.0,0.0,718.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
51,ALLEGHANY,PRATHERS CREEK ...,0.0,0.0,266.0,0.0,0.0,771.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
78,AVERY,ALTAMONT ...,0.0,0.0,136.0,0.0,0.0,416.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2747,YADKIN,NORTH LIBERTY ...,0.0,0.0,393.0,0.0,0.0,1492.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2748,YADKIN,SOUTH BUCK SHOALS ...,0.0,0.0,82.0,0.0,0.0,380.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2749,YADKIN,SOUTH FALL CREEK ...,0.0,0.0,155.0,0.0,0.0,860.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2750,YADKIN,SOUTH KNOBS ...,0.0,0.0,65.0,0.0,0.0,631.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [278]:
df.columns[~df.columns.str.startswith("G")]

Index(['County', 'Election Date', 'Precinct', 'Contest Group ID',
       'Contest Type', 'Contest Name', 'Choice', 'Choice Party', 'Vote For',
       'Election Day', 'One Stop', 'Absentee by Mail', 'Provisional',
       'Total Votes', 'Real Precinct', 'Unnamed: 15', 'col_party',
       'col_last_name', 'col_office', 'office_abr', 'col_cong_dist',
       'col_su_dist', 'col_sl_dist', 'col_ssc_seat', 'col_ia_seat',
       'full_col_names'],
      dtype='object')

In [None]:
dff['PRECINCT'] = dff['PREC']
dff['COUNTYNM'] =  dff['county']
#Reorder
election_cols = dff.columns[dff.columns.str.startswith("G")].sort_values()
dff[election_cols] = dff[election_cols].astype(int)
#Sort alphabetically
dff = dff[['UNIQUE_ID','COUNTYFP', 'COUNTYNM','PRECINCT']+list(election_cols)]
dff

In [None]:
#Make floats int
final[list(set(final.columns)-{'UNIQUE_ID','PRECINCT','cty_file_number','COUNTYFP','COUNTYNM'})]=final[list(set(final.columns)-{'UNIQUE_ID','PRECINCT','cty_file_number','COUNTYFP','COUNTYNM'})].astype(int)
#Reorder
election_cols = final.columns[final.columns.str.startswith("G")].sort_values()
#Sort alphabetically
final = final[['UNIQUE_ID','COUNTYFP', 'COUNTYNM','PRECINCT']+list(election_cols)]
final