From [VEST 2020](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/K7760H) documentation:

"Election results and precinct shapefile from the NC State Board of Elections (https://www.ncsbe.gov/results-data)

Buncombe 681, Henderson CV, Wake 01-07A, 07-07A are missing from the 20201018 shapefile. They are added from the 20190827 shapefile.

North Carolina produces two sets of election results data. The precinct results are the unaltered results as initially reported by the counties. Many counties report early votes by vote center while provisional and other nonstandard ballots may be reported countywide. The precinct-sorted results are then produced within 30 days after the election. In the precinct-sorted data nearly all votes are assigned to precincts regardless of the manner by which the ballots were cast. However, North Carolina law requires the addition of statistical "noise" to the precinct-sorted data wherever any given vote by any specific voter may otherwise be deduced via cross referencing the various election-related data sets produced by the SBE.

For the 2020 general election 51 counties reported all votes by precinct in their initial precinct results. The precinct-sorted data set was used instead for the counties listed below.

Alleghany, Avery, Beaufort, Bertie*, Bladen, Buncombe, Cabarrus, Caldwell, Camden, Currituck, Dare, Davidson*, Davie, Duplin*, Durham*, Edgecombe, Guilford, Halifax*, Harnett, Haywood, Henderson, Hertford, Hyde, Johnston, Jones, Lee, Lincoln, Macon, Martin, Mecklenburg*, Moore, Nash, New Hanover*, Northampton*, Orange, Pasquotank, Pitt*, Polk, Richmond, Scotland, Stokes*, Surry*, Tyrrell*, Wake, Washington, Watauga, Wayne, Wilkes*, Yadkin

In counties marked by asterisk some votes were still reported by vote center or countywide in the precinct-sorted data. These were distributed by candidate to precincts based on the precinct-level reported vote. The precinct-sorted results were further adjusted to match the certified countywide totals based on the precinct-level vote by candidate."

Note that the RDH checked which counties contained key words that would indicate sorting was required and found that the 2022 list of counties that needed sorting was down to 46 from 51. The list mostly overlaps with the above with a few added and removed. For more information see code below. 

**2022 RDH Processing:** 

Absentee and voting center votes were allocated proportionally to precincts, by share of precinct-reported vote.

The precinct shapefile available [here](https://www.nconemap.gov/datasets/voting-precincts/explore?location=35.097107%2C-79.888900%2C7.41) was last updated in March of 2023 and therefore has precinct names missing and that do not match the November 2022 election results. After reaching out to the NCSBE, we received the following response which led to all but 5 precinct names matching between the two files:

*I’m not sure which file that site is displaying/making available for download. But I would suggest using this one: https://s3.amazonaws.com/dl.ncsbe.gov/PrecinctMaps/SBE_PRECINCTS_20220831.zip, as it is the data closest to the Nov 2022 election while also being before the election.*

*We provide shapefiles on our ftp site, which is linked to on our Voting Maps/Redistricting page: https://www.ncsbe.gov/results-data/voting-maps-redistricting*

# Load packages and data

In [1]:
import pandas as pd
import geopandas as gp
import os
from pdv_functions import *
pd.options.display.max_columns = 100
'''
Sources:
precinct shp: https://s3.amazonaws.com/dl.ncsbe.gov/PrecinctMaps/SBE_PRECINCTS_20220831.zip
precinct election results: https://www.ncsbe.gov/results-data/election-results/historical-election-results-data
'''

'\nSources:\nprecinct shp: https://s3.amazonaws.com/dl.ncsbe.gov/PrecinctMaps/SBE_PRECINCTS_20220831.zip\nprecinct election results: https://www.ncsbe.gov/results-data/election-results/historical-election-results-data\n'

In [2]:
#gdf = gp.read_file("./raw-from-source/Voting_Precincts/Voting_Precincts.shp")
gdf = gp.read_file("./raw-from-source/SBE_PRECINCTS_20220831/SBE_PRECINCTS_20220831.shp")
df = pd.read_table("./raw-from-source/results_pct_20221108.txt", sep = "\t")
sorted_prec = pd.read_csv("./raw-from-source/sorted_precincts/AllCounties.txt", sep="\t")

print("# prec ids in gdf not in df: ",len(set((gdf.county_nam.str.upper()+gdf.prec_id.str.upper()))-set(df.County.str.upper()+df.Precinct.str.upper())))
print("# prec ids in df not in gdf: ", len(set(df.County.str.upper()+df.Precinct.str.upper())-set(gdf.county_nam.str.upper()+gdf.prec_id.str.upper())))
print("shape df: ", (df.County.str.upper()+df.Precinct.str.upper()).nunique(), "\nshape gdf: ", (gdf.county_nam.str.upper()+gdf.prec_id.str.upper()).nunique())

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


# prec ids in gdf not in df:  0
# prec ids in df not in gdf:  325
shape df:  2977 
shape gdf:  2652


# Process unsorted election results

From Ballotpedia:
- To include: US Senate, US House, State Senate, State House, State Supreme Court 
    - 'US SENATE'
    - 'US HOUSE OF REPRESENTATIVES DISTRICT XX'
    - 'NC STATE SENATE DISTRICT XX'
    - 'NC HOUSE OF REPRESENTATIVES DISTRICT XXX'
    - 'NC SUPREME COURT ASSOCIATE JUSTICE SEAT XX'
    
- Not sure: Intermediate Appelate Courts
    - 'NC COURT OF APPEALS JUDGE SEAT XX'
- Not to include: School boards, Municipal government, local ballot measures

## Grab info for column dictionaries

In [3]:
#Set party col
potential_party = df['Choice Party']
party_dict = {'DEM':'D','LIB':'L','REP':'R','UNA':'U','GRE':'G', "na":"O"}
df["col_party"] = df.loc[df['Choice Party'].isin(party_dict.keys()), "Choice Party"].map(party_dict)
df.loc[df["col_party"].isna(), 'col_party'] = "O"


#Set last name abrv - will need to edit 
df["col_last_name"] = "na"
df.loc[df["Choice"].str.contains(". "), "col_last_name"] = df["Choice"].str.split(pat=" ").str[2].str.slice(stop=3).str.upper()

#Correcting for unique instances
df.loc[(df["Choice"]=='Ted Davis, Jr.')|(df["Choice"]=='Gettys Cohen, Jr.')|(df["Choice"]=='Paul Lowe, Jr.')|(df["Choice"]=='Howard Penny, Jr.'), "col_last_name"] = df["Choice"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
df.loc[(df["Choice"]=='Philip E. (Phil) Berger')|(df["Choice"]=='Mary Price (Pricey) Harrison')|(df["Choice"]=='Susan Lee (Susie) Scott')|(df["Choice"]=='Milton F. (Toby) Fitch')|(df["Choice"]=='Ives Brizuela de Sholar'), "col_last_name"] = df["Choice"].str.split(pat=" ").str[3].str.slice(stop=3).str.upper()
df.loc[df["Choice"]=="Michael Greer O'Shea", "col_last_name"] = "OSH"
df.loc[df["col_last_name"].isna(), "col_last_name"] = df["Choice"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
df.loc[(df["Choice"] == "Write-In (Miscellaneous)")|(df["Choice"].str.contains("(Write-In)")), "col_last_name"] = "WRI"


#Set contest
general_office_dict = {"US SENATE":"USS", "US HOUSE": "CON", "STATE SENATE":"SU", "NC HOUSE OF REPRESENTATIVES": "SL", 
                       "NC SUPREME COURT": "SSC", "NC COURT OF APPEALS JUDGE":"IA"}
df["col_office"]='na'
df.loc[(df["Contest Name"].str.contains("US SENATE")), "col_office"] = "US SENATE"
df.loc[(df["Contest Name"].str.contains("US HOUSE")), "col_office"] = "US HOUSE"
df.loc[(df["Contest Name"].str.contains("STATE SENATE")), "col_office"] = "STATE SENATE"
df.loc[(df["Contest Name"].str.contains("NC HOUSE OF REPRESENTATIVES")), "col_office"] = "NC HOUSE OF REPRESENTATIVES"
df.loc[(df["Contest Name"].str.contains("NC SUPREME COURT")), "col_office"] = "NC SUPREME COURT"
df.loc[(df["Contest Name"].str.contains("NC COURT OF APPEALS JUDGE")), "col_office"] = "NC COURT OF APPEALS JUDGE"
df["office_abr"] = df["col_office"].map(general_office_dict)


#Set districts
#Get CONG DIST
df["col_cong_dist"] = "na"
df.loc[df["Contest Name"].str.contains("US HOUSE"), "col_cong_dist"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET state sen dist
df["col_su_dist"] = "na"
df.loc[df["Contest Name"].str.contains("STATE SENATE"), "col_su_dist"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET state house dist
df["col_sl_dist"] = "na"
df.loc[df["Contest Name"].str.contains("NC HOUSE OF REPRESENTATIVES"), "col_sl_dist"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET ssc seat
df["col_ssc_seat"] = "na"
df.loc[df["Contest Name"].str.contains("NC SUPREME COURT"), "col_ssc_seat"] = df["Contest Name"].str.split(pat=" ").str[-1]
#GET court of appeals dist
df["col_ia_seat"] = "na"
df.loc[df["Contest Name"].str.contains("NC COURT OF APPEALS JUDGE"), "col_ia_seat"] = df["Contest Name"].str.split(pat=" ").str[-1]


#Create column names
df["full_col_names"] = "na"
#cong
df.loc[df["Contest Name"].str.contains("US HOUSE"), "full_col_names"] = "G" + df["office_abr"] + df['col_cong_dist'] + df['col_party'] + df['col_last_name']
#us sen
df.loc[df["Contest Name"].str.contains("US SENATE"), "full_col_names"] = "G22" + df["office_abr"] + df['col_party'] + df['col_last_name']
#state sen
df.loc[df["Contest Name"].str.contains("STATE SENATE"), "full_col_names"] = "G" + df["office_abr"] + df['col_su_dist'].str.zfill(2) + df['col_party'] + df['col_last_name']
#state house
df.loc[df["Contest Name"].str.contains("NC HOUSE OF REPRESENTATIVES"), "full_col_names"] = "G" + df["office_abr"] + df['col_sl_dist'].str.zfill(3) + df['col_party'] + df['col_last_name']
#state ssc
df.loc[df["Contest Name"].str.contains("NC SUPREME COURT"), "full_col_names"] = "G" + df["office_abr"] + df["col_ssc_seat"].str.zfill(2) + df['col_party'] + df['col_last_name']
#IA court
df.loc[df["Contest Name"].str.contains("NC COURT OF APPEALS JUDGE"), "full_col_names"] = "G" + df["office_abr"] + df["col_ia_seat"].str.zfill(2) + df['col_party'] + df['col_last_name']


#filter
df = df[~df["office_abr"].isna()]
#Make dict
unsorted_df_column_name_dict = pd.Series((df["Choice"]+", "+df["Contest Name"]).values, index=df["full_col_names"]).to_dict()

  return func(self, *args, **kwargs)


## Pivot

In [4]:
## PIVOT RESULTS
df_pivot = df.pivot_table(index = ['County','Precinct'],
                         columns = ['full_col_names'],
                        values = ['Total Votes'],
                         aggfunc = 'sum')


#Clean up the indices
df_pivot.reset_index(inplace = True,drop=False)
df_pivot[('County', 'County')] = df_pivot[('County', '')]
df_pivot[('Precinct', 'Precinct')] = df_pivot[('Precinct', '')]


#Rename the columns
df_pivot.columns = df_pivot.columns.map(pd.Series([col[1] for col in df_pivot.columns], index = [col for col in df_pivot.columns]).to_dict())
df_pivot = df_pivot.fillna(0)


df_pivot["UNIQUE_ID"] = df_pivot["County"] + "---" + df_pivot["Precinct"]

## Separate out counties to use sorted results

In [5]:
searchfor = ['ABS', 'PROVISIONAL','PROVISIOINAL','PROVI ', 'PROV',
             'ONE STOP','ONESTOP','OS ','OS-',' OS','OSAP','OSCA',
             'OSCH','OSKD','OSLL','OSLOB','OSNR','OSOP','OSTA','OSWA',
             'OSDU','-OS','OSAV','OSBOE','OSGR','OSHS','OSJB','OSSE','OSWD',
             'OSCS','OSHT','MAOS','DBOS',
             'CURBSIDE','TRANS','LEE COUNTY BOE', 'MCSWAIN CENTER' 
            ]
in_sos =  df_pivot[df_pivot["Precinct"].str.contains('|'.join(searchfor))]
in_sos = in_sos.groupby(by=["County"]).sum().reset_index()
in_sos

Unnamed: 0,County,G22USSDBEA,G22USSGHOH,G22USSLBRA,G22USSOWRI,G22USSRBUD,GCON01DDAV,GCON01RSMI,GCON02DROS,GCON02RVIL,GCON03DGAS,GCON03RMUR,GCON04DFOU,GCON04RGEE,GCON05DPAR,GCON05RFOX,GCON06DMAN,GCON06LWAT,GCON06RCAS,GCON07DGRA,GCON07RROU,GCON08DHUF,GCON08RBIS,GCON09DCLA,GCON09RHUD,GCON10DGEN,GCON10OWRI,GCON10RMCH,GCON11DBEA,GCON11LCOA,GCON11REDW,GCON12DADA,GCON12RLEE,GCON13DNIC,GCON13RHIN,GCON14DJAC,GCON14RHAR,GIA08DTHO,GIA08RFLO,GIA09DSAL,GIA09RSTR,GIA10DADA,GIA10RTYS,GIA11DJAC,GIA11RSTA,GSL001RGOO,GSL002DJEF,GSL002LBEL,GSL002RYAR,GSL003RTYS,...,GSU22DWOO,GSU22LUBI,GSU22RCOL,GSU23DMEY,GSU23RWOO,GSU24DGIB,GSU24RBRI,GSU25DEWI,GSU25RGAL,GSU26OWRI,GSU26RBER,GSU27DGAR,GSU27RSES,GSU28DROB,GSU28RSCH,GSU29DCRU,GSU29RCRA,GSU30DJOH,GSU30RJAR,GSU31RKRA,GSU32DLOW,GSU32RWAR,GSU33DHOR,GSU33RFOR,GSU34DSAN,GSU34RNEW,GSU35RJOH,GSU36RSET,GSU37RSAW,GSU38DMOH,GSU39DSAL,GSU39RROB,GSU40DWAD,GSU40RSHI,GSU41DMAR,GSU41RLEO,GSU42DHUN,GSU42RRUS,GSU43ROVE,GSU44RALE,GSU45RPRO,GSU46DMAR,GSU46RDAN,GSU47RHIS,GSU48DCAR,GSU48RMOF,GSU49DMAY,GSU49RAND,GSU50DMCC,GSU50RCOR
0,ALLEGHANY,963.0,20.0,48.0,2.0,2211.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,920.0,2328.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,967.0,2199.0,889.0,2283.0,949.0,2206.0,936.0,2204.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2413.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AVERY,777.0,12.0,32.0,1.0,1500.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,755.0,1559.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,759.0,1533.0,731.0,1563.0,756.0,1531.0,750.0,1536.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1663.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BEAUFORT,3897.0,55.0,134.0,3.0,6038.0,0.0,0.0,0.0,0.0,3768.0,6330.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3829.0,6158.0,3656.0,6362.0,3793.0,6203.0,3779.0,6183.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BERTIE,2137.0,15.0,35.0,1.0,1257.0,2216.0,1218.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2073.0,1267.0,2101.0,1292.0,2106.0,1275.0,2099.0,1275.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BLADEN,3248.0,48.0,79.0,2.0,3476.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3207.0,3621.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3218.0,3508.0,3148.0,3577.0,3156.0,3547.0,3143.0,3493.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,BUNCOMBE,46494.0,596.0,692.0,28.0,22347.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,46492.0,810.0,22753.0,0.0,0.0,0.0,0.0,0.0,0.0,46789.0,22790.0,44971.0,24560.0,46556.0,22969.0,46563.0,22899.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12271.0,7699.0,0.0,0.0,0.0,34541.0,15146.0,0.0,0.0
6,CABARRUS,18362.0,241.0,426.0,15.0,17471.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2538.0,4025.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15738.0,14018.0,0.0,0.0,0.0,0.0,18115.0,18023.0,17478.0,18712.0,18096.0,18044.0,17914.0,18162.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,17528.0,17632.0,734.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,CALDWELL,4474.0,103.0,204.0,11.0,12601.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3558.0,10681.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,764.0,7.0,2300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4428.0,12803.0,4282.0,12975.0,4413.0,12818.0,4425.0,12784.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10301.0,0.0,0.0,3339.0,0.0,0.0,0.0,0.0,0.0,0.0
8,CAMDEN,631.0,15.0,45.0,4.0,1549.0,0.0,0.0,0.0,0.0,632.0,1596.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,654.0,1542.0,609.0,1597.0,646.0,1557.0,627.0,1566.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,CASWELL,169.0,1.0,5.0,0.0,305.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,176.0,3.0,303.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,171.0,308.0,161.0,319.0,165.0,313.0,171.0,310.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,164.0,321.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# Process state-sorted election results

In [6]:
#cut out over/under votes
sorted_prec = sorted_prec[~sorted_prec['result_type_desc'].str.contains("ER VOTES")]
sorted_prec['result_type_desc'].value_counts()

<NORMAL>    622877
WRITE-IN    158201
Name: result_type_desc, dtype: int64

## Grab info for column dictionaries

In [7]:
#Set party col
potential_party = sorted_prec['candidate_party_lbl']
party_dict = {'DEM':'D','LIB':'L','REP':'R','GRE':'G', "NON":"O"}
sorted_prec["col_party"] = sorted_prec.loc[sorted_prec['candidate_party_lbl'].str.upper().isin(party_dict.keys()), 'candidate_party_lbl'].str.upper().map(party_dict)
sorted_prec.loc[sorted_prec["col_party"].isna(), 'col_party'] = "O"


#Set last name abrv - will need to edit 
sorted_prec["col_last_name"] = "na"
#General cases
sorted_prec.loc[sorted_prec["candidate_name"].str.contains(". "), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[2].str.slice(stop=3).str.upper()
#Correcting for unique instances
sorted_prec.loc[(sorted_prec["candidate_name"]=='Ted Davis, Jr.')|(sorted_prec["candidate_name"]=='Gettys Cohen, Jr.')|(sorted_prec["candidate_name"]=='Paul Lowe, Jr.')|(sorted_prec["candidate_name"]=='Howard Penny, Jr.'), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
sorted_prec.loc[(sorted_prec["candidate_name"]=='Philip E. (Phil) Berger')|(sorted_prec["candidate_name"]=='Mary Price (Pricey) Harrison')|(sorted_prec["candidate_name"]=='Susan Lee (Susie) Scott')|(sorted_prec["candidate_name"]=='Milton F. (Toby) Fitch')|(sorted_prec["candidate_name"]=='Ives Brizuela de Sholar'), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[3].str.slice(stop=3).str.upper()
sorted_prec.loc[sorted_prec["candidate_name"]=="Michael Greer O'Shea", "col_last_name"] = "OSH"
sorted_prec.loc[sorted_prec["col_last_name"].isna(), "col_last_name"] = sorted_prec["candidate_name"].str.split(pat=" ").str[1].str.slice(stop=3).str.upper()
sorted_prec.loc[sorted_prec["candidate_name"] == "Write-In (Miscellaneous)", "col_last_name"] = "WRI"
sorted_prec.loc[sorted_prec["result_type_desc"] == "WRITE-IN", "col_last_name"] = "WRI"


#Set contest
general_office_dict = {"US SENATE":"USS", "US HOUSE": "CON", "STATE SENATE":"SU", "NC HOUSE OF REPRESENTATIVES": "SL", 
                       "NC SUPREME COURT": "SSC", "NC COURT OF APPEALS JUDGE":"IA"}
sorted_prec["col_office"]='na'
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("US SENATE")), "col_office"] = "US SENATE"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("US HOUSE")), "col_office"] = "US HOUSE"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("STATE SENATE")), "col_office"] = "STATE SENATE"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("NC HOUSE OF REPRESENTATIVES")), "col_office"] = "NC HOUSE OF REPRESENTATIVES"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("NC SUPREME COURT")), "col_office"] = "NC SUPREME COURT"
sorted_prec.loc[(sorted_prec["contest_title"].str.contains("NC COURT OF APPEALS JUDGE")), "col_office"] = "NC COURT OF APPEALS JUDGE"
sorted_prec["office_abr"] = sorted_prec["col_office"].map(general_office_dict)


#Set districts
#Get CONG DIST
sorted_prec["col_cong_dist"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("US HOUSE"), "col_cong_dist"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET state sen dist
sorted_prec["col_su_dist"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("STATE SENATE"), "col_su_dist"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET state house dist
sorted_prec["col_sl_dist"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC HOUSE OF REPRESENTATIVES"), "col_sl_dist"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET ssc seat
sorted_prec["col_ssc_seat"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC SUPREME COURT"), "col_ssc_seat"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]
#GET court of appeals dist
sorted_prec["col_ia_seat"] = "na"
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC COURT OF APPEALS JUDGE"), "col_ia_seat"] = sorted_prec["contest_title"].str.split(pat=" ").str[-1]


#Create column names
sorted_prec["full_col_names"] = "na"
#cong
sorted_prec.loc[sorted_prec["contest_title"].str.contains("US HOUSE"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec['col_cong_dist'] + sorted_prec['col_party'] + sorted_prec['col_last_name']
#us sen
sorted_prec.loc[sorted_prec["contest_title"].str.contains("US SENATE"), "full_col_names"] = "G22" + sorted_prec["office_abr"] + sorted_prec['col_party'] + sorted_prec['col_last_name']
#state sen
sorted_prec.loc[sorted_prec["contest_title"].str.contains("STATE SENATE"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec['col_su_dist'].str.zfill(2) + sorted_prec['col_party'] + sorted_prec['col_last_name']
#state house
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC HOUSE OF REPRESENTATIVES"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec['col_sl_dist'].str.zfill(3) + sorted_prec['col_party'] + sorted_prec['col_last_name']
#state ssc
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC SUPREME COURT"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec["col_ssc_seat"].str.zfill(2) + sorted_prec['col_party'] + sorted_prec['col_last_name']
#IA court
sorted_prec.loc[sorted_prec["contest_title"].str.contains("NC COURT OF APPEALS JUDGE"), "full_col_names"] = "G" + sorted_prec["office_abr"] + sorted_prec["col_ia_seat"].str.zfill(2) + sorted_prec['col_party'] + sorted_prec['col_last_name']


#filter
sorted_prec = sorted_prec[~sorted_prec["office_abr"].isna()]
#Set column key dict
sorted_df_column_name_dict = pd.Series((sorted_prec["candidate_name"]+", "+sorted_prec["contest_title"]).values, index=sorted_prec["full_col_names"]).to_dict()
#set countyfp
sorted_prec["COUNTYFP"] = 2*sorted_prec["county_id"]-1

## Pivot and filter for sorted counties

In [8]:
## PIVOT RESULTS
sorted_prec_pivot = sorted_prec.pivot_table(index = ['county','COUNTYFP','precinct_name'],
                         columns = ['full_col_names'],
                        values = ['vote_ct'],
                         aggfunc = 'sum')


#Clean up the indices
sorted_prec_pivot.reset_index(inplace = True,drop=False)
sorted_prec_pivot[('county', 'county')] = sorted_prec_pivot[('county', '')]
sorted_prec_pivot[('COUNTYFP', 'COUNTYFP')] = sorted_prec_pivot[('COUNTYFP', '')]
sorted_prec_pivot[('precinct_name', 'precinct_name')] = sorted_prec_pivot[('precinct_name', '')]


#Rename the columns
sorted_prec_pivot.columns = sorted_prec_pivot.columns.map(pd.Series([col[1] for col in sorted_prec_pivot.columns], index = [col for col in sorted_prec_pivot.columns]).to_dict())
sorted_prec_pivot = sorted_prec_pivot.fillna(0)
sorted_prec_pivot.drop([''], axis=1, inplace=True)

#separate out counties sorted by the state
sorted_counties = set(in_sos["County"].values)
df_sorted = sorted_prec_pivot[sorted_prec_pivot['county'].isin(sorted_counties)]
gdf_sorted = gdf[gdf['county_nam'].isin(sorted_counties)]

df_sorted["precinct_name"] = df_sorted["precinct_name"].str.strip() 
df_sorted["county"] = df_sorted["county"].str.strip()
df_sorted["UNIQUE_ID"] = df_sorted["county"].astype(str) + "---" + df_sorted["precinct_name"].astype(str)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["precinct_name"] = df_sorted["precinct_name"].str.strip()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["county"] = df_sorted["county"].str.strip()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["UNIQUE_ID"] = df_sorted["county"].astype(str) + "---" + df_sorted["preci

## Joining sorted county df and gdf precinct

In [9]:
#Testing which precinct identifier column better matches
enr_merge = pd.merge(gdf_sorted, df_sorted.fillna(value=0), left_on=['county_nam','enr_desc'], right_on=['county','precinct_name'], how='outer', indicator=True)
prec_id_merge = pd.merge(gdf_sorted, df_sorted.fillna(value=0), left_on=['county_nam','prec_id'], right_on=['county','precinct_name'], how='outer', indicator=True)
#enr_merge[["county_nam", "enr_desc","prec_id", "county","precinct_name","_merge"]][enr_merge['_merge']!="both"].to_csv("./enr_merge.csv")
#prec_id_merge[["county_nam", "enr_desc","prec_id", "county","precinct_name","_merge"]][prec_id_merge['_merge']!="both"].to_csv("./prec_id_merge.csv")


#Correct precinct names to make them match between sorted df for isolated counties and gdf for isolated counties
gdf_sorted["prec"] = gdf_sorted["enr_desc"].str.upper()
gdf_sorted.loc[(gdf_sorted["county_nam"]=="BUNCOMBE")|(gdf_sorted["county_nam"]=="DURHAM"),"prec"] = gdf_sorted["prec_id"]
gdf_sorted.loc[(gdf_sorted["county_nam"]=="NORTHAMPTON")&(gdf_sorted["enr_desc"]=="GARYSBURG/PLEASA_PLEASANT HILL"),"prec"] = "GARYSBURG/PLEASANT HILL"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="ORANGE")&(gdf_sorted["enr_desc"]=="HillsboroughEast"),"prec"] = "HILLSBOROUGH EAST"
gdf_sorted.loc[gdf_sorted["county_nam"]=="MECKLENBURG", "prec"] = "PCT "+gdf_sorted["enr_desc"].str.zfill(3)
gdf_sorted.loc[(gdf_sorted["county_nam"]=="MECKLENBURG")&(gdf_sorted["enr_desc"].str.endswith(".1")), "prec"] = "PCT "+gdf_sorted["enr_desc"].str.slice(stop=-2)
gdf_sorted.loc[(gdf_sorted["county_nam"]=="AVERY")&(gdf_sorted["prec_id"]=="14"), "prec"] = "NEWLAND 1"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="AVERY")&(gdf_sorted["prec_id"]=="15"), "prec"] = "NEWLAND 2"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="CASWELL")&(gdf_sorted["enr_desc"]=="YANCEYVILLE 2"), "prec"] = "YANCEYVILLE"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="PITT")&(gdf_sorted["enr_desc"]=="GREENVILE  13A"), "prec"] = "GREENVILLE 13A"
gdf_sorted.loc[(gdf_sorted["county_nam"]=="JOHNSTON")&(gdf_sorted["prec_id"]=="PR25"), "prec"] = "WEST SELMA"
gdf_sorted.loc[gdf_sorted["county_nam"]=="WAKE", "prec"] = "PRECINCT "+gdf_sorted["enr_desc"].str.zfill(4)
gdf_sorted.loc[gdf_sorted["county_nam"]=="LEE", "prec"] = "PRECINCT "+gdf_sorted["enr_desc"]
gdf_sorted.loc[gdf_sorted["county_nam"]=="SCOTLAND", "prec"] = gdf_sorted["prec_id"].str.slice(stop=2)
df_sorted["prec"] = df_sorted["precinct_name"].str.upper()
df_sorted["prec"] = df_sorted["precinct_name"].str.replace("#","")
df_sorted["prec"] = df_sorted["prec"].astype(str).str.replace("  ", " ")
gdf_sorted["prec"] = gdf_sorted["prec"].astype(str).str.replace("  ", " ")


#Re-merge for max match
enr_prec_merge = pd.merge(gdf_sorted, df_sorted.fillna(value=0), left_on=['county_nam','prec'], right_on=['county','prec'], how='outer', indicator=True)
enr_prec_merge[["county_nam", "enr_desc","prec_id", "county","precinct_name","prec","_merge"]][enr_prec_merge["_merge"]!="both"].to_csv("./enr_prec_merge.csv")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super(GeoDataFrame, self).__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
A value is trying to be set on a copy of a slice from a DataFr

# Combine 

## Join unsorted and sorted results

In [10]:
#Isolate counties from unsorted df that do not need sorting
df_pivot.drop([''], axis=1, inplace=True)
df_unsorted = df_pivot[~df_pivot['County'].isin(sorted_counties)]
df_sorted["County"] = df_sorted["county"]
countyfp_dict = pd.Series(sorted_prec_pivot['COUNTYFP'].values, index=sorted_prec_pivot["county"]).to_dict()
#common columns are elction columns and UNIQUE_ID


df_sorted_no_dup_columns = df_unsorted[set(df_unsorted.columns)]
df_unsorted = df_unsorted[set(df_unsorted.columns)]
df_sorted = df_sorted[set(df_sorted.columns)]
#How is df[set(df.columns)] different from just df???

dff = pd.concat([df_unsorted, df_sorted], ignore_index=True)#, keys=['County', 'UNIQUE_ID'])
dff["COUNTYFP"] = dff["County"].map(countyfp_dict)
dff.shape

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_sorted["County"] = df_sorted["county"]


(2655, 365)

In [11]:
df_sorted["County"].unique()

array(['ALLEGHANY', 'AVERY', 'BEAUFORT', 'BERTIE', 'BLADEN', 'BUNCOMBE',
       'CABARRUS', 'CALDWELL', 'CAMDEN', 'CASWELL', 'CURRITUCK', 'DARE',
       'DAVIDSON', 'DAVIE', 'DURHAM', 'EDGECOMBE', 'GUILFORD', 'HALIFAX',
       'HAYWOOD', 'HENDERSON', 'HOKE', 'HYDE', 'JOHNSTON', 'JONES', 'LEE',
       'MACON', 'MARTIN', 'MECKLENBURG', 'MOORE', 'NASH', 'NEW HANOVER',
       'NORTHAMPTON', 'ORANGE', 'PERQUIMANS', 'PITT', 'POLK', 'RICHMOND',
       'SCOTLAND', 'STOKES', 'SURRY', 'TYRRELL', 'WAKE', 'WASHINGTON',
       'WATAUGA', 'WILKES', 'YADKIN'], dtype=object)

In [12]:
df_unsorted["County"].unique()

array(['ALAMANCE', 'ALEXANDER', 'ANSON', 'ASHE', 'BRUNSWICK', 'BURKE',
       'CARTERET', 'CATAWBA', 'CHATHAM', 'CHEROKEE', 'CHOWAN', 'CLAY',
       'CLEVELAND', 'COLUMBUS', 'CRAVEN', 'CUMBERLAND', 'DUPLIN',
       'FORSYTH', 'FRANKLIN', 'GASTON', 'GATES', 'GRAHAM', 'GRANVILLE',
       'GREENE', 'HARNETT', 'HERTFORD', 'IREDELL', 'JACKSON', 'LENOIR',
       'LINCOLN', 'MADISON', 'MCDOWELL', 'MITCHELL', 'MONTGOMERY',
       'ONSLOW', 'PAMLICO', 'PASQUOTANK', 'PENDER', 'PERSON', 'RANDOLPH',
       'ROBESON', 'ROCKINGHAM', 'ROWAN', 'RUTHERFORD', 'SAMPSON',
       'STANLY', 'SWAIN', 'TRANSYLVANIA', 'UNION', 'VANCE', 'WARREN',
       'WAYNE', 'WILSON', 'YANCEY'], dtype=object)

## Join combined results with updated gdf

In [13]:
gdf_reform = gp.read_file("./raw-from-source/full_nc22_sn/full_nc22.shp")


#Correct precinct names to make them match between sorted df for isolated counties and gdf for isolated counties
gdf_reform["prec"] = gdf_reform["prec_id"].str.upper()
gdf_reform.loc[gdf_reform["county_nam"].isin(sorted_counties),"prec"] = gdf_reform["enr_desc"].str.upper()
gdf_reform.loc[(gdf_reform["county_nam"]=="BUNCOMBE")|(gdf_reform["county_nam"]=="DURHAM"),"prec"] = gdf_reform["prec_id"]
gdf_reform.loc[(gdf_reform["county_nam"]=="NORTHAMPTON")&(gdf_reform["enr_desc"]=="GARYSBURG/PLEASA_PLEASANT HILL"),"prec"] = "GARYSBURG/PLEASANT HILL"
gdf_reform.loc[(gdf_reform["county_nam"]=="ORANGE")&(gdf_reform["enr_desc"]=="HillsboroughEast"),"prec"] = "HILLSBOROUGH EAST"
gdf_reform.loc[gdf_reform["county_nam"]=="MECKLENBURG", "prec"] = "PCT "+gdf_reform["enr_desc"].str.zfill(3)
gdf_reform.loc[(gdf_reform["county_nam"]=="MECKLENBURG")&(gdf_reform["enr_desc"].str.endswith(".1")), "prec"] = "PCT "+gdf_reform["enr_desc"].str.slice(stop=-2)
gdf_reform.loc[(gdf_reform["county_nam"]=="AVERY")&(gdf_reform["prec_id"]=="14"), "prec"] = "NEWLAND 1"
gdf_reform.loc[(gdf_reform["county_nam"]=="AVERY")&(gdf_reform["prec_id"]=="15"), "prec"] = "NEWLAND 2"
gdf_reform.loc[(gdf_reform["county_nam"]=="CASWELL")&(gdf_reform["enr_desc"]=="YANCEYVILLE 2"), "prec"] = "YANCEYVILLE"
gdf_reform.loc[(gdf_reform["county_nam"]=="PITT")&(gdf_reform["enr_desc"]=="GREENVILE  13A"), "prec"] = "GREENVILLE 13A"
gdf_reform.loc[(gdf_reform["county_nam"]=="JOHNSTON")&(gdf_reform["prec_id"]=="PR25"), "prec"] = "WEST SELMA"
gdf_reform.loc[gdf_reform["county_nam"]=="WAKE", "prec"] = "PRECINCT "+gdf_reform["enr_desc"].str.zfill(4)
gdf_reform.loc[gdf_reform["county_nam"]=="LEE", "prec"] = "PRECINCT "+gdf_reform["enr_desc"]
gdf_reform.loc[gdf_reform["county_nam"]=="SCOTLAND", "prec"] = gdf_reform["prec_id"].str.slice(stop=2)
gdf_reform.loc[gdf_reform["enr_desc"]=="CV_CAROLINA VILLAGE", "prec"] = "CAROLINA VILLAGE"
df_sorted["prec"] = df_sorted["precinct_name"].str.upper()
df_sorted["prec"] = df_sorted["precinct_name"].str.replace("#","")
df_sorted["prec"] = df_sorted["prec"].astype(str).str.replace("  ", " ")
gdf_reform["prec"] = gdf_reform["prec"].astype(str).str.replace("  ", " ")


dff.loc[dff["precinct_name"].isna(),"prec"] = dff['Precinct']
dff['UNIQUE_ID'] = dff["County"] + "---" + dff["prec"]
gdf_reform['UNIQUE_ID'] = gdf_reform["county_nam"] + "---" + gdf_reform["prec"]
merge = pd.merge(gdf_reform, dff.fillna(value=0), on=['UNIQUE_ID'], how='outer', indicator=True)
merge[merge["_merge"]!="both"]

Unnamed: 0,DIS_COL,prec_id,enr_desc,county_nam,geometry,prec_x,UNIQUE_ID,GIA11RSTA,GSL078DDAV,GSU02RPER,GSL106DCUN,GSL064DOSB,GSU30DJOH,GSU27RSES,GSL069RARP,GSL054RPET,GSL065DDON,GSU17RCAV,GSL103RBRA,GSL053RPEN,GSL073RECH,GSL021DLIU,GSL044DSMI,GSL094DHUB,GSU14LLAS,GSL014RCLE,GSU44RALE,GSL062DGRA,GSU06RLAZ,GSL111RMOO,GSL101DLOG,GCON01RSMI,GSU03RHAN,GSL087DKIR,GSL093DMAS,GIA09DSAL,GSU01RSAN,GSL007RWIN,GIA10RTYS,GSL074RZEN,GSU12DCHA,GIA11DJAC,GSU25DEWI,GSL054DREI,GSL115RBHA,GSL003RTYS,GSU46DMAR,GSL092RROB,GSU24DGIB,GSL047RLOW,...,GSL020RDAV,GSL034DLON,GSL095DKOT,GSL037RPAR,GSL042RCAR,GSL077RHOW,GCON12DADA,GSU10RSAW,GSU10DCOH,GSU37RSAW,GSL057DCLE,GSU15RPRI,GSU07DMOR,GSU42RRUS,GSL025LTAY,GSL028RSTR,GSL085RGRE,GSU26OWRI,GSL035LSER,GSL006RPIK,GSL025RCHE,GSL050DPRI,GSL048DPIE,GCON06DMAN,GSL075RLAM,GSL089RSET,GCON03RMUR,GSL069DCOU,GSL008DBRO,GSL021RFAL,GSL018DBUT,GSSC05DERV,GIA08RFLO,GSL112DCOT,GSL047DTOW,GSL026RWHI,GSL023DWIL,GSL097RSAI,GSU14DBLU,GSL115DPRA,GSU28DROB,GSL038DJON,GSL021LMOR,GCON11REDW,GSL104RPOM,precinct_name,COUNTYFP,prec_y,county,_merge


## Format

In [14]:
merge['PRECINCT'] = merge['prec_x']
merge['COUNTYNM'] =  merge['County']
#Reorder
election_cols = merge.columns[merge.columns.str.startswith("G")].sort_values()
merge[election_cols] = merge[election_cols].astype(int)
#Sort alphabetically
merge = merge[['UNIQUE_ID','COUNTYFP', 'COUNTYNM','PRECINCT']+list(election_cols)+['geometry']]
#merge=merge.drop('G22USSNBEA','G22USSNBRA','G22USSNBUD','G22USSNHOH','GCON07NGRA','GCON07NROU','GSL017NILE','GSL017NTER','GSL019NMIL','GSU08NRAB','GSU26NROB'axis=1)

# Check vote totals

In [15]:
for col in merge.columns[merge.columns.str.startswith("G")]:
    if merge[col].sum() == 0:
        print(col)

In [16]:
statewide_totals_check(df_pivot, "df_unsorted", merge, "final df", election_cols)

***Statewide Totals Check***
G22USSDBEA has a difference of 18.0 votes
	df_unsorted: 1784049.0 votes
	final df: 1784031 votes
G22USSGHOH has a difference of -428.0 votes
	df_unsorted: 29934.0 votes
	final df: 30362 votes
G22USSLBRA has a difference of 6.0 votes
	df_unsorted: 51640.0 votes
	final df: 51634 votes
G22USSOWRI has a difference of 1.0 votes
	df_unsorted: 2515.0 votes
	final df: 2514 votes
G22USSRBUD has a difference of 201.0 votes
	df_unsorted: 1905786.0 votes
	final df: 1905585 votes
GCON01DDAV has a difference of 194.0 votes
	df_unsorted: 134996.0 votes
	final df: 134802 votes
GCON01RSMI has a difference of 25.0 votes
	df_unsorted: 122780.0 votes
	final df: 122755 votes
GCON02DROS has a difference of -8.0 votes
	df_unsorted: 190714.0 votes
	final df: 190722 votes
GCON02RVIL has a difference of -13.0 votes
	df_unsorted: 104155.0 votes
	final df: 104168 votes
GCON03DGAS has a difference of -29.0 votes
	df_unsorted: 82378.0 votes
	final df: 82407 votes
GCON03RMUR has a differ

In [17]:
df_pivot["COUNTYNM"] = df_pivot["County"]
county_totals_check(df_pivot, "df_unsorted", merge, "final df", election_cols,"COUNTYNM",full_print=False)

***Countywide Totals Check***

G22USSDBEA contains differences in these counties:
	ALLEGHANY has a difference of -6.0 votes
		df_unsorted: 1249.0 votes
		final df: 1255 votes
	BEAUFORT has a difference of -9.0 votes
		df_unsorted: 6245.0 votes
		final df: 6254 votes
	BERTIE has a difference of -1.0 votes
		df_unsorted: 3492.0 votes
		final df: 3493 votes
	BUNCOMBE has a difference of -26.0 votes
		df_unsorted: 73807.0 votes
		final df: 73833 votes
	CABARRUS has a difference of 9.0 votes
		df_unsorted: 32372.0 votes
		final df: 32363 votes
	CALDWELL has a difference of -17.0 votes
		df_unsorted: 6404.0 votes
		final df: 6421 votes
	CASWELL has a difference of -10.0 votes
		df_unsorted: 3121.0 votes
		final df: 3131 votes
	CURRITUCK has a difference of -15.0 votes
		df_unsorted: 2739.0 votes
		final df: 2754 votes
	DARE has a difference of -7.0 votes
		df_unsorted: 7097.0 votes
		final df: 7104 votes
	DAVIDSON has a difference of -36.0 votes
		df_unsorted: 14643.0 votes
		final df: 14679

In [18]:
merge['COUNTYFP'].unique()

array([ 23, 167, 119, 179, 185, 199, 183,  25, 165,  21, 147,  65, 191,
        67, 155,  93,  11, 121, 189,  35, 159,  63, 171,   5, 169,  69,
        59, 111,   1, 153,  71,   9,  49, 161,  57, 193, 177,  37, 137,
        41,  73, 105,  91,  31,  87, 151,  51, 173,  61, 145,  33,  39,
         7,  77,  89,  79, 109,  13, 163,  55,   3, 175,  97, 133,  99,
       113, 143, 125,  45, 115, 123,  95, 197,  43,  83,  15, 135,  19,
        53, 181, 107, 117, 129, 141, 149,  81,  29, 131, 157, 139,  75,
       187, 127,  47, 103,  17,  27,  85, 101, 195])

# File Creation

In [19]:
#os.mkdir("./nc_2022_gen_prec_draft/")
merge.to_file("./nc_2022_gen_prec/nc_2022_gen_prec.shp")

In [20]:
def create_fields_table(race_field_header_0, fields_dict_0, 
                        add_race_field_header_1 = '', fields_dict_1 = {}, 
                        add_race_field_header_2 = '', fields_dict_2 = {}, 
                        add_race_field_header_3 = '', fields_dict_3 = {}):
    '''Purpose: Create fields table used in readme based on field dictionary created separately
    Arguments:
        race_field_header_0: include asterisks "***text***" and label first set of fields
        fields_dict_0: the default dictionary for the primary file (statewide)
        add_race_field_header_1: include asterisks to draw attention to section - ex: "***additional_race_file_name_fields***"
        fields_dict_1: additional fields to go under add_race_field_header_1 header
        add_race_field_header_2 and _3: same use as add_race_field_header_1 - include as needed
        fields_dict_2 and _3: same use as fields_dict_1 - include as needed associated with corresponding add_race_field_header section
    '''
    fields_table_data = {'Field Name': ['',race_field_header_0]  + list(fields_dict_0.keys()) +
                         ['',add_race_field_header_1] + list(fields_dict_1.keys()) +
                         ['',add_race_field_header_2] + list(fields_dict_2.keys()) +
                         ['',add_race_field_header_3] + list(fields_dict_3.keys()),
                         'Description': ['',''] + list(fields_dict_0.values()) + 
                         ['',''] + list(fields_dict_1.values()) + 
                         ['',''] + list(fields_dict_2.values()) +
                         ['',''] + list(fields_dict_3.values())}
    fields_table = pd.DataFrame(fields_table_data)
    return fields_table


#create_fields_table('Field Name: ', readme_column_dict)
#Column dictionary for readme
#dfcol#["new_col"]
#readme_column_dict = pd.Series(dfcol["og_column"].str.replace("_", " ").values, index = dfcol["new_col"]).to_dict()

unsorted_df_column_name_dict["UNIQUE_ID"] = "Unique precinct-identifier combining county and precinct names"
unsorted_df_column_name_dict["COUNTYFP"] = "State County FIPS code"
unsorted_df_column_name_dict["COUNTYNM"] = "County name"
unsorted_df_column_name_dict["PRECINCT"] = "County precinct identifier"
unsorted_df_column_name_dict["geometry"] = "geospatial geometry"
fields_table = create_fields_table('', unsorted_df_column_name_dict).sort_values(["Field Name"])

readme_fields = fields_table.to_string(formatters={'Description':'{{:<{}s}}'.format(fields_table['Description'].str.len().max()).format, 'Field Name':'{{:<{}s}}'.format(fields_table['Field Name'].str.len().max()).format}, index=False)

print(readme_fields)

Field Name                                                            Description
                                                                                 
                                                                                 
                                                                                 
                                                                                 
                                                                                 
                                                                                 
                                                                                 
                                                                                 
COUNTYFP   State County FIPS code                                                
COUNTYNM   County name                                                           
G22USSDBEA Cheri Beasley, US SENATE                                              
G22USSGHOH Matth

In [21]:
merge

Unnamed: 0,UNIQUE_ID,COUNTYFP,COUNTYNM,PRECINCT,G22USSDBEA,G22USSGHOH,G22USSLBRA,G22USSOWRI,G22USSRBUD,GCON01DDAV,GCON01RSMI,GCON02DROS,GCON02RVIL,GCON03DGAS,GCON03RMUR,GCON04DFOU,GCON04RGEE,GCON05DPAR,GCON05RFOX,GCON06DMAN,GCON06LWAT,GCON06RCAS,GCON07DGRA,GCON07RROU,GCON08DHUF,GCON08RBIS,GCON09DCLA,GCON09RHUD,GCON10DGEN,GCON10OWRI,GCON10RMCH,GCON11DBEA,GCON11LCOA,GCON11REDW,GCON12DADA,GCON12RLEE,GCON13DNIC,GCON13RHIN,GCON14DJAC,GCON14RHAR,GIA08DTHO,GIA08RFLO,GIA09DSAL,GIA09RSTR,GIA10DADA,GIA10RTYS,GIA11DJAC,GIA11RSTA,GSL001RGOO,GSL002DJEF,...,GSU22LUBI,GSU22RCOL,GSU23DMEY,GSU23RWOO,GSU24DGIB,GSU24RBRI,GSU25DEWI,GSU25RGAL,GSU26OWRI,GSU26RBER,GSU27DGAR,GSU27RSES,GSU28DROB,GSU28RSCH,GSU29DCRU,GSU29RCRA,GSU30DJOH,GSU30RJAR,GSU31RKRA,GSU32DLOW,GSU32RWAR,GSU33DHOR,GSU33RFOR,GSU34DSAN,GSU34RNEW,GSU35RJOH,GSU36RSET,GSU37RSAW,GSU38DMOH,GSU39DSAL,GSU39RROB,GSU40DWAD,GSU40RSHI,GSU41DMAR,GSU41RLEO,GSU42DHUN,GSU42RRUS,GSU43ROVE,GSU44RALE,GSU45RPRO,GSU46DMAR,GSU46RDAN,GSU47RHIS,GSU48DCAR,GSU48RMOF,GSU49DMAY,GSU49RAND,GSU50DMCC,GSU50RCOR,geometry
0,BURKE---0001,23,BURKE,0001,437,14,21,0,1081,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,439,2,1108,0,0,0,0,0,0,0,0,0,438,1095,427,1111,435,1099,445,1093,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,424,1121,0,0,0,0,0,0,0,"POLYGON ((1233595.464 737538.312, 1233589.172 ..."
1,STANLY---0003,167,STANLY,0003,503,14,16,2,646,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,494,680,0,0,0,0,0,0,0,0,0,0,0,0,0,0,498,672,478,689,495,672,498,669,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,492,683,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"POLYGON ((1644857.853 584760.831, 1644768.728 ..."
2,BURKE---0003,23,BURKE,0003,129,4,8,0,492,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,134,1,498,0,0,0,0,0,0,0,0,0,135,491,129,499,131,496,132,492,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,126,502,0,0,0,0,0,0,0,"POLYGON ((1220715.101 726879.358, 1220723.026 ..."
3,STANLY---0007,167,STANLY,0007,379,13,21,0,709,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,381,739,0,0,0,0,0,0,0,0,0,0,0,0,0,0,396,723,359,762,386,730,380,738,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,383,733,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"POLYGON ((1650578.509 584607.573, 1650484.945 ..."
4,STANLY---0008,167,STANLY,0008,478,5,11,0,79,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,479,89,0,0,0,0,0,0,0,0,0,0,0,0,0,0,483,82,479,88,483,84,485,82,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,483,84,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"POLYGON ((1646657.855 579294.538, 1646677.799 ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2650,ORANGE---WESTWOOD,135,ORANGE,WESTWOOD,1358,10,9,0,97,0,0,0,0,0,0,1353,113,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1345,116,1302,157,1347,111,1351,109,0,0,...,0,0,1353,108,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"POLYGON ((1980358.648 779536.674, 1980323.701 ..."
2651,ALEXANDER---W,3,ALEXANDER,W,207,12,14,2,920,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,206,1,943,0,0,0,0,0,0,0,0,0,209,928,195,942,207,930,200,936,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1003,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"POLYGON ((1328877.683 764648.104, 1328959.697 ..."
2652,LENOIR---W,107,LENOIR,W,93,8,7,0,544,0,0,0,0,76,577,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,91,551,66,582,85,560,82,562,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"POLYGON ((2419504.837 485228.828, 2419439.358 ..."
2653,HAYWOOD---WAYNESVILLE WEST,87,HAYWOOD,WAYNESVILLE WEST,379,10,26,0,504,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,380,17,517,0,0,0,0,0,0,393,510,376,527,386,514,384,520,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,388,520,"POLYGON ((808739.181 660103.212, 808721.741 66..."
