MGGG says the shapefile contains the 2011 wards which were in place during the 2012 to 2016 elections

I think they mean 2012 and 2016 presidential elections

In the LTSB file, other elections results such as 2018 were included as well but it seems like MGGG removed these election results from their file

In [1]:
import pandas as pd
import geopandas as gp

PGP OpenPrecincts links to MGGG WI GitHub:
https://openprecincts.org/wi/

https://github.com/mggg-states/WI-shapefiles

Upon looking at the MGGG WI folder description, it seems like MGGG just retrieved the file from the WI Legislative Technology Services Bureau(LTSB) GIS open data portal:
https://data-ltsb.opendata.arcgis.com/datasets/2018-2012-election-data-with-2011-wards

The file was already cleaned by LTSB using their own methodology, which we do not have access to but can read more in detail about in the description

This code will just be checking that the MGGG file matches the LTSB file

In [2]:
ltsb = gp.read_file("./2018-2012_Election_Data_with_2011_Wards-shp/2018-2012_Election_Data_with_2011_Wards.shp")
mggg = gp.read_file("./WI_wards_12_16/WI_ltsb_corrected_final.shp")

proj = mggg.crs
ltsb = ltsb.to_crs(proj)
mggg = mggg.to_crs(proj)

Check columns and headers to see if they match up

In [3]:
ltsbcol = ltsb.columns.tolist()
ltsb.head()

Unnamed: 0,OBJECTID,GEOID10,LABEL,MCD_NAME,CTV,WARD_ID,CNTY_NAME,CNTY_FIPS,COUSUBFP,MCD_FIPS,...,WSASCT12,WSSTOT12,WSSDEM12,WSSREP12,WSSREP212,WSSCON12,WSSIND12,WSSSCT12,WSSAME12,geometry
0,1,55001002750001,Adams - C 1,Adams,C,1,Adams,55001,275,5500100275,...,0,230,137,92,0,0,0,1,0,"POLYGON ((-9998455.508 5459185.977, -9998453.5..."
1,2,55001002750002,Adams - C 2,Adams,C,2,Adams,55001,275,5500100275,...,0,224,135,89,0,0,0,0,0,"POLYGON ((-9997184.128 5458891.224, -9997183.0..."
2,3,55001002750003,Adams - C 3,Adams,C,3,Adams,55001,275,5500100275,...,0,100,60,40,0,0,0,0,0,"POLYGON ((-9998460.739 5459468.521, -9998466.8..."
3,4,55001002750004,Adams - C 4,Adams,C,4,Adams,55001,275,5500100275,...,0,125,75,50,0,0,0,0,0,"POLYGON ((-9998455.508 5459185.977, -9998599.2..."
4,5,55001003000001,ADAMS - T 1,ADAMS,T,1,Adams,55001,300,5500100300,...,0,335,185,150,0,0,0,0,0,"MULTIPOLYGON (((-10007403.702 5461404.809, -10..."


In [4]:
mgggcol = mggg.columns.tolist() 
mggg.head()

Unnamed: 0,GEOID10,OBJECTID,NAME,ASM,SEN,CON,CNTY_NAME,PERSONS,WHITE,BLACK,...,WSASCT12,WSSTOT12,WSSDEM12,WSSREP12,WSSREP212,WSSCON12,WSSIND12,WSSSCT12,WSSAME12,geometry
0,55001002750001,1,Adams - C 1,41,14,3,Adams,661,620,17,...,0,230,137,92,0,0,0,1,0,"POLYGON ((-9998455.263 5459186.119, -9998453.2..."
1,55001002750002,2,Adams - C 2,41,14,3,Adams,652,599,6,...,0,224,135,89,0,0,0,0,0,"POLYGON ((-9997183.884 5458891.368, -9997182.7..."
2,55001002750003,3,Adams - C 3,41,14,3,Adams,288,278,6,...,0,100,60,40,0,0,0,0,0,"POLYGON ((-9998460.495 5459468.662, -9998466.6..."
3,55001002750004,4,Adams - C 4,41,14,3,Adams,366,350,2,...,0,125,75,50,0,0,0,0,0,"POLYGON ((-9998455.263 5459186.119, -9998598.9..."
4,55001003000001,5,ADAMS - T 1,41,14,3,Adams,902,847,2,...,0,335,185,150,0,0,0,0,0,"MULTIPOLYGON (((-10007403.458 5461404.936, -10..."


LTSB file has 238 columns but MGGG has 180 columns

I assume MGGG probably just deleted some columns they did not find relevant

Column names seem to match up

Check that there are the same amount of rows

In [5]:
ltsb.shape

(6634, 238)

In [6]:
mggg.shape

(6634, 180)

In [7]:
validate = [i for i in mggg if i in ltsb]
missing = [i for i in mggg if i not in ltsb]

In [8]:
len(validate)

168

Validate should find all 180 columns in MGGG in the original LTSB file, only found 168

Check which MGGG columns are not from LTSB file

In [9]:
missing

['NAME',
 'USSTOT14',
 'USSDEM14',
 'USSREP14',
 'USSIND14',
 'USSSCT14',
 'WAGTOT12',
 'WAGDEM12',
 'WAGDEM212',
 'WAGREP12',
 'WAGIND12',
 'WAGSCT12']

Votes for US Senate in 2014, Attorney General in 2012 included in MGGG

Checked to see if they were named differently in the LTSB file, seems like it was not included

Check to see if MGGG NAME column matches up with LTSB LABEL column

In [10]:
validate = pd.merge(mggg, ltsb, on=['GEOID10'], how='left')

LABEL and NAME columns are the same, so only things missing from LTSB file that were additions to the MGGG file is USS14 and WAG12 results

In [11]:
validate[['LABEL', 'NAME']].head(10)

Unnamed: 0,LABEL,NAME
0,Adams - C 1,Adams - C 1
1,Adams - C 2,Adams - C 2
2,Adams - C 3,Adams - C 3
3,Adams - C 4,Adams - C 4
4,ADAMS - T 1,ADAMS - T 1
5,ADAMS - T 2,ADAMS - T 2
6,ADAMS - T 3,ADAMS - T 3
7,BIG FLATS - T 1,BIG FLATS - T 1
8,BIG FLATS - T 2,BIG FLATS - T 2
9,COLBURN - T 1,COLBURN - T 1


In [12]:
validate.columns.tolist()

['GEOID10',
 'OBJECTID_x',
 'NAME',
 'ASM_x',
 'SEN_x',
 'CON_x',
 'CNTY_NAME_x',
 'PERSONS_x',
 'WHITE_x',
 'BLACK_x',
 'HISPANIC_x',
 'ASIAN_x',
 'AMINDIAN_x',
 'PISLAND_x',
 'OTHER_x',
 'OTHERMLT_x',
 'PERSONS18_x',
 'WHITE18_x',
 'BLACK18_x',
 'HISPANIC18_x',
 'ASIAN18_x',
 'AMINDIAN18_x',
 'PISLAND18_x',
 'OTHER18_x',
 'OTHERMLT18_x',
 'CDATOT16_x',
 'CDADEM16_x',
 'CDADEM216_x',
 'CDAREP16_x',
 'CDAIND16_x',
 'CDASCT16_x',
 'PRETOT16_x',
 'PREDEM16_x',
 'PREREP16_x',
 'PREGRN16_x',
 'PRELIB16_x',
 'PRECON16_x',
 'PREIND16_x',
 'PREIND216_x',
 'PREIND316_x',
 'PREIND416_x',
 'PREIND516_x',
 'PREIND616_x',
 'PREIND716_x',
 'PREIND816_x',
 'PREIND916_x',
 'PREIND1016_x',
 'PREIND1116_x',
 'PRESCT16_x',
 'USHTOT16_x',
 'USHDEM16_x',
 'USHDEM216_x',
 'USHREP16_x',
 'USHGRN16_x',
 'USHLIB16_x',
 'USHIND16_x',
 'USHSCT16_x',
 'USSTOT16_x',
 'USSDEM16_x',
 'USSREP16_x',
 'USSREP216_x',
 'USSLIB16_x',
 'USSSCT16_x',
 'WSATOT16_x',
 'WSADEM16_x',
 'WSAREP16_x',
 'WSALIB16_x',
 'WSAIND16_x'

In [13]:
validate[['PRETOT16_x','PRETOT16_y']].head()

Unnamed: 0,PRETOT16_x,PRETOT16_y
0,258,258
1,241,241
2,113,113
3,136,136
4,414,414


In [14]:
validate['vote_diff_PRETOT16'] = validate.PRETOT16_x - validate.PRETOT16_y
validate[validate.vote_diff_PRETOT16.abs()<10].shape[0] / validate.shape[0]

1.0

In [15]:
validate[['USSTOT16_x','USSTOT16_y']].head()

Unnamed: 0,USSTOT16_x,USSTOT16_y
0,245,245
1,239,239
2,107,107
3,133,133
4,404,404


In [16]:
validate['vote_diff_USSTOT16'] = validate.USSTOT16_x - validate.USSTOT16_y
validate[validate.vote_diff_USSTOT16.abs()<10].shape[0] / validate.shape[0]

1.0

In [17]:
validate[['PRETOT12_x','PRETOT12_y']].head()

Unnamed: 0,PRETOT12_x,PRETOT12_y
0,272,272
1,262,262
2,117,117
3,147,147
4,427,427


In [18]:
validate['vote_diff_PRETOT12'] = validate.PRETOT12_x - validate.PRETOT12_y
validate[validate.vote_diff_PRETOT12.abs()<10].shape[0] / validate.shape[0]

1.0

In [19]:
validate[['WSSTOT12_x','WSSTOT12_y']].head()

Unnamed: 0,WSSTOT12_x,WSSTOT12_y
0,230,230
1,224,224
2,100,100
3,125,125
4,335,335


In [20]:
validate['vote_diff_WSSTOT12'] = validate.WSSTOT12_x - validate.WSSTOT12_y
validate[validate.vote_diff_WSSTOT12.abs()<10].shape[0] / validate.shape[0]

1.0

In [21]:
validate[['WAGTOT14_x','WAGTOT14_y']].head(10)

Unnamed: 0,WAGTOT14_x,WAGTOT14_y
0,175,182
1,167,177
2,86,80
3,109,98
4,328,323
5,113,120
6,41,39
7,133,130
8,253,256
9,102,102


In [22]:
validate['vote_diff_WAGTOT14'] = validate.WAGTOT14_x - validate.WAGTOT14_y
validate[validate.vote_diff_WAGTOT14.abs()<10].shape[0] / validate.shape[0]

0.7188724751281278

Decided to check Wisconsin Attorney General, because that was one of the columns in MGGG but not LTSB

Validation rate is around 71% 

Not sure why? Sum for votes is slightly off too

In [23]:
#get MGGG WAGTOT14 sum
validate['WAGTOT14_x'].sum()

2343440

In [24]:
#get LTSB WAGTOT14 sum
validate['WAGTOT14_y'].sum()

2350325

Took a look at the 2014 Attorney General vote totals gathered from: 
https://elections.wi.gov/elections-voting/results/2014/fall-general
    
It reported a vote total of 2,350,325 which matches the LTSB sum


MGGG WAGTOT14 is not as accurate, unsure where they got the source from

Took a brief look at the Attorney General of Wisconsin elections, it seems like they hold 4 year terms:
https://en.wikipedia.org/wiki/Attorney_General_of_Wisconsin

Not sure there should be a WAGTOT12 column, looked at the 2012 elections results page on the Wisconsin Elections Commission website and didn't find any indication there was an attorney general election that year?

https://elections.wi.gov/elections-voting/results/2012/fall-general

There was an attorney general election in 2010, but the sum of votes is 2,112,485 which does not exactly match with what MGGG has either:
https://elections.wi.gov/sites/elections.wi.gov/files/2010%20Fall%20General%20Election%20Results%20Summary.pdf

In [25]:
validate['WAGTOT12'].sum()

2189676

MGGG file includes a mistake from an older version of the LTSB file:
    
The shapefile has results for 2014 that use the LTSB code for US Senate (USS). 
This is a mistake that the LTSB has fixed in later versions of this shapefile. 
There was no US Senate election in Wisconsin in 2014. 
These are the results for the Wisconsin state senate. 
This is reflected in the descriptions of the variables.

Validate based on the different names

In [26]:
validate[['USSTOT14','WSSTOT14']].head()

Unnamed: 0,USSTOT14,WSSTOT14
0,0,0
1,0,0
2,0,0
3,0,0
4,0,0


In [27]:
validate['vote_diff_WSSTOT14'] = validate.USSTOT14 - validate.WSSTOT14
validate[validate.vote_diff_WSSTOT14.abs()<10].shape[0] / validate.shape[0]

0.8460958697618329

In [28]:
validate['vote_diff_WSSDEM14'] = validate.USSDEM14 - validate.WSSDEM14
validate[validate.vote_diff_WSSDEM14.abs()<10].shape[0] / validate.shape[0]

0.9389508592101297

In [29]:
validate['vote_diff_WSSREP14'] = validate.USSREP14 - validate.WSSREP14
validate[validate.vote_diff_WSSREP14.abs()<10].shape[0] / validate.shape[0]

0.9223696110943623

In [30]:
validate['vote_diff_WSSIND14'] = validate.USSIND14 - validate.WSSIND14
validate[validate.vote_diff_WSSIND14.abs()<10].shape[0] / validate.shape[0]

0.9996985227615315

In [31]:
validate['vote_diff_WSSSCT14'] = validate.USSSCT14 - validate.WSSSCT14
validate[validate.vote_diff_WSSSCT14.abs()<10].shape[0] / validate.shape[0]

0.9995477841422973

Validate some populations just to double check those too

In [32]:
validate[['PERSONS_x','PERSONS_y']].head()

Unnamed: 0,PERSONS_x,PERSONS_y
0,661,661
1,652,652
2,288,288
3,366,366
4,902,902


In [33]:
validate['pop_diff_PERSONS'] = validate.PERSONS_x - validate.PERSONS_y
validate[validate.pop_diff_PERSONS.abs()<10].shape[0] / validate.shape[0]

1.0

In [34]:
validate[['HISPANIC18_x','HISPANIC18_y']].head()

Unnamed: 0,HISPANIC18_x,HISPANIC18_y
0,8,8
1,7,7
2,2,2
3,2,2
4,21,21


In [35]:
validate['pop_diff_HISPANIC18'] = validate.HISPANIC18_x - validate.HISPANIC18_y
validate[validate.pop_diff_HISPANIC18.abs()<10].shape[0] / validate.shape[0]

1.0