# Deriving "Precinct" level results for Wisconsin

### tl;dr: We'll show how to join Wisconsin election results to ward(precinct) shapefiles

> What we in Wisconsin call a ward is referred to as a precinct in some states or a voting district by the
Census Bureau. Wards do not constitute election districts from which municipal officials are elected, and
thus are not subject to the “one person, one vote” requirement which governs the formation of election
districts. Instead, wards are intended to serve as administrative subunits that are aggregated into election
districts of equal population. Cities, villages, and towns form municipal wards by combining whole
census blocks...Once established, wards serve as the building blocks used by the legislature, counties, and cities in redistricting their respective election districts. -- <cite>[Wisconsin Elections Commission](https://docs.legis.wisconsin.gov/misc/lrb/redistricting_information/guidelines_2020.pdf)</cite>

The [OpenElections Project](http://www.openelections.net) is compiling a set of standardized, precinct-level results for national and state level elections going back to 2000. It's a great project, and when finished the dataset will be very useful for journalists, academics, campaigns, and armchair political scientists. 

The quality of the data OpenElections obtains varies greatly by state, and even within a state it can be an adventure - sometimes there's a nice spreadsheet in one county, but in some extreme cases volunteers have to go to County Clerk's offices and take photographs of election results and convert it by hand. 

Wisconsin is on the easier end - The Wisconsin Elections Commission and its predecessor the Government Accountability Board have been good about [providing statewide election results](http://elections.wi.gov/elections-voting/results) within a few weeks of an election, and in machine-readable formats, usually Excel. They're not consistent about the formatting of those Excel files but it's at least not too bad to reason about on a year-by-year basis. 

Wards are the atomic base unit for most elections and all voters in a ward get the same ballot. Wards often have different polling places per ward, but in some cases multiple wards will vote at the same polling location. The districts built from the wards are not in a strict hierarchy: a State Assembly district made up of 50 wards might well be split by a Congressional district, with 25 wards in one district and 25 in another Congressional District. Wards are always contained in one County and in one municipality. 

Because wards are built from whole Census blocks, demographic information is available for each ward.

Wards may get bigger or new wards may be created due to annexations and changes to municipal boundaries, but once created at the start of a new redistricting cycle, a ward is not deleted and they have stable ID numbers. The Legislative Technology Services Bureau publishes a map and shapefiles for the wards and gives each ward an equivient of a FIPS code that can be used for a database/GIS join. The maps are published twice a year to reflect municipal boundary changes. Election results need to be paired with the current shapefile for best accuracy - using a 2016 shapefile with the 2012 election results will be off on the boundaries of muncipalities, though most wards will still look the same. 

Unfortunately for Wisconsin election data users, while Wisconsin votes by ward, it does not necessarily report by ward. Only cities of greater than 35,000 people are required to report each ward individually. Smaller municipalites are permitted to combine results into ["reporting units"](http://elections.wi.gov/sites/default/files/publication/65/ea_wards_districts_reporting_units_annexations_f_18339.pdf) of multiple wards. All wards in a reporting unit must share the same districts, and because different elections cover districts, the reporting units may vary every election, though they're typically the same. There are usually around 3600 reporting units per election, and in the 2014 fall election there were 6,634 wards in Wisconsin. 

The "reporting units" do not get any sort of geographical identifier like a FIPS code. In the reported data collected by the Elections Commission, the reporting unit is an unstructured string. However, there is a prescribed format for [how clerks should name reporting units](http://elections.wi.gov/node/1298) that in theory contains all of the information necessary to decide what wards are included in a reporting unit. 

Sadly, the Reporting units are created by humans, and Wisconsin has 1,927 different clerks who are potentially creating reporting unit names, few of whom think about "how can a computer parse this", and not all of whom follow exactly the guidnance set out by the Elections Commission. 

So, the remainder of this notebook will be how we use a mix of code and hand-editing to create a file that users of OpenElection data can reference, and lookup which wards (and which GeoIDs) are associated with a given row of the election results data.

We'll start with the preliminaries - bring in Pandas and set some blogging-friendly defaults

In [1]:
import pandas as pd
import re
pd.options.display.max_rows = 999
pd.options.display.max_columns = 999

We'll start by bringing in the reporting units used for the Fall 2016 elections. We'll merge in a few other reporting units too - our end result will be a superset of multiple elections, but that's OK, if a reporting unit is named the same thing in different elections it will have the same wards.

In [2]:
reportingunits2016 = pd.read_excel("http://elections.wi.gov/sites/default/files/page/2016_general_election_reporting_units_xlsx_79857.xlsx")

In [3]:
reportingunits2016.head()

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict
0,Adams County,CITY OF ADAMS,Ward 1-4,Congressional - District 3,Assembly - District 41,State Senate - District 14
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9",Congressional - District 3,Assembly - District 41,State Senate - District 14
2,Adams County,TOWN OF ADAMS,Wards 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,Congressional - District 3,Assembly - District 72,State Senate - District 24
4,Adams County,TOWN OF COLBURN,Ward 1,Congressional - District 3,Assembly - District 72,State Senate - District 24


In [4]:
len(reportingunits2016)

3638

We don't actually care about the different Districts in use here (we'll rediscover those in election results anyway.) The good clerks of Adams County above have done a nice job of following the instructions, and it looks like this might be relatively straightforward to parse with a regular expression. (You know what quote this calls for, of course)

> Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. <cite>--jwz</cite>

It will turn out that there are too many exceptions to reasonably parse with a regex, so we'll take the coward's way out: we'll parse a bunch with a regex, and just fix the rest by hand. 

First, we'll try out our regex to try to cut the problem down to size. We'll blast over each row and then make a few new columns in our DataFrame 

In [5]:
def processReportingUnit(ward):
    x = re.search('((?:^\D+)(?P<hypen>(\d+)(?:\s*)-(?:\s*)(\d+$)))|((?:^\D+)(?P<single>(\d+$)))|((?:^\D+)(?P<amp>(\d+)(?:\s*)&(?:\s*)(\d+$)))|((?:^\D+)(?P<comma>(\d+)(?:\s*),(?:\s*)(\d+$)))', ward)
    if x is None:
       #print("%s" % (ward))
       #unmatched.append(ward)
        return {"type": "unmatched", "data": ward}
    elif x.group('single'):
       #foo.append('single %s' % (x.group('single')))
       #print('single ward: %sEND (single %s)' % (ward, x.group('single')))
        return {"type": "single", "data": x.group('single')}
    elif x.group('hypen'):
        #print("Ward: %s (hypne %s)" % (ward, x.group('hypen')))
        #foo.append('hyphen: %s' % (x.group('hypen')))
        return {"type": "hypen", "data": x.group('hypen')}
    elif x.group('amp'):
        #print("Ward: %s (amp %s)" % (ward, x.group('amp')))
        #foo.append('amp: %s' % (x.group('amp')))
        return {"type": "amp", "data": x.group('amp')}
    elif x.group('comma'):
        #print("Ward: %s (comma %s)" % (ward, x.group('comma')))
        #foo.append('comma: %s' % (x.group('comma')))
        return {"type": "comma", "data": x.group('comma')}
    else:
        #print("shouldn't get here: %s" % (ward))
        raise Exception()
        pass
    
processed = pd.concat([reportingunits2016,reportingunits2016.ReportingUnit.apply(lambda s: pd.Series(processReportingUnit(s)))], axis=1)
print("done")

done


In [6]:
processed.head()

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type
0,Adams County,CITY OF ADAMS,Ward 1-4,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-4,hypen
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9",Congressional - District 3,Assembly - District 41,State Senate - District 14,59,comma
2,Adams County,TOWN OF ADAMS,Wards 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-3,hypen
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,Congressional - District 3,Assembly - District 72,State Senate - District 24,1-2,hypen
4,Adams County,TOWN OF COLBURN,Ward 1,Congressional - District 3,Assembly - District 72,State Senate - District 24,1,single


Let's look at a couple of examples

In [7]:
processed[processed['type']=='hypen'].head(5)

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type
0,Adams County,CITY OF ADAMS,Ward 1-4,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-4,hypen
2,Adams County,TOWN OF ADAMS,Wards 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-3,hypen
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,Congressional - District 3,Assembly - District 72,State Senate - District 24,1-2,hypen
5,Adams County,TOWN OF DELL PRAIRIE,Ward 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-3,hypen
6,Adams County,TOWN OF EASTON,Wards 1-2,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-2,hypen


In [8]:
processed[processed['type']=='comma'].head(5)

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9",Congressional - District 3,Assembly - District 41,State Senate - District 14,59,comma
115,Brown County,CITY OF DE PERE,"Wards 9,18",Congressional - District 8,Assembly - District 2,State Senate - District 1,918,comma
197,Brown County,VILLAGE OF HOWARD,"Wards 1,12",Congressional - District 8,Assembly - District 89,State Senate - District 30,112,comma
426,Crawford County,CITY OF PRAIRIE DU CHIEN,"Wards 2,7",Congressional - District 3,Assembly - District 96,State Senate - District 32,27,comma
616,Dane County,CITY OF VERONA,"Wards 1,5",Congressional - District 2,Assembly - District 79,State Senate - District 27,15,comma


And overall, 157 that are something funky. That's not too bad. Let's take a look at what a few of those look like:

In [9]:
processed['type'].value_counts()

single       2198
hypen        1236
unmatched     157
comma          47
Name: type, dtype: int64

In [10]:
processed[processed['type']=='unmatched'].head(5)

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type
173,Brown County,TOWN OF LEDGEVIEW,"Wards 1-3,8-10",Congressional - District 8,Assembly - District 88,State Senate - District 30,"Wards 1-3,8-10",unmatched
198,Brown County,VILLAGE OF HOWARD,"Wards 2,8,11",Congressional - District 8,Assembly - District 4,State Senate - District 2,"Wards 2,8,11",unmatched
199,Brown County,VILLAGE OF HOWARD,"Wards 3-4,6",Congressional - District 8,Assembly - District 89,State Senate - District 30,"Wards 3-4,6",unmatched
202,Brown County,VILLAGE OF HOWARD,"Wards 9-10,18",Congressional - District 8,Assembly - District 4,State Senate - District 2,"Wards 9-10,18",unmatched
205,Brown County,VILLAGE OF PULASKI,"Wards 1-3,6",Congressional - District 8,Assembly - District 6,State Senate - District 2,"Wards 1-3,6",unmatched


Yeah, let's not try to figure all those variations out. Let's just make ourselves a nice dictionary to edit. 

In [11]:
def convert(row):
    x = re.search('(^\D+)', row)
    print("\"%s\": \"%s \"," % (row, x.group(1).strip()) )
    
processed[processed['type']=='unmatched'].ReportingUnit.map(lambda s: convert(s))

"Wards 1-3,8-10": "Wards ",
"Wards 2,8,11": "Wards ",
"Wards 3-4,6": "Wards ",
"Wards 9-10,18": "Wards ",
"Wards 1-3,6": "Wards ",
"Wards 7-8,10-12": "Wards ",
"Wards 5-6,10": "Wards ",
"Wards 7-9,14": "Wards ",
"Ward 3A": "Ward ",
"Ward 7A": "Ward ",
"Wards 1-4,6-7": "Wards ",
"Wards 1,9-10": "Wards ",
"Wards 2-3,5": "Wards ",
"Wards 4,6-8": "Wards ",
"Wards 1-3,6": "Wards ",
"Wards 15,18-19": "Wards ",
"Wards 1-4,9": "Wards ",
"Wards 3-4,12": "Wards ",
"Wards 1-2,4-5,7": "Wards ",
"Ward 1,3,5": "Ward ",
"Ward 2,4,6": "Ward ",
"Ward 1,3-6,15": "Ward ",
"Ward 7-10,12": "Ward ",
"Wards 14,16-17": "Wards ",
"Wards 1,5-6,11": "Wards ",
"Wards 2-4,12": "Wards ",
"Wards 1,3,5": "Wards ",
"Wards 7,12-13": "Wards ",
"Wards 1-4,7-11": "Wards ",
"Wards 1,3-7": "Wards ",
"Wards 1-6,22-24,29": "Wards ",
"Wards 7-10,18-21,25-27,30": "Wards ",
"Wards 11-17,28": "Wards ",
"Wards 1-3,7-8": "Wards ",
"Wards 4-6,9-12": "Wards ",
"Wards 13-16,20-22": "Wards ",
"Wards 17-19,30-32": "Wards ",
"Ward 9B": "

173     None
198     None
199     None
202     None
205     None
274     None
280     None
281     None
296     None
300     None
305     None
389     None
390     None
391     None
392     None
457     None
598     None
606     None
627     None
633     None
634     None
668     None
670     None
673     None
680     None
681     None
690     None
693     None
709     None
733     None
751     None
752     None
753     None
772     None
773     None
774     None
776     None
955     None
965     None
1153    None
1340    None
1345    None
1356    None
1511    None
1513    None
1520    None
1521    None
1557    None
1558    None
1658    None
1662    None
1663    None
1713    None
1734    None
1735    None
1742    None
1743    None
1746    None
1752    None
2162    None
2182    None
2183    None
2193    None
2196    None
2198    None
2199    None
2208    None
2209    None
2360    None
2365    None
2384    None
2385    None
2386    None
2402    None
2403    None
2422    None
2425    None

Let's cut and paste this into a new cell and just fix it by hand. It won't take too long to zip over by hand.

In [12]:
manual = {"Wards 1-3,8-10": "Wards 1,2,3,8,9,10",
"Wards 2,8,11": "Wards 2,8,11",
"Wards 3-4,6": "Wards 3,4,6",
"Wards 9-10,18": "Wards 9,10,18",
"Wards 1-3,6": "Wards 1,2,3,6",
"Wards 7-8,10-12": "Wards 7,8,10,11,12",
"Wards 5-6,10": "Wards 5,6,10",
"Wards 7-9,14": "Wards 7,8,9,14",
"Ward 3A": "Ward 3A",
"Ward 7A": "Ward 7A",
"Wards 1-4,6-7": "Wards 1,2,3,4,6,7",
"Wards 1,9-10": "Wards 1,9,10",
"Wards 2-3,5": "Wards 2,3,5",
"Wards 4,6-8": "Wards 4,6,7,8",
"Wards 1-3,6": "Wards 1,2,3,6",
"Wards 15,18-19": "Wards 15,18,19",
"Wards 1-4,9": "Wards 1,2,3,4,9",
"Wards 3-4,12": "Wards 3,4,12",
"Wards 1-2,4-5,7": "Wards 1,2,4,5,7",
"Ward 1,3,5": "Ward 1,3,5",
"Ward 2,4,6": "Ward 2,4,6",
"Ward 1,3-6,15": "Ward 1,3,4,5,6,15",
"Ward 7-10,12": "Ward 7,8,9,10,12",
"Wards 14,16-17": "Wards 14,16,17",
"Wards 1,5-6,11": "Wards 1,5,6,11",
"Wards 2-4,12": "Wards 2,3,4,12",
"Wards 1,3,5": "Wards 1,3,5",
"Wards 7,12-13": "Wards 7,12,13",
"Wards 1-4,7-11": "Wards 1,2,3,4,7,8,9,10,11",
"Wards 1,3-7": "Wards 1,3,4,5,6,7",
"Wards 1-6,22-24,29": "Wards 1,2,3,4,5,6,22,23,24,29",
"Wards 7-10,18-21,25-27,30": "Wards 7,8,9,10,18,19,20,21,25,26,27,30",
"Wards 11-17,28": "Wards 11,12,13,14,15,16,17,28",
"Wards 1-3,7-8": "Wards 1,2,3,7,8",
"Wards 4-6,9-12": "Wards 4,5,6,9,10,11,12",
"Wards 13-16,20-22": "Wards 13,14,15,16,20,21,22",
"Wards 17-19,30-32": "Wards 17,18,19,30,31,32",
"Ward 9B": "Ward 9B",
"Ward 1A": "Ward 1A",
"Ward 2-3,5": "Ward 2,3,5",
"Ward 1-5,10": "Ward 1,2,3,4,5,10",
"Wards 1-3,8": "Wards 1,2,3,8",
"Wards 5-6,9-10": "Wards 5,6,9,10",
"Wards 1-6,8": "Wards 1,2,3,4,5,6,8",
"Wards 3-4,22": "Wards 3,4,22",
"Wards 17-18,21,23-26": "Wards 17,18,21,23,24,25,26",
"Wards 19-20,27": "Wards 19,20,27",
"Wards 12,20-21,24": "Wards 12,20,21,24",
"Wards 1-2,6-7": "Wards 1,2,6,7",
"Wards 1-2,4-5": "Wards 1,2,4,5",
"Wards 1,3,5": "Wards 1,3,5",
"Wards 2,4,6": "Wards 2,4,6",
"Wards 1,3-5": "Wards 1,3,4,5",
"Ward 15A": "Ward 15A",
"Ward 15B": "Ward 15B",
"Ward 22A": "Ward 22A",
"Ward 22B": "Ward 22B",
"Wards 2,8S": "Wards 2,8S",
"Ward 11S": "Ward 11S",
"Wards 1S,3S": "Wards 1S,3S",
"Ward 1-2,5": "Ward 1,2,5",
"Ward 3-4,6": "Ward 2,4,6",
"Wards 13-18,20": "Wards 13,14,15,16,17,18,20",
"Ward 5B": "Ward 5B",
"Wards 12-16,18": "Wards 12,13,14,15,16,18",
"Wards 17,19-20": "Wards 17,19,20",
"Wards 1A-3A": "Wards 1A,2A,3A",
"Wards 1B-3B": "Wards 1B,2B,3B",
"Wards 4-6,9-10": "Wards 4,5,6,9,10",
"Wards 1-3,5,7-8": "Wards 1,2,3,5,7,8",
"Wards 1,4-5,14": "Wards 1,4,5,14",
"Wards 2,6,8,12-13": "Wards 2,6,8,12,13",
"Wards 3,9-11": "Wards 3,9,10,11",
"Wards 5,7B": "Wards 5,7B",
"Wards 6-7A": "Wards 6,7A",
"Ward 5-6,10": "Ward 5,6,10",
"Wards 1-2,5": "Wards 1,2,5",
"Wards 1,6-7": "Wards 1,6,7",
"Ward 2A": "Ward 2A",
"Ward 2B": "Ward 2B",
"Wards 35,40,43": "Wards 35,40,43",
"Wards 36,38,41": "Wards 36,38,41",
"Wards 1-2,4": "Wards 1,2,4",
"Ward 3S": "Ward 3S",
"Wards 9-10,12-13": "Wards 9,10,12,13",
"Wards 11,14-15,17": "Wards 11,14,15,17",
"Wards 16,18-19": "Wards 16,18,19",
"Wards 1,17,20": "Wards 1,17,20",
"Wards 2-4,11": "Wards 2,3,4,11",
"Wards 9,13-14": "Wards 9,13,14",
"Wards 10,12,15-16": "Wards 10,12,15,16",
"Wards 19,21-22": "Wards 19,21,22",
"Wards 1-4,6": "Wards 1,2,3,4,6",
"Wards 1-3,13": "Wards 1,2,3,13",
"Wards 4,6,14": "Wards 4,6,14",
"Wards 5,7-9": "Wards 5,7,8,9",
"Wards 1-2,9": "Wards 1,2,9",
"Wards 1-4,15": "Wards 1,2,3,4,15",
"Wards 1-2,7-9,11-14": "Wards 1,2,7,8,9,11,12,13,14",
"Wards 3-6,10": "Wards 3,4,5,6,10",
"Wards 1-4,6": "Wards 1,2,3,4,6",
"Wards 6,9-10,15-17,20,23-25,28": "Wards 6,9,10,15,16,17,20,23,24,25,28",
"Wards 11-14,21-22,26-27": "Wards 11,12,13,14,21,22,26,27",
"Wards 9-10,32": "Wards 9,10,32",
"Wards 11-14,28": "Wards 11,12,13,14,28",
"Wards 23-24,26": "Wards 23,24,26",
"Wards 1-2,8": "Wards 1,2,8",
"Wards 1,8,10-11": "Wards 1,8,10,11",
"Wards 2,5-7": "Wards 2,5,6,7",
"Wards 3-4,9,16-17": "Wards 3,4,9,16,17",
"Wards 1-5,7": "Wards 1,2,3,4,5,7",
"Wards 1-3,13": "Wards 1,2,3,13",
"Wards 1,3-4,10": "Wards 1,3,4,10",
"Wards 2,5-8": "Wards 2,5,6,7,8",
"Wards 1-2,5-6": "Wards 1,2,5,6",
"Wards 1,3,5,9": "Wards 1,3,5,9",
"Wards 2,4,10": "Wards 2,4,10",
"Ward 1-3,7-9": "Ward 1,2,3,7,8,9",
"Ward 4-6,10-11": "Ward 4,5,6,10,11",
"Wards 1,7-8": "Wards 1,7,8",
"Wards 2-3,9-11": "Wards 2,3,9,10,11",
"Wards 1-2,4": "Wards 1,2,4",
"Wards 3,6-7": "Wards 3,6,7",
"Wards 5,8-9": "Wards 5,8,9",
"Ward 1,8-11": "Ward 1,8,9,10,11",
"Wards 7,9-11": "Wards 7,9,10,11",
"Wards 3-4,8": "Wards 3,4,8",
"Wards 1A-2,4,7": "Wards 1A,2,4,7",
"Ward 1B": "Ward 1B",
"Wards 3,14-15,30": "Wards 3,14,15,30",
"Ward 5B": "Ward 5B",
"Wards 5A-6,8-9,23-29,31-35,38": "Wards 5A,6,8,9,22,23,24,25,26,27,28,29,31,32,33,34,35,38",
"Wards 10-13,21-22,36-37": "Wards 10,11,12,13,21,22,36,37",
"Ward 22B": "Ward 22B",
"Ward 22A": "Ward 22A",
"Ward 23A": "Ward 23A",
"Ward 23B": "Ward 23B",
"Ward 25A": "Ward 25A",
"Ward 25B": "Ward 25B",
"Ward 28B": "Ward 28B",
"Ward 28A": "Ward 28A",
"Ward 29A": "Ward 29A",
"Ward 29B": "Ward 29B",
"Wards 1-2,7-10": "Wards 1,2,7,8,9,10",
"Wards 1A,2-5": "Wards 1A,2,3,4,5",
"Ward 1B": "Ward 1B",
"Wards 1A-2": "Wards 1A,2",
"Ward 1B": "Ward 1B",
"Wards 1-2A": "Wards 1,2A",
"Wards 2B,2C": "Wards 2B,2C",
"Wards 1-2,4,7": "Wards 1,2,4,7",
"Wards 3,5-6": "Wards 3,5,6",
"Wards 6,17,25-26": "Wards 6,17,25,26",
"Wards 7,16,27": "Wards 7,16,27",
"Wards 8,19,22-23": "Wards 8,19,22,23",
"Wards 6-15,24,26-29": "Wards 6,7,8,9,10,11,12,13,14,15,24,26,27,28,29",
"Wards 16-23,25": "Wards 16,17,18,19,20,21,22,23,25"}

Now, we can just blast over the Data Frame and create a new column, merging or expanding where necessary.

In [13]:
def lookup(row):
    if row['type'] == 'hypen':
        search = re.search('(\d+)(?:\s*)-(?:\s*)(\d+$)', row['data'])
        if(search):
            return "Wards %s" % (",".join([str(x) for x in range(int(search.group(1)), int(search.group(2))+1)]))
    elif row['type'] == 'comma':
        return row['ReportingUnit']
    elif row['type'] == 'single':
        return row['ReportingUnit']
    elif row['type'] == 'amp':
        raise Exception()
    elif row['type'] == 'unmatched':
        return manual[row['ReportingUnit']]
              
processed['mapped'] = processed.apply(lookup, axis=1)

In [14]:
processed.head()

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type,mapped
0,Adams County,CITY OF ADAMS,Ward 1-4,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-4,hypen,"Wards 1,2,3,4"
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9",Congressional - District 3,Assembly - District 41,State Senate - District 14,59,comma,"Wards 5,9"
2,Adams County,TOWN OF ADAMS,Wards 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-3,hypen,"Wards 1,2,3"
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,Congressional - District 3,Assembly - District 72,State Senate - District 24,1-2,hypen,"Wards 1,2"
4,Adams County,TOWN OF COLBURN,Ward 1,Congressional - District 3,Assembly - District 72,State Senate - District 24,1,single,Ward 1


In [15]:
processed[processed['ReportingUnit'] == 'Wards 1A-3A']

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type,mapped
2208,Monroe County,TOWN OF LA GRANGE,Wards 1A-3A,Congressional - District 7,Assembly - District 70,State Senate - District 24,Wards 1A-3A,unmatched,"Wards 1A,2A,3A"


Now, let's add in some Geo Data. The [LTSB Open Data Portal](http://data.ltsb.opendata.arcgis.com/) is based on ArcGIS and allows you to download the ward data as a CSV, a Shapefile, or as a KML file. To keep it simple, let's download the CSV version of the 2016 Fall wards.

In [16]:
# also at http://data.ltsb.opendata.arcgis.com/datasets/6497103b939d41268a48905631f84de5_0.csv
ltsbwards = pd.read_csv("http://data.ltsb.opendata.arcgis.com/datasets/6497103b939d41268a48905631f84de5_0.csv")
ltsbwards.head()

Unnamed: 0,OBJECTID,GEOID,CNTY_FIPS,COUSUBFP,WARDID,WARD_FIPS,SUPERID,SUPER_FIPS,ALDERID,ALDER_FIPS,CNTY_NAME,MCD_NAME,MCD_FIPS,CONTACT,DATE_SUB,CTV,NOTES,SCHOOLID
0,1,55001002750001,55001,275,1,55001002750001,11,5500111,1.0,550010027501.0,ADAMS,Adams,5500100275,afaust@ncwrpc.org,7/6/2016 1:02:14 PM,C,,
1,2,55001002750002,55001,275,2,55001002750002,12,5500112,2.0,550010027502.0,ADAMS,Adams,5500100275,afaust@ncwrpc.org,7/6/2016 1:02:14 PM,C,,
2,3,55001002750003,55001,275,3,55001002750003,12,5500112,3.0,550010027503.0,ADAMS,Adams,5500100275,afaust@ncwrpc.org,7/6/2016 1:02:14 PM,C,,
3,4,55001002750004,55001,275,4,55001002750004,11,5500111,4.0,550010027504.0,ADAMS,Adams,5500100275,afaust@ncwrpc.org,7/6/2016 1:02:14 PM,C,,
4,5,55001003000001,55001,300,1,55001003000001,8,5500108,,,ADAMS,ADAMS,5500100300,afaust@ncwrpc.org,7/6/2016 1:02:14 PM,T,,


Now we need to join the LTSB Shapefile data with the GAB/OpenElex results data, to bring GeoIDs to the reporting units. The join is on (County, City) - two counties might both have the same named city (or a city might be in two counties!) so we need to join on the pair. Alas, the county names are not identitical in both data sets, and there are quirks in the city names, so we need to build up some join keys that match on both datasets. 

We'll first fix up the "MCD" (a 'minor civil division, in Census terms - but what we'd normally call a City, Village, or Town) and then clean up the County names so it matches how the GAB/CFIS data names its Counties.

We'll also bring the FIPS codes for the County+MCD along with us - we'll use Ward numbers later to build out the full GEOID

In [17]:
def ltsb_reporting_mcd(row):
    ctv = ""
    #print(row['GEOID'])
    if row['CTV'] == 'C':
        ctv = 'CITY OF '
    elif row['CTV'] == 'T':
        ctv = 'TOWN OF '
    elif row['CTV'] == 'V':
        ctv = 'VILLAGE OF '
    else:
        print("WHAT? + %s" (row['OBJECTID']))
    #print(ctv)
    #print(row['MCD_NAME'])
    return("%s%s") %(ctv, row['MCD_NAME'].upper())

ltsbwards['JoinMcd'] = ltsbwards.apply(ltsb_reporting_mcd, axis=1)

In [18]:
def partial_fips(row):
    return("%s%s" % (row['CNTY_FIPS'], '{0:05d}'.format(int(row['COUSUBFP']))))

ltsbwards['PARTIALFIPS'] = ltsbwards.apply(partial_fips, axis=1)

In [19]:
ltsb = ltsbwards[['CNTY_NAME', 'JoinMcd', 'PARTIALFIPS']].drop_duplicates()

In [20]:
ltsb['JoinCounty'] = ltsb['CNTY_NAME'].astype(str) + " COUNTY"

Now that we've finished processing the LTSB Ward data, we need to fix up GAB Reporting Units data for our two join columns. In this case, it's mostly converintg it to upper case and taking out spaces. 

In [21]:
def mapgabreportingcounty(cnty):
    if cnty == "LaCrosse County":
        return "LA_CROSSE COUNTY"
    if cnty == "St. Croix County":
        return "ST_CROIX COUNTY"
    tokens = cnty.split()
    cntyname = "_".join(tokens[:-1])
    return "%s %s" % (cntyname.upper(), "COUNTY")

processed['JoinCounty'] = processed['County'].map(mapgabreportingcounty) 

In [22]:
def mapgabmcd(mcd):
    if mcd == "TOWN OF GRAND VIEW":
        return 'TOWN OF GRANDVIEW'
    elif mcd == "VILLAGE OF Windsor":
        return 'VILLAGE OF WINDSOR'
    elif mcd == 'VILLAGE OF FONTANA':
        return 'VILLAGE OF FONTANA-ON-GENEVA LAKE'
    elif mcd == 'TOWN OF SAINT LAWRENCE':
        return 'TOWN OF ST. LAWRENCE'
    elif mcd == 'TOWN OF LAND O-LAKES':
        return 'TOWN OF LAND O\'LAKES'
    elif mcd == 'VILLAGE OF LAVALLE':
        return 'VILLAGE OF LA VALLE'
    elif mcd == 'VILLAGE OF Maine':
        return 'VILLAGE OF MAINE'
    elif mcd == 'VILLAGE OF MT. STERLING':
        return 'VILLAGE OF MOUNT STERLING'
    else:
        return mcd
    
processed['JoinMcd'] = processed['Muni'].map(mapgabmcd)

In [23]:
processed.head()

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type,mapped,JoinCounty,JoinMcd
0,Adams County,CITY OF ADAMS,Ward 1-4,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-4,hypen,"Wards 1,2,3,4",ADAMS COUNTY,CITY OF ADAMS
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9",Congressional - District 3,Assembly - District 41,State Senate - District 14,59,comma,"Wards 5,9",ADAMS COUNTY,CITY OF WISCONSIN DELLS
2,Adams County,TOWN OF ADAMS,Wards 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-3,hypen,"Wards 1,2,3",ADAMS COUNTY,TOWN OF ADAMS
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,Congressional - District 3,Assembly - District 72,State Senate - District 24,1-2,hypen,"Wards 1,2",ADAMS COUNTY,TOWN OF BIG FLATS
4,Adams County,TOWN OF COLBURN,Ward 1,Congressional - District 3,Assembly - District 72,State Senate - District 24,1,single,Ward 1,ADAMS COUNTY,TOWN OF COLBURN


In [24]:
ltsb.head()

Unnamed: 0,CNTY_NAME,JoinMcd,PARTIALFIPS,JoinCounty
0,ADAMS,CITY OF ADAMS,5500100275,ADAMS COUNTY
4,ADAMS,TOWN OF ADAMS,5500100300,ADAMS COUNTY
7,ADAMS,TOWN OF BIG FLATS,5500107300,ADAMS COUNTY
9,ADAMS,TOWN OF COLBURN,5500116075,ADAMS COUNTY
10,ADAMS,TOWN OF DELL PRAIRIE,5500119575,ADAMS COUNTY


In [25]:
joined = pd.merge(processed, ltsb, how='left', right_on=['JoinMcd', 'JoinCounty'], left_on=['JoinMcd', 'JoinCounty'])

In [26]:
joined.head()

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type,mapped,JoinCounty,JoinMcd,CNTY_NAME,PARTIALFIPS
0,Adams County,CITY OF ADAMS,Ward 1-4,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-4,hypen,"Wards 1,2,3,4",ADAMS COUNTY,CITY OF ADAMS,ADAMS,5500100275
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9",Congressional - District 3,Assembly - District 41,State Senate - District 14,59,comma,"Wards 5,9",ADAMS COUNTY,CITY OF WISCONSIN DELLS,ADAMS,5500188150
2,Adams County,TOWN OF ADAMS,Wards 1-3,Congressional - District 3,Assembly - District 41,State Senate - District 14,1-3,hypen,"Wards 1,2,3",ADAMS COUNTY,TOWN OF ADAMS,ADAMS,5500100300
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,Congressional - District 3,Assembly - District 72,State Senate - District 24,1-2,hypen,"Wards 1,2",ADAMS COUNTY,TOWN OF BIG FLATS,ADAMS,5500107300
4,Adams County,TOWN OF COLBURN,Ward 1,Congressional - District 3,Assembly - District 72,State Senate - District 24,1,single,Ward 1,ADAMS COUNTY,TOWN OF COLBURN,ADAMS,5500116075


In [27]:
joined[joined['CNTY_NAME'].isnull()]

Unnamed: 0,County,Muni,ReportingUnit,CongressionalDistrict,AssemblyDistrict,SenateDistrict,data,type,mapped,JoinCounty,JoinMcd,CNTY_NAME,PARTIALFIPS


In [28]:
finished = joined[['County', 'Muni', 'ReportingUnit', 'mapped', 'PARTIALFIPS']]

In [29]:
def expandfips(row):
    (junk, wardsstr) = row['mapped'].split()
    wardlist = [x for x in wardsstr.split(',')]
    fipslist = "|".join(["%s%s" % (row['PARTIALFIPS'], x.rjust(4,'0')) for x in wardlist])
    return fipslist
                     
finished['EXPANDEDGEO'] = finished.apply(expandfips, axis=1)
        

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Let's just take a look at a couple of samples, including one with a funky ward ID

In [30]:
finished.head()

Unnamed: 0,County,Muni,ReportingUnit,mapped,PARTIALFIPS,EXPANDEDGEO
0,Adams County,CITY OF ADAMS,Ward 1-4,"Wards 1,2,3,4",5500100275,55001002750001|55001002750002|55001002750003|5...
1,Adams County,CITY OF WISCONSIN DELLS,"Wards 5,9","Wards 5,9",5500188150,55001881500005|55001881500009
2,Adams County,TOWN OF ADAMS,Wards 1-3,"Wards 1,2,3",5500100300,55001003000001|55001003000002|55001003000003
3,Adams County,TOWN OF BIG FLATS,Ward 1-2,"Wards 1,2",5500107300,55001073000001|55001073000002
4,Adams County,TOWN OF COLBURN,Ward 1,Ward 1,5500116075,55001160750001


In [31]:
finished[finished['ReportingUnit']== 'Wards 1A-3A']

Unnamed: 0,County,Muni,ReportingUnit,mapped,PARTIALFIPS,EXPANDEDGEO
2208,Monroe County,TOWN OF LA GRANGE,Wards 1A-3A,"Wards 1A,2A,3A",5508141000,5508141000001A|5508141000002A|5508141000003A


In [32]:
finished.to_csv("2016_WI_Reporting_Units_To_GEOID.csv")