# Presidential Election Notebook
This notebook takes the raw Presidential Elections spreadsheet in elections.csv and the electoral college sheet in electoral_college.csv and converts them into a number of tables suitable for the presidential elections dashboard.  This involves:
1. Splitting the combined field \<candidateName> - \<party> into two fields, candidate and party
2. Converting years to integers and putting in the missing years (converting '2016', '', '' to 2016, 2016, 2016
3. Collecting the cells of a particular state and year into a structure, with the individual candidates as a list
4. Converting the votes into integers, and then, for each result, adding a percentage float
5. generating the individual records (state, year, candidate, party, votes, percentage) as a list
6. Creating the data table as a Galyleo Table and sending it to the dashboard.  This will form the basis of the Candidate Votes by State and Year and Party Percent by State and Year Charts
7. Selecting the rows of this table for Nationwide results.  This will be the basis of the pie chart for national share of the vote.  Send this to the dashboard.
8. Forming a pivot table of percentage of the vote for each party, by state and year.  This will form the basis of the Vote history line chart.  Send this to the dashboard
9. Converting the pivot table to a margin table, which will form the basis of the colored map.  Send this to the dashboard.
10. Finally, reading in the electoral college results from the CSV file and turning this into a set of records (Year, EV-Democrat, EV-Republican, EV-Other).  Send this to the dashboard.
All in, we compute five tables to send to the dashboard.  These will be filtered using widgets on the dashboard to form six graphs, which respond to the filters to show results for a specific state and year.






Step 0: read in the table from the CSV file.  After this, the data will be in the variable rows.

In [1]:
import csv
f = open('elections.csv', 'r')
election_reader = csv.reader(f)
rows = [row for row in election_reader]
f.close()

In the raw data, the candidate field is <name> - <party>.  Parse into pairs a dictionary {"name": <name>, "total": <total>}.  If the string is "Total", then the party and candidate are both "Total".  If there is no dash, there is no name and party is "Other".

In [2]:
def clean_candidate(raw):
    if (raw == 'Total'): return {"Name": "Total", "Party": "Total"}
    else: 
        parsed = raw.split(' - ')
        return {"Name": parsed[0], "Party": parsed[1]} if len(parsed) == 2 else {"Name": "", "Party": "Other"}
candidates = [clean_candidate(entry) for entry in rows[1][1:]]

Parties have gone by various aliases throughout the years; moreover, our dataset goes back to 1828, but the Republican party wasn't formed until 1854 from the Whig Party, which was itself a descendant of the Federalists.  As a result, we consolidate parties using this function, and then make sure that every candidate's party is canonized.  

In [3]:
def canonical_name(party):
    party_aliases = {'National Republican': "Republican", 'National Union (Republican)': "Republican", 'Whig': "Republican",
                     'Liberal Republican/Democratic': "Democratic", '(Northern) Democratic': "Democratic", 
                    'Progressive "Bull Moose"': 'Progressive'}
    return party_aliases[party] if party in party_aliases else party
    
for candidate in candidates: candidate["Party"] = canonical_name(candidate["Party"])

The years are blank except for the first column in every group, leading to the following messy bit of code to assign a year to every record

In [4]:
class YearCanonizer:
    def __init__(self, years):
        self.years = [self.canonize_year(year)for year in years]
    def canonize_year(self, year):
        if (year != ""): self.prev_year = int(year)
        return self.prev_year
canonizer = YearCanonizer(rows[0][1:])
for i in range(len(candidates)): candidates[i]["Year"] = canonizer.years[i]

Code which, from the row for a state, and the records {"Name", "Party", "Year"} computes {{"Name", "Party", "Year", "State", "Votes"}, using the fact that the votes are in the same order as the candidates

In [5]:
# First, convert a string which may be blank or contain commas to a number
def compute_int_from_delimited_string(string):
    string = string.strip()
    string = string.replace(',', '')
    return int(string) if len(string) > 0 else 0

def compute_state_record(state_row):
    state_name = state_row[0]
    votes = state_row[1:]
    result = [candidate.copy() for candidate in candidates]
    for candidate in result: candidate["State"] = state_name
    for i in range(len(result)): result[i]["Votes"] = compute_int_from_delimited_string(votes[i])
    return result
state_records = [compute_state_record(row) for row in rows[2:]]
state_list = []
for record in state_records: state_list.extend(record)

Trim the records with 0 votes and then get the totals for each state and year

In [6]:
state_list = [record for record in state_list if record["Votes"] > 0]
total_records = [record for record in state_list if record["Name"]  == "Total"]
party_records = [record for record in state_list if record["Name"]  != "Total"]
totals = {}
for record in total_records: totals[(record["State"], record["Year"])] = record["Votes"]

Compute the percentages

In [7]:
for record in party_records: record["Percentage"] = 100* record["Votes"]/totals[(record["State"], record["Year"])]

Create first table to the dashboard

In [8]:
from galyleo.galyleo_table import GalyleoTable
from galyleo.galyleo_constants import GALYLEO_STRING, GALYLEO_NUMBER
table = GalyleoTable("presidential_vote")
schema = [("Year", GALYLEO_NUMBER), ("State", GALYLEO_STRING), ("Name", GALYLEO_STRING), ("Party", GALYLEO_STRING), ("Votes", GALYLEO_NUMBER), ("Percentage", GALYLEO_NUMBER)]
data = [[record["Year"], record["State"], record["Name"], record["Party"], record["Votes"], record["Percentage"]] for record in party_records]
table.load_from_schema_and_data(schema, data)

Send the first table to the dashboard

In [9]:
from galyleo.galyleo_jupyterlab_client import GalyleoClient
client = GalyleoClient()
client.send_data_to_dashboard(table)

A filtered table, nationwide vote only -- this will drive a pie chart with the national percentage of the vote.

In [35]:
nationwide_records = [[record[0], record[3], record[5]] for record in data if record[1] == "Nationwide"]
nationwide_schema = [("Year", GALYLEO_NUMBER), ("Party", GALYLEO_STRING), ("Percentage", GALYLEO_NUMBER)]
table = GalyleoTable("nationwide_vote")
table.load_from_schema_and_data(nationwide_schema, nationwide_records)
client.send_data_to_dashboard(table)

Time to form the pivot and margin tables, which we will use for the map and the history graph.  One note is that there have been a _lot_ of parties in American history; the Cook database shows 26, and even after we have removed 7 as aliases, above, this leaves 19.  This makes for a busy history chart.  So what we will do here is choose a party list, and everything else becomes "Other".  The party list is a matter of taste; it will of course include Republican and Democrat, but the remainder are personal preference.  I'm using "Progressive", "Socialist", and "Reform", since they showed well in 2 or more elections and/or captured 20% of the national vote in one

In [36]:
parties = ['Democratic', 'Republican', 'Progressive', 'Socialist', 'Reform']


Pivot on percentage to break out by party.  The idea here is to create records of the form {State, Year, P1,...,Pn} where each Pi is the name of a party and the value is the percentage of the vote

In [37]:
# A function which creates an empty pivot record for state and year
def pivot_record(state, year):
    result = {"State": state, "Year": year}
    for party in parties: result[party] = 0 
    result["Other"] = 0
    return result
# Compute the pivot table.  For each record in stripped_list, add the vote to the entry for state and year.  If
# none exists, create the record first
pivot_table = {}
party_set = set(parties)
for record in party_records:
    if not ((record["State"], record["Year"]) in pivot_table):
        pivot_table[(record["State"], record["Year"])] = pivot_record(record["State"], record["Year"])
    if record["Party"] in party_set:
        pivot_table[(record["State"], record["Year"])][record["Party"]] = record["Percentage"]
    else:
        pivot_table[(record["State"], record["Year"])]["Other"] = max(record["Percentage"], pivot_table[(record["State"], record["Year"])]["Other"])

The pivot table is now complete.  Just prepare the table and send it to the dashboard.  Add "Other" to the list of parties, then form the Schema ("State" is a string, everything else is a number), create the table, load it with the data, and send to a dashboard.

In [38]:
parties.append("Other")
pivot_schema = [("State", GALYLEO_STRING), ("Year", GALYLEO_NUMBER)] + [(party, GALYLEO_NUMBER) for party in parties]
pivot_data = [[record["State"], record["Year"]] + [record[party] for party in parties] for record in pivot_table.values()]
galyleo_pivot_table = GalyleoTable("presidential_vote_history")
galyleo_pivot_table.load_from_schema_and_data(pivot_schema, pivot_data)
client.send_data_to_dashboard(galyleo_pivot_table)


Prepare the margin table.  This is going to drive a red/blue/green map, where a Democratic victory is going to be on the scale 5-10, Republican on the scale -5 to -10, and "Other" will be 0.  We only need three parties for this one, Democratic, Republican, and Other, so we consolidate the margin table down to a list of length 3 

In [40]:
from functools import reduce
def consolidate(pivot_record):
    other = pivot_record[parties[2]]
    for party in parties[3:]: other = max(other, pivot_record[party])
    result = {"Other": other}
    for field in ["State", "Year", "Democratic", "Republican"]: result[field] = pivot_record[field]
    return result
margin_party_records = [consolidate(pivot_record) for pivot_record in pivot_table.values()]

Now we have the margin records, and we want to convert them into 5-10 (state lightly to heavily Democratic) (-10 - -5) (state heavily to lightly Republican), and 0 (other). We'll just use a linear scale, capped at 10, so 0-2% is light, 2-4% is next, and 10% or above is heavy.  This is adjustable.

In [41]:
def compute_margin(margin_party_record):
    if (margin_party_record["Other"] > max(margin_party_record["Republican"], margin_party_record["Democratic"])): return 0
    raw_margin = round((margin_party_record["Democratic"] - margin_party_record["Republican"])/2)
    if (raw_margin < 0):
        return max(raw_margin, -5) - 5
    else:
        return min(raw_margin, 5) + 5
margins = [[record["State"], record["Year"], compute_margin(record)] for record in margin_party_records]

In [42]:
margin_table = GalyleoTable('presidential_margins')
margin_table.load_from_schema_and_data([('State', GALYLEO_STRING), ('Year', GALYLEO_NUMBER), ('Margin', GALYLEO_NUMBER)], margins)
client.send_data_to_dashboard(margin_table)

Finally, get the electoral college.  This is very simple.  Just create one record per year, with three fields: Republican, Democratic, Other

In [43]:
ec_aliases = {'Republican': {'National Republican', 'Whig', 'Republican'}, 'Democratic': {'Democratic/Liberal Republican', 'Independent-Democratic', 'Democratic', 'Democrat'}}
f = open('electoral_college.csv', 'r')
ec_reader = csv.reader(f)
rows = [row for row in ec_reader]
f.close()
class EC_Record:
    def __init__(self, year):
        self.year = int(year)
        self.republican = 0
        self.democratic = 0
        self.other = 0
    
    def add_record(self, record):
        value = int(record[3])
        if (record[2] in ec_aliases['Republican']): self.republican = self.republican + value
        elif (record[2] in ec_aliases['Democratic']): self.democratic = self.democratic + value
        else: self.other = self.other + value
    
    def as_list(self):
        return [self.year, self.democratic, self.republican, self.other]
    
    def __repr__(self):
        l1 = self.as_list()
        return ', '.join([str(elt) for elt in l1])
    
ec_records = {}
for row in rows[1:]:
    year = row[0]
    if year not in ec_records: ec_records[year] = EC_Record(year)
    ec_records[year].add_record(row)


Load the Electoral College records into a table and send it to the dashboard

In [44]:
schema = [("Year", GALYLEO_NUMBER), ("Democratic", GALYLEO_NUMBER), ("Republican", GALYLEO_NUMBER), ("Other", GALYLEO_NUMBER)]
ec_table = GalyleoTable("electoral_college")
ec_table.load_from_schema_and_data(schema, [record.as_list() for record in ec_records.values()])
client.send_data_to_dashboard(ec_table)