# Presidential Election Notebook
This notebook takes the raw Presidential Elections spreadsheet in elections.csv and the electoral college sheet in electoral_college.csv and converts them into a number of tables suitable for the presidential elections dashboard.  This involves:
1. Splitting the combined field \<candidateName> - \<party> into two fields, candidate and party
2. Converting years to integers and putting in the missing years (converting '2016', '', '' to 2016, 2016, 2016
3. Collecting the cells of a particular state and year into a structure, with the individual candidates as a list
4. Converting the votes into integers, and then, for each result, adding a percentage float
5. generating the individual records (state, year, candidate, party, votes, percentage) as a list
6. Creating the data table as a Galyleo Table and sending it to the dashboard





Step 0: read in the table from the CSV file.  After this, the data will be in the variable rows.

In [1]:
import csv
f = open('elections.csv', 'r')
election_reader = csv.reader(f)
rows = [row for row in election_reader]
f.close()

In the raw data, the candidate field is <name> - <party>.  Parse into pairs a dictionary {"name": <name>, "total": <total>}.  If the string is "Total", then the party and candidate are both "Total".  If there is no dash, there is no name and party is "Other".

In [2]:
def clean_candidate(raw):
    if (raw == 'Total'): return {"Name": "Total", "Party": "Total"}
    else: 
        parsed = raw.split(' - ')
        return {"Name": parsed[0], "Party": parsed[1]} if len(parsed) == 2 else {"Name": "", "Party": "Other"}
candidates = [clean_candidate(entry) for entry in rows[1][1:]]

Parties have gone by various aliases throughout the years; moreover, our dataset goes back to 1828, but the Republican party wasn't formed until 1854 from the Whig Party, which was itself a descendant of the Federalists.  As a result, we consolidate parties using this function, and then make sure that every candidate's party is canonized.  

In [3]:
def canonical_name(party):
    party_aliases = {'National Republican': "Republican", 'National Union (Republican)': "Republican", 'Whig': "Republican",
                     'Liberal Republican/Democratic': "Democratic", '(Northern) Democratic': "Democratic", 
                    'Progressive "Bull Moose"': 'Progressive'}
    return party_aliases[party] if party in party_aliases else party
    
for candidate in candidates: candidate["Party"] = canonical_name(candidate["Party"])

The years are blank except for the first column in every group, leading to the following messy bit of code to assign a year to every record

In [4]:
class YearCanonizer:
    def __init__(self, years):
        self.years = [self.canonize_year(year)for year in years]
    def canonize_year(self, year):
        if (year != ""): self.prev_year = int(year)
        return self.prev_year
canonizer = YearCanonizer(rows[0][1:])
for i in range(len(candidates)): candidates[i]["Year"] = canonizer.years[i]

Code which, from the row for a state, and the records {"Name", "Party", "Year"} computes {{"Name", "Party", "Year", "State", "Votes"}, using the fact that the votes are in the same order as the candidates

In [5]:
# First, convert a string which may be blank or contain commas to a number
def compute_int_from_delimited_string(string):
    string = string.strip()
    string = string.replace(',', '')
    return int(string) if len(string) > 0 else 0

def compute_state_record(state_row):
    state_name = state_row[0]
    votes = state_row[1:]
    result = [candidate.copy() for candidate in candidates]
    for candidate in result: candidate["State"] = state_name
    for i in range(len(result)): result[i]["Votes"] = compute_int_from_delimited_string(votes[i])
    return result
state_records = [compute_state_record(row) for row in rows[2:]]
state_list = []
for record in state_records: state_list.extend(record)

Trim the records with 0 votes and then get the totals for each state and year

In [6]:
state_list = [record for record in state_list if record["Votes"] > 0]
total_records = [record for record in state_list if record["Name"]  == "Total"]
party_records = [record for record in state_list if record["Name"]  != "Total"]
totals = {}
for record in total_records: totals[(record["State"], record["Year"])] = record["Votes"]

Compute the percentages

In [7]:
for record in party_records: record["Percentage"] = 100* record["Votes"]/totals[(record["State"], record["Year"])]

Create first table to the dashboard

In [8]:
from galyleo.galyleo_table import GalyleoTable
from galyleo.galyleo_constants import GALYLEO_STRING, GALYLEO_NUMBER
table = GalyleoTable("presidential_vote")
schema = [("Year", GALYLEO_NUMBER), ("State", GALYLEO_STRING), ("Name", GALYLEO_STRING), ("Party", GALYLEO_STRING), ("Votes", GALYLEO_NUMBER), ("Percentage", GALYLEO_NUMBER)]
data = [[record["Year"], record["State"], record["Name"], record["Party"], record["Votes"], record["Percentage"]] for record in party_records]
table.load_from_schema_and_data(schema, data)

Send the first table to the dashboard

In [11]:
from galyleo.galyleo_jupyterlab_client import GalyleoClient
client = GalyleoClient()
client.send_data_to_dashboard(table)

In [10]:
client.send_data_to_dashboard(table)

Strip out the trivial records, those with < 10% of the vote

In [21]:
stripped_list = [record for record in party_records if record["Percentage"] >= 10]

Pivot on percentage to break out by party.  The idea here is to create records of the form {State, Year, P1,...,Pn} where each Pi is the name of a party and the value is the percentage of the vote

In [29]:
parties = list(set([record["Party"] for record in stripped_list]))
# A function which creates an empty pivot record for state and year
def pivot_record(state, year):
    result = {"State": state, "Year": year}
    for party in parties: result[party] = 0 
    return result
# Compute the pivot table.  For each record in stripped_list, add the vote to the entry for state and year.  If
# none exists, create the record first
pivot_table = {}
for record in stripped_list:
    if not ((record["State"], record["Year"]) in pivot_table):
        pivot_table[(record["State"], record["Year"])] = pivot_record(record["State"], record["Year"])
    pivot_table[(record["State"], record["Year"])][record["Party"]] = record["Percentage"]

In [65]:
from functools import cmp_to_key, reduce
def cmp(party1, party2):
    if (party1 == 'Democratic'): return -1
    if (party2 == 'Democratic'): return 1
    if (party1 == 'Republican'): return -1
    if (party2 == 'Republican'): return 1
    return -1 if party1 < party2 else 1
party_order = sorted(parties, key=cmp_to_key(cmp))
pivot_schema = [("State", GALYLEO_STRING), ("Year", GALYLEO_NUMBER)] + [(party, GALYLEO_NUMBER) for party in party_order]
pivot_data = [[record["State"], record["Year"]] + [record[party] for party in party_order] for record in pivot_table.values()]
galyleo_pivot_table = GalyleoTable("presidential_vote_history")
galyleo_pivot_table.load_from_schema_and_data(pivot_schema, pivot_data)
client.send_data_to_dashboard(galyleo_pivot_table)


In [63]:
def max_index(array):
    result = 0
    top = array[0]
    for i in range(len(array)):
        if (array[i] > top):
            top = array[i]
            result = i
    return result

def margin(record_array):
    maxIndex = max_index(record_array)
    if (maxIndex == 1):
        # Republican, score between -5 and -10
        raw = (record_array[0] - record_array[1])/2 - 5
        return min(-5, max(-10, round(raw)))
    elif (maxIndex == 0):
        raw = (record_array[0] - record_array[1])/2 + 5
        return max(5, min(10, round(raw)))
    else:
        return 0

def margin_record(record):
    pct_margin = margin(record[2:])
    return [record[0], record[1], pct_margin]

margins = [margin_record(record) for record in pivot_data]
    

In [64]:
margin_table = GalyleoTable('presidential_margins')
margin_table.load_from_schema_and_data([('State', GALYLEO_STRING), ('Year', GALYLEO_NUMBER), ('Margin', GALYLEO_NUMBER)], margins)
client.send_data_to_dashboard(margin_table)