This notebook takes the raw Presidential Elections spreadsheet in elections.csv and converts it into a GoogleDataTable, output as the file presidential_data_google.json.  This involves:
1. Splitting the combined field \<candidateName> - \<party> into two fields, candidate and party
2. Converting years to integers and putting in the missing years (converting '2016', '', '' to 2016, 2016, 2016
3. Collecting the cells of a particular state and year into a structure, with the individual candidates as a list
4. Converting the votes into integers, and then, for each result, adding a percentage float
5. generating the individual records (state, year, candidate, party, votes, percentage) as a list
6. adding the description
7. creating the data table
8. Writing this out as a JSON file

Step 1: read in the CSV.

In [40]:
import csv
f = open('elections.csv', 'r')
election_reader = csv.reader(f)
rows = [row for row in election_reader]
f.close()

In the raw file, candidates and parties are in the same field, split by a dash.  Separate into two fields

In [41]:
candidate_row = rows[1]
candidate_fields = [field.split(' - ') for field in candidate_row]
candidates = [field[0] for field in candidate_fields]
parties = [field[1] if len(field) == 2 else field[0] for field in candidate_fields]


There are missing years in the data -- the year is only present in the first cell of a year.  Fill in the rest.  Also, convert each actual year to an integer.

In [42]:
years = rows[0]
last_year = years[0]
for i in range(1, len(years)):
    if (years[i] == ''):
        years[i] = last_year
    else:
        last_year = years[i]
years = ['Years'] + [int(year) for year in years[1:]]


A record for each state and year.  We are going to (1) create the state and year; (2) add the votes for each candidate to the record, putting total in total,
and then (3) create the percentage once everything has been read.  Note that we're going to trim records with zero total before creating the percentages.  As a side effect, when adding percentages we also add state and year to the record, because this is what we'll want in the row.

In [43]:
class StateAndYear:
    def __init__(self, state, year):
        self.year = int(year)
        self.state = state
        self.candidates = []
        self.total = 0
        
    def add_candidate(self, candidate, party, votes):
        if (candidate == 'Total'):
            self.total = votes
        else:
            self.candidates.append([candidate, party, votes])
    
    def add_percentages(self):
        self.candidates = [[self.state, self.year, cand[0], cand[1], cand[2], round(1.0 * cand[2]/self.total, 3)] for cand in self.candidates]

Create the state and year records, sticking them in a dictionary indexed by state and year.  Note that votes are converted to int before being added to the record.

In [44]:
state_and_year_dictionary = {}
year_set  = set(years[1:])
for row in rows[2:]:
    state = row[0]
    for year in year_set:
        state_and_year_dictionary[(state, year)] = StateAndYear(state, year)
    for index in range(2, len(row)):
        try:
            votes = int(row[index])
            year = years[index]
            candidate = candidates[index]
            party = parties[index]
            state_and_year_dictionary[(state, year)].add_candidate(candidate, party, votes)
        except ValueError:
            pass

Trim the records with 0 total, add the percentages, collect and sort the records, and add a header

In [45]:
record_list = [record for record in state_and_year_dictionary.values() if record.total > 0]
for record in record_list:
    record.add_percentages()
    
data = []
for record in record_list:
    data = data + record.candidates

In [46]:
import gviz_api
schema = [("State", "string"), ("Year", "number"), ("Candidate", "string"), ("Party", "string"), ("Votes", "number"), ("Pct", "number")]
# At some point try ("Year", "number", "Year", {f: "####"}) to get proper formatting
data_table = gviz_api.DataTable(schema)
data_table.LoadData(data)


In [47]:
import json
google_table = data_table.ToJSon(columns_order=("State", "Year", "Candidate", "Party", "Votes", "Pct"),
                           order_by=( "Year", "State", "Party"))
f = open('presidential_data_google.json', 'w')
f.write('{"name":"presidential_vote", "table": %s}' % google_table)
f.close()