# Chicago City Council Voting Records
### 2 April 2017

As part of the Data for Democracy "Chicago Lobbyists" project, I've scraped historical voting records for each of the city's Alder-persons. 

The outputs of this notebook are at https://data.world/stephen-hoover/chicago-city-council-votes . 

You can find the Chicago Lobbyists project at https://data.world/lilianhj/chicago-lobbyists . 

To learn more about Data for Democracy, go to https://github.com/Data4Democracy/read-this-first .

In [496]:
from collections import Counter
from glob import glob
import os
import string
import subprocess
import sys
import tempfile
import urllib

import bs4
import pandas as pd

from typing import Dict, List, Tuple

In [495]:
WORKING_DIR = os.path.join(os.path.expanduser('~'), 'projects', 'd4d', 'chicago-lobbyists')

# Download Data

Chicago city council voting records exist in the form of pdfs on the city clerk's website. The votes all seem to be in pdfs titled "Attendence and Divided Roll Call Vote", so download only those pdfs. The website is paginated (as of April 2017). We can find the number of pages by inspecting the "Go to last page" link.

The table on the city clerk's website includes a meeting date with each download link. Use that to create the file name.

In [497]:
base = 'http://www.chicityclerk.com/'
suffix = 'legislation-records/journals-and-reports/council-meeting-reports'
page_query = '?field_publish_date_value[value]&page={num}'
resp = urllib.request.urlopen(base + suffix + page_query.format(num=0))
soup = bs4.BeautifulSoup(resp.read(), "html5lib")

In [498]:
last_page_link = [l for l in soup.find_all('a') if l.attrs.get('title') == 'Go to last page'][0].attrs['href']
print(last_page_link)
last_page = int(last_page_link.split('=')[-1])
print(last_page)

/legislation-records/journals-and-reports/council-meeting-reports?field_publish_date_value[value]=&page=27
27


In [499]:
download_links = []
for page_num in range(last_page + 1):
    resp = urllib.request.urlopen(base + suffix + page_query.format(num=page_num))
    soup = bs4.BeautifulSoup(resp.read(), "html5lib")
    for link in soup.find_all('a', string='Download'):
        link_info = {'title': list(link.parent.parent.parent.children)[3].text.strip(),
                     'date': list(link.parent.parent.parent.children)[1].text.strip(),
                     'href': link.attrs['href']}
        download_links.append(link_info)
len(download_links)

1097

In [500]:
roll_call_votes = [l for l in download_links if 'roll call' in l['title'].lower()]
roll_call_votes

[{'date': '2017-03-29',
  'href': '/file/7189/download?token=bxMpAQh6',
  'title': 'City Council - Attendance and Divided Roll Call Vote 3-29-2017'},
 {'date': '2017-02-22',
  'href': '/file/7165/download?token=kgouFPr6',
  'title': 'City Council - Attendance and Divided Roll Call Vote 2-22-2017'},
 {'date': '2017-01-25',
  'href': '/file/7139/download?token=WbxqcBhj',
  'title': 'City Council - Attendance and Divided Roll Call Vote 1-25-2017'},
 {'date': '2016-12-14',
  'href': '/file/7070/download?token=gQTk6kbQ',
  'title': 'City Council - Attendance and Divided Roll Call Vote 12-14-2016'},
 {'date': '2016-09-14',
  'href': '/file/6986/download?token=LaaZKUeq',
  'title': 'Attendance and Divided Roll Call Report 09/14/2016'},
 {'date': '2016-06-23',
  'href': '/file/6888/download?token=0xaUDVRg',
  'title': 'Attendance and Divided Roll Call Report 06/22/2016'},
 {'date': '2016-04-13',
  'href': '/file/7150/download?token=5GN2as0G',
  'title': 'Attendance and Divided Roll Call Report

In [501]:
print(len(roll_call_votes))

153


In [502]:
def download_link(doc_link):
    local_name = doc_link['date'].replace('-', '') + '_roll_call_report.pdf'
    path = os.path.join(os.path.expanduser('~'), 'projects', 'd4d', 'chicago-lobbyists', 'roll_calls', local_name)
    url = base + doc_link['href'].lstrip('/')
    try:
        print('Downloading "{}" from {}'.format(doc_link['title'], url))
        return urllib.request.urlretrieve(url, filename=path)
    except:
        print("Failed!")

In [None]:
download_resp = []
for doc_link in roll_call_votes:
    local_name = doc_link['date'].replace('-', '') + '_roll_call_report.pdf'
    path = os.path.join(WORKING_DIR, 'roll_calls', local_name)
    url = base + doc_link['href'].lstrip('/')
    try:
        print("Downloading {}".format(url))
        download_resp.append(urllib.request.urlretrieve(url, filename=path))
    except:
        print("Failed")

In [None]:
download_resp

# Parse Voting Records

There's two formats of voting records. The older format has a single table indexed by ward # and Alderman name, with a matrix of votes. The column names are the record numbers of the measures being considered on that day. Titles of each measure are in the block of text above the voting records. I've called this format the "vote table". Parsing is controlled by the "parse_file_with_table" function.

The newer format has one vote per page. Each page has the record name, title, and a block of warn # and Alderman name with votes in two columns. I've called this a "vote block" format. Parsing is controlled by the "parse_file_with_blocks" function. First I locate each block (each measure considered), then parse that block. Some tables translated cleanly from pdf -- these use the "parse_vote_block_clean" function. Some got a bit scrambled -- those use the "parse_vote_block_dirty" function.

For both formats, I've special-cased a few documents that didn't parse cleanly for one reason or another.

In [503]:
# This cell contains the code for converting vote blocks 
# of the newer format into DataFrames of voting records.

columns = ['Ward', 'Alderman', 'Vote']
VOTE_TOKENS = ['Y', 'N', 'A', 'NV', 'V', 'E']

def is_number(token: str):
    try:
        int(token)
        return True
    except ValueError:
        return False

    
def next_index_with_condition(line: List[str], condition):
    for i_token, token in enumerate(line):
        if condition(token):
            return i_token

def vote_from_token_end(token: str) -> Tuple[str, str]:
    if token[-1] in VOTE_TOKENS:
        return token[-1], token[:-1]
    elif token[-2:] in VOTE_TOKENS:
        return token[-2:], token[:-2]
    else:
        return None, token
    

# Special casing for corner cases and single document errors.
# Use these to clean lines.
SPECIAL_LINE_CLEANING = [('YO’Connor', 'Y O’Connor'),
                         ('V41', 'V 41'),
                        ('Brookins45 Cappleman YY', 'Brookins 45 Cappleman Y Y'),
                        ('PaY  war', 'Pawar Y'),
                        ('OstermaY n', 'Osterman Y'),
                        ('Jackson32', 'Jackson 32'),
                        ('Thompson36', 'Thompson 36'),
                        ('denas37', 'denas 37'),
                        ('Napolitano NY', 'Napolitano N Y'),
                        ('Napolitano YY', 'Napolitano Y Y'),
                        ('34 Y', 'Y 34'),
                        ('Brookins45 Capple                                Y                man          Y',
                         'Brookins 45 Cappleman Y Y')]

def parse_vote_block_dirty(lines: List[str]) -> pd.DataFrame:
    # This works on a vote block with messed up formatting.
    # Assume lines are organized as
    # WARD1 NAME1 WARD2 NAME2 VOTE1 VOTE2
    # The "VOTE1" may or may not have whitespace between it and NAME2
    # Possibly VOTE1 is between NAME1 and WARD2 for some lines.
    
    votes = []
    for i, line in enumerate(lines):
        if not line:
            break
        #print(line)
        for special in SPECIAL_LINE_CLEANING:
            line = line.replace(*special)
        #print(line)
        tokens = line.split()
        clean_tokens = [], []
        stage = 0
        clean_tokens[0].append(tokens.pop(0))
        
        i_next_ward = next_index_with_condition(tokens, is_number)
        name_0 = ' '.join(tokens[:i_next_ward]).strip()
        vote_0, name_0 = vote_from_token_end(name_0)
        clean_tokens[0].append(name_0.strip())
        clean_tokens[1].append(tokens[i_next_ward])
        
        tokens = tokens[i_next_ward + 1:]  # Remove consumed tokens
        vote_1 = tokens.pop(len(tokens) - 1)  # Last token is the second vote
        
        # Remaining tokens are a combination of first vote and second name
        if not vote_0:
            vote_0, tokens[-1] = vote_from_token_end(tokens[-1])
        if not vote_0:
            vote_0 = tokens.pop(1)
        clean_tokens[0].append(vote_0)
        clean_tokens[1].append(' '.join(tokens).strip())
        clean_tokens[1].append(vote_1)
        
        for cleaned in clean_tokens:
            votes.append(dict(zip(columns, cleaned)))
    df = pd.DataFrame.from_records(votes)
    df['Ward'] = df['Ward'].astype(int)
    return df.sort_values(by='Ward').reset_index(drop=True)


def parse_vote_block_clean(lines: List[str]) -> pd.DataFrame:
    # This works on a well-formatted vote block
    votes = []
    # Go over enough lines to be sure you have it all. Break when done.
    for i, line in enumerate(lines):
        if not line:
            break
        tokens = line.split()
        clean_tokens = []
        stage = 0
        name_tokens = []
        for t in tokens:
            if stage == 0:
                clean_tokens.append(t)
                stage += 1
            elif stage == 1:
                if t in VOTE_TOKENS:
                    clean_tokens.append(' '.join(name_tokens))
                    clean_tokens.append(t)
                    votes.append(dict(zip(columns, clean_tokens)))
                    clean_tokens, name_tokens = [], []
                    stage = 0
                else:
                    name_tokens.append(t)
    df = pd.DataFrame.from_records(votes)
    df['Ward'] = df['Ward'].astype(int)
    return df.sort_values(by='Ward').reset_index(drop=True)


In [504]:
# This cell finds the portions of data in the newer format -- 
# record indicator, title, and voting data -- and cleans and 
# wraps everything into one dictionary per vote.

def startswith_index(page, i_start, substr):
    for i, l in enumerate(page[i_start:]):
        if l.strip().lower().startswith(substr.lower()):
            return i + i_start

def select_chunks(page):
    chunks, starts, stops = {}, {}, {}
    end_index = None
    
    i_roll_call_page = startswith_index(page, 0, 'Roll Call Vote')
    if not i_roll_call_page:
        return chunks, end_index
    
    starts['record'] = startswith_index(page, i_roll_call_page, 'Record')
    stops['record'] = starts['record'] + 1
    starts['title'] = startswith_index(page, starts['record'], 'Title')
    stops['title'] = startswith_index(page, starts['title'], 'Vote')
    starts['votes'] = startswith_index(page, stops['title'], 'Ward') + 1
    if page[starts['votes']].strip().startswith('Ward'):
        # Sometimes the table header splits onto two lines
        # E.g. roll_calls/20131211_roll_call_report.txt
        starts['votes'] += 1
    for i in range(starts['votes'], len(page)):
        if not page[i].strip() or page[i].strip()[0] not in string.digits:
            stops['votes'] = i
            break
    else:
        raise RuntimeError("Couldn't find the end of the vote block.")
    
    for name in starts:
        chunks[name] = page[starts[name]: stops[name]]
        
    return chunks, stops['votes'] + 1

   
def clean_record(line: str) -> str:
    if len(line) != 1:
        raise ValueError('Expected one line!')
    return line[0].split()[-1]
    #return line[0][len('Record No.:'):].strip()

def clean_title(lines: List[str]) -> str:
    line = ' '.join(lines)
    tokens = line.split()
    return ' '.join(tokens[1:])
    #return line[len('Title/Description:'):].strip()

def clean_votes(lines: List[str]) -> pd.DataFrame:
    try:
        return parse_vote_block_clean(lines)
    except (ValueError, KeyError):
        return parse_vote_block_dirty(lines)

CLEAN = {'record': clean_record, 'title': clean_title, 'votes': clean_votes}


def read_page(file_name: str) -> List[str]:
    with open(file_name) as _fin:
        page = [l.strip() for l in _fin.readlines()]
    return page


def parse_file_with_blocks(file_name: str, date_str: str) -> List[Dict]:
    records = []
    page = read_page(file_name)
    index = 0
    while True:
        this_rec, next_index = select_chunks(page[index:])
        if this_rec:
            records.append({k: CLEAN[k](v) for k, v in this_rec.items()})
            records[-1]['date'] = date_str
            index += next_index
        else:
            break
    return records

In [505]:
# This cell parses voting records in the older style.

def select_vote_table(lines: List[str], votes_only: bool=False) -> List[str]:
    # Search for a unified table of votes
    i_start, i_stop = None, None
    for i_line, line in enumerate(lines):
        if ('issue:' in line.strip().lower() and 
                (lines[i_line + 1].strip().startswith('1st') or
                 lines[i_line + 2].strip().startswith('1st'))):
            i_start = i_line
        elif votes_only and line.strip().startswith('1st'):
            # We already have the header, so we only need the block of votes
            i_start = i_line
        elif i_start and line.strip().startswith('50th'):
            i_stop = i_line + 1
            break
    else:
        return []
    return lines[i_start: i_stop]

def parse_table_line(line: str, n_issues: int) -> Tuple[str, str, str]:
    tokens = line.split()
    if len(tokens) < 3:
        return None
    ward = tokens.pop(0)[:-2]  # Remove "st", "nd", "th", etc.
    tokens.pop(0)  # Should be "Ward:"
    
    votes = []
    while tokens[-1] in VOTE_TOKENS:
        votes.append(tokens.pop(len(tokens) - 1))
    additional_vote, tokens[-1] = vote_from_token_end(tokens[-1])
    if additional_vote:
        votes.append(additional_vote)
        
    name = ' '.join(tokens)
    if name.lower() == 'vacant' and not votes:
        votes = n_issues * ['V']
    return [ward, name] + votes


def get_title(page: List[str], issue: str) -> str:
    if issue.startswith('ADJOURN'):
        return "Motion to adjourn"
    if issue.lower().startswith('case of'):
        return issue
    issue = issue.replace(';', ',')
    for i_line, line in enumerate(page):
        if line.strip().startswith(issue):
            break
    blob = ' '.join([l.strip() for l in page[i_line: i_line+6]])
    i_end = blob.find('Click here for entire text')
    title = blob[: i_end]
    title = title[len(issue) + 1:].strip().lstrip('-').strip()
    title = title.rstrip('(').strip()
    return title


def parse_issues(lines: List[str]) -> List[str]:
    # Convert lines of text with issue record names into a list of names
    # Need to handle line breaks.
    # See e.g. roll_calls/20071113_roll_call_report.txt

    # Differently-broken:
    # roll_calls/20061115_roll_call_report.txt
    
    # Remove spaces from "Motion to Adjourn" to aid in processing
    lines[0] = lines[0].replace('Motion to Adjourn', 'ADJOURN')
    
    if len(lines) == 1:
        return ''.join([l.strip() for l in lines]).split()[1:]
    if len(lines) == 2:
        if ';' in lines[0]:
            lines[0].replace('; ', ';')
        elif Counter(lines[0])['-'] > len(lines[0].split()) - 1:
            lines[0] = lines[0].replace('-', '- ').replace(' - ', ' ')
        lines[1] = lines[1].replace(' (', '(')
        tokens1 = lines[0].split()[1:]  # Remove "Issue:"
        tokens2 = lines[1].split()
        if len(tokens2) == 1:
            return ''.join([l.strip() for l in lines]).split()[1:]
        elif len(tokens2) == len(tokens1):
            return [''.join(pair) for pair in zip(tokens1, tokens2)]
    raise RuntimeError('Unable to parse issues: \n{}'.format(lines))

        
def parse_file_with_table(file_name: str, date_str: str, issues: List[str]=None) -> List[Dict]:
    records = []
    page = read_page(file_name)
    
    vote_table = select_vote_table(page, votes_only=(issues is not None))
    if not vote_table:
        return
    
    if not issues:
        i_start = [i for i, l in enumerate(vote_table) if l.startswith('1st')][0]
        issues = parse_issues(vote_table[: i_start])
    else:
        i_start = 0
    parsed_lines = [parse_table_line(l, len(issues)) for l in vote_table[i_start:] if len(l.split()) >= 3]
    df = pd.DataFrame.from_records(parsed_lines, columns=['Ward', 'Alderman'] + issues)
    assert len(df) == 50
    for issue in issues:
        record = {'record': issue,
                  'title': get_title(page, issue),
                  'date': date_str,
                  'votes': df[['Alderman', issue, 'Ward']].rename(columns={issue: 'Vote'})}
        records.append(record)
    return records

In [506]:
# Hand-inspected records, verified to have no votes that day (attendance only)
known_empty = ['/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20110309_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20110413_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20110504_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20110518_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20110706_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20111005_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20111012_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20111102_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20111109_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20111214_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20120314_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20120509_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20121003_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20130117_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20130213_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20130313_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20130717_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20131016_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20140402_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20141210_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20150121_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20150415_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20150506_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20150520_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20151014_roll_call_report.txt',
 '/Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20151021_roll_call_report.txt']

# There's odd things in these files which keep them from being parsed easily.
# Hand-code the issue headers. (These are for old-style formatted documents only.)
issues_for_date = {'2009-10-07': ['SO2009-5597','SO2009-5542', 
                                  'PO2009-4114: Motion to lay on table', 
                                  'PO2009-4114: Motion to Re-refer'],
                   '2006-11-19': ['O2008-6775', 'O2008-6776', 'O2008-6782', 'O2008-6778', 'SO2008-6777'],
                   '2010-09-08': ['Case of Gary Kamen v. City of Chicago.'],
                   '2011-02-09': ['SO2010-7086; O2010-6824']}

In [507]:
# Use GhostScript to convert all of the pdfs into text files,
# then parse the text files.
cmd = 'gs -sDEVICE=txtwrite -o {output} {input}'

vote_records = []
empty = []
unknown = []
success = []
fail = []
val = None
pdfs = glob(os.path.abspath('roll_calls/*.pdf'))
for i_name, fname in enumerate(pdfs):
    out_fname = os.path.splitext(fname)[0] + '.txt'
    if not os.path.exists(out_fname):
        # Only reprocess the pdfs if the outputs don't exist already.
        retval = subprocess.run(cmd.format(output=out_fname, input=fname),
                                shell=True, check=False,
                                stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    #print(retval.stdout, retval.stderr)
    print("Parsing {} from {}".format(fname, out_fname))
    try:
        with open(out_fname) as _fin:
            full_page = _fin.read()
            
        # Many documents have only attendance and no votes. Skip those.
        if (out_fname in known_empty or
                'There were no divided roll call votes' in full_page or
                'There was no divided roll call' in full_page):
            empty.append(out_fname)
            continue
            
        # First try to parse the document as if it were in the new-style format.
        yyyymmdd = os.path.basename(out_fname).split('_')[0]
        date_str = "{}-{}-{}".format(yyyymmdd[:4], yyyymmdd[4:6], yyyymmdd[6:])
        parsed = parse_file_with_blocks(out_fname, date_str=date_str)
        if not parsed:
            # If that didn't work, try again with the older format.
            parsed = parse_file_with_table(out_fname, date_str=date_str, 
                                           issues=issues_for_date.get(date_str))
            
        # Record success or failure.
        if not parsed:
            unknown.append(out_fname)
        else:
            success.append(out_fname)
            vote_records.extend(parsed)
    except Exception as exc:
        fail.append((out_fname, exc))
        print(exc)

Parsing /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060524_roll_call_report.pdf from /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060524_roll_call_report.txt
Parsing /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060628_roll_call_report.pdf from /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060628_roll_call_report.txt
Parsing /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060726_roll_call_report.pdf from /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060726_roll_call_report.txt
Parsing /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060913_roll_call_report.pdf from /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20060913_roll_call_report.txt
Parsing /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20061004_roll_call_report.pdf from /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/20061004_roll_call_report.txt
Parsing /Users/shoover/projects/d4d/chicago-lobbyists/roll_calls/

In [508]:
print('{} vote records successfully parsed from {} files.'.format(len(vote_records), len(success)))
print(len(empty), ' files have no votes.')
print(len(fail), ' files had an error in parsing.')
print(len(unknown), ' files I don\'t know how to parse.')

217 vote records successfully parsed from 90 files.
58  files have no votes.
0  files had an error in parsing.
0  files I don't know how to parse.


In [509]:
vote_records[0]

{'date': '2006-07-26',
 'record': 'SO2006-3086',
 'title': 'Amendment of Title 2, Chapter 8, Section 041 of Municipal Code of Chicago by Establishment of New Aldermanic Compensation Schedule.',
 'votes':                  Alderman Vote Ward
 0           Manuel Flores    Y    1
 1   Madeline L. Haithcock    N    2
 2      Dorothy J. Tillman    N    3
 3        Toni Preckwinkle    Y    4
 4         Leslie Hairston   NV    5
 5         Freddrenna Lyle   NV    6
 6      William M. Beavers    N    7
 7            Todd Stroger    N    8
 8           Anthony Beale    N    9
 9            John A. Pope    Y   10
 10        James A. Balcer    N   11
 11     George A. Cardenas    N   12
 12         Frank J. Olivo    Y   13
 13        Edward M. Burke    Y   14
 14        Theodore Thomas    N   15
 15     Shirley A. Coleman    N   16
 16     LaTasha R. ThomasN    N   17
 17      Thomas W. MurphyN    N   18
 18      Virginia A. Rugai    N   19
 19        Arenda Troutman    N   20
 20   Howard Brookin

In [483]:
# Inspect pages
read_page(empty[4])

['Attendance and Divided Roll Call                               Close',
 'Vote',
 'Attendance for the February 7th, 2007 Meeting of the Chicago City',
 'Council',
 'Present - The Honorable Richard M. Daley, Mayor, and Aldermen Flores, Haithcock,',
 'Tillman, Preckwinkle, Hairston, Lyle, Beavers, Stroger, Beale, Pope, Balcer, Cardenas,',
 'Olivo, Burke, T. Thomas, Coleman, L.Thomas, Murphy, Rugai, Troutman, Brookins,',
 'Munoz, Zalewski, Chandler, Solis, Ocasio, Burnett, E. Smith, Carothers, Reboyras,',
 "Suarez, Matlak, Mell, Austin, Colon, Banks, Mitts, Allen, Laurino, O'Connor, Doherty,",
 'Natarus, Daley, Tunney, Levar, Shiller, Schulter, M. Smith, Moore, Stone.',
 'Absent - None.',
 'Divided Roll Call Voting February 7th, 2007 Meeting of the Chicago City',
 'Council',
 'There were no divided roll call votes in the February 7th, 2007 meeting of the Chicago City',
 'Council.']

In [None]:
# Error checking / debugging

fname = fail[0][0]
print(fname)
page = read_page(fname)
votes = select_chunks(page)
print(votes[1])
parse_vote_block_dirty(votes[0]['votes'])
#parse_file_with_table(fname, 'yyyy')

# Output Data

Write the data to disk!

In [484]:
df_records_list = []
titles = []
for record in vote_records:
    _df = record['votes'].copy()
    _df['Date'] = record['date']
    _df['Record'] = record['record']
    df_records_list.append(_df)
    titles.append([record['date'], record['record'], record['title']])
    
df_records = pd.concat(df_records_list)
df_titles = pd.DataFrame(titles, columns=['Date', 'Record', 'Title'])

In [488]:
df_records.to_csv(os.path.join(WORKING_DIR, 'alderman_votes.csv'), index=False)
df_titles.to_csv(os.path.join(WORKING_DIR, 'legislation_titles.csv'), index=False)