Skip to content

A Python script to scrape City of Austin campaign finance reports (PDFs + metadata), 2009-2016.

Notifications You must be signed in to change notification settings

statesman/austin-campaign-finance-scraper

Repository files navigation

Austin campaign finance scraper

A Python script to scrape City of Austin campaign finance reports (PDFs + metadata), 2009-2016.

Requirements

virtualenv
pip

Setup

Clone this repo, activate the virtual environment, install the requirements.

$ git clone git@github.com:statesman/austin-campaign-finance-scraper.git
$ virtualenv austin-campaign-finance-scraper
$ cd austin-campaign-finance-scraper
$ source bin/activate
$ pip install -r requirements.txt

Run the script

$ fab scrapeEm

Results

After running the script, you should end up with:

  1. A directory for each year, 2009-2016, each containing PDF scans of campaign finance reports for that year. Reports are slugged {year}-{month}-{day}-{filer_name}.pdf. Corrected reports are slugged {year}-{month}-{day}-{filer_name}-corrected.pdf.
  2. A JSON metadata file.
  3. A zipfile containing (1) and (2).

Git-ing the ZIP file

You'll need Git Large File Storage to commit the ZIP file, which clocks in at ~650MB.

Todo

Refactor for DRY-ness.

About

A Python script to scrape City of Austin campaign finance reports (PDFs + metadata), 2009-2016.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published