A Python script to scrape City of Austin campaign finance reports (PDFs + metadata), 2009-2016.
virtualenv
pip
Clone this repo, activate the virtual environment, install the requirements.
$ git clone git@github.com:statesman/austin-campaign-finance-scraper.git
$ virtualenv austin-campaign-finance-scraper
$ cd austin-campaign-finance-scraper
$ source bin/activate
$ pip install -r requirements.txt
$ fab scrapeEm
After running the script, you should end up with:
- A directory for each year, 2009-2016, each containing PDF scans of campaign finance reports for that year. Reports are slugged
{year}-{month}-{day}-{filer_name}.pdf
. Corrected reports are slugged{year}-{month}-{day}-{filer_name}-corrected.pdf
. - A JSON metadata file.
- A zipfile containing (1) and (2).
You'll need Git Large File Storage to commit the ZIP file, which clocks in at ~650MB.
Refactor for DRY-ness.