A selenium based scraper for Welsh Assembly Members public expenses record
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
LICENSE
README.md
expenses.py

README.md

assembly-expenses-scraper

A selenium based scraper for Welsh Assembly Members public expenses record

Scraping Expenses!

Requirements

  • selenium - to drive the web browser
  • chromedriver - default assumes this is in the same folder as the script, if not, supply the path as a command line argument
  • tqdm - for the lovely progress bars on the command line

Usage

usage: expenses.py [-h] -y [YEARS [YEARS ...]] [-s START] [-t TO] [-p PAUSE] [-d DRIVER]

required arguments:

-y [YEARS [YEARS ...]], --years [YEARS [YEARS ...]] list of years to check

optional arguments:

-h, --help show this help message and exit

-s START, --start START month to start scraping from

-t TO, --to TO
month to scrape to

-p PAUSE, --pause PAUSE pause to add between requests to server (in seconds) - default 1/3 of a second

-d DRIVER, --driver DRIVER path to the chromedriver

TODO

  • caching would be nice (for the assembly web server - I couldn't care less)
  • more detailed searches (site allows by AM, by type etc)
  • read input parameters from file for more detailed searches