Skip to content

How to write a scraper

rkiddy edited this page Jul 24, 2011 · 3 revisions

The main steps, generally speaking, to writing a new OpenStates scraper are:

  1. learn python - many texts and web-based tutorials are available.
  2. install python 2.7 and the openstates supporting libraries, for Windows, *nix, Mac OS X.
  3. install git, learn the basics of using git, check out the OpenStates source code from github
  4. review the information at http://openstates.sunlightlabs.com/contributing/
  5. find a state and locate the on-line information for its legislature and figure out the structure of the information it provides.
  6. find lists of all bills, legislators, committees - how many of each are there and are any missed from the list?
  7. having found what to look for, locate detailed information for bills, legislators, committees.
  8. override methods in OpenStates scripts for saving bills, legislators, committees, matching the information you have found to the desired information in the OpenStates objects.
  9. check code in to a branch you have set up on github, or issue a pull request against the main repository.

Clone this wiki locally