Skip to content
source for Open States's scrapers
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github
openstates
scripts remove pupa2billy Jan 10, 2019
.dockerignore
.gitignore Add expanded California data files to the gitignore Feb 15, 2019
.travis.yml Bump travis python version. Nov 12, 2018
AUTHORS.md Update AUTHORS.md Nov 13, 2018
CODE_OF_CONDUCT.md Update CODE_OF_CONDUCT.md Mar 12, 2017
Dockerfile CA: Re-add MySQL client package to Dockerfile Mar 21, 2019
LICENSE
README.rst
docker-compose.yml Switch Docker Compose to MariaDB container, to mimic Alpine server us… Feb 15, 2019
pupa-scrape.sh
pupa_settings.py
requirements.txt
setup.cfg remove billy_metadata Jan 10, 2019
setup.py bump setup Feb 22, 2017

README.rst

The Open States project collects and makes available data about state legislative activities, including bill summaries, votes, sponsorships and state legislator information. This data is gathered directly from the states and made available in a common format for interested developers, through a JSON API and data dumps.

Links

Getting Started

We use Docker to provide a reproducible development environment. Make sure you have Docker installed.

Inside your Open States repository directory, run a scrape for a specific state by running:

docker-compose run --rm scrape <state postal code>

You can also choose to scrape only specific data types for a state. The data types available vary from state to state; look at the scrapers listed in the state's __init__.py for a list. For example, Tennessee (with a state postal code of tn) has:

scrapers = {
    'bills': TNBillScraper,
    'committees': TNCommitteeScraper,
    'events': TNEventScraper,
    'people': TNPersonScraper,
}

So you can limit a Tennessee scrape to only committees and legislators using:

docker-compose run --rm scrape tn committees people

After retrieving everything from the state, scrape imports the data into a PostgreSQL database. If you want to skip this step, include a --scrape flag at the end of your command, like so:

docker-compose run --rm scrape tn committees people --scrape

If you do want to import data into Postgres, start a Postgres service using Docker Compose:

docker-compose up postgres

Then run database migrations and import jurisdictions, thus initializing the database contents:

docker-compose run --rm dbinit

Now you can run the scrape service without the --scrape flag, and data will be imported into Postgres. You can connect to the database and inspect data using psql (credentials are set in docker-compose.yml):

psql postgres://postgres:secret@localhost:5432/openstates

After you run scrape (with or without the Postgres import), it will leave one JSON file in the _data subdirectory for each entity that was scraped. These JSON files contain the transformed, scraped data, and are very useful for debugging.

Check out the writing scrapers guide to understand more about how the scrapers work, and how you can contribute.

Testing

Our scraping framework, Pupa, has a strong test harness, and requires well-structured data when ingesting. Furthermore, Open States scrapers should be written to fail when they encounter unexpected data, rather than guessing at its format and possibly ingesting bad data. Together, this means that there aren't many benefits to writing unit tests for particular Open States scrapers, versus high upkeep costs.

Occasionally, states will have unit tests, though, for specific structural cases. To run all tests:

docker-compose run --rm --entrypoint=nosetests scrape /srv/openstates-web/openstates

API Keys

A few states require credentials to access their APIs. If you want to run code for these states, get credentials for yourself using these steps:

You can’t perform that action at this time.