Writing Scrapers

A state scraper is implementing by providing classes derived from ~billy.scrape.bills.BillScraper, ~billy.scrape.legislators.LegislatorScraper, ~billy.scrape.votes.VoteScraper, and ~billy.scrape.committees.CommitteeScraper.

Derived scraper classes should override the scrape method that that is responsible for creating ~billy.scrape.bills.Bill, ~billy.scrape.legislators.Legislator, ~billy.scrape.votes.Vote, and ~billy.scrape.committees.Committee objects as appropriate.

Example state scraper directory structure:

./ex/__init__.py      # metadata for "ex" state scraper
./ex/bills.py         # contains EXBillScraper (also scrapes Votes)
./ex/legislators.py   # contains EXLegislatorScraper
./ex/committees.py    # contains EXCommitteeScraper

billy.scrape

Scraper

The most useful on the base Scraper class is urlopen(url, method='GET', body=None). Scraper.urlopen opens a URL and returns a string-like object that can then be parsed by a library like lxml.

This method provides advantages over built-in urlopen methods in that the underlying Scraper class can be configured to support rate-limiting, caching, and provides robust error handling.

Note

For advanced usage see scrapelib which provides the basis for billy.scrape.Scraper.

Logging

The base class also configures a python logger instance and provides several shortcuts for logging at various log levels:

log(msg, *args, **kwargs): log a message with level logging.INFO
debug(msg, *args, **kwargs): log a message with level logging.DEBUG
warning(msg, *args, **kwargs): log a message with level logging.WARNING

Note

It is also possible to access the self.logger object directly.

billy.scrape.Scraper

SourcedObject

billy.scrape.SourcedObject

Exceptions

billy.scrape.ScrapeError

billy.scrape.NoDataForPeriod

billy.scrape.bills

Bills

BillScraper

BillScraper implementations should gather and save ~billy.scrape.bills.Bill objects.

Sometimes it is easiest to also gather ~billy.scrape.votes.Vote objects in a BillScraper as well, these can be attached to ~billy.scrape.bills.Bill objects via the add_vote method.

billy.scrape.bills.BillScraper

Bill

billy.scrape.bills.Bill

billy.scrape.votes

Votes

VoteScraper

VoteScraper implementations should gather and save ~billy.scrape.votes.Vote objects.

If a state's BillScraper gathers votes it is not necessary to provide a VoteScraper implementation.

billy.scrape.votes.VoteScraper

Vote

billy.scrape.votes.Vote

billy.scrape.legislators

Legislators

LegislatorScraper implementations should gather and save ~billy.scrape.legislators.Legislator objects.

Sometimes it is easiest to also gather committee memberships at the same time as legislators. Committee memberships can be attached to ~billy.scrape.legislators.Legislator objects via the add_role method.

LegislatorScraper

billy.scrape.legislators.LegislatorScraper

Person

billy.scrape.legislators.Person

Legislator

billy.scrape.legislators.Legislator

billy.scrape.committees

Committees

CommitteeScraper implementations should gather and save ~billy.scrape.committees.Committee objects.

If a state's LegislatorScraper gathers committee memberships it is not necessary to provide a CommitteeScraper implementation.

CommitteeScraper

billy.scrape.committees.CommitteeScraper

Committee

billy.scrape.committees.Committee

Events

EventScraper implementations should gather and save ~billy.scrape.events.Event objects.

Relevant bills, documents, and participants can be attached to ~billy.scrape.events.Event objects via the add_related_bill, add_document, and add_participant methods, respectively.

EventScraper

billy.scrape.events.EventScraper

Event

billy.scrape.events.Event

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapers.rst

scrapers.rst

Writing Scrapers

billy.scrape

Scraper

Logging

SourcedObject

Exceptions

Bills

BillScraper

Bill

Votes

VoteScraper

Vote

Legislators

LegislatorScraper

Person

Legislator

Committees

CommitteeScraper

Committee

Events

EventScraper

Event

Files

scrapers.rst

Latest commit

History

scrapers.rst

File metadata and controls

Writing Scrapers

billy.scrape

Scraper

Logging

SourcedObject

Exceptions

Bills

BillScraper

Bill

Votes

VoteScraper

Vote

Legislators

LegislatorScraper

Person

Legislator

Committees

CommitteeScraper

Committee

Events

EventScraper

Event