Skip to content
This repository has been archived by the owner on Feb 1, 2020. It is now read-only.

Latest commit

 

History

History
178 lines (118 loc) · 5.2 KB

scrapers.rst

File metadata and controls

178 lines (118 loc) · 5.2 KB

Writing Scrapers

A state scraper is implementing by providing classes derived from ~billy.scrape.bills.BillScraper, ~billy.scrape.legislators.LegislatorScraper, ~billy.scrape.votes.VoteScraper, and ~billy.scrape.committees.CommitteeScraper.

Derived scraper classes should override the scrape method that that is responsible for creating ~billy.scrape.bills.Bill, ~billy.scrape.legislators.Legislator, ~billy.scrape.votes.Vote, and ~billy.scrape.committees.Committee objects as appropriate.

Example state scraper directory structure:

./ex/__init__.py      # metadata for "ex" state scraper
./ex/bills.py         # contains EXBillScraper (also scrapes Votes)
./ex/legislators.py   # contains EXLegislatorScraper
./ex/committees.py    # contains EXCommitteeScraper

billy.scrape

billy.scrape

Scraper

The most useful on the base Scraper class is urlopen(url, method='GET', body=None). Scraper.urlopen opens a URL and returns a string-like object that can then be parsed by a library like lxml.

This method provides advantages over built-in urlopen methods in that the underlying Scraper class can be configured to support rate-limiting, caching, and provides robust error handling.

Note

For advanced usage see scrapelib which provides the basis for billy.scrape.Scraper.

Logging

The base class also configures a python logger instance and provides several shortcuts for logging at various log levels:

log(msg, *args, **kwargs)

log a message with level logging.INFO

debug(msg, *args, **kwargs)

log a message with level logging.DEBUG

warning(msg, *args, **kwargs)

log a message with level logging.WARNING

Note

It is also possible to access the self.logger object directly.

billy.scrape.Scraper

SourcedObject

billy.scrape.SourcedObject

Exceptions

billy.scrape.ScrapeError

billy.scrape.NoDataForPeriod

billy.scrape.bills

Bills

BillScraper

BillScraper implementations should gather and save ~billy.scrape.bills.Bill objects.

Sometimes it is easiest to also gather ~billy.scrape.votes.Vote objects in a BillScraper as well, these can be attached to ~billy.scrape.bills.Bill objects via the add_vote method.

billy.scrape.bills.BillScraper

Bill

billy.scrape.bills.Bill

billy.scrape.votes

Votes

VoteScraper

VoteScraper implementations should gather and save ~billy.scrape.votes.Vote objects.

If a state's BillScraper gathers votes it is not necessary to provide a VoteScraper implementation.

billy.scrape.votes.VoteScraper

Vote

billy.scrape.votes.Vote

billy.scrape.legislators

Legislators

LegislatorScraper implementations should gather and save ~billy.scrape.legislators.Legislator objects.

Sometimes it is easiest to also gather committee memberships at the same time as legislators. Committee memberships can be attached to ~billy.scrape.legislators.Legislator objects via the add_role method.

LegislatorScraper

billy.scrape.legislators.LegislatorScraper

Person

billy.scrape.legislators.Person

Legislator

billy.scrape.legislators.Legislator

billy.scrape.committees

Committees

CommitteeScraper implementations should gather and save ~billy.scrape.committees.Committee objects.

If a state's LegislatorScraper gathers committee memberships it is not necessary to provide a CommitteeScraper implementation.

CommitteeScraper

billy.scrape.committees.CommitteeScraper

Committee

billy.scrape.committees.Committee

Events

EventScraper implementations should gather and save ~billy.scrape.events.Event objects.

Relevant bills, documents, and participants can be attached to ~billy.scrape.events.Event objects via the add_related_bill, add_document, and add_participant methods, respectively.

EventScraper

billy.scrape.events.EventScraper

Event

billy.scrape.events.Event