A state scraper is implementing by providing classes derived from ~billy.scrape.bills.BillScraper
, ~billy.scrape.legislators.LegislatorScraper
, ~billy.scrape.votes.VoteScraper
, and ~billy.scrape.committees.CommitteeScraper
.
Derived scraper classes should override the scrape
method that that is responsible for creating ~billy.scrape.bills.Bill
, ~billy.scrape.legislators.Legislator
, ~billy.scrape.votes.Vote
, and ~billy.scrape.committees.Committee
objects as appropriate.
Example state scraper directory structure:
./ex/__init__.py # metadata for "ex" state scraper
./ex/bills.py # contains EXBillScraper (also scrapes Votes)
./ex/legislators.py # contains EXLegislatorScraper
./ex/committees.py # contains EXCommitteeScraper
billy.scrape
The most useful on the base Scraper
class is urlopen(url, method='GET', body=None)
. Scraper.urlopen
opens a URL and returns a string-like object that can then be parsed by a library like lxml.
This method provides advantages over built-in urlopen methods in that the underlying Scraper
class can be configured to support rate-limiting, caching, and provides robust error handling.
Note
For advanced usage see scrapelib which provides the basis for billy.scrape.Scraper
.
The base class also configures a python logger instance and provides several shortcuts for logging at various log levels:
log(msg, *args, **kwargs)
log a message with level
logging.INFO
debug(msg, *args, **kwargs)
log a message with level
logging.DEBUG
warning(msg, *args, **kwargs)
log a message with level
logging.WARNING
Note
It is also possible to access the self.logger
object directly.
billy.scrape.Scraper
billy.scrape.SourcedObject
billy.scrape.ScrapeError
billy.scrape.NoDataForPeriod
billy.scrape.bills
BillScraper
implementations should gather and save ~billy.scrape.bills.Bill
objects.
Sometimes it is easiest to also gather ~billy.scrape.votes.Vote
objects in a BillScraper as well, these can be attached to ~billy.scrape.bills.Bill
objects via the add_vote
method.
billy.scrape.bills.BillScraper
billy.scrape.bills.Bill
billy.scrape.votes
VoteScraper
implementations should gather and save ~billy.scrape.votes.Vote
objects.
If a state's BillScraper
gathers votes it is not necessary to provide a VoteScraper
implementation.
billy.scrape.votes.VoteScraper
billy.scrape.votes.Vote
billy.scrape.legislators
LegislatorScraper
implementations should gather and save ~billy.scrape.legislators.Legislator
objects.
Sometimes it is easiest to also gather committee memberships at the same time as legislators. Committee memberships can be attached to ~billy.scrape.legislators.Legislator
objects via the add_role
method.
billy.scrape.legislators.LegislatorScraper
billy.scrape.legislators.Person
billy.scrape.legislators.Legislator
billy.scrape.committees
CommitteeScraper
implementations should gather and save ~billy.scrape.committees.Committee
objects.
If a state's LegislatorScraper
gathers committee memberships it is not necessary to provide a CommitteeScraper
implementation.
billy.scrape.committees.CommitteeScraper
billy.scrape.committees.Committee
EventScraper
implementations should gather and save ~billy.scrape.events.Event
objects.
Relevant bills, documents, and participants can be attached to ~billy.scrape.events.Event
objects via the add_related_bill
, add_document
, and add_participant
methods, respectively.
billy.scrape.events.EventScraper
billy.scrape.events.Event