Skip to content
rkiddy edited this page Jul 24, 2011 · 1 revision

Sources:

(Note: a POST with a comma delimited list of bill numbers passed as the 'hListBills' form parameter to the following URL provides information on bills) http://dirac.rilin.state.ri.us/billstatus/WebClass1.ASP?WCI=BillStatus&WCE=ifrmBillStatus&WCU

legislators.py=

Progress: finished

Data scraped:

  • term
  • chamber
  • district_name
  • full_name
  • party
  • office_address
  • town_represented (comma delimited list of towns)
  • email
Notes: The source information does not provide separate first/last name information, so only full name is scraped.

committees.py

Progress: finished

Data scraped:

  • chamber
  • membernames
  • roles
Notes: It seems like the committee links from http://www.rilin.state.ri.us/Sitemap.html are intermittently timing-out. There is a try/except clause to handle these timeouts.

bills.py

Progress: some

Implementation plan: Currently, bills are requested one at a time and the results are parsed for the relevant data. It would potentially be quicker to request multiple bill details all at once, but there is a difficulty in that the list of bills contains bills with irregular IDs (they end with letters, e.g. 5002Aaa) which would need to be removed (the request page only accepts numerical bill IDs). Additionally, bill actions are present in the results, but they are currently not parsed. For vote information, the plan is to look into the House and Senate Journals for the relevant days to get the roll call. Unfortunately, the journals are only available as PDFs.

Data scraped:

  • session
  • chamber
  • bill_id
  • title
  • type
  • sponsors and cosponsors
Note: Currently, a try/except clause handles irregular bill IDs. It would be better to exclude these before sending the POST request.

Clone this wiki locally