James Turk edited this page Mar 28, 2013 · 5 revisions

What we're scraping and why


We're currently using the senate api for:

  * the master list of active bill_id
  * the short bill title

We're currently scraping the senate website for:

  * bill "subjects", which on their site is the law compilation the bill primarily affects.
  * senate committee and floor votes
  * senate sponsor's memoranda
  * committee meetings

We're currently scraping the assembly for:

  * Bill sponsors, actions, and summaries
  * assembly votes
  * assembly sponsors' memoranda
  * version urls
  * assembly events

Senate API issues:

  * Bill actions are sometimes mangled or truncated in "uni bills"
  * Bill action ordering is mangled--an errant date sort occurring in the api somewhere
  * Sponsors are getting mangled on some bills
  * Bills have no session attribute
  * Assembly same-as id's aren't consistently displayed on the on the senate site.

Weirdness Of NY Companion Bills


NY has "same-as" bills, which are companion bills. It also has "uni bills", which as same-as bills that can only be amended if the amendment passes both houses. In other respects, they're two separate bills though.

Other Vagaries of NY Legislative Info


The assembly site doesn't publish senate votes. The senate site doesn't publish assembly votes.

So it seems the only way to get all votes for a bill that has a companion is to scrape both, then merge each's votes into the other, but only if the other hasn't been substituted and killed away first. AAAAAAAAARGGH!!