Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
e.g. https://legis.delaware.gov/json/BillDetail/GenerateHtmlDocument?legislationId=26010&legislationTypeId=1&docTypeId=2&legislationName=HB248ocd-bill/5793a6bb-6dcd-496d-b970-419568789932 has strikethrough'ed text (by CSS class, frustratingly) and underlined text to indicate deletions and additions, respectively. What do we want to extract? I'm thinking we would exclude the deletions from our conception of "bill text", but include the existing law (for context, so the result is semi-readable) and the additions. What do you all think?
It would increase complexity, but another options is we could output one text file that includes the deletions in the bill text and another that excludes them.
I can see us wanting to include the deletion text if this is, for instance, feeding ElasticSearch. A bill that deletes a word from statute should still come up in a search, for instance. But for machine-learning purposes, it might not...