klondike edited this page Dec 9, 2011 · 14 revisions

Crosslaws is a set of parsing scripts that produce bulk data related to federal laws, statutes, the US Code, the Code of Federal Regulations, and proposed regulations as they go through the rulemaking process.


The legislative process, complicated as it is, is fundamentally understood by much of the public, and a large constellation of tools is available for professionals and citizens to actively engage with legislation as it is created, debated, and voted upon.

The regulatory process receives much less attention, is understood more poorly, and has less structured data available to work with. It is nonetheless an extremely important process, which by convention and by statute is designed to be public.

Citizens, journalists, and other professionals who follow issues often key their research of these issues by pieces of legislation that make these issues relevant. People who follow major legislation on financial sector regulation, health-care reform, copyright legislation, or deficit reduction measures should be able to easily find the proposed and final regulations relevant to their interests.

Most importantly, they should be able to find them at a time when their input is still relevant, either through the media, public comment, activist campaigns, or any other venue in which people attempt to influence the regulatory process.


The primary and founding goal is to establish the data necessary to draw a link all the way between a proposed federal regulation, and the bill that it originally derived from.

To do this, we will more generally bring some order to the various identifiers, sources, and stages of the legal and regulatory process -- and create a foundation for people to build apps, websites, and tools that make use of this data.

All of this data will be published in bulk and available in the public domain.

More information

We're compiling a list of related resources to websites and bulk data that relate to the project.

This code began with an informal working group during the International Open Data Hackathon Day in Washington, DC. We've uploaded the group's shared notes from the day.


Currently being parsed

  • The Parallel Table of Rules and Authorities, an annually produced compendium that links sections of the US Code and Statutes at Large to sections of the Code of Federal Regulations.

    • At the Open Data Hack day, we made a proof-of-concept site showing the links we've created so far between the CFR and the USC from this table.
    • This parser needs to be expanded to understand ranges within the hierarchy of the USC and the CFR, and to output data in bulk. (Right now, it's hardcoded to deposit the data in a MongoDB collection.)
  • The Office of the Law Revision Counsel's Table III tool for the US Code. This table relates every section and subsection of every bill passed into public law to its equivalent section of the US Code, and its page in the Statutes at Large.

    • This parser is in its first draft stage, and may require bug fixes and optimization. At a minimum, it needs to be robust to interruption.
  • Information from Regulations.gov: every proposed and final regulation on the site, with basic metadata including the RIN, CFR Part, and Docket number. This scraper is a large, separate project.

    • This data is continuously collected, and is integrated into Influence Explorer, a project of the Sunlight Foundation. It is not yet made available in bulk, or via API.

Needs to be started

  • Info from Reginfo.gov: metadata on every regulation with an RIN, especially its legal authority citations. We should obtain this archivally, and also keep it up to date on a reasonable basis (probably daily).

  • Information from GPO: every public slip law number and its associated bill code, and potentially its page in the Statutes at Large. This can be found in GPO's FDSys collection of Public and Private Laws, where the link to its original bill code can be found in a given law's MODS metadata.

  • The hierarchy of the US Code, in structured data. This is necessary to intelligently understand US Code citations that are expressed as ranges, or with modifiers such as "et seq". The contents of the US Code are not as necessary.

    • Cornell provides the US Code in XML for public download and use (with proper attribution, of course), though this is a direct XML translation of the "GPO locator codes", an unpleasant typesetting format. We could use this XML to produce a hierarchy of the USC, or work more closely with Cornell to see how they've done so.
    • GPO provides extensive MODS metadata in its FDSys collection of the US Code, per-title, per-chapter, and per-section. For example, the MODS file for 17 USC is long and has many public law and statutes-at-large citations, and seems to describe the various sections within 17 USC.
    • By using two sources, we may be able to get a more precise structure of the code by verifying them against each other. We could use this to verify the hierarchy of the US Code as derived from the GPO Locator files, and the linkages between statutes, slip law, and the code that the OLRC publishes in their Table III.
  • The hierarchy of the Code of Federal Regulations, in structured data. This is necessary to intelligently understand CFR Part citations that are expressed as ranges. The contents of the CFR are not as necessary. The CFR is published by GPO in XML form and can probably be parsed into a hierarchy.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.