Skip to content

nickbabcock/register

Repository files navigation

Register

Register is a project that attempts to distill the Federal Register data into a more digestible format with an emphasis on reproducibility for those also interested in the data. This project takes the 7GB of XML data from 2005 to the current year and condenses it into a 70MB CSV. For more details / background, see the introductory blog post: Back to the Register: Distilling the Federal Register for All (the sample.R script is used to generate graphs for that post).

See Releases for the latest csv data. Here are the headers

  • date: The date the document appeared in the registry
  • type: Presidential / rule / proposed-rule / notice
  • agency: What agency issued this document (eg. Department of transportation)
  • sub agency: What sub agency issued the document. For instance, while the agency may be "Health and human services", the sub agency may be "Food and drug administration"
  • subject: What is the subject / title of this document
  • names: List of names associated with the document (semi-colon delimited)
  • rin: List Regulation Identifier Numbers associated with the document (semi-colon delimited)

Here's a sample of the data (with subject column removed as Federal Register titles are quite long):

date        type    agency                             names               rin            docket
2013-03-20  notice  Department of transportation       G. Kelly Leone                     2013-06361
2015-04-02  notice  Department of veterans affairs     Rebecca Schiller                   2015-07509
2012-11-14  notice  Department of commerce             Gwellnar Banks                     2012-27621
2013-07-22  notice  Federal communications commission  Marlene H. Dortch                  2013-17626
2005-10-19  notice  Environmental protection agency    Vicki A. Simons                    05-20709
2016-02-09  notice  Office of personnel management     Beth F. Cobert                     2016-02615
2013-09-19  rule    Department of the interior         Stephen Guertin     RIN 1018-AY52  2013-22702
2009-05-05  notice  Department of labor                Elliott S. Kushner                 E9-10237
2010-08-03  notice  Small business administration      Karen G. Mills                     2010-19068
2007-09-05  notice  Environmental protection agency    James B. Gulliford                 E7-17542

Generating the data

There are two ways to download, parse, and generate the data you see above: docker (easiest) or by installing the prerequisites (still not that bad)

Docker

Assuming docker is installed

docker build -t nickbabcock/register .
docker run -v "$(pwd)/data":/register/data --rm -ti nickbabcock/register

The csv data will be in the data directory

Prerequisites

If not interested in the docker solution, you'll need:

  • bash shell (linux machine -- potentially mac-os)
  • Java
  • python3

After the above are installed, run the below scripts, which will do the following:

  • setup.sh:
    • Download the java library for XQuery files into a saxon directory
    • Download the Federal Register data into the data directory
  • run_conversion.sh
    • Run the XQuery transformation (transform.xql), which outputs JSON lines
    • Pipe the JSON lines into the python script (to_csv.py), which outputs a CSV file

Contributing

Only a subset of fields available in the Federal Register are extracted into the CSV. If there is a field missing that you want to see, please open an issue or create a pull request.