Register
is a project that attempts to distill the Federal
Register data into a more digestible format
with an emphasis on reproducibility for those also interested in the data. This
project takes the 7GB of XML data from 2005 to the current year and condenses it into a
70MB CSV. For more details / background, see the introductory blog post: Back
to the Register: Distilling the Federal Register for
All
(the sample.R
script is used to generate graphs for that post).
See Releases for the latest csv data. Here are the headers
- date: The date the document appeared in the registry
- type: Presidential / rule / proposed-rule / notice
- agency: What agency issued this document (eg. Department of transportation)
- sub agency: What sub agency issued the document. For instance, while the agency may be "Health and human services", the sub agency may be "Food and drug administration"
- subject: What is the subject / title of this document
- names: List of names associated with the document (semi-colon delimited)
- rin: List Regulation Identifier Numbers associated with the document (semi-colon delimited)
Here's a sample of the data (with subject column removed as Federal Register titles are quite long):
date type agency names rin docket
2013-03-20 notice Department of transportation G. Kelly Leone 2013-06361
2015-04-02 notice Department of veterans affairs Rebecca Schiller 2015-07509
2012-11-14 notice Department of commerce Gwellnar Banks 2012-27621
2013-07-22 notice Federal communications commission Marlene H. Dortch 2013-17626
2005-10-19 notice Environmental protection agency Vicki A. Simons 05-20709
2016-02-09 notice Office of personnel management Beth F. Cobert 2016-02615
2013-09-19 rule Department of the interior Stephen Guertin RIN 1018-AY52 2013-22702
2009-05-05 notice Department of labor Elliott S. Kushner E9-10237
2010-08-03 notice Small business administration Karen G. Mills 2010-19068
2007-09-05 notice Environmental protection agency James B. Gulliford E7-17542
There are two ways to download, parse, and generate the data you see above: docker (easiest) or by installing the prerequisites (still not that bad)
Assuming docker is installed
docker build -t nickbabcock/register .
docker run -v "$(pwd)/data":/register/data --rm -ti nickbabcock/register
The csv data will be in the data directory
If not interested in the docker solution, you'll need:
- bash shell (linux machine -- potentially mac-os)
- Java
- python3
After the above are installed, run the below scripts, which will do the following:
setup.sh
:- Download the java library for XQuery files into a
saxon
directory - Download the Federal Register data into the
data
directory
- Download the java library for XQuery files into a
run_conversion.sh
- Run the XQuery transformation (
transform.xql
), which outputs JSON lines - Pipe the JSON lines into the python script (
to_csv.py
), which outputs a CSV file
- Run the XQuery transformation (
Only a subset of fields available in the Federal Register are extracted into the CSV. If there is a field missing that you want to see, please open an issue or create a pull request.