Skip to content
Robb Shecter edited this page Mar 11, 2022 · 24 revisions

Information Architecture

The key idea is to split parsing into two stages. They're analogous to the lexer + parser pair in a compiler. Dividing the parsing into two pieces allows each to be simpler.

The first stage (this repo) crawls and converts original sources to JSON. The actual schema of the JSON mirrors the original content as much as possible. And so, each type of original source will have very different looking JSON. But, being JSON (instead of PDF, HTML, etc.) they're all easily read by the next stage. The second stage can focus on converting the source schema to a particular app's needs.

Public Law Data Flow (Horizontal)

The second stage transforms the JSON to an app's internal representation. That code is outside the scope of this repo because many apps by different developers can use the source data. E.g., Public.Law imports the JSON data into a Postgres database and Netlify static pages. That particular code isn't yet open source.

Example: U.S.A. / Oregon Administrative Rules

Public Law Data Flow Example - OAR

Example: Canada / Department of Justice Legal Glossaries

Public Law Data Flow Example - DoJ Glossaries