(Francis Avila, April 2013)
Wikisourceify takes CatoXML-enhanced XML (US Congressional bill XML with semantic tagging extensions) from the Deepbills Cato project and produces hyperlinked Wikitext intended for publication on Wikisource.
Also contains crosswalks between government entity ID systems and Wikipedia pages.
The XML source and final built TXT files are also included in the repository.
Makefilebuilds everything. (It requires
Pythonto build anything.)
lookupscontains lookup tables (produced by Cato interns) which map from common unique identifiers to corresponding Wikipedia pages. The source version is an excel spreadsheet, which is exported to a csv, which is processed by a Python script into an XML file. Includes crosswalks for:
- Federal Bodies (e.g., agencies and bureaus) using NIST SP800-87 codes
- Congressional committees
- Federal Elective Officials (congressmen) using Bioguide IDs
- Public Laws
xmlcontains the source CatoXML-enhanced XML (see the CatoXML namespace documentation).
wikicontains the target wikitext
dumpxml.xqgets source XML from the deepbills BaseX database using XQuery
wikilookup.pyis a Python script to produce the XML lookup tables
xml2wiki.xslis an XSLT 1.0 script to produce the wikitext from the source XML. (It uses exslt's
strings:tokenizeextension, but otherwise is vanilla XSLT 1.0.)
How to Build
- First ensure you have source XML; bills the Cato interns have completed are
included already. If you want the latest data run
make xmlsource(requires you have a deepbills BaseX server running).
makewill build the lookup XML and the
wiki/*.txtfiles. (This is easy and safe to parallelize with
make -j 100.)
make cleanwill remove