A set of tools for interacting with data from NERC's Research Outputs Database
Python JavaScript
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
static
templates
.gitignore
README.md
csv2sql.py
viewer.py

README.md

How to use these tools

Create and populate the database:

$ cat rod-schema.sql | sqlite3 rod.db
$ bzcat rod-complete.sql.bz2 | sqlite3 rod.db

Install Flask and run the viewer:

$ virtualenv --distribute pyenv
$ source pyenv/bin/activate
$ pip install flask
$ python viewer.py

Initial extraction

The data has been extracted from an MS Access (JET) database, with the help of mdbtools.

First extract a basic schema. This is a rudimentary conversion, and the end result required extensive editing to make the output SQLite-compatible. PRIMARY KEY and FOREIGN KEY constraints were also added where obvious.

$ mdb-schema rod.mdb > rod-schema.sql

Now get a list of tables we're interested in:

$ ROD_TABLES="$(mdb-tables rod-snap-20100426.mdb | tr ' ' '\n' | grep ROD)"

Extract each of these tables to CSV:

$ echo $ROD_TABLES | while read table; do mdb-export rod-snap-20100426.mdb $table > $table.csv; done

You can optionally convert these CSVs to SQL "INSERT" statement files using csv2sql.py:

$ echo $ROD_TABLES | while read table; do python csv2sql.py $table < $table.csv | bzip2 > $table.sql.bz2; done

Note that you will need to ensure you load the tables into the SQL database in the correct order to ensure FOREIGN KEY constraints are met. In particular, load ROD_GRANTS first, and ROD_OUTPUTS second.

NB: The foreign key constraints won't be met by the data at the moment. The apparent absence of the concept of such constraints from Access means this is a rather messy dataset.