Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time



A postgres 9.1 dump is available:

curl | gzip -d | sed -i -e 's/robert/esteele/g' > ~/Code/language_explorer/data/wals.sql

Loading is done in a subsequent step

Australian Census 2011

Data comes from ABS TableBuilder

  1. Select the "2011 Census - Cultural and Language Diversity" Database
  2. Under Language Spoken At Home (LANP), drill down to find "Australian Indigenous Languages"
  3. Select all at the "LANP - 4 digit level" (237 items)
  4. Click "Add to Row"
  5. Under Proficiency in Spoken English (ENGLP), select all at level "ENGLP" (7 items). (we choose ENGLP rater than ENGP because we are interested in those that do not speak english at home -
  6. Click "Add to Column"
  7. Click "Retrieve Data"
  8. Download Table as type "Comma Separated Value (.csv)"

This downloaded file is data/census_2011_LANP_ENGLP.csv and is accessed in-place, so no further action is required.

Ethnologue Retired Code Element Mappings

This file contains language codes that have been retired or split.


The downloaded file is data/ The schema file is adapted for postgres from the documentation page above

Loading is done in a subsequent step

Loading WALS and Ethnologue RCEM

  1. cd data
  2. ./ (ignore all JPHarvest related errors)
  3. Proceed to the JP Harvest step below

Joshua Project

  1. Make sure mdbtools are installed (database is in MS Access format)
  2. Download database from and unzip (output file is JPHarvestFieldDataOnly.mdb).
  3. run ./data/ JPHarvestFieldDataOnly.mdb

Note (8 Sep 2015): Seems tblGEO*, tblLnkPEOtoGEOLocationInCountry, tblLnkPEOtoGEOStateProvince tables have been removed since 2013 but I don't think I use them so it probably isn't a problem (just remove them from jpharvest-table-insertion-order.txt?)

Creating and loading a data bundle

The data loading process is scripted, but relies on the resources above being available. Once they have been downloaded, edit the locations in data/ and run the script. The script prints the location of the created data bundle upon completion. Copy the bundle to the target machine and unpack it. Once unpacked, run the script that is inside the bundle.