Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


The data underlying the LexiRumah CLLD database is maintained and edited here, as well as the pylexirumah python package, which provides an API for accessing, manipulating and publishing the database content.


The cldf directory contains the dataset in CLDF 1.0 Wordlist format. Included beyond forms, which are cross-linked to lects (with Glottolog IDs) and concepts (with Concepticon references), are cognate judgements (automatically coded for the time being, but manual changes will be documented) and a borrowing table.


In addition to the CLDF dataset, we retain data which has not (yet) been merged into the dataset. The noncldf folder contains the sociolinguistic profile of many of the speakers who contributed word lists as informants.

The keraf subfolder contains the original digitizations of the word lists from Keraf (1978) for reference. The forms in the cldf may have been normalized to IPA and some concepts have been merged with close-but-not-perfect synonyms.

The sulawesi subfolder contains wordlists from South-East Sulawesi, provided by David Mead, as well as the draft for a script to import these lects into LexiRumah.


Build Status


The tests directory contains tests for functionality in pylexirumah.