The data underlying the LexiRumah
CLLD database is maintained and edited here, as well as the
pylexirumah python package, which provides an API for
accessing, manipulating and publishing the database content.
cldf directory contains the dataset in
CLDF 1.0 Wordlist
format. Included beyond forms, which are cross-linked to lects (with Glottolog IDs)
and concepts (with Concepticon references), are
cognate judgements (automatically coded for the time being, but manual changes will
be documented) and a borrowing table.
In addition to the CLDF dataset, we retain data which has not (yet) been merged into
the dataset. The
noncldf folder contains the sociolinguistic profile of many of
the speakers who contributed word lists as informants.
contains the original digitizations of the word lists from Keraf (1978) for
reference. The forms in the cldf may have been normalized to IPA and some
concepts have been merged with close-but-not-perfect synonyms.
sulawesi subfolder contains wordlists from South-East Sulawesi, provided by
David Mead, as well as the draft for a script to import these lects into LexiRumah.
tests directory contains tests for functionality in