Skip to content

lessersunda/lexirumah-data

Repository files navigation

lexiruma-data

The data underlying the LexiRumah CLLD database is maintained and edited here, as well as the pylexirumah python package, which provides an API for accessing, manipulating and publishing the database content.

CLDF

The cldf directory contains the dataset in CLDF 1.0 Wordlist format. Included beyond forms, which are cross-linked to lects (with Glottolog IDs) and concepts (with Concepticon references), are cognate judgements (automatically coded for the time being, but manual changes will be documented) and a borrowing table.

Non-CLDF

In addition to the CLDF dataset, we retain data which has not (yet) been merged into the dataset. The noncldf folder contains the sociolinguistic profile of many of the speakers who contributed word lists as informants.

The keraf subfolder contains the original digitizations of the word lists from Keraf (1978) for reference. The forms in the cldf may have been normalized to IPA and some concepts have been merged with close-but-not-perfect synonyms.

The sulawesi subfolder contains wordlists from South-East Sulawesi, provided by David Mead, as well as the draft for a script to import these lects into LexiRumah.

pylexirumah

Build Status

tests

The tests directory contains tests for functionality in pylexirumah.