CLDF dataset derived from Greenhill et al.'s "Austronesian Basic Vocabulary Database" from 2020 focusing on Oceanic languages

How to cite

If you use these data please cite

the original source

Greenhill, S.J., Blust. R, & Gray, R.D. (2008). The Austronesian Basic Vocabulary Database: From Bioinformatics to Lexomics. Evolutionary Bioinformatics, 4:271-283.
the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://abvd.shh.mpg.de/austronesian/

Conceptlists in Concepticon:

Blust-2008-210

Notes

Notes:

Making a Nexus File:

You will need to have the lexibank dataset installed. Probably best outside the directory:

# set up and install a virtual environment
python -m venv env
source ./env/bin/activate

# clone git repository
git clone https://github.com/lexibank/abvdoceanic

# or update repository
cd abvd_oceanic
git checkout main
git pull
cd ..

# install dataset
cd abvd_oceanic
pip install -e .
cd ..

To make a nexus file, use the custom abvdoceanic.nexus in cldfbench. The parameters are:

--output=/path/to/filename.nex = the output file to write.
--ascertainment={token} add BEASTs ascertainment correction if you want.
- overall - one ascertainment character added for overall correction.
- word - per word ascertainment correction.
--removecombined={int} - set level at which to filter combined cognates.

# make a nexus file, with combined cognates removed above level 2:
cldfbench abvdoceanic.nexus --removecombined 2 --output abvdoceanic.nex

# ...with per-word ascertainment correction:
cldfbench abvdoceanic.nexus --ascertainment=word --removecombined 2 --output abvdoceanic.nex

Statistics

Varieties: 418
Concepts: 191
Lexemes: 78,515
Sources: 0
Synonymy: 1.14
Cognacy: 74,236 cognates in 9,490 cognate sets (2,308 singletons)
Cognate Diversity: 0.12
Invalid lexemes: 0
Tokens: 392,172
Segments: 430 (0 BIPA errors, 0 CLTS sound class errors, 429 CLTS modified)
Inventory size (avg): 30.58

Possible Improvements:

Entries missing sources: 78515/78515 (100.00%)

Contributors

Name	GitHub user	Description	Role
Simon J. Greenhill	@SimonGreenhill	maintainer	Author
Johann-Mattis List	@lingulist	orthography profiles	Other

CLDF Datasets

The following CLDF datasets are available in cldf:

CLDF Wordlist at cldf/cldf-metadata.json

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github/workflows		.github/workflows
abvdoceanic_commands		abvdoceanic_commands
cldf-structure		cldf-structure
cldf		cldf
etc		etc
output		output
plots		plots
raw		raw
scripts		scripts
.gitignore		.gitignore
.zenodo.json		.zenodo.json
CONTRIBUTORS.md		CONTRIBUTORS.md
FORMS.md		FORMS.md
LICENSE		LICENSE
Makefile		Makefile
NOTES.md		NOTES.md
README.md		README.md
TRANSCRIPTION.md		TRANSCRIPTION.md
lexibank_abvdoceanic.py		lexibank_abvdoceanic.py
metadata.json		metadata.json
setup.cfg		setup.cfg
setup.py		setup.py
test.py		test.py
workflow.md		workflow.md

License

lexibank/abvdoceanic

Folders and files

Latest commit

History

Repository files navigation

CLDF dataset derived from Greenhill et al.'s "Austronesian Basic Vocabulary Database" from 2020 focusing on Oceanic languages

How to cite

Description

Notes

Notes:

Making a Nexus File:

Statistics

Possible Improvements:

Contributors

CLDF Datasets

About

Resources

License

Stars

Watchers

Forks

Languages