Skip to content

lexibank/lexibank-analysed

Repository files navigation

Lexibank Analysed

How to cite

If you use these data please cite

  • the original source

    List, Johann-Mattis; Forkel, Robert; Greenhill, Simon J.; Rzymski, Christoph; Englisch, Johannes; and Russell D. Gray (2021): Lexibank: A publicly available repository of standardized lexical datasets with automatically computed phonological and lexical features for more than 2000 language varieties [Dataset, Version 1.0]. Geneva: Zenodo.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at https://lexibank.clld.org

Statistics

Glottolog: 100% Concepticon: 100% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 2,029
  • Concepts: 3,033
  • Lexemes: 709,638
  • Sources: 75
  • Synonymy: 1.14
  • Invalid lexemes: 0
  • Tokens: 3,857,425
  • Segments: 1,486 (0 BIPA errors, 0 CLTS sound class errors, 1478 CLTS modified)
  • Inventory size (avg): 37.46

Possible Improvements:

Contributors

Name GitHub user Description Role
Johann-Mattis List @LinguList maintainer Author
Robert Forkel @xrotwang maintainer Author
Simon J. Greenhill @simongreenhill maintainer Author
Christoph Rzymski @chrzyki maintainer Author
Johannes Englisch @johenglisch maintainer Author
Russell D. Gray maintainer Author

CLDF Datasets

The following CLDF datasets are available in cldf: