Skip to content

lexibank/wold

Repository files navigation

CLDF dataset derived from Haspelmath and Tadmor's "World Loanword Database" from 2009

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Haspelmath, Martin & Tadmor, Uri (eds.) 2009. World Loanword Database. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at http://wold.clld.org)

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-BY-4.0 license

Available online at http://wold.clld.org

Conceptlists in Concepticon:

Notes

The World Loanword Database, edited by Martin Haspelmath and Uri Tadmor, is a scientific publication by the Max Planck Institute for Evolutionary Anthropology, Leipzig (2009).

It provides vocabularies (mini-dictionaries of about 1000-2000 entries) of 41 languages from around the world, with comprehensive information about the loanword status of each word. It allows users to find loanwords, source words and donor languages in each of the 41 languages, but also makes it easy to compare loanwords across languages.

Each vocabulary was contributed by an expert on the language and its history. An accompanying book has been published by De Gruyter Mouton (Loanwords in the World's Languages: A Comparative Handbook, edited by Martin Haspelmath & Uri Tadmor).

The World Loanword Database consists of vocabularies contributed by 41 different authors or author teams. When citing material from the database, please cite the corresponding vocabulary (or vocabularies).

The World Loanword Database is the result of a collaborative project coordinated by Uri Tadmor and Martin Haspelmath between 2004 and 2008, called the Loanword Typology Project (LWT). Most of the contributors took part in workshops at which the procedures for selecting and annotating words were discussed extensively. The list of 1460 meanings on which the vocabularies are based is called the Loanword Typology meaning list, and it is in turn based on the list of the Intercontinental Dictionary Series.

Statistics

CLDF validation Glottolog: 100% Concepticon: 99% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 41
  • Concepts: 1,814
  • Lexemes: 64,289
  • Sources: 41
  • Synonymy: 1.20
  • Invalid lexemes: 0
  • Tokens: 365,462
  • Segments: 631 (0 BIPA errors, 0 CTLS sound class errors, 626 CLTS modified)
  • Inventory size (avg): 54.68

Contributors

Name GitHub user Description Role
Tiago Tresoldi @tresoldi patron, maintainer, orthographic profiles Other
Robert Forkel @xrotwang maintainer Other
Natalia Morozova @natalia-morozova orthographic profiles Other
Martin Haspelmath publication editor Author, Editor
Uri Tadmor publication editor Author, Editor

CLDF Datasets

The following CLDF datasets are available in cldf: