Skip to content

langdoc/four-battles-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Creative Commons License

DOI

Four Battles corpus

This corpus contain line-by-line aligned parallel text in several Uralic languages. The organisation of materials is still on-going, and especially Erzya needs to be converted to CoNLL-U format. Also the language tags need to be imported to CoNLL-U files of the other languages.

The corpus has been used in following works:

Bradley Jeremy, Kellner Alexandra & Partanen Niko 2018: Variation in word order in Permic and Mari varieties: a corpus-based investigation. Proceedings of the symposium "Language contacts of the nations of Volga-Ural region", Cheboksary, 21–24.5.2018.

Janurik Boglarka, Kantele Simo & Partanen Niko 2017: Three Uralic languages walk into a bar. Presentation in SLE 2017, Zurich.

The links to original data in National Library of Finland's Fenno-Ugrica collection are as follows:

The materials in Fenno-Ugrica are licensed as Public Domain.

Part of the Komi annotations are also in the Universal Dependencies Komi-Zyrian Lattice treebank. Those annotations are under CC-BY-SA license. However, the texts themselves are entirely copyright free.

The Russian translations are available from publ.lib.ru archive, where it is released on non-commercial license.