This is an experimental lab for using the morphological paradigm extractor tool pextract together with the Lexical Markup Framework to automatically create:
- a Votic morphological dictionary
- human readable documentation of the extracted morphological paradigms (in LMF)
- source code for the Votic morphology module in the Grammatical Framework
The main idea is to use LMF as a wrapper andn extension for pextract. Paradigm extraction does not need much information, but a linguist does. Both input and output of pextract is supplemented by LMF. The input is modelled on top of the Extensional Morphology module and can carry information about corpus attestations of concrete word forms. The output is modelled as LMF Morphological Patterns. See below for links to example files.
lab:init-lab()
(: LMF GlobalInformation :)
let $language-code := "vot"
let $label := "Votic automatically extracted morphological paradigms"
let $comment := "The morphological paradigms has been extracted with the pextract tool."
let $author := "Kristian Kankainen"
let $languageCoding := "ISO 639-3"
return lab:init-language-resource(
$language-code,
$label,
$comment,
$author,
$languageCoding
)
let $base := file:parent(static-base-uri())
let $pfiles := (
$base || "pextract-votic/pextract/vot-commonNoun.p",
$base || "pextract-votic/pextract/vot-personalPronoun.p"
)
return lab:insert-morphological-patterns-from-pfiles("vot", $pfiles)
lab:get-lexical-resource("vot")
The output can be seen in examples/generated-vot-lmf.xml.
Work in Progress! The file examples/MorphoVot.gf
is generated by pextract-lmf2gf.xq
. The code has many drawbacks and is
still incomplete.
Work in Progress! Much of this effort goes into creating a web frontend to help creating a morphological database using the ideas presented here.