Skip to content

Bootstrap the Votic GF morphology resource using pextract

Notifications You must be signed in to change notification settings

keeleleek/pextract2gf-votic

Repository files navigation

Paradigm Extract and LMF Lab

This is an experimental lab for using the morphological paradigm extractor tool pextract together with the Lexical Markup Framework to automatically create:

  • a Votic morphological dictionary
  • human readable documentation of the extracted morphological paradigms (in LMF)
  • source code for the Votic morphology module in the Grammatical Framework

The main idea is to use LMF as a wrapper andn extension for pextract. Paradigm extraction does not need much information, but a linguist does. Both input and output of pextract is supplemented by LMF. The input is modelled on top of the Extensional Morphology module and can carry information about corpus attestations of concrete word forms. The output is modelled as LMF Morphological Patterns. See below for links to example files.

Usage demonstration

Initialize the whole lab

lab:init-lab()

Initialize a language resource for Votic

(: LMF GlobalInformation :)
let $language-code := "vot"
let $label   := "Votic automatically extracted morphological paradigms"
let $comment := "The morphological paradigms has been extracted with the pextract tool."
let $author  := "Kristian Kankainen"
let $languageCoding := "ISO 639-3"

return lab:init-language-resource(
  $language-code,
  $label,
  $comment,
  $author,
  $languageCoding
)

Add paradigms from pfiles

let $base := file:parent(static-base-uri())
let $pfiles := (
  $base || "pextract-votic/pextract/vot-commonNoun.p",
  $base || "pextract-votic/pextract/vot-personalPronoun.p"
)

return lab:insert-morphological-patterns-from-pfiles("vot", $pfiles)

Output the built Votic LMF language resource

lab:get-lexical-resource("vot")

The output can be seen in examples/generated-vot-lmf.xml.

Generate source code for the Grammatical Framework paradigm functions

Work in Progress! The file examples/MorphoVot.gf is generated by pextract-lmf2gf.xq. The code has many drawbacks and is still incomplete.

View the lab's web frontend

Work in Progress! Much of this effort goes into creating a web frontend to help creating a morphological database using the ideas presented here.