Skip to content

Latest commit

 

History

History
213 lines (198 loc) · 47.5 KB

File metadata and controls

213 lines (198 loc) · 47.5 KB

Morphotactics Model

This document outlines the morpheme inventory and tagsets that are used in defining the morphotactics model (affixation models per part-of-speech). Morphotactics model is defined in text files that are under //src/analyzer/morphotactics/model.

Structure of a source morphotactics file

Morphotactics model consists of 19 text files, where each defines the inflectional and derivational morphemes and agglutination patterns of a part-of-speech. They are structured according to the following conventions:

  • Morphotactic model files should use '.txt' file extension.
  • Morphotactic model files should be named as [coarse_pos_tag].txt, where [coarse_pos_tag] stands for the part-of-speech for which the rewrite rules are defined for.
  • Morphotactic model files should end with an empty line (\n).
  • Morphotactic model files can only contain comment lines or lines that define FST rewrite rules.
  • Lines that start with '#' are comment lines and they are disregarded in morphotactics FST compilation.
  • All non-comment lines define a FST rewrite rule. The format of a rewrite rule is similar to AT&T FSM format. It should contain 4 whitespace separated strings. First two are the name of the source and destination states. The last two are the input and output labels. Note that while compiling the morphological analyzer FST, we invert the morphotactics FST, thus the input and output labels switch sides (see //src/analyzer/build.sh).

Morpheme inventory

In below sections we present the morpheme inventory (the set of morphemes that define the Turkish morphology) that is used in overall morphotactics model. Morpheme inventory is presented in three separate sections:

  • Inflectional morphemes: morphemes that define the inflectional paradigm of a part-of-speech. Inflectional morphemes are represented with a preceding + markup. Some inflectional features might not realize in surface form, we specify their meta-morphemes as <eps>.
  • Derivational morphemes: morphemes that alter the part-of-speech of the word when affixed. Derivational morphemes are represented with a preceding - markup. They are always realized in the surface form (no zero-derivations!), therefore there is always a corresponding meta-morpheme for them.
  • Others: these are not really suffixes, but additional tags that mark certain syntactic agreement, semantic and segmentation features, which are helpful in implementing models for morphological disambiguation, part-of-speech tagging and syntactic parsing

Meta-morphemes are composed of fully realized phonemes (represented in lowercase; e.g. {c, s, n} are the fully realized phonemes in cAsHnA) and meta-phonemes (represented in uppercase; {H, A} are the meta-phonemes in cAsHnA). Fully realized phonemes are the ones that occur in the surface form. Meta-phonemes are used to represent allophones and morphophonemics model realizes them to phonemes given the context. The set of meta-phonemes and the morphophonemic processes that resolve them are implemented in self-explanatory Thrax grammars that are under //src/analyzer/morphophonemics.

Inflectional morphemes

Feature Category Feature Value Meta-Morphemes Description Applies To Categories
Case Abl +DAn, +NDAn Ablative case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Acc +YH, +NH Accusative case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Bare <eps> Caseless ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Dat +YA, +NA Dative case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Gen +NHn Genitive case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Ins +YlA Instrumental / Comitative case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Loc +DA, +NDA Locative case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Case Nom <eps> Nominative case ADD, IN, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRP$, PRR, VN, WP
Contrast True +YsA Contrastive ADD, IN, NN, NNP, PRD, PRF, PRI, PRP, PRP$, PRR, RB, VN, WRB, WP
Copula CndCop +YA, +YsA Conditional copula NOMP, VB
Copula EvCop +YmHş Evidential copula NOMP, VB
Copula GenCop +DHr Generalizing copula NOMP, VB
Copula PastCop +YDH Past copula NOMP, VB
Copula PresCop <eps> Present copula NOMP, VB
NumberType Dist +SAr Distributive CD
NumberType Ord +., +HncH Ordinal CD, NN (only number roots)
PersonNumber A3pl +lAr 3rd person plural (marked on nominals) ADD, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRR, VN
PersonNumber A3sg <eps> 3rd person singular (marked on nominals) ADD, NN, NNP, NOMP, PRD, PRF, PRI, PRP, PRR, VN
PersonNumber V1pl +YHz, +k,+lHm 1st person plural (marked on verbals) NOMP, VB
PersonNumber V2pl +sHnHz, +nHz, +YHn, +sAnHzA 2nd person plural (marked on verbals) NOMP, VB
PersonNumber V3pl <eps>, +lAr, +sHn, +sHnlAr 3rd person plural (marked on verbals) NOMP, VB
PersonNumber V1sg +YHm, +m 1st person singular (marked on verbals) NOMP, VB
PersonNumber V2sg <eps>, +sHn,+n, +sAnA 2nd person singular (marked on verbals) NOMP, VB
PersonNumber V3sg <eps>, +sHn 3rd person singular (marked on verbals) NOMP, VB
Polarity Neg +mA Negative polarity VB
Polarity Pos <eps> Positive polarity VB
Possessive P2pl +HnHz 2nd person plural possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP
Possessive P1pl +HmHz 1st person plural possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP
Possessive P3pl +lArH 3rd person possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP
Possessive P1sg +Hm 1st person singular possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP
Possessive P2sg +Hn, +HnHz 2nd person singular possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP
Possessive P3sg +SH 3rd person singular possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRR, VJ, VN, WP
Possessive Pnon <eps> None possessive ADD, NN, NNP, IN, NOMP, PRD, PRF, PRI, PRP, PRR, VJ, VN, WP
TenseAspectMood Aor +Ar, +Hr, +r, +z Aorist tense VB
TenseAspectMood Desr +sA Desire / Past Auxiliary VB
TenseAspectMood Fut +YAcAk Future tense VB
TenseAspectMood Imp <eps> Imperative VB
TenseAspectMood Nar +mHş Narrative past tense / Perfective-Evidential VB
TenseAspectMood Nec +mAlH Necesssitative / Obligative VB
TenseAspectMood Opt +YA Optative VB
TenseAspectMood Past +DH Past tense / Perfective VB
TenseAspectMood Prog1 +Hyor Progressive tense 1 / Imperfective 1 VB
TenseAspectMood Prog2 +mAktA Progressive tense 2 / Imperfective 2 VB

Derivational morphemes

Feature Category Feature Value Meta-Morphemes Description Example Derives From Category - To Category
Derivation Able -YAbil, -YA Ability gel-ebil-ir VB-to-VB
Derivation Acq -lAn Acquire yeşil-len {ADD | NN | NNP | VN}-to-VB
Derivation Act -NCA According to ben-ce {PRP | PRD}-to-RB
Derivation Aff -CHl Affinity et-çil {ADD | NN | NNP}-to-{JJ | NN | NOMP}
Derivation After -YHp After doing so gel-ip VB-to-CRB
Derivation Agt -CH Agentive koşu-cu {ADD | NN | NNP | VN}-to-{JJ | NN | NOMP}
Derivation Alm -YAyaz Almost düş-eyaz+dı VB-to-VB
Derivation AorNom -Hr, -Ar, -r, -z Aorist Nominalizer gel-ir (e.g. elde edilen gelirler) VB-to-{VN | NOMP}
Derivation AorPart -Hr, -Ar, -r, -z Aorist Participle tükenme-z (e.g. tükenmez kalem) VB-to-VJ
Derivation Apostrophe -' Apostrophe Yüzüklerin Efendisi-'+nden {ADD | CC | CD | CRB | DT | DUP | EP | EX | FW | GW | IN | JJ | LS | NFP | NN | NNP | NOMP | OP | PDT | PFX | PRD | PRF | PRI | PRP | PRP$ | PRR | RB | RPC | RPNEG | RPQ | SYM | UH | VB | VJ | VN | WRD | WRB | WP}-to-NN
Derivation As -DHkçA As git-tikçe VB-to-CRB
Derivation AsIf -cAsHnA As if koşar-casına VB-to-{CRB | NOMP}
Derivation Bcm -lAş Become iyi-leş {ADD | NN | NNP}-to-VB
Derivation By -NCA By aklı-nca {ADD | NN | NNP | VN}-to-RB
Derivation Cau -DHr, -Hr, -Ht, -t Causative yap-tır VB-to-VB
Derivation Coll -CA, -CAk, -CAnAk Collective toplu-ca, toplu-cak, toplu-canak {ADD | NN | VN}-to-RB
Derivation Dim -CHk, -cAğHz Diminutive kitap-çık {ADD | NN | VN}-to-NN, {ADD | NN | NNP | VN}-to-NOMP
Derivation Doct -izm Doctrine fütur-izm {ADD | NN | NNP}-to-{NN | NOMP}
Derivation Ever -YAgel Ever sür-egel+en VB-to-VB
Derivation Fam -gil, -lAr Family annem-gil {ADD | NN | NNP}-to-{NN | NOMP}
Derivation Foll -ist, -st Follower fütur-ist {ADD | NN | NNP}-to-{JJ | NN | NOMP}
Derivation For -lHk For kitap-lık, saat-lik {ADD | NN | VN}-to-{JJ | NN | NOMP}
Derivation From -lH From Ankara'-lı {ADD | NN | NNP}-to-{JJ | NN | NOMP}
Derivation FutNom -YAcAk Future Nominalizer yak-acak (e.g. yakacağımız bitti) VB-to-{VN | NOMP}
Derivation FutPart -YAcAk Future Participle yak-acak (e.g. yakacak malzeme) VB-to-VJ
Derivation Ger -YArAk, -DAn Gerund koş-arak (e.g. koşarak geldim), koş+ma-dan (e.g. koşmadan geldim) VB-to-CRB
Derivation Haste -YHver Haste koş-uver VB-to-VB
Derivation Inf -mAk Infinitive koş-mak VB-to-{NOMP | VN}
Derivation Inh -YHcH Inherent del-ici VB-to-{NN | NOMP | VJ}
Derivation Inter -ara Inter kıtalar-ara+sı {ADD | NN}-to-{JJ | NOMP}
Derivation Lang -CA Language Alman-ca {ADD | NN | NNP}-to-{NN | NOMP}
Derivation Like -CA Like insan-ca {ADD | NN | VN}-to-{NN}, {ADD | NN | NNP | VN}-to-{JJ, NOMP}
Derivation Ly -CA, -CAsHnA Adverbial aptal-casına (e.g. aptalcasına davranmak) JJ-to-{JJ | NOMP | RB}, {ADD | NN | NNP}-to-NN
Derivation Make -lA Make işaret-le {ADD | NN | NNP | VN}-to-VB
Derivation Ness -lHk Ness insan-lık {ADD | NN | NNP | VN}-to-{NN | NOMP}
Derivation Nonf -mA, YHş Nonfinite konuş-ma, bak-ış VB-to-{NOMP | VN}
Derivation Of -lArcA Of ton-larca {ADD | NN}-to-{JJ | NN | NOMP}
Derivation Pass -Hl, -Hn Passive yap-ıl+dı VB-to-VB
Derivation PastNom -DHk Past Nominalizer yap-tık+larım VB-to-{NOMP | VN}
Derivation PastPart -DHk Past Participle yap-tığ-ım (e.g. yaptığım şeyler) VB-to-VJ
Derivation PerNom -mHş Perfective Nominalizer gör-müş (e.g. görmüş geçirmiş) VB-to-{NOMP | VN}
Derivation PerPart -mHş Perfective Participle büyü-müş (e.g. büyümüş çocuk) VB-to-VJ
Derivation PresNom -YAn Present Nominalizer gel-en+ler VB-to-{NOMP | VN}
Derivation PresPart -YAn Present Participle kazan-an (e.g. kazanan yarışmacılar) VB-to-VJ
Derivation Pron -ki Pronominalizer evde-ki (e.g. evdekilerin yeri) {ADD | IN | NN | NNP| PRD | PRF | PRI | PRP | PRP$ | PRR | RB | VN | WP}-to-PRF
Derivation ProNom -YAsH Progressive Nominalizer acı-yası VB-to-{NOMP | VN}
Derivation ProPart -YAsH Progressive Participle gülün-esi (e.g. gülünesi şakalar) VB-to-VJ
Derivation Rcp -Hş Reciprocal gül-üş VB-to-VB
Derivation Rel -ki, -kH Relativizer okulda-ki (e.g. okuldaki öğrenciler) {ADD | IN | NN | NNP| PRD | PRI | PRP | PRR | VN | WP}-to-JJ, {ADD | IN | JJ | NN | PRD | PRI | PRP | PRP$ | PRR | RB | VN | WP}-to-NOMP
Derivation Rfx -Hn Reflexive yıka-n VB-to-VB
Derivation Rpt -YAdur Repetitive yürü-yedur VB-to-VB
Derivation Rtd -sAl Related bilim-sel {ADD | NN | NNP | VN}-to-{JJ | NN | NNP | NOMP | VN}
Derivation Sim -HmsHm, -sH, -sHl, -vari, -Hmtrak Similar sarı-msı, sarı-mtrak {ADD | NN | NNP | VN}-to-{JJ | NN | NNP | NOMP | VN}
Derivation Sincb -YAlH Since before yap-alı VB-to-CRB
Derivation Since -DHr Since zaman-dır NN-to-RB
Derivation Snd -lA, -dA Sound fokur-da DUP-to-VB
Derivation Start -YAkoy Start pişir-ekoy VB-to-VB
Derivation Stay -YAkal Stay uyu-yakal VB-to-VB
Derivation When -YHncA When uyu-yunca VB-to-{CRB | NOMP}
Derivation While -Yken, -ken While uyur-ken VB-to-{CRB | NOMP}, {ADD | NN | NNP}-to-RB
Derivation With -lH, -HlH With uyku-lu {ADD | CD | NN | NNP | VB | VN}-to-{NN | NOMP}, {ADD | CD | NN | NNP | VN}-to-JJ, {ADD | NN | NNP | VN}-to-RB, VB-to-VJ
Derivation Wout -sHz Without uyku-suz {ADD | NN | NNP | VN}-to-{JJ | NN | NOMP | RB}

Other tags (Semantic, Syntactic and Segmentation)

Feature Category Feature Value Meta-Morphemes Description Applies To Categories
Apostrophe True +' Apostrophe separating root and inflections ADD, CD, NN (only abbreviated and number roots), NNP, NOMP
ComplementType CAbl <eps> (Postposition has) ablative case marked complement IN
ComplementType CAcc <eps> (Postposition has) accusative case marked complement IN
ComplementType CBare <eps> (Postposition has) caseless complement IN
ComplementType CDat <eps> (Postposition has) dative case marked complement IN
ComplementType CFin <eps> (Postposition has) finite complement IN
ComplementType CGen <eps> (Postposition has) genitive case marked complement IN
ComplementType CIns <eps> (Postposition has) instrumental case marked complement IN
ComplementType CNum <eps> (Postposition has) numeric complement IN
ConjunctionType Adv <eps> Adverbial conjunction CC
ConjunctionType Coor <eps> Coordinating conjunction CC
ConjunctionType Par <eps> Parallel conjunction CC
ConjunctionType Sub <eps> Subordinating conjunction CC
DeterminerType Def <eps> Definitive (determiner) DT
DeterminerType Dem <eps> Demonstrative (determiner) DT
DeterminerType Dir <eps> Directional (determiner) DT
DeterminerType Ind <eps> Indefinite (determiner) DT
Proper False <eps> Inflectional group is not a part of proper noun RB, NN
Proper True <eps> Inflectional group is a part of proper noun RB, NN
Temporal True <eps> Temporal ADD, CC, CD, CRB, DT, DUP, EP, EX, FW, GW, IN, JJ, LS, NFP, NN, NNP, NOMP, OP, PDT, PFX, PRD, PRF, PRI, PRP, PRP$, PRR, RB, RPC, RPNEG, RPQ, SYM, UH, VB, VJ, VN, WRB, WDT, WP