## Morphological analysis with GT (giellatekno) categories

By default, EstNLTK uses Vabamorf's morphological analysis categories, which are described [here](https://github.com/Filosoft/vabamorf/blob/master/doc/tagset.html) (in Estonian). Vabamorf's categories can be converted to GT (Giellatekno) categories (described [here](http://www2.keeleveeb.ee/dict/corpus/shared/categories.html)), which provide minor fixes on noun forms, and more fine-grained categories for verb forms.

The whole conversion process is handled by `GTMorphConverter`:

In [1]:
from estnltk.text import Text
from estnltk.taggers import GTMorphConverter
gt_converter = GTMorphConverter()
gt_converter

name,output layer,output attributes,input layers
GTMorphConverter,gt_morph_analysis,"('normalized_text', 'lemma', 'root', 'root_tokens', 'ending', 'clitic', 'form', 'partofspeech')","('words', 'sentences', 'morph_analysis', 'clauses')"

0,1
disambiguate_neg,True
disambiguate_sid_ksid,True


After a text has been morphologically analysed and clause tagged, we can use `GTMorphConverter` to convert morphological categories to GT format. As a result of conversion, we will have new layer called `'gt_morph_analysis'`:

In [2]:
text = Text('Rändur võttis istet.')
text.tag_layer(['morph_analysis', 'clauses'])
gt_converter.tag( text )
text

text
Rändur võttis istet.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,4
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,4
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,4
clauses,clause_type,,words,False,1
gt_morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,4


In [3]:
text['gt_morph_analysis']

layer name,attributes,parent,enveloping,ambiguous,span count
gt_morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,4

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Rändur,Rändur,rändur,rändur,"('rändur',)",0,,Sg Nom,S
võttis,võttis,võtma,võt,"('võt',)",is,,Pers Prt Ind Sg 3 Aff,V
istet,istet,iste,iste,"('iste',)",t,,Sg Par,S
.,.,.,.,"('.',)",,,,Z


Attributes `'lemma', 'root', 'root_tokens', 'ending', 'clitic', 'partofspeech'` are exactly the same in `'gt_morph_analysis'` and `'morph_analysis'`. Only the categories listed under `'form'` are different.

#### Disambiguation

The conversion to `'gt_morph_analysis'` can introduce some additional analyses, as some of the Vabamorf's verb categories have more than one corresponding GT category. These ambiguities are also automatically disambiguated by `GTMorphConverterTagger` based on contextual clues.

The first type of ambiguity that will be resolved is the ambiguity between negative and imperative form of a verb. This process can also be switched off with the flag `disambiguate_neg`:

In [4]:
from estnltk.taggers import GTMorphConverter
gt_converter = GTMorphConverter(disambiguate_neg=False) # Switch off disambiguation of negative verbs

text = Text('Sa ei peatu ega vaata ringi. Palun peatu!')
text.tag_layer(['morph_analysis', 'clauses'])
gt_converter.tag( text )
text['gt_morph_analysis']

layer name,attributes,parent,enveloping,ambiguous,span count
gt_morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,10

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Sa,Sa,sina,sina,"('sina',)",0,,Sg Nom,P
ei,ei,ei,ei,"('ei',)",0,,Neg,V
peatu,peatu,peatuma,peatu,"('peatu',)",0,,Pers Prs Imprt Sg 2,V
,peatu,peatuma,peatu,"('peatu',)",0,,Pers Prs Ind Neg,V
ega,ega,ega,ega,"('ega',)",0,,,J
vaata,vaata,vaatama,vaata,"('vaata',)",0,,Pers Prs Imprt Sg 2,V
,vaata,vaatama,vaata,"('vaata',)",0,,Pers Prs Ind Neg,V
ringi,ringi,ringi,ringi,"('ringi',)",0,,,D
.,.,.,.,"('.',)",,,,Z
Palun,Palun,paluma,palu,"('palu',)",n,,Pers Prs Ind Sg 1 Aff,V


In [5]:
from estnltk.taggers import GTMorphConverter
gt_converter = GTMorphConverter(disambiguate_neg=True) # Switch on disambiguation of negative verbs (default)

text = Text('Sa ei peatu ega vaata ringi. Palun peatu!')
text.tag_layer(['morph_analysis', 'clauses'])
gt_converter.tag( text )
text['gt_morph_analysis']

layer name,attributes,parent,enveloping,ambiguous,span count
gt_morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,10

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Sa,Sa,sina,sina,"('sina',)",0,,Sg Nom,P
ei,ei,ei,ei,"('ei',)",0,,Neg,V
peatu,peatu,peatuma,peatu,"('peatu',)",0,,Pers Prs Ind Neg,V
ega,ega,ega,ega,"('ega',)",0,,,J
vaata,vaata,vaatama,vaata,"('vaata',)",0,,Pers Prs Ind Neg,V
ringi,ringi,ringi,ringi,"('ringi',)",0,,,D
.,.,.,.,"('.',)",,,,Z
Palun,Palun,paluma,palu,"('palu',)",n,,Pers Prs Ind Sg 1 Aff,V
peatu,peatu,peatuma,peatu,"('peatu',)",0,,Pers Prs Imprt Sg 2,V
!,!,!,!,"('!',)",,,,Z


The second type of ambiguity that is automatically resolved is the ambiguity between verb forms `'Pers Prt Ind Pl3 Aff'` and `'Pers Prt Ind Sg2 Aff'` ( flag `disambiguate_sid_ksid` ):

In [6]:
from estnltk.taggers import GTMorphConverter
gt_converter = GTMorphConverter(disambiguate_sid_ksid=True) # Switch on disambiguation of -sid, -ksid, -nuksid verbs (default)

text = Text('Sa läksid ära, aga nemad tulid tagasi.')
text.tag_layer(['morph_analysis', 'clauses'])
gt_converter.tag( text )
text['gt_morph_analysis']

layer name,attributes,parent,enveloping,ambiguous,span count
gt_morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,9

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Sa,Sa,sina,sina,"('sina',)",0,,Sg Nom,P
läksid,läksid,minema,mine,"('mine',)",sid,,Pers Prt Ind Sg 2 Aff,V
ära,ära,ära,ära,"('ära',)",0,,,D
",",",",",",",","(',',)",,,,Z
aga,aga,aga,aga,"('aga',)",0,,,J
nemad,nemad,tema,tema,"('tema',)",d,,Pl Nom,P
tulid,tulid,tulema,tule,"('tule',)",id,,Pers Prt Ind Pl 3 Aff,V
tagasi,tagasi,tagasi,tagasi,"('tagasi',)",0,,,D
.,.,.,.,"('.',)",,,,Z


Note: if the flag `disambiguate_sid_ksid` is switched off, then the input layer `"clauses"` is no longer required, as it is only needed for resolving _sid/ksid_ type of ambiguities.