# <span style="color:purple">Syntax preprocessing</span>

This tutorial describes the tools that are needed to convert the morphologically analysed Estonian text into the format on which syntactic parsing can be applied. 
Note that Estnltk supports multiple syntactic parsers, and each parser typically has multiple models, so preprocessing setups are varying and there is no single preprocessing setup used in all occasions.
Here we introduce commonly used preprocessing steps, and parsers' tutorials give details about which preprocessing is needed.

## MorphExtendedTagger and 'morph_extended' layer

Layer 'morph_extended' is an extended version of Vabamorf's 'morph_analysis' layer, which has been augmented with additional morpho-syntactic information, such as pronoun type, verb form and verb subcategorization information. 
It is used as an input by Visl CG3-based syntactic analysis, and also by MaltParser's, UDPipe's, and stanza's models.

You can create 'morph_extended' layer directly via default resolver:

In [1]:
from estnltk import Text

text = Text('laulma hüplev tantsija').tag_layer('morph_extended')

text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,3

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,punctuation_type,pronoun_type,letter_case,fin,verb_extension_suffix,subcat
laulma,laulma,laulma,laul,['laul'],ma,,mod sup ps ill,V,,,,False,[],"['NGP-P', 'El', 'All']"
,laulma,laulma,laul,['laul'],ma,,aux sup ps ill,V,,,,False,[],"['NGP-P', 'El', 'All']"
,laulma,laulma,laul,['laul'],ma,,main sup ps ill,V,,,,False,[],"['NGP-P', 'El', 'All']"
hüplev,hüplev,hüplev,hüplev,['hüplev'],0,,pos sg nom,A,,,,,[],
tantsija,tantsija,tantsija,tantsija,['tantsija'],0,,com sg nom,S,,,,,[],


Note that the 'morph_extended' layer is _ambiguous_ as the conversion process does not involve disambiguation.

The 'morph_extended' layer is created by **MorphExtendedTagger**. 
Taggers and retaggers that are used in the process are described here:

|tagger or retagger|source attributes|target attributes|output values|
|------------------|-----------------|-----------------|------|
|MorphToSyntaxMorphRetagger |_Vabamorf's attributes_ | _Vabamorf's attributes_ | partofspeech & form tags from here: https://www.cl.ut.ee/korpused/morfliides/seletus/ |
|PunctuationTypeRetagger | partofspeech, root|punctuation_type| ```None```, 'Fst', 'Com', 'Col', 'Ell', 'Els', 'Scl', 'Int', 'Exc', 'Dsd', 'Dsh', 'Opr', 'Cpr', 'Quo', 'Oqu', 'Cqu', 'Grt', 'Sml', 'Osq', 'Csq', 'Sla', 'crd' |
|PronounTypeRetagger|root, ending, clitic, partofspeech|pronoun_type| ```None```, ('det',), ('pers ps3',), ('pos', 'det', 'refl'), ... [(based on this lexicon)](https://github.com/estnltk/estnltk/blob/main/estnltk/estnltk/taggers/standard/syntax/preprocessing/rules_files/pronouns.csv) |
|LetterCaseRetagger|word_text|cap|```None```, 'cap'|
|RemoveAdpositionAnalysesRetagger|partofspeech, form|&nbsp;|&nbsp;|
|FiniteFormRetagger|partofspeech, form|fin|```None```, ```True```, ```False```|
|VerbExtensionSuffixRetagger|root|verb_extension_suffix|```None```,'tud','nud','mine','nu','tu','du','v','tav','dav','mata','ja'|
|SubcatRetagger|root, partofspeech, form|subcat|```None```, 'Intr', 'Part', 'gen', ...[(based on this lexicon)](https://github.com/estnltk/estnltk/blob/main/estnltk/estnltk/taggers/standard/syntax/preprocessing/rules_files/abileksikon06utf.lx) |

In the following, we'll exemplify the usage of these taggers in detail.

### MorphToSyntaxMorphRetagger
MorphToSyntaxMorphRetagger is the first preprocessing step, which converts morphological categories from Vabamorf's format to the CG format described here: https://www.cl.ut.ee/korpused/morfliides/seletus/ .
( For a comparison of different Estonian morpho-syntactic category systems, please see the document https://cl.ut.ee/ressursid/morfo-systeemid/ )

MorphToSyntaxMorphRetagger can be used both as a tagger (to create a new layer), and as a retagger (to change an existing layer).

Usage example (creating a new layer):

In [2]:
from estnltk.taggers.standard.syntax.preprocessing.morph_to_syntax_morph_retagger import MorphToSyntaxMorphRetagger
base_morph_converter = MorphToSyntaxMorphRetagger()

text = Text('Kumb, sina või mina?').tag_layer('morph_analysis')
base_morph_converter.tag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,6

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Kumb,Kumb,kumb,kumb,['kumb'],0.0,,sg nom,P
",",",",",",",","[',']",,,,Z
sina,sina,sina,sina,['sina'],0.0,,sg nom,P
või,või,või,või,['või'],0.0,,sub crd,J
mina,mina,mina,mina,['mina'],0.0,,com sg nom,S
?,?,?,?,['?'],,,,Z


### PunctuationTypeRetagger	

PunctuationTypeRetagger	adds information about punctuation types:

In [3]:
from estnltk.taggers.standard.syntax.preprocessing.punctuation_type_retagger import PunctuationTypeRetagger

retagger = PunctuationTypeRetagger()

text = Text('Kumb, sina või mina?').tag_layer('morph_analysis')
base_morph_converter.tag(text)

retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type",morph_analysis,,True,6

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,punctuation_type
Kumb,Kumb,kumb,kumb,['kumb'],0.0,,sg nom,P,
",",",",",",",","[',']",,,,Z,Com
sina,sina,sina,sina,['sina'],0.0,,sg nom,P,
või,või,või,või,['või'],0.0,,sub crd,J,
mina,mina,mina,mina,['mina'],0.0,,com sg nom,S,
?,?,?,?,['?'],,,,Z,Int


### PronounTypeRetagger

PronounTypeRetagger adds information about pronoun types, based on [this lexicon](https://github.com/estnltk/estnltk/blob/main/estnltk/estnltk/taggers/standard/syntax/preprocessing/rules_files/pronouns.csv).  Usage example:

In [4]:
from estnltk.taggers import PronounTypeRetagger

retagger = PronounTypeRetagger()

text = Text('Kumb, sina või mina?').tag_layer('morph_analysis')
base_morph_converter.tag(text)

retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, pronoun_type",morph_analysis,,True,6

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,pronoun_type
Kumb,Kumb,kumb,kumb,['kumb'],0.0,,sg nom,P,"('rel',)"
",",",",",",",","[',']",,,,Z,
sina,sina,sina,sina,['sina'],0.0,,sg nom,P,"('ps2',)"
või,või,või,või,['või'],0.0,,sub crd,J,
mina,mina,mina,mina,['mina'],0.0,,com sg nom,S,
?,?,?,?,['?'],,,,Z,


### LetterCaseRetagger

LetterCaseRetagger simply marks words starting with a capital letter:

In [5]:
from estnltk.taggers.standard.syntax.preprocessing.letter_case_retagger import LetterCaseRetagger

retagger = LetterCaseRetagger()

text = Text('Kumb, Sina või mina?').tag_layer('morph_analysis')
base_morph_converter.tag(text)

retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, letter_case",morph_analysis,,True,6

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,letter_case
Kumb,Kumb,kumb,kumb,['kumb'],0.0,,sg nom,P,cap
",",",",",",",","[',']",,,,Z,
Sina,Sina,sina,sina,['sina'],0.0,,sg nom,P,cap
või,või,või,või,['või'],0.0,,sub crd,J,
mina,mina,mina,mina,['mina'],0.0,,com sg nom,S,
?,?,?,?,['?'],,,,Z,


### RemoveAdpositionAnalysesRetagger

RemoveAdpositionAnalysesRetagger removes duplicate adposition analyses (a lexicon specific pre-processing step):

In [6]:
# Convert morph annotations
text = Text('suve eel, talve järel').tag_layer('morph_analysis')
base_morph_converter.tag(text)
# Observe duplicate adposition (K) analyses
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
suve,suve,suvi,suvi,['suvi'],0.0,,com sg gen,S
eel,eel,eel,eel,['eel'],0.0,,post,K
,eel,eel,eel,['eel'],0.0,,pre,K
",",",",",",",","[',']",,,,Z
talve,talve,talv,talv,['talv'],0.0,,com sg gen,S
,talve,tali,tali,['tali'],0.0,,com sg gen,S
järel,järel,järel,järel,['järel'],0.0,,post,K
,järel,järel,järel,['järel'],0.0,,pre,K


In [7]:
from estnltk.taggers.standard.syntax.preprocessing.remove_adposition_analyses_retagger import RemoveAdpositionAnalysesRetagger
retagger = RemoveAdpositionAnalysesRetagger(allow_to_delete_all=False)
# Remove adposition duplicates with the retagger
retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",morph_analysis,,True,5

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
suve,suve,suvi,suvi,['suvi'],0.0,,com sg gen,S
eel,eel,eel,eel,['eel'],0.0,,post,K
",",",",",",",","[',']",,,,Z
talve,talve,talv,talv,['talv'],0.0,,com sg gen,S
,talve,tali,tali,['tali'],0.0,,com sg gen,S
järel,järel,järel,järel,['järel'],0.0,,post,K


Flag `allow_to_delete_all` controls, whether RemoveAdpositionAnalysesRetagger is allowed to remove all analyses of a word. 
While the original implementation of syntactic preprocessing allowed this, EstNLTK does not support words/spans without analyses and so this option should be switched off by default. Switch on at your own risk.

### FiniteFormRetagger
FiniteFormRetagger marks whether a verb form is finite or infinite. Non-verb analyses are marked with _None_.

Usage example:

In [8]:
from estnltk.taggers.standard.syntax.preprocessing.finite_form_retagger import FiniteFormRetagger

retagger = FiniteFormRetagger()

text = Text('laulma hüpelnud tantsija').tag_layer('morph_analysis')
base_morph_converter.tag(text)

retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, fin",morph_analysis,,True,3

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,fin
laulma,laulma,laulma,laul,['laul'],ma,,mod sup ps ill,V,False
,laulma,laulma,laul,['laul'],ma,,aux sup ps ill,V,False
,laulma,laulma,laul,['laul'],ma,,main sup ps ill,V,False
hüpelnud,hüpelnud,hüplema,hüple,['hüple'],nud,,mod indic impf ps neg,V,True
,hüpelnud,hüplema,hüple,['hüple'],nud,,mod partic past ps,V,False
,hüpelnud,hüplema,hüple,['hüple'],nud,,aux indic impf ps neg,V,True
,hüpelnud,hüplema,hüple,['hüple'],nud,,aux partic past ps,V,False
,hüpelnud,hüplema,hüple,['hüple'],nud,,main indic impf ps neg,V,True
,hüpelnud,hüplema,hüple,['hüple'],nud,,main partic past ps,V,False
,hüpelnud,hüpelnud,hüpel=nud,['hüpelnud'],0,,pos,A,


### VerbExtensionSuffixRetagger
VerbExtensionSuffixRetagger annotates suffixes of (infinite) verb forms. Example:

In [9]:
from estnltk.taggers import VerbExtensionSuffixRetagger

retagger = VerbExtensionSuffixRetagger('morph_extended')

text = Text('Laulev hüpelnud tantsija').tag_layer('morph_analysis')
base_morph_converter.tag(text)

retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, verb_extension_suffix",morph_analysis,,True,3

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,verb_extension_suffix
Laulev,Laulev,laulev,laulev,['laulev'],0,,pos sg nom,A,()
hüpelnud,hüpelnud,hüplema,hüple,['hüple'],nud,,mod indic impf ps neg,V,()
,hüpelnud,hüplema,hüple,['hüple'],nud,,mod partic past ps,V,()
,hüpelnud,hüplema,hüple,['hüple'],nud,,aux indic impf ps neg,V,()
,hüpelnud,hüplema,hüple,['hüple'],nud,,aux partic past ps,V,()
,hüpelnud,hüplema,hüple,['hüple'],nud,,main indic impf ps neg,V,()
,hüpelnud,hüplema,hüple,['hüple'],nud,,main partic past ps,V,()
,hüpelnud,hüpelnud,hüpel=nud,['hüpelnud'],0,,pos,A,"('nud',)"
,hüpelnud,hüpelnud,hüpel=nud,['hüpelnud'],0,,pos sg nom,A,"('nud',)"
,hüpelnud,hüpelnud,hüpel=nud,['hüpelnud'],d,,pos pl nom,A,"('nud',)"


### SubcatRetagger

SubcatRetagger marks verb and adposition subcategorization information, based on [this lexicon / rule file](https://github.com/estnltk/estnltk/blob/main/estnltk/estnltk/taggers/standard/syntax/preprocessing/rules_files/abileksikon06utf.lx).  Usage example:

In [10]:
from estnltk.taggers import SubcatRetagger

retagger = SubcatRetagger()

text = Text('Järel juurduma').tag_layer('morph_analysis')
base_morph_converter.tag(text)

retagger.retag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, subcat",morph_analysis,,True,2

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,subcat
Järel,Järel,järel,järel,['järel'],0,,post,K,"('gen',)"
,Järel,järel,järel,['järel'],0,,pre,K,
juurduma,juurduma,juurduma,juurdu,['juurdu'],ma,,mod sup ps ill,V,"('Intr',)"
,juurduma,juurduma,juurdu,['juurdu'],ma,,aux sup ps ill,V,"('Intr',)"
,juurduma,juurduma,juurdu,['juurdu'],ma,,main sup ps ill,V,"('Intr',)"


### MorphExtendedTagger

MorphExtendedTagger glues together all forementioned taggers and retaggers into a single preprocessing tagger, which produces the 'morph_extended' layer.

In [11]:
from estnltk.taggers import MorphExtendedTagger

tagger = MorphExtendedTagger()
tagger

name,output layer,output attributes,input layers
MorphExtendedTagger,morph_extended,"('normalized_text', 'lemma', 'root', 'root_tokens', 'ending', 'clitic', 'form', 'partofspeech', 'punctuation_type', 'pronoun_type', 'letter_case', 'fin', 'verb_extension_suffix', 'subcat')","('morph_analysis',)"

0,1
punctuation_type_retagger,"PunctuationTypeRetagger(('morph_extended',)->morph_extended)"
morph_to_syntax_morph_retagger,"MorphToSyntaxMorphRetagger(('morph_analysis',)->morph_extended)"
pronoun_type_retagger,"PronounTypeRetagger(('morph_extended',)->morph_extended)"
letter_case_retagger,"LetterCaseRetagger(('morph_extended',)->morph_extended)"
remove_adposition_analyses_retagger,"RemoveAdpositionAnalysesRetagger(('morph_extended',)->morph_extended)"
finite_form_retagger,"FiniteFormRetagger(('morph_extended',)->morph_extended)"
verb_extension_suffix_retagger,"VerbExtensionSuffixRetagger(('morph_extended',)->morph_extended)"
subcat_retagger,"SubcatRetagger(('morph_extended',)->morph_extended)"


In [12]:
text = Text('Ta on rääkinud!').tag_layer()
tagger.tag(text)
text['morph_extended']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,4

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,punctuation_type,pronoun_type,letter_case,fin,verb_extension_suffix,subcat
Ta,Ta,tema,tema,['tema'],0,,sg nom,P,,['ps3'],cap,,[],
on,on,olema,ole,['ole'],0,,mod indic pres ps3 sg ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,aux indic pres ps3 sg ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,main indic pres ps3 sg ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,mod indic pres ps3 pl ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,aux indic pres ps3 pl ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,main indic pres ps3 pl ps af,V,,,,True,[],['Intr']
rääkinud,rääkinud,rääkima,rääki,['rääki'],nud,,mod indic impf ps neg,V,,,,True,[],"['Part-P', 'El']"
,rääkinud,rääkima,rääki,['rääki'],nud,,mod partic past ps,V,,,,False,[],"['Part-P', 'El']"
,rääkinud,rääkima,rääki,['rääki'],nud,,aux indic impf ps neg,V,,,,True,[],"['Part-P', 'El']"


## CG3 exporter

CG3 exporter can be used to export Text with 'morph_extended' layer into the [Visl CG format](https://visl.sdu.dk/cg3/single/#stream-vislcg). Example:

In [13]:
from estnltk.converters import export_CG3
text = Text('Lähme! Ta on rääkinud.')
text.tag_layer(['sentences','morph_extended'])
export_CG3(text)

['"<s>"',
 '"<Lähme>"',
 '    "mine" Lme V mod indic pres ps1 pl ps af cap <FinV>',
 '    "mine" Lme V aux indic pres ps1 pl ps af cap <FinV>',
 '    "mine" Lme V main indic pres ps1 pl ps af cap <FinV>',
 '"<!>"',
 '    "!" Z Exc',
 '"</s>"',
 '"<s>"',
 '"<Ta>"',
 '    "tema" L0 P pers ps3 sg nom cap',
 '"<on>"',
 '    "ole" L0 V mod indic pres ps3 sg ps af <FinV> <Intr>',
 '    "ole" L0 V aux indic pres ps3 sg ps af <FinV> <Intr>',
 '    "ole" L0 V main indic pres ps3 sg ps af <FinV> <Intr>',
 '    "ole" L0 V mod indic pres ps3 pl ps af <FinV> <Intr>',
 '    "ole" L0 V aux indic pres ps3 pl ps af <FinV> <Intr>',
 '    "ole" L0 V main indic pres ps3 pl ps af <FinV> <Intr>',
 '"<rääkinud>"',
 '    "rääki" Lnud V mod indic impf ps neg <FinV> <Part-P> <El>',
 '    "rääki" Lnud V mod partic past ps <Part-P> <El>',
 '    "rääki" Lnud V aux indic impf ps neg <FinV> <Part-P> <El>',
 '    "rääki" Lnud V aux partic past ps <Part-P> <El>',
 '    "rääki" Lnud V main indic impf ps neg <FinV> <Par

This is useful when we want to test Visl CG3 parser without the EstNLTK's interface.

## ConllMorphTagger and 'conll_morph' layer

At its simplest, the 'conll_morph' layer is a representation of morpho-syntactic information using fields of the [CoNLL format](https://universaldependencies.org/format.html).
In this way, both 'morph_analysis' and 'morph_extended' layers can be converted to a 'conll_morph' layer ( which has fields _id, form, lemma, upostag, xpostag, feats, head, deprel, deps, misc_ ), while keeping their original morphological categories.
In a more complex pre-processing, 'morph_extended' layer can be processed with Visl CG3-based syntactic parser, which produces a variant of 'conll_morph' layer with extended categories.

The 'conll_morph' layer is used as an input by MaltParser's and UDPipe's models.

In the following, we'll exemplify the usages of ConllMorphTagger.

### ConllMorphTagger on 'morph_analysis' layer

ConllMorphTagger can be used to convert Vabamorf's 'morph_analysis' attributes to CONLL-U attributes (without changing the categories):

In [14]:
from estnltk import Text
from estnltk.taggers import ConllMorphTagger

text = Text("Ta on rääkinud.").tag_layer('morph_analysis')

conll_morph = ConllMorphTagger( morph_extended_layer='morph_analysis',
                                no_visl=True )
conll_morph.tag(text)
text['conll_morph']

layer name,attributes,parent,enveloping,ambiguous,span count
conll_morph,"id, form, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_analysis,,True,4

text,id,form,lemma,upostag,xpostag,feats,head,deprel,deps,misc
Ta,1,Ta,tema,P,P,sg|n,_,_,_,_
on,2,on,olema,V,V,b,_,_,_,_
rääkinud,3,rääkinud,rääkima,V,V,pl|n,_,_,_,_
.,4,.,.,Z,Z,_,_,_,_,_


*Note:* 'conll_morph' does not have ambiguous annotations. This is a result of "random pick disambiguation strategy", which always picks the first part of speech and random _form_ in case of an ambiguity.

### ConllMorphTagger on 'morph_extended' layer (without Visl CG3 parsing)

ConllMorphTagger can be used to convert 'morph_extended' attributes to CONLL-U attributes (without changing the categories):

In [15]:
from estnltk import Text
from estnltk.taggers import ConllMorphTagger

text = Text("Ta on rääkinud.").tag_layer('morph_extended')

conll_morph = ConllMorphTagger( morph_extended_layer='morph_extended',
                                no_visl=True )
conll_morph.tag(text)
text['conll_morph']

layer name,attributes,parent,enveloping,ambiguous,span count
conll_morph,"id, form, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_extended,,True,4

text,id,form,lemma,upostag,xpostag,feats,head,deprel,deps,misc
Ta,1,Ta,tema,P,P,sg|nom,_,_,_,_
on,2,on,olema,V,V,aux|indic|pres|ps3|sg|ps|af,_,_,_,_
rääkinud,3,rääkinud,rääkima,V,V,pos|sg|nom,_,_,_,_
.,4,.,.,Z,Z,_,_,_,_,_


*Note:* Again, 'conll_morph' layer does not have ambiguous annotations, but this is a result of a random pick from ambiguous forms and keeping the first part of speech on each ambiguous word.

### ConllMorphTagger on 'morph_extended' layer (with Visl CG3 parsing)

ConllMorphTagger can be used to process 'morph_extended' layer with VislCG3 parser's Estonian rules, which involves rule-based morphological disambiguation and updating morphological categories for syntax.

Note: this requires that the **VISLCG3 parser is installed into the system, and accessible from system's environment variable PATH** . The information about the parser is distributed in the [Constraint Grammar's Google Group](https://groups.google.com/g/constraint-grammar), and this is also the place to look for the most compact guide about [getting & installing the parser](https://groups.google.com/g/constraint-grammar/c/fNMkpAb_g3U).

You can check the availability of the VISLCG3 parser by typing:

In [16]:
!vislcg3 -V

VISL CG-3 Disambiguator version 1.3.7.13892
Copyright (C) 2007-2021 GrammarSoft ApS. Licensed under GPLv3+


Once you have the parser installed and available, you can apply ConllMorphTagger to produce an enchanced 'conll_morph' layer:

In [17]:
from estnltk import Text
from estnltk.taggers import ConllMorphTagger

text = Text("Ta on rääkinud.").tag_layer('morph_extended')
text.tag_layer('morph_extended')

conll_morph = ConllMorphTagger(morph_extended_layer='morph_extended')
conll_morph.tag(text)
text['conll_morph']

layer name,attributes,parent,enveloping,ambiguous,span count
conll_morph,"id, form, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_extended,,True,4

text,id,form,lemma,upostag,xpostag,feats,head,deprel,deps,misc
Ta,1,Ta,tema,P,Ppers,ps3|sg|nom,_,_,_,_
on,2,on,ole,V,Vaux,indic|pres|ps3|sg,_,_,_,_
rääkinud,3,rääkinud,rääki,V,V,ppast,_,_,_,_
.,4,.,.,Z,Z,Fst,_,_,_,_


## CONLL exporter

Text with 'conll_morph' layer can also be exported into the [CoNLL format](https://universaldependencies.org/format.html) string. Example:

In [18]:
from estnltk.taggers.standard.syntax.conll_morph_to_str import conll_to_str

from estnltk import Text
from estnltk.taggers import ConllMorphTagger

text = Text("Lähme! Ta on rääkinud.").tag_layer('morph_extended')

conll_morph = ConllMorphTagger( morph_extended_layer='morph_extended',
                                no_visl=True )
conll_morph.tag(text)

print(conll_to_str(text))

1	Lähme	minema	V	V	aux|indic|pres|ps1|pl|ps|af	_	_	_	_	
2	!	!	Z	Z	_	_	_	_	_	

1	Ta	tema	P	P	sg|nom	_	_	_	_	
2	on	olema	V	V	main|indic|pres|ps3|pl|ps|af	_	_	_	_	
3	rääkinud	rääkima	V	V	mod|indic|impf|ps|neg	_	_	_	_	
4	.	.	Z	Z	_	_	_	_	_	




This is useful when we want to test Maltparser or UDPipe without the EstNLTK's interface.