## <span style="color:purple">Dependency syntactic analysis with Visl CG3</span>

The VISL project provides tools for rule-based language analysis with the [Constraint Grammar (CG)](https://visl.sdu.dk/constraint_grammar.html) approach. 
[Estonian CG syntactic parser](https://github.com/EstSyntax/EstCG) has thousands of Estonian-specific handcrafted rules for tagging syntactic functions and dependencies.
EstNLTK contains [a version of EstCG's rules](https://github.com/estnltk/estnltk/blob/main/estnltk/estnltk/taggers/standard/syntax/files/readme.txt) and VislTagger that employs these rules for syntactic parsing.

#### Requirements

**VISLCG3 executable**. In order to use VISLCG3 based syntactic analysis, the VISLCG3 must be installed into the system. 
The information about the VISLCG3 parser is distributed in the [Constraint Grammar's Google Group](https://groups.google.com/g/constraint-grammar), and this is also the place to look for the most compact guide about [getting & installing the parser](https://groups.google.com/g/constraint-grammar/c/fNMkpAb_g3U).

By default, EstNLTK expects that the directory containing VISLCG3 parser's executable (vislcg3 in UNIX, vislcg3.exe in Windows) is accessible from system's environment variable PATH. 
If this requirement is satisfied, the EstNLTK should always be able to use the parser.

You can check the availability of the VISLCG3 parser by typing:

In [1]:
!vislcg3 -V

VISL CG-3 Disambiguator version 1.3.7.13892
Copyright (C) 2007-2021 GrammarSoft ApS. Licensed under GPLv3+


#### Preprocessing for VislTagger

The parser needs more information than is given out by the Vabamorf's morphological analyser, e.g. the parser needs information about pronoun types, verb subcategorization, etc.
This information is provided with EstNLTK's syntax preprocessing layer `morph_extended` (see [this tutorial](01_syntax_preprocessing.ipynb) for details).

Therefore, to use VislTagger, we first need to add layer `morph_extended` to our Text object. 
This can be done simply with the default resolver:

In [2]:
from estnltk import Text
text = Text('Ta on ise tee esimesel poolel.')
# Add preprocessing for syntactic parsing
text.tag_layer('morph_extended')

text
Ta on ise tee esimesel poolel.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,7
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,7
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,7
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,7


If we compare the standard `morph_analysis` layer with `morph_extended`, we can see that `morph_extended` has more refined labels under the 'form' attribute, as well as features like pronoun type, punctuation type, letter case, etc. Those extra features are needed because they are used in VislCG3 grammar rules. 

In addition, `morph_extended` layer is ambiguous, more so than the standard `morph_analysis` layer as it is more detailed. E.g. in our example sentence, we can see that the word 'on' gets 6 different analyses. As the first step of VislTagger is morphological disambiguation, the extra analyses will be removed and will not propagate to syntactic analysis layer.

In [3]:
text.morph_analysis

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,7

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Ta,Ta,tema,tema,['tema'],0,,sg n,P
on,on,olema,ole,['ole'],0,,b,V
,on,olema,ole,['ole'],0,,vad,V
ise,ise,ise,ise,['ise'],0,,sg n,P
,ise,ise,ise,['ise'],0,,pl n,P
tee,tee,tee,tee,['tee'],0,,sg n,S
esimesel,esimesel,esimene,esimene,['esimene'],l,,sg ad,O
poolel,poolel,pool,pool,['pool'],l,,sg ad,S
.,.,.,.,['.'],,,,Z


In [4]:
text.morph_extended

layer name,attributes,parent,enveloping,ambiguous,span count
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,7

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech,punctuation_type,pronoun_type,letter_case,fin,verb_extension_suffix,subcat
Ta,Ta,tema,tema,['tema'],0,,sg nom,P,,['ps3'],cap,,[],
on,on,olema,ole,['ole'],0,,mod indic pres ps3 sg ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,aux indic pres ps3 sg ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,main indic pres ps3 sg ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,mod indic pres ps3 pl ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,aux indic pres ps3 pl ps af,V,,,,True,[],['Intr']
,on,olema,ole,['ole'],0,,main indic pres ps3 pl ps af,V,,,,True,[],['Intr']
ise,ise,ise,ise,['ise'],0,,sg nom,P,,"['pos', 'det', 'refl']",,,[],
,ise,ise,ise,['ise'],0,,pl nom,P,,"['pos', 'det', 'refl']",,,[],
tee,tee,tee,tee,['tee'],0,,com sg nom,S,,,,,[],


#### Basic usage

Once you have correctly installed VISLCG3 parser's executable (see "Requirements" above), EstNLTK should be able to execute the parser and therefore, we can parse our example sentence as follows:

In [5]:
from estnltk.taggers import VislTagger

visl_tagger = VislTagger()
visl_tagger.tag(text)

text.visl

layer name,attributes,parent,enveloping,ambiguous,span count
visl,"id, lemma, ending, partofspeech, subtype, mood, tense, voice, person, inf_form, number, case, polarity, number_format, capitalized, finiteness, subcat, clause_boundary, deprel, head",morph_extended,,True,7

text,id,lemma,ending,partofspeech,subtype,mood,tense,voice,person,inf_form,number,case,polarity,number_format,capitalized,finiteness,subcat,clause_boundary,deprel,head
Ta,1,tema,0,P,pers,_,_,_,ps3,_,sg,nom,_,_,cap,_,_,_,@SUBJ,2
on,2,ole,0,V,main,indic,pres,ps,ps3,_,sg,_,af,_,_,_,_,_,@FMV,0
ise,3,ise,0,P,"['pos', 'det', 'refl']",_,_,_,_,_,sg,nom,_,_,_,_,_,_,@ADVL,2
tee,4,tee,0,S,com,_,_,_,_,_,sg,nom,_,_,_,_,_,_,@PRD,2
esimesel,5,esimene,l,N,ord,_,_,_,_,_,sg,ad,_,l,_,_,_,_,@AN>,6
poolel,6,pool,l,S,com,_,_,_,_,_,sg,ad,_,_,_,_,_,_,"['@<NN', '@ADVL']",2
.,7,.,_,Z,Fst,_,_,_,_,_,_,_,_,_,_,_,_,CLB,_,6


**Interpreting the output**. The parser assigns each word a syntactic label (`deprel`, (e.g. '@SUBJ' stands for subject, see the [documentation](https://korpused.keeleressursid.ee/syntaks/dokumendid/syntaksiliides_en.pdf) for details)) and its syntactic head (`head`) which is the id of its governing word in the sentence. NB! As can be seen from the example, the word id's start from 1 and not 0. The governing word index 0 marks that the current word is the root node of the tree.

As VislCG3 is based on [Constraint Grammar](http://visl.sdu.dk/constraint_grammar.html) formalism and first adds all the syntactic labels and then removes the ones that are not suitable based on constraints, it does leave some syntactic labels ambiguous as is seen with the word 'poolel' in our example sentence which gets both the analyses of a complement (@<NN) and adverbial (@ADVL). Despite this, each word still has only one syntactic head.

Note: By default, VislTagger makes post-corrections on the original VISLCG3 parser output and removes self-references (that is: situations where word's `id` equals `head`). You can use constructor parameter `fix_selfreferences` to turn off these post-corrections.

**How do navigate over syntactic relations?** 
You can use `SyntaxDependencyRetagger` to post-process the syntax layer and add attributes `parent_span` and `children`.
This helps to navigate from a span to its syntactic parent and children.

Example:

In [6]:
from estnltk.taggers import SyntaxDependencyRetagger

SyntaxDependencyRetagger('visl').retag(text)

text.visl

layer name,attributes,parent,enveloping,ambiguous,span count
visl,"id, lemma, ending, partofspeech, subtype, mood, tense, voice, person, inf_form, number, case, polarity, number_format, capitalized, finiteness, subcat, clause_boundary, deprel, head, parent_span, children",morph_extended,,True,7

text,id,lemma,ending,partofspeech,subtype,mood,tense,voice,person,inf_form,number,case,polarity,number_format,capitalized,finiteness,subcat,clause_boundary,deprel,head,parent_span,children
Ta,1,tema,0,P,pers,_,_,_,ps3,_,sg,nom,_,_,cap,_,_,_,@SUBJ,2,"Span('on', [{'id': 2, 'lemma': 'ole', 'ending': '0', 'partofspeech': 'V', 'subty ..., type: <class 'estnltk_core.layer.span.Span'>",()
on,2,ole,0,V,main,indic,pres,ps,ps3,_,sg,_,af,_,_,_,_,_,@FMV,0,,"(""Span('Ta', [{'id': 1, 'lemma': 'tema', 'ending': '0', 'partofspeech': 'P', 'su ..., type: <class 'tuple'>, length: 4"
ise,3,ise,0,P,"['pos', 'det', 'refl']",_,_,_,_,_,sg,nom,_,_,_,_,_,_,@ADVL,2,"Span('on', [{'id': 2, 'lemma': 'ole', 'ending': '0', 'partofspeech': 'V', 'subty ..., type: <class 'estnltk_core.layer.span.Span'>",()
tee,4,tee,0,S,com,_,_,_,_,_,sg,nom,_,_,_,_,_,_,@PRD,2,"Span('on', [{'id': 2, 'lemma': 'ole', 'ending': '0', 'partofspeech': 'V', 'subty ..., type: <class 'estnltk_core.layer.span.Span'>",()
esimesel,5,esimene,l,N,ord,_,_,_,_,_,sg,ad,_,l,_,_,_,_,@AN>,6,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'ending': 'l', 'partofspeech': 'S', ' ..., type: <class 'estnltk_core.layer.span.Span'>",()
poolel,6,pool,l,S,com,_,_,_,_,_,sg,ad,_,_,_,_,_,_,"['@<NN', '@ADVL']",2,"Span('on', [{'id': 2, 'lemma': 'ole', 'ending': '0', 'partofspeech': 'V', 'subty ..., type: <class 'estnltk_core.layer.span.Span'>","(""Span('esimesel', [{'id': 5, 'lemma': 'esimene', 'ending': 'l', 'partofspeech': ..., type: <class 'tuple'>, length: 2"
.,7,.,_,Z,Fst,_,_,_,_,_,_,_,_,_,_,_,_,CLB,_,6,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'ending': 'l', 'partofspeech': 'S', ' ..., type: <class 'estnltk_core.layer.span.Span'>",()


The span `poolel` has a parent span and two child spans:

In [7]:
span = text.visl[5]
span

text,id,lemma,ending,partofspeech,subtype,mood,tense,voice,person,inf_form,number,case,polarity,number_format,capitalized,finiteness,subcat,clause_boundary,deprel,head,parent_span,children
poolel,6,pool,l,S,com,_,_,_,_,_,sg,ad,_,_,_,_,_,_,"['@<NN', '@ADVL']",2,"Span('on', [{'id': 2, 'lemma': 'ole', 'ending': '0', 'partofspeech': 'V', 'subty ..., type: <class 'estnltk_core.layer.span.Span'>","(""Span('esimesel', [{'id': 5, 'lemma': 'esimene', 'ending': 'l', 'partofspeech': ..., type: <class 'tuple'>, length: 2"


To get the parent span write

In [8]:
span.annotations[0].parent_span

text,id,lemma,ending,partofspeech,subtype,mood,tense,voice,person,inf_form,number,case,polarity,number_format,capitalized,finiteness,subcat,clause_boundary,deprel,head,parent_span,children
on,2,ole,0,V,main,indic,pres,ps,ps3,_,sg,_,af,_,_,_,_,_,@FMV,0,,"(""Span('Ta', [{'id': 1, 'lemma': 'tema', 'ending': '0', 'partofspeech': 'P', 'su ..., type: <class 'tuple'>, length: 4"


To iterate over all children write

In [9]:
for child in span.annotations[0].children:
    display(child)

text,id,lemma,ending,partofspeech,subtype,mood,tense,voice,person,inf_form,number,case,polarity,number_format,capitalized,finiteness,subcat,clause_boundary,deprel,head,parent_span,children
esimesel,5,esimene,l,N,ord,_,_,_,_,_,sg,ad,_,l,_,_,_,_,@AN>,6,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'ending': 'l', 'partofspeech': 'S', ' ..., type: <class 'estnltk_core.layer.span.Span'>",()


text,id,lemma,ending,partofspeech,subtype,mood,tense,voice,person,inf_form,number,case,polarity,number_format,capitalized,finiteness,subcat,clause_boundary,deprel,head,parent_span,children
.,7,.,_,Z,Fst,_,_,_,_,_,_,_,_,_,_,_,_,CLB,_,6,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'ending': 'l', 'partofspeech': 'S', ' ..., type: <class 'estnltk_core.layer.span.Span'>",()
