## <span style="color:purple">Dependency syntactic analysis with MaltParser</span>

EstNLTK provides wrappers for several syntactic analysers (see [this document](04_syntactic_analysers_and_utils.ipynb) for details), but only the MaltParser-based syntactic analyser is available by default.

[Maltparser](http://www.maltparser.org) is a data-driven parser that has been trained on [Estonian Dependency Treebank](https://github.com/EstSyntax/EDT). 
For using Maltparser, you need to have:
  * Java SE Runtime Environment (version >= 1.8) installed into the system;
  * `java` in the PATH environment variable;
  

_The default model._ EstNLTK package contains a model for predicting syntax from `morph_analysis` layer, which outputs UD dependency relations. 
The easiest way to use Maltparser's syntactic analysis is by tagging the layer `maltparser_syntax` via default resolver:

In [1]:
from estnltk import Text

text = Text('Ilus suur karvane kass nurrus punasel diivanil')

text.tag_layer('maltparser_syntax')

text
Ilus suur karvane kass nurrus punasel diivanil

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,7
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,7
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,7
maltparser_conll_morph,"id, form, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_analysis,,True,7
maltparser_syntax,"id, lemma, upostag, xpostag, feats, head, deprel, deps, misc, parent_span, children",,,False,7


This creates automatically all the prerequisite layers: segmentation layers and linguistic analysis layers `morph_analysis` and `maltparser_conll_morph`.

The resulting `maltparser_syntax` layer uses [UD tags for the _deprel_ attribute](https://universaldependencies.org/u/dep/index.html) and [Vabamorf's tagset](https://github.com/estnltk/estnltk/blob/main/tutorials/nlp_pipeline/B_morphology/00_tables_of_morphological_categories.ipynb) for other linguistic attributes (_upostag_, _xpostag_, _feats_).

In [2]:
text.maltparser_syntax

layer name,attributes,parent,enveloping,ambiguous,span count
maltparser_syntax,"id, lemma, upostag, xpostag, feats, head, deprel, deps, misc, parent_span, children",,,False,7

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
Ilus,1,ilus,A,A,"{'sg': '', 'n': ''}",4,amod,,,"Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feats' ..., type: <class 'estnltk_core.layer.span.Span'>",()
suur,2,suur,A,A,"{'sg': '', 'n': ''}",4,amod,,,"Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feats' ..., type: <class 'estnltk_core.layer.span.Span'>",()
karvane,3,karvane,A,A,"{'sg': '', 'n': ''}",4,amod,,,"Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feats' ..., type: <class 'estnltk_core.layer.span.Span'>",()
kass,4,kass,S,S,"{'sg': '', 'n': ''}",5,nsubj,,,"Span('nurrus', [{'id': 5, 'lemma': 'nurruma', 'upostag': 'V', 'xpostag': 'V', 'f ..., type: <class 'estnltk_core.layer.span.Span'>","(""Span('Ilus', [{'id': 1, 'lemma': 'ilus', 'upostag': 'A', 'xpostag': 'A', 'feat ..., type: <class 'tuple'>, length: 3"
nurrus,5,nurruma,V,V,{'s': ''},0,root,,,,"(""Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'tuple'>, length: 2"
punasel,6,punane,A,A,"{'sg': '', 'ad': ''}",7,amod,,,"Span('diivanil', [{'id': 7, 'lemma': 'diivan', 'upostag': 'S', 'xpostag': 'S', ' ..., type: <class 'estnltk_core.layer.span.Span'>",()
diivanil,7,diivan,S,S,"{'sg': '', 'ad': ''}",5,obl,,,"Span('nurrus', [{'id': 5, 'lemma': 'nurruma', 'upostag': 'V', 'xpostag': 'V', 'f ..., type: <class 'estnltk_core.layer.span.Span'>","(""Span('punasel', [{'id': 6, 'lemma': 'punane', 'upostag': 'A', 'xpostag': 'A', ..., type: <class 'tuple'>, length: 1"


### Dependency relations

Attributes _id_ and _head_ indicate syntactic dependency relations between words. 
Each word has a unique index within a sentence (_id_), and attribute _head_ points to its parent in the sentence.

Attributes _parent_span_ and _children_ provide a convenient access to the same information: you can get a parent span or all child spans of a word:

In [3]:
# get the span corresponding to the word 'kass'
span = text.maltparser_syntax[3]
span

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
kass,4,kass,S,S,"{'sg': '', 'n': ''}",5,nsubj,,,"Span('nurrus', [{'id': 5, 'lemma': 'nurruma', 'upostag': 'V', 'xpostag': 'V', 'f ..., type: <class 'estnltk_core.layer.span.Span'>","(""Span('Ilus', [{'id': 1, 'lemma': 'ilus', 'upostag': 'A', 'xpostag': 'A', 'feat ..., type: <class 'tuple'>, length: 3"


In [4]:
# get the first dependant of 'kass'
span.annotations[0].children[0]

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
Ilus,1,ilus,A,A,"{'sg': '', 'n': ''}",4,amod,,,"Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feats' ..., type: <class 'estnltk_core.layer.span.Span'>",()


In [5]:
# get the second dependant of 'kass'
span.annotations[0].children[1]

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
suur,2,suur,A,A,"{'sg': '', 'n': ''}",4,amod,,,"Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feats' ..., type: <class 'estnltk_core.layer.span.Span'>",()


In [6]:
# get the third dependant of 'kass'
span.annotations[0].children[2]

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
karvane,3,karvane,A,A,"{'sg': '', 'n': ''}",4,amod,,,"Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feats' ..., type: <class 'estnltk_core.layer.span.Span'>",()


In [7]:
# get the parent word of 'kass'
span.annotations[0].parent_span

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
nurrus,5,nurruma,V,V,{'s': ''},0,root,,,,"(""Span('kass', [{'id': 4, 'lemma': 'kass', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'tuple'>, length: 2"


---

### MaltparserTagger

You can also use MaltParserTagger directly to create the `maltparser_syntax` layer.
This allows to change the parsing model and the tagset; however, models need to be downloaded separately and they have specific preprocessing requirements which need to be met.

The following table gives an overview about EstNLTK's Maltparser models:

|             | UD + morph_analysis | UD + morph_extended | CG + morph_analysis | CG + morph_extended | 
| ----------- | ------------- | --------- | -------------- | -------- |
| **required preprocessing** | `words`, `sentences`, `morph_analysis`, `conll_morph` (from `morph_analysis`)  | `words`, `sentences`, `morph_extended`, `conll_morph` (from `morph_extended`) | `words`, `sentences`, `morph_analysis`, `conll_morph` (from `morph_analysis`) | `words`, `sentences`, `morph_extended`, `conll_morph` (from `morph_extended`) |
| **model name**  | morph_analysis_conllu.mco   |  morph_extended_conllu.mco      | model_morph.mco          | model_morph_ext.mco   |
| **Needs to be downloaded?** | No | Yes | Yes | Yes |
| **Dependency relations** | [UD tags for _deprel_](https://universaldependencies.org/u/dep/index.html) | [UD tags for  _deprel_](https://universaldependencies.org/u/dep/index.html) | [CG tags for _deprel_](https://korpused.keeleressursid.ee/syntaks/dokumendid/syntaksiliides_en.pdf) |  [CG tags for _deprel_](https://korpused.keeleressursid.ee/syntaks/dokumendid/syntaksiliides_en.pdf) |
| **upostag, xpostag, feats** | [Vabamorf's tagset](https://github.com/estnltk/estnltk/blob/main/tutorials/nlp_pipeline/B_morphology/00_tables_of_morphological_categories.ipynb) | [morph_extended tags](01_syntax_preprocessing.ipynb) | [Vabamorf's tagset](https://github.com/estnltk/estnltk/blob/main/tutorials/nlp_pipeline/B_morphology/00_tables_of_morphological_categories.ipynb) | [morph_extended tags](01_syntax_preprocessing.ipynb) |


#### Preprocessing for MaltparserTagger

Use `ConllMorphTagger` with the setting `no_visl=True` in all preprocessing setups of MaltparserTagger. 
This transforms annotations of the preprocessing layer to CONLL format fields, as described [here](01_syntax_preprocessing.ipynb). 
The parameter `morph_extended_layer` can be used to change between input layers `morph_analysis` and `morph_extended`.

In [8]:
from estnltk.taggers import ConllMorphTagger

# create preprocessing tagger
conll_tagger = ConllMorphTagger(output_layer='conll_morph',       # default: 'conll_morph'
                          morph_extended_layer='morph_analysis',  # default: 'morph_extended'
                          no_visl=True
                          )

# Create text and preprocess for Maltparser syntax
from estnltk import Text
text = Text('Ta on ise tee esimesel poolel.')
text.tag_layer('morph_analysis')
conll_tagger.tag(text)

text
Ta on ise tee esimesel poolel.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,7
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,7
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,7
conll_morph,"id, form, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_analysis,,True,7


#### Models

By default, the EstNLTK package contains only the model for predicting syntax with morph_analysis layer and UD tagset. Other models can be downloaded in the following way:

* If you create a new instance of `MaltParserTagger` with the settings that require a non-default model and the model has not been downloaded yet, you'll be prompted with a question asking for permission to download the models;
* Alternatively, you can pre-download models manually via `download` function:

```python
from estnltk import download
download('maltparsertagger')
```

#### Basic usage

You can change MaltParserTagger's model via combination of parameters `version` and `input_type`.
Parameter `version='conllu'` specifies to use a model with the UD deprel tags (default), and `version='conllx'` stands for a model with the CG deprel tags.
Parameter `input_type` specifies the input preprocessing layer, possible values: `input_type='morph_analysis'` (default) and `input_type='morph_extended'`.

In [9]:
from estnltk.taggers import MaltParserTagger

maltparser_tagger = MaltParserTagger(input_type='morph_analysis', version='conllu')

maltparser_tagger.tag( text )

text.maltparser_syntax

layer name,attributes,parent,enveloping,ambiguous,span count
maltparser_syntax,"id, lemma, upostag, xpostag, feats, head, deprel, deps, misc, parent_span, children",,,False,7

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc,parent_span,children
Ta,1,tema,P,P,"{'sg': '', 'n': ''}",6,nsubj:cop,,,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'estnltk_core.layer.span.Span'>",()
on,2,olema,V,V,{'b': ''},6,cop,,,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'estnltk_core.layer.span.Span'>",()
ise,3,ise,P,P,"{'pl': '', 'n': ''}",6,advmod,,,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'estnltk_core.layer.span.Span'>",()
tee,4,tee,S,S,"{'sg': '', 'n': ''}",6,nsubj:cop,,,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'estnltk_core.layer.span.Span'>",()
esimesel,5,esimene,O,O,"{'sg': '', 'ad': ''}",6,amod,,,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'estnltk_core.layer.span.Span'>",()
poolel,6,pool,S,S,"{'sg': '', 'ad': ''}",0,root,,,,"(""Span('Ta', [{'id': 1, 'lemma': 'tema', 'upostag': 'P', 'xpostag': 'P', 'feats' ..., type: <class 'tuple'>, length: 6"
.,7,.,Z,Z,,6,punct,,,"Span('poolel', [{'id': 6, 'lemma': 'pool', 'upostag': 'S', 'xpostag': 'S', 'feat ..., type: <class 'estnltk_core.layer.span.Span'>",()


Internally, MaltParserTagger also applies SyntaxDependencyRetagger to add `parent_span` and `children` attributes to spans.
You can disable this behaviour with the flag `add_parent_and_children`:

```python
maltparser_tagger = MaltParserTagger( add_parent_and_children=False )
```

Note: if you are developing your own models and want to test these with `MaltParserTagger`, you can also pass the folder of models as `MaltParserTagger`'s input parameter `resources_path`. However, this model directory must also contain maltparser jar and a subfolder lib with additional jars required by maltparser (for details, see https://www.maltparser.org/install.html).

---

**TODO:** In addition to the default model, EstNLTK has more syntactic analysis models available for Maltparser, but these models need to be downloaded separately. 
And there are also alternative syntactic parsers available, such UDPipeTagger and StanzaSyntaxTagger. 
For details about how to download models and how to use alternative parsers, see out [this tutorial](04_syntactic_analysers_and_utils.ipynb).