# Neural Morphological Tagger / Disambiguator

EstNLTK contains neural morphological tagger / disambiguator that was introduced by [Tkachenko and Sirts (2018)](https://arxiv.org/pdf/1810.06908.pdf). 
It takes Vabamorf's analyses as an input to predict morphological tags (`partofspeech` and `form`) with better accuracy than Vabamorf. 
The tagger also employs a morphological tagset that extends Vabamorf's tags towards [UD](https://universaldependencies.org/guidelines.html)'s morphological features. 
As for the limitation, the tagger does not resolve lemma ambiguities.

## Running as a web tagger

Easiest way to use neural morphological tagger is via EstNLTK's web service:

In [1]:
from estnltk import Text
from estnltk.web_taggers import NeuralMorphDisambWebTagger

neural_morph_tagger = NeuralMorphDisambWebTagger(url='http://127.0.0.1:5000/estnltk/tagger/neural_morph_disamb')
neural_morph_tagger

name,output layer,output attributes,input layers
NeuralMorphDisambWebTagger,neural_morph_disamb,"('morphtag', 'pos', 'form')","('words', 'sentences', 'morph_analysis')"

0,1
url,http://127.0.0.1:5000/estnltk/tagger/neural_morph_disamb
batch_layer,words
batch_layer_max_size,125
batch_enveloping_layer,sentences


In [2]:
# Create input text
text=Text('Kiirelt võetud pangalaen on kärmelt kulunud.').tag_layer('morph_analysis')
# Add neural morph layer
neural_morph_tagger.tag(text)
text['neural_morph_disamb']

layer name,attributes,parent,enveloping,ambiguous,span count
neural_morph_disamb,"morphtag, pos, form",words,,False,7

text,morphtag,pos,form
Kiirelt,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=abl,S,sg abl
võetud,POS=A|DEGREE=pos,A,
pangalaen,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
on,POS=V|VERB_TYPE=aux|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
kärmelt,POS=D,D,
kulunud,POS=V|VERB_TYPE=main|VERB_FORM=partic|TENSE=past|VERB_PS=ps,V,nud
.,POS=Z|PUNCT_TYPE=Fst,Z,


In the output layer, attributes `'pos'`, `'form'` are based on Vabamorf's categories, and can be used to disambiguate the input `morph_analysis` layer; the attribute `'morphtag'` contains model-specific morphological features (Vabamorf's categories extended towards [UD](https://universaldependencies.org/guidelines.html) categories).

You can also use NeuralMorphDisambWebTagger directly as **a disambiguator of the input morph analysis layer**. 
For this, you need to change the output_layer of the tagger to morph analysis layer, and then use the tagger as a retagger:

In [3]:
# Initialize neural morph tagger as retagger
neural_morph_retagger = NeuralMorphDisambWebTagger(url='http://127.0.0.1:5000/estnltk/tagger/neural_morph_disamb', 
                                                   output_layer='morph_analysis')
# Create input text
text=Text('Kiirelt võetud pangalaen on kärmelt kulunud.').tag_layer('morph_analysis')
# Disambiguate morph layer
neural_morph_retagger.retag(text)
text['morph_analysis']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,7

text,normalized_text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Kiirelt,Kiirelt,kiir,kiir,['kiir'],lt,,sg abl,S
võetud,võetud,võetud,võetud,['võetud'],0,,,A
pangalaen,pangalaen,pangalaen,panga_laen,"['panga', 'laen']",0,,sg n,S
on,on,olema,ole,['ole'],0,,b,V
kärmelt,kärmelt,kärmelt,kärmelt,['kärmelt'],0,,,D
kulunud,kulunud,kuluma,kulu,['kulu'],nud,,nud,V
.,.,.,.,['.'],,,,Z


If you need to process large datasets, it is recommended to use neural morphological tagging locally.

## Running locally

*Note: if you want to run neural morphological tagging locally, you need to install [estnltk_neural](https://github.com/estnltk/estnltk/tree/main/estnltk_neural) package. Be aware that the implementation also requires an old `tensorflow` version (version < 2.0, such as 1.15.5), and is not compatible with the newest `tensorflow`.*

There are 4 morphological tagging models available, and for each of these models, there is a tagger class defined in `estnltk_neural.taggers`:

* `SoftmaxEmbTagSumTagger()`
* `SoftmaxEmbCatSumTagger()`
* `Seq2SeqEmbTagSumTagger()`
* `Seq2SeqEmbCatSumTagger()`

Note that models are not distributed with the estnltk\_neural package. You can download them in the following ways:
* If you create a new instance of tagger (`SoftmaxEmbTagSumTagger()`, `Seq2SeqEmbTagSumTagger()` etc) and the model has not been downloaded yet, you'll be prompted with a question asking for a permission to download the model;
* Alternatively, you can pre-download models manually via the download function:

```python
from estnltk import download
# download model for SoftmaxEmbCatSumTagger
download('softmaxembcatsumtagger')
# download model for Seq2SeqEmbTagSumTagger
download('seq2seqembtagsumtagger')
...
```

### `SoftmaxEmbTagSumTagger`

In [1]:
from estnltk import Text
from estnltk_neural.taggers import SoftmaxEmbTagSumTagger

text = Text("See on lause.")
text.tag_layer(['morph_analysis'])
        
tagger = SoftmaxEmbTagSumTagger('morph_softmax_emb_tag_sum')
tagger.tag(text)

This requires downloading resource 'neural_morph_softmax_emb_tag_sum_2019-08-23' (size: 354M). Proceed with downloading? [Y/n] y


Downloading neural_morph_softmax_emb_tag_sum_2019-08-23: 349031it [00:29, 11877.95it/s]


Unpacked resource into subfolder 'neural_morph_disamb/softmax_emb_tag_sum_2019-08-23/' of the resources dir.
Loaded analyses: 341 from file C:\Programmid\Miniconda3\envs\py37_estnltk_neural\lib\site-packages\estnltk\estnltk_resources\neural_morph_disamb\softmax_emb_tag_sum_2019-08-23\output\data\analysis.txt



The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which 

text
See on lause.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,4
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,4
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,4
morph_softmax_emb_tag_sum,"morphtag, pos, form",words,,False,4


Now the text object has a layer named `morph_softmax_emb_tag_sum`, which contains three attributes for every word: `morphtag` (which is the original tag predicted by the neural model), `pos` and `form` (which are morptags converted into vabamorf format).

In [2]:
text['morph_softmax_emb_tag_sum']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_softmax_emb_tag_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,


### `SoftmaxEmbCatSumTagger`

If you want to load a new tagger with a different neural model, you need to reset the previously loaded one with the `reset` method.

In [3]:
from estnltk_neural.taggers import SoftmaxEmbCatSumTagger

tagger.reset()
tagger = SoftmaxEmbCatSumTagger('morph_softmax_emb_cat_sum')
tagger.tag(text)
text['morph_softmax_emb_cat_sum']




INFO:model.py:79: Initializing tf session
INFO:model.py:92: Reloading the latest trained model...
INFO:saver.py:1284: Restoring parameters from C:\Programmid\Miniconda3\envs\py37_estnltk_neural\lib\site-packages\estnltk\estnltk_resources\neural_morph_disamb\softmax_emb_cat_sum_2019-08-23\output\results\model.weights


layer name,attributes,parent,enveloping,ambiguous,span count
morph_softmax_emb_cat_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,


### `Seq2SeqEmbTagSumTagger`

In [4]:
from estnltk_neural.taggers import Seq2SeqEmbTagSumTagger

tagger.reset()
tagger = Seq2SeqEmbTagSumTagger('morph_seq2seq_emb_tag_sum')
tagger.tag(text)
text['morph_seq2seq_emb_tag_sum']

Instructions for updating:
dim is deprecated, use axis instead
INFO:model.py:147: Initializing tf session
INFO:model.py:160: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/seq2seq_emb_tag_sum/output/results/model.weights


layer name,attributes,parent,enveloping,ambiguous,span count
morph_seq2seq_emb_tag_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,


### `Seq2SeqEmbCatSumTagger`

In [5]:
from estnltk_neural.taggers import Seq2SeqEmbCatSumTagger

tagger.reset()
tagger = Seq2SeqEmbCatSumTagger('morph_seq2seq_emb_cat_sum')
tagger.tag(text)
text['morph_seq2seq_emb_cat_sum']

INFO:model.py:147: Initializing tf session
INFO:model.py:160: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/seq2seq_emb_cat_sum/output/results/model.weights


layer name,attributes,parent,enveloping,ambiguous,span count
morph_seq2seq_emb_cat_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,
