# Neural Morphological Tagger

Performs neural morphological tagging. It takes vabamorf analyses as input to predict morphological tags with better accuracy than vabamorf, but uses a different tag set. The tagger can be obtained using the following methods defined in `estnltk.taggers.neural_morph.new_neural_morph.neural_morph_tagger`, each one of them loads a neural model with different configuration from others:

* `SoftmaxEmbTagSumTagger()`
* `SoftmaxEmbCatSumTagger()`
* `Seq2SeqEmbTagSumTagger()`
* `Seq2SeqEmbCatSumTagger()`

## Setup

Before using neural morphological tagging, you need to obtain one of the models and make the tagger aware of the model.
Models are not distributed with EstNLTK because of their large size.

1. You can download model zip file from the page:
https://entu.keeleressursid.ee/public-document/entity-7645

2. Find the location of EstNLTK's installation in your `conda`'s virtual environment. For instance, if you are using Ubuntu, it might be something like that: <br>
   `/home/{my_user}/conda_envs/py36/lib/python3.6/site-packages/estnltk-1.6.xb0-py3.6-linux-x86_64.egg/estnltk`

3. In EstNLTK's installation folder, there should be a subdirectory corresponding to the model. For instance, if the model was `softmax_emb_cat_sum`, then the directory should be: <br>
`.../taggers/neural_morph/new_neural_morph/softmax_emb_cat_sum` <br>
Unpack the zip file and copy its contents (namely: the directory `'output'`) to this directory;

4. You can use only one model at time. Set the environment variable `NEURAL_MORPH_TAGGER_CONFIG` to point to the directory that contains the model and its configuration. For example, if the model was `softmax_emb_cat_sum`, then on Ubuntu's terminal, you could do something like this: <br>
`export NEURAL_MORPH_TAGGER_CONFIG=/home/{my_user}/conda_envs/py36/lib/python3.6/site-packages/estnltk-1.6.xb0-py3.6-linux-x86_64.egg/estnltk/taggers/neural_morph/new_neural_morph/softmax_emb_cat_sum`

5. Now, it should be done. In below, you can try out the tagging example corresponding to your model.

## `SoftmaxEmbTagSumTagger`

In [1]:
from estnltk import Text
from estnltk.taggers.neural_morph.new_neural_morph.neural_morph_tagger import SoftmaxEmbTagSumTagger

text = Text("See on lause.")
text.tag_layer(['morph_analysis'])
        
tagger = SoftmaxEmbTagSumTagger('morph_softmax_emb_tag_sum')
tagger.tag(text)

Loaded analyses: 341 from file /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/softmax_emb_tag_sum/output/data/analysis.txt
Instructions for updating:
seq_dim is deprecated, use seq_axis instead
Instructions for updating:
batch_dim is deprecated, use batch_axis instead


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:model.py:140: Initializing tf session
INFO:model.py:153: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/softmax_emb_tag_sum/output/results/model.weights


text
See on lause.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,4
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,False,4
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,4
morph_softmax_emb_tag_sum,"morphtag, pos, form",words,,False,4


Now the text object has a layer named `morph_softmax_emb_tag_sum`, which contains three attributes for every word: `morphtag` (which is the original tag predicted by the neural model), `pos` and `form` (which are morptags converted into vabamorf format).

In [2]:
text['morph_softmax_emb_tag_sum']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_softmax_emb_tag_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,


## `SoftmaxEmbCatSumTagger`

If you want to load a new tagger with a different neural model, you need to reset the previously loaded one with the `reset` method."

In [3]:
from estnltk.taggers.neural_morph.new_neural_morph.neural_morph_tagger import SoftmaxEmbCatSumTagger

tagger.reset()
tagger = SoftmaxEmbCatSumTagger('morph_softmax_emb_cat_sum')
tagger.tag(text)
text['morph_softmax_emb_cat_sum']

INFO:model.py:140: Initializing tf session
INFO:model.py:153: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/softmax_emb_cat_sum/output/results/model.weights


layer name,attributes,parent,enveloping,ambiguous,span count
morph_softmax_emb_cat_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,


## `Seq2SeqEmbTagSumTagger`

In [4]:
from estnltk.taggers.neural_morph.new_neural_morph.neural_morph_tagger import Seq2SeqEmbTagSumTagger

tagger.reset()
tagger = Seq2SeqEmbTagSumTagger('morph_seq2seq_emb_tag_sum')
tagger.tag(text)
text['morph_seq2seq_emb_tag_sum']

Instructions for updating:
dim is deprecated, use axis instead
INFO:model.py:147: Initializing tf session
INFO:model.py:160: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/seq2seq_emb_tag_sum/output/results/model.weights


layer name,attributes,parent,enveloping,ambiguous,span count
morph_seq2seq_emb_tag_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,


## `Seq2SeqEmbCatSumTagger`

In [5]:
from estnltk.taggers.neural_morph.new_neural_morph.neural_morph_tagger import Seq2SeqEmbCatSumTagger

tagger.reset()
tagger = Seq2SeqEmbCatSumTagger('morph_seq2seq_emb_cat_sum')
tagger.tag(text)
text['morph_seq2seq_emb_cat_sum']

INFO:model.py:147: Initializing tf session
INFO:model.py:160: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk/estnltk/taggers/neural_morph/new_neural_morph/seq2seq_emb_cat_sum/output/results/model.weights


layer name,attributes,parent,enveloping,ambiguous,span count
morph_seq2seq_emb_cat_sum,"morphtag, pos, form",words,,False,4

text,morphtag,pos,form
See,POS=P|NUMBER=sg|CASE=nom,P,sg n
on,POS=V|VERB_TYPE=main|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
lause,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
.,POS=Z|PUNCT_TYPE=Fst,Z,
