# Neural Morphological Tagger

Neural morphological tagger performs morphological analysis and disambiguation. Unlike the `VabamorfTagger`, which in some cases outputs ambiguous results, the neural tagger always returns exactly one analysis per word.

The default model was trained on Morphologically Disambiguated Corpus [1] and achieves an accuracy of 98.02%. As a side effect, the neural tagger uses a tag set [2] which is not compatible with the Vabamorf's own.

For technical details, check the paper [3], where the current model is referred to as a multiclass (MC).

1.	Morphologically disambiguated corpus http://www.cl.ut.ee/korpused/morfkorpus/index.php?lang=en
2.	Morpho-syntactic categories http://www.cl.ut.ee/korpused/morfliides/seletus
3.	Tkachenko, A. and Sirts, K. (2018, September). Neural Morphological Tagging for Estonian. In BalticHLT.


## Usage


Preliminary steps:
* Install tensorflow 1.4.0:
    * using conda: *conda install -c conda-forge tensorflow==1.4.0*
    
    * using pip: *pip install tensorflow==1.4.0*
    
* Download the pre-trained model from `http://kodu.ut.ee/~distorti/estnltk/neural_morph_tagger/models/md-mc-emb-tag.tar.gz`.
* Provide a configuration file which specifies the location of your model. For this, make a copy of a default configuration file `estnltk/estnltk/neural_morph/config.py` and edit the `model_dir` attribute.
* Define an environment variable 'NEURAL_MORPH_TAGGER_CONFIG' to point to your configuration file.

Finally, you can run the tagger:

In [1]:
import os
import pandas as pd
from estnltk.taggers.neural_morph.old_neural_morph.neural_morph_tagger import NeuralMorphTagger
from estnltk import Text

os.environ['NEURAL_MORPH_TAGGER_CONFIG'] = os.path.expanduser('~/neural_morph_tagger_config.py')

In [2]:
tagger = NeuralMorphTagger()

Instructions for updating:
seq_dim is deprecated, use seq_axis instead
Instructions for updating:
batch_dim is deprecated, use batch_axis instead


  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


INFO:model.py:476: Initializing tf session
INFO:model.py:489: Reloading the latest trained model...
INFO:tf_logging.py:115: Restoring parameters from /home/paul/Projects/estnltk_neural_morph_model/model/results/model.weights


In [3]:
text = Text("Eberhardt'i poolt katsetatud prototüübi baasil loodud masin")
text.tag_layer(["morph_analysis"])
tagger.tag(text)

text
Eberhardt'i poolt katsetatud prototüübi baasil loodud masin

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,9
compound_tokens,"type, normalized",,tokens,False,1
words,normalized_form,,,False,7
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,7
neural_morph_analysis,morphtag,morph_analysis,,False,7


The morpholoical tags can be now accessed using an attribute `morphtag`:

In [4]:
pd.DataFrame(data=[(w.text, w.neural_morph_analysis.morphtag, w.partofspeech, w.form) for w in text.words],
             columns=["word", "morphtag", "partofspeech", "form"])

Unnamed: 0,word,morphtag,partofspeech,form
0,Eberhardt'i,_S_|prop|sg|gen,(H),(sg g)
1,poolt,_K_|post,(K),()
2,katsetatud,_A_|pos,"(A, A, A, V)","(, sg n, pl n, tud)"
3,prototüübi,_S_|com|sg|gen,(S),(sg g)
4,baasil,_S_|com|sg|ad,(S),(sg ad)
5,loodud,_A_|pos,"(A, A, V, A)","(, sg n, tud, pl n)"
6,masin,_S_|com|sg|nom,(S),(sg n)


Note that the neural tagger correcty analysed words *katsetatud* and *loodud* for which vabamorf produced ambiguous results.

## Training
    
To train your own model, first, download the training data from http://kodu.ut.ee/~distorti/estnltk/neural_morph_tagger/data/md/data.tar.gz.

Second, download pre-trained fastText word embeddings from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md.

Third, create a configuration file. For this, make a copy of the default configuration file `estnltk/estnltk/neural_morph/config.py` and edit attributes:
* data_dir - folder containing train/dev/test files (e.g. resources/data/md)
* embeddings_file - embeddings file
* model_dir - directory where the model will be saved

Finally, you can train the model:

    python estnltk/estnltk/neural_morph/scripts/train.py --config <configuration-file>

To evaluate the model on the test set, run:
    
    python estnltk/estnltk/neural_morph/scripts/evaluate.py --config <configuration-file> --test

## Unit Tests

To unit test the morphological tagger, define an environment variable *NEURAL_MORPH_TAGGER_CONFIG* as explained above (otherwise the tests will be skipped) and run a command
    
    python -m unittest discover estnltk.tests.test_taggers.test_neural_morf_tagger -vvv