# API for taggers and retaggers
## Tagger
Taggers are used to create layers. Tagger is a properly implemented subclass of the `Tagger` class. There are several taggers in [estnltk/taggers](https://github.com/estnltk/estnltk/tree/devel_1.6/tutorials/taggers) but everyone can make their own. To create a tagger, the following steps need to be taken:
1. create a subclass of `Tagger`,
1. list all the configuration attribute names of the tagger in `conf_param: Sequence[str]`,
1. store the output layer name in `output_layer: str`,
1. list all attribute names of the output layer in `output_attributes: Sequence[str]`,
1. list all layer names that are needed by the tagger as an input in `input_layers: Sequence[str]`,
1. define `__init__(self, ...)` initializing all attributes in `conf_param`,
1. define `_make_layer(self, raw_text: str, layers: Mapping[str, Layer], status: dict=None) -> Layer`.

Note that the **status parameter is deprecated**. To store any metadata use `layer.meta` instead.

The assumption is that the mapping `layers` contains all `input_layers`, but it can contain other layers too.

Let's assume that we have an initialized tagger `tagger` and a `Text` object `text` with necessary layers.
Then
```python
tagger.tag(text)
# or just
tagger(text)

```
creates a new layer and adds it to the `text`. To create a new layer without adding it to the `text` object one can write
```python
layer = tagger.make_layer(text, layers)
```
where 
```python
layers = text.layers
```
or any other proper `dict` of layers not necessarily attached to the `text`.

Prettyprint of the tagger object starts with the first non-empty line of the docstring. The attributes in `conf_param` are printed in the given order, protected attributes (name starts with `_`) are not included.

### Minimal tagger example

This is a tagger that creates a layer `minimal` that contains nothing.

In [1]:
from estnltk import Span, Layer, Text
from estnltk.taggers import Tagger

class MinimalTagger(Tagger):
    """
    Minimal tagger example.
    """
    conf_param = ()
    output_layer = 'minimal'
    output_attributes = ()
    input_layers = ()
    
    def __init__(self):
        pass

    def _make_layer(self, text, layers, status=None):
        return Layer(name=self.output_layer, text_object=text)

minimal_tagger = MinimalTagger()
minimal_tagger

name,output layer,output attributes,input layers
MinimalTagger,minimal,(),()


In [2]:
text = Text('tere')
minimal_tagger.tag(text)

text
tere

layer name,attributes,parent,enveloping,ambiguous,span count
minimal,,,,False,0


In [3]:
text.minimal

layer name,attributes,parent,enveloping,ambiguous,span count
minimal,,,,False,0

text


### Longer Example

This is an example of a tagger that tags numbers in the text. 

In [4]:
import regex as re


class NumberTagger(Tagger):
    """Tags numbers."""

    conf_param = ['regex']

    def __init__(self,
                 output_layer='numbers',
                 output_attributes=(),
                 input_layers=()           
                ):
        self.output_layer = output_layer
        self.output_attributes = output_attributes
        self.input_layers = input_layers
        self.regex = re.compile('-?\d+')

    def _make_layer(self, text, layers, status=None):
        layer = Layer(self.output_layer, text_object=text)
        for m in self.regex.finditer(text.text):
            layer.add_annotation((m.start(), m.end()))
        layer.meta['NumberTagger message'] = 'successfully created {!r} layer'.format(self.output_layer)
        return layer

number_tagger = NumberTagger()
number_tagger

name,output layer,output attributes,input layers
NumberTagger,numbers,(),()

0,1
regex,<Regex -?\d+>


In [5]:
text = Text('-123,45')
number_tagger(text)
text.numbers

0,1
NumberTagger message,successfully created 'numbers' layer

layer name,attributes,parent,enveloping,ambiguous,span count
numbers,,,,False,2

text
-123
45


## Retagger
**Retagger** changes an existing layer. The following **Retagger** adds `value` attribute to the numbers layer.

In [6]:
from copy import deepcopy
from estnltk.taggers import Retagger


class EvaluatingRetagger(Retagger):
    """Evaluates parsed numbers in input layer."""
    conf_param = ()
    
    def __init__(self, output_layer='numbers', input_layers=['numbers']):
        self.output_layer = output_layer
        self.input_layers = input_layers
        self.output_attributes = ['value']

    def _change_layer(self, raw_text, layers, status):
        layer = layers[self.output_layer]
        layer.attributes += tuple(self.output_attributes)
        for span in layers[self.input_layers[0]]:
            span.value = int(span.text)
        layer.meta['EvaluatingRetagger message'] = "successfully added 'value' attribute"


evaluating_retagger = EvaluatingRetagger('numbers')
evaluating_retagger

name,output layer,output attributes,input layers
EvaluatingRetagger,numbers,"('value',)","('numbers',)"


In [7]:
evaluating_retagger.retag(text)

text
-12345

layer name,attributes,parent,enveloping,ambiguous,span count
numbers,value,,,False,2


In [8]:
text.numbers

0,1
EvaluatingRetagger message,successfully added 'value' attribute
NumberTagger message,successfully created 'numbers' layer

layer name,attributes,parent,enveloping,ambiguous,span count
numbers,value,,,False,2

text,value
-123,-123
45,45
