# Gaps tagging
Gaps in layers can be found using ``GapTagger`` or ``EnvelopingGapTagger``.
## GapTagger
``GapTagger`` tags gaps of input layers. Input layers can be of any type. The resulting gaps layer is a simple layer of text spans. A gap is a maximal span of consequtive letters that are not covered by any span of any input layer. A letter is covered by a span if it lays between the start and end of that span. It means that gaps between spans of enveloping spans are not considered.

The gaps can be trimmed using a trim function and annotated using a decorator function.

In [1]:
from estnltk import Text, Layer

text = Text('Üks kaks kolm neli viis kuus seitse.')
layer_1 = Layer('test_1')
layer_1.add_annotation((4, 8))
layer_1.add_annotation((9, 13))
layer_1.add_annotation((24, 28))
text.add_layer(layer_1)

layer_2 = Layer('test_2')
layer_2.add_annotation((4, 8))
layer_2.add_annotation((9, 18))
layer_2.add_annotation((35, 36))
text.add_layer(layer_2)

### Example 1

In [2]:
from estnltk.taggers import GapTagger
gap_tagger = GapTagger('simple_gaps', ['test_1', 'test_2'])
gap_tagger.tag(text)
text.simple_gaps

layer name,attributes,parent,enveloping,ambiguous,span count
simple_gaps,,,,False,4

text
Üks
viis
seitse


The following illustrates examples 1 and 2.

    text:           'Üks kaks kolm neli viis kuus seitse.'
    test_1:             'kaks'kolm'         'kuus'      
    test_2:             'kaks'kolm neli'               '.'
    simple_gaps:    'Üks '  ' '       ' viis '  ' seitse'
    gaps:           'Üks'              'viis'    'seitse'

### Example 2

In [3]:
def trim(t:str) -> str:
    return t.strip()

def decorator(text:str):
    return {'gap_length':len(text)}

gap_tagger = GapTagger(output_layer='gaps',
                       input_layers=['test_1', 'test_2'],
                       trim=trim,
                       decorator=decorator,
                       output_attributes=['gap_length'])
gap_tagger

name,output layer,output attributes,input layers
GapTagger,gaps,"('gap_length',)","('test_1', 'test_2')"

0,1
decorator,<function __main__.decorator>
trim,<function __main__.trim>
ambiguous,False


In [4]:
gap_tagger.tag(text)
text.gaps

layer name,attributes,parent,enveloping,ambiguous,span count
gaps,gap_length,,,False,3

text,gap_length
Üks,3
viis,4
seitse,6


## EnvelopingGapTagger

``EnvelopingGapTagger`` tags gaps of enveloping layers. All input layers must be enveloping the same layer. Input layers can be ambiguous or unambiguous. The gaps layer of these layers is an unambiguous enveloping layer. A gap is a maximal SpanList of consequtive spans of enveloped layer that are not enveloped by any input layer.

The gaps can be annotated using a decorator function.

In [5]:
from estnltk import EnvelopingSpan

text = Text('Üks kaks kolm neli viis kuus seitse.')
text.tag_layer(['words'])

layer = Layer('test_3', enveloping='words', text_object=text)

layer.add_annotation(text.words[0:2])
layer.add_annotation(text.words[3:4])

text.add_layer(layer)
text.test_3

layer name,attributes,parent,enveloping,ambiguous,span count
test_3,,,words,False,2

text
"['Üks', 'kaks']"
['neli']


In [6]:
layer = Layer('test_4', enveloping='words', ambiguous=True)

layer.add_annotation(text.words[3:5])

text.add_layer(layer)
text.test_4

layer name,attributes,parent,enveloping,ambiguous,span count
test_4,,,words,True,1

text
"['neli', 'viis']"


In [7]:
from estnltk.taggers import EnvelopingGapTagger

def decorator(spans):
    return {'gap_word_count': len(spans)}

gap_tagger = EnvelopingGapTagger(output_layer='gaps',
                                 layers_with_gaps=['test_3', 'test_4'],
                                 enveloped_layer='words',
                                 decorator=decorator,
                                 output_attributes=['gap_word_count'])
gap_tagger

name,output layer,output attributes,input layers
EnvelopingGapTagger,gaps,"('gap_word_count',)","('test_3', 'test_4', 'words')"

0,1
decorator,<function __main__.decorator>
layers_with_gaps,"['test_3', 'test_4']"
enveloped_layer,words


In [8]:
gap_tagger.tag(text)
text.gaps

layer name,attributes,parent,enveloping,ambiguous,span count
gaps,gap_word_count,,words,False,2

text,gap_word_count
['kolm'],1
"['kuus', 'seitse', '.']",3
