# Layer operations
## Excerpt

Create a text object.

In [1]:
from estnltk import Text

text = Text('Tere, maailm!').analyse('morphology')
text

text
"Tere, maailm!"

layer name,attributes,parent,enveloping,ambiguous,span count
words,normalized_form,,,False,4
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,4


Excerpt the first 9 characters from the text.

In [2]:
from estnltk.layer_operations import excerpt

excerpt(text, 0, 9)

text
"Tere, maa"

layer name,attributes,parent,enveloping,ambiguous,span count
words,normalized_form,,,False,2
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,2


This is equivalent of writing
```python
excerpt(text=text,
        start=0,
        end=9,
        layers_to_keep=None,
        trim_overlapping=False)
```
where<br>
**text** is a Text object<br>
**start** is the index of the first character of the excerpt in the text<br>
**end** is the index of the first character after the excerpt in the text<br>
**layers_to_keep** is a tuple of layer names to be kept. 
        The dependences must also be included, that is, if a layer in the tuple
        has a parent or is enveloping, then the parent or enveloped layer
        must also be in this tuple. If `None` (the default), all layers are kept<br>
**trim_overlapping** If `False` (the default), overlapping spans are not kept in the excerpted text.
If `True`, overlapping spans are trimmed to fit the boundaries.

Here span count 2 means that 'Tere' and ',' are tagged as words, but the letters 'maa' are not covered by any spans since it is a part of a longer word 'maailm'.

In the next example the span of 'maailm' is trimmed to cover the letters 'maa'. That gives a strange result where the analysis of 'maailm' is attached to the partial word 'maa'. So, use the trimming option with caution.

In [3]:
excerpt(text, 0, 9, ('words', 'morph_analysis'), True)['morph_analysis']

layer name,attributes,parent,enveloping,ambiguous,span count
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,3

text,lemma,root,root_tokens,ending,clitic,form,partofspeech
Tere,tere,tere,"(tere,)",0.0,,,I
",",",",",","(,,)",,,,Z
maa,maailm,maa_ilm,"(maa, ilm)",0.0,,sg n,S


A more practical use case of
```python
trim_overlapping=True
```
would be trimming a span of a paragraph while leaving out the last sentence of a text.

## Splitting
Now let's create a text with three sentences.

In [4]:
t = '''Esimene lause.

Teine lõik. Kolmas lause.'''

from estnltk import Text
text = Text(t)
text.analyse('all')
text

text
Esimene lause. Teine lõik. Kolmas lause.

layer name,attributes,parent,enveloping,ambiguous,span count
paragraphs,,,sentences,False,2
sentences,,,words,False,3
tokens,,,,False,9
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,False,9
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,9
morph_extended,"lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,9


### `split_by_sentences`
Using the `split_by_sentences` function, we can turn the text object into a list of three new text objects, each containig one sentence of the original text.

In [5]:
from estnltk.layer_operations import split_by_sentences
texts = split_by_sentences(text)
texts

[Text(text="Esimene lause."),
 Text(text="Teine lõik."),
 Text(text="Kolmas lause.")]

Here is the second sentence.

In [6]:
texts[1]

text
Teine lõik.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
words,normalized_form,,,False,3
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,3
morph_extended,"lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,3


### `split_by`
Using the `split_by` function, the text object can be split into pieces by any layer. Here, for instance, we split the text by words and print the first one.

In [7]:
from estnltk.layer_operations import split_by
words = split_by(text, 'words', layers_to_keep=('words','morph_analysis'))
words[0]

text
Esimene

layer name,attributes,parent,enveloping,ambiguous,span count
words,normalized_form,,,False,1
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,1


## Rebase

The parent of the `morph_extended` layer is `morph_analysis`. So, if one deletes `morph_analysis` layer, then `morph_extended` layer is also deleted. To avoid this, the `parent` attribute of `morph_extended` can be changed to `words` using the `rebase` function.

This can be done because, the `_base` attribute of both layers is the same:

In [8]:
text['morph_analysis']._base, text['morph_extended']._base

('words', 'words')

(In the future, it might be a good idea to replace the `parent` attribute with the `_base` attribute.)

In [9]:
from estnltk.layer_operations import rebase
rebase(text, 'morph_extended', 'words')

text
Esimene lause. Teine lõik. Kolmas lause.

layer name,attributes,parent,enveloping,ambiguous,span count
paragraphs,,,sentences,False,2
sentences,,,words,False,3
tokens,,,,False,9
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,False,9
morph_analysis,"lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,9
morph_extended,"lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",words,,True,9


In [10]:
del text.morph_analysis
text

text
Esimene lause. Teine lõik. Kolmas lause.

layer name,attributes,parent,enveloping,ambiguous,span count
paragraphs,,,sentences,False,2
sentences,,,words,False,3
tokens,,,,False,9
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,False,9
morph_extended,"lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",words,,True,9
