# Introduction to GrammarParsingTagger

**GrammarParsingTagger** is a tool that allows us to write a context-free grammar and apply the grammar on our Text object, creating a new layer that contains all the matches found by the grammar. This means that we can define the sequences of symbols that we want to extract from the text. In this tutorial, we present examples on extracting addresses from a text, but of course, the tool can be used for many different purposes.

First, we need to have a Text object. Let's create one containing an address:

In [1]:
from estnltk import Text

text = Text('Jüri Homenja kontsert toimub E, 22. mai kl 18:00 kultuurimajas Veski 5, Elva, Tartumaa.')

In [2]:
text

text
"Jüri Homenja kontsert toimub E, 22. mai kl 18:00 kultuurimajas Veski 5, Elva, Tartumaa."


Next, we need to tag the **symbols** whose sequences we want to search for with our grammar. You can find out about different taggers from [here](https://github.com/estnltk/estnltk/tree/devel_1.6/tutorials/taggers), but for the GrammarParsingTagger example, let's not dive into this but use an existing tagger called *AddressPartTagger*. *AddressPartTagger* needs the text to be segmented into words, that's why we tag the layer *words* on the text before applying the tagger.

In [3]:
from estnltk.taggers import AddressPartTagger

address_part_tagger = AddressPartTagger()
text.tag_layer(['words'])
address_part_tagger.tag(text)
text.address_parts

layer name,attributes,parent,enveloping,ambiguous,span count
address_parts,"grammar_symbol, type",,,True,11

text,grammar_symbol,type
Jüri,ASULA,asula
,TÄNAV,tänav
Homenja kontsert toimub E,RANDOM_TEXT,
22,MAJA,
mai kl,RANDOM_TEXT,
18,MAJA,
00,MAJA,
kultuurimajas,RANDOM_TEXT,
Veski,ASULA,asula
,TÄNAV,tänav


We can see that we have different **symbols** tagged on the text in the layer called *address_parts*. Some of the tagged symbols are in fact parts of the address, while others are not. To know whether the symbol is part of an address, we have to define the sequences of symbols that make up an address. These sequences are called grammar **rules**. 

### Rules and Grammar

To define rules and a grammar, we first need to import the classes Rule and Grammar.

In [4]:
from estnltk.finite_grammar import Rule, Grammar

Then it is possible to start defining the **rules**. A rule consists of a **left side** (non-terminal), a **right side** (non-terminals and terminals), and optional parameters. In the following example of a rule, the left side is 'ADDRESS' and the right side is 'TÄNAV MAJA ASULA'. The rule says that if symbols 'TÄNAV', 'MAJA', and 'ASULA' occurr in this order, this is an 'ADDRESS'. 

In [5]:
Rule('ADDRESS', 'TÄNAV MAJA ASULA')

ADDRESS -> TÄNAV MAJA ASULA	: 0, val: default_validator, dec: default_decorator, scoring: default_scoring

To apply the rules, we need to create a **grammar**:

In [6]:
grammar = Grammar(start_symbols=['ADDRESS'], 
                  depth_limit=float('inf'), # the default
                  width_limit=float('inf'), # the default
                  legal_attributes=None # the default
                  )

And then we need to **add rules to the grammar**. Let's add two rules to keep things simple:

In [7]:
grammar.add_rule('ADDRESS', 'TÄNAV MAJA ASULA')
grammar.add_rule('ADDRESS', 'TÄNAV MAJA')
grammar


Grammar:
	start: ADDRESS
	terminals: ASULA, MAJA, TÄNAV
	nonterminals: ADDRESS
	legal attributes: frozenset()
	depth_limit: inf
	width_limit: inf
Rules:
	ADDRESS -> TÄNAV MAJA ASULA	: 0, val: default_validator, dec: default_decorator, scoring: default_scoring
	ADDRESS -> TÄNAV MAJA	: 0, val: default_validator, dec: default_decorator, scoring: default_scoring

To apply the grammar on our text, we need to **create a tagger** - a *GrammarParsingTagger* object. This tagger gets our grammar for the parameter *grammar*. *layer_of_tokens* is the name of the layer that we want to apply our grammar on and *layer_name* is the name of the layer that we are creating with the *GrammarParsingTagger*. 

In [8]:
from estnltk.taggers import GrammarParsingTagger

tagger = GrammarParsingTagger(grammar=grammar,
                              layer_of_tokens='address_parts',
                              output_layer='addresses', 
                              output_ambiguous=True # default False, True recommended
                              )
tagger

name,output layer,output attributes,input layers
GrammarParsingTagger,addresses,(),"('address_parts',)"

0,1
grammar,"\nGrammar:\n\tstart: ADDRESS\n\tterminals: ASULA, MAJA, TÄNAV\n\tnonterminals: ADDRESS\n ..., type: <class 'estnltk.finite_grammar.grammar.Grammar'>"
name_attribute,grammar_symbol
output_nodes,{'ADDRESS'}
resolve_support_conflicts,True
resolve_start_end_conflicts,True
resolve_terminals_conflicts,True
ambiguous,True
gap_validator,
debug,False
force_resolving_by_priority,False


Then we can use the tagger to **tag the text**:

In [9]:
tagger.tag(text)

text
"Jüri Homenja kontsert toimub E, 22. mai kl 18:00 kultuurimajas Veski 5, Elva, Tartumaa."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,21
compound_tokens,"type, normalized",,tokens,False,2
words,normalized_form,,,True,18
address_parts,"grammar_symbol, type",,,True,11
addresses,,,address_parts,True,2


In [10]:
text.addresses

layer name,attributes,parent,enveloping,ambiguous,span count
addresses,,,address_parts,True,2

text
"['Veski', '5']"
"['Veski', '5', 'Elva']"


The address is indeed found, but we are probably not completely happy with all the received results. 

### Validating gaps

The first problem that we see is that ['Jüri', '22'] has been tagged as an address, although the tokens are not even next to each other in the original text. However, as the grammar is looking at the layer *address_parts* and there's nothing between the grammar_symbols of these tokens, they are tagged as an address. To overcome this problem, we can use a **gaps_validator** function where we can define **what kind of strings we allow** between our tagged symbols - e.g. we would probably want to accept a space between the parts of an address but not long sequences of words or sentences. Let's define the one that accepts spaces and commas:

In [11]:
import re

def gap_validator(s):
    if re.fullmatch('[, ]+', s):
        return True
    return False

So, when we define a new GrammarParsingTagger that uses the *gap_validator()* function, we get rid of this problem:

In [12]:
tagger2 = GrammarParsingTagger(grammar=grammar,
                              layer_of_tokens='address_parts',
                              output_layer='addresses2', # default: 'parse'
                              output_ambiguous=True, # default False, True recommended
                              gap_validator=gap_validator
                              )

In [13]:
tagger2.tag(text)

text
"Jüri Homenja kontsert toimub E, 22. mai kl 18:00 kultuurimajas Veski 5, Elva, Tartumaa."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,21
compound_tokens,"type, normalized",,tokens,False,2
words,normalized_form,,,True,18
address_parts,"grammar_symbol, type",,,True,11
addresses,,,address_parts,True,2
addresses2,,,address_parts,True,2


In [14]:
text.addresses2

layer name,attributes,parent,enveloping,ambiguous,span count
addresses2,,,address_parts,True,2

text
"['Veski', '5']"
"['Veski', '5', 'Elva']"


### Priorities

The next problem that we have is that both ['Veski', '5'] and ['Veski', '5', 'Elva'] are tagged. Of course, we could just remove the rule for the first case from the grammar, but as sometimes the address is expressed only by a streetname and a house number, we would prefer to keep both rules but to only receive the match of the longer rule if both match. 

For this, we can use optional parameters *group* and *priority* while defining our rules. The parameter *group* is a name for the group into which the rule belongs. It can be anything, but all rules that we want to put into one group need to have the same name for *group*. *priority* defines which rule of the ones belonging to the *group* is applied if several rules match. **NB!** The **higher** the value, the **lower** the priority. Therefore, if there are two rules with priorities 2 and 3 that both match, the rule with the priority of 2 is applied.

To view this, let's create a new grammar and add the rules with *group* and *priority* parameters:

In [15]:
grammar3 = Grammar(start_symbols=['ADDRESS'], 
                  depth_limit=float('inf'), # the default
                  width_limit=float('inf'), # the default
                  legal_attributes=None # the default
                  )

grammar3.add_rule('ADDRESS', 'TÄNAV MAJA ASULA', group='g0', priority=2)
grammar3.add_rule('ADDRESS', 'TÄNAV MAJA',       group='g0', priority=3)

In [16]:
tagger3 = GrammarParsingTagger(grammar=grammar3,
                              layer_of_tokens='address_parts',
                              name_attribute='grammar_symbol', # the default
                              output_layer='addresses3', # the default
                              output_ambiguous=True, # default False
                              gap_validator=gap_validator
                              )
tagger3.tag(text)
text.addresses3

layer name,attributes,parent,enveloping,ambiguous,span count
addresses3,,,address_parts,True,1

text
"['Veski', '5', 'Elva']"


### Decorators

Now we have succeeded in tagging only the match that we wanted with GrammarParsingTagger. However, we only have a list of strings with no information about which one is the streetname, etc. To **add information to the layer** that we are tagging with a GrammarParsingTagger, we can use **decorator** functions. Decorators are also optional parameters of the rules of the grammar just like *group* and *priority*. 

Let's define a decorator that adds the attributes 'ASULA', 'TÄNAV', 'INDEKS', 'MAAKOND' and 'MAJA' to the tagger matches:

In [17]:
def address_decorator(nodes):
    asula = ''
    maakond = ''
    t2nav = ''
    indeks = ''
    maja = ''
    for node in nodes:
        if node.name == 'ASULA':
            asula = node.text
        elif node.name == 'TÄNAV':
            t2nav = node.text
        elif node.name == 'MAAKOND':
            maakond = node.text
        elif node.name == 'MAJA':
            maja = node.text
        elif node.name == 'INDEKS':
            indeks = node.text
    return {'grammar_symbol': 'ADDRESS',
            'ASULA': asula,
            'TÄNAV': t2nav,
            'INDEKS': indeks,
            'MAAKOND': maakond,
            'MAJA': maja}

If we want to add attributes with a decorator, we have to **allow those attributes in our grammar** - for this there is the parameter *legal_attributes*. 

In [18]:
grammar4 = Grammar(start_symbols=['ADDRESS'], 
                  depth_limit=float('inf'), # the default
                  width_limit=float('inf'), # the default
                  legal_attributes=['grammar_symbol', 'INDEKS', 'MAJA', 'TÄNAV', 'MAAKOND', 'ASULA']
                  )

Let's add the **rules with decorators** to the new grammar:

In [19]:
grammar4.add_rule('ADDRESS', 'TÄNAV MAJA ASULA', group='g0', priority=2, decorator=address_decorator)
grammar4.add_rule('ADDRESS', 'TÄNAV MAJA',       group='g0', priority=3, decorator=address_decorator)

And then we can define a new tagger where we also have to **specify the attributes** that we want to tag with the GrammarParsingTagger - the *attributes* parameter is for this.

In [20]:
tagger4 = GrammarParsingTagger(grammar=grammar4,
                              layer_of_tokens='address_parts',
                              name_attribute='grammar_symbol',
                              output_layer='addresses4',
                              attributes=('grammar_symbol', 'INDEKS', 'MAJA', 'TÄNAV', 'MAAKOND', 'ASULA'),
                              output_ambiguous=False, # default False
                              gap_validator = gap_validator
                              )

In [21]:
tagger4.tag(text)
text.addresses4

layer name,attributes,parent,enveloping,ambiguous,span count
addresses4,"grammar_symbol, INDEKS, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,False,1

text,grammar_symbol,INDEKS,MAJA,TÄNAV,MAAKOND,ASULA
"['Veski', '5', 'Elva']",ADDRESS,,5,Veski,,Elva


Now we can easily get the tagged parts of the address:

In [22]:
print(text.addresses4.TÄNAV)

['Veski']


In [23]:
print(text.addresses4.MAJA)

['5']


In [24]:
print(text.addresses4.ASULA)

['Elva']


### Validators

We have nicely tagged an address on the text, but let's assume now that we have some more information about possible addresses in Estonia that we want to take into account too - namely, let's play that we know which streets exist in which towns, and while tagging addresses, we want to make sure that the street is a valid street in its town. So, let's have the following huge dataset of towns and their streetnames:

In [25]:
town_streets = {'Elva': {'Veski', 'Tuletõrje'},
                'Tartu': {'Veski', 'Ülikooli'}}

And let's take an example that has this kind of problem:

In [26]:
text = Text('Inimesed, kes töötavad Tartus Ülikooli 5, Elva haiglas \
            ja Tõravere observatooriumis, söövad esmaspäeviti õunu.').tag_layer(['words'])
address_part_tagger.tag(text)

text
"Inimesed, kes töötavad Tartus Ülikooli 5, Elva haiglas ja Tõravere observatooriumis, söövad esmaspäeviti õunu."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,18
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,18
address_parts,"grammar_symbol, type",,,True,7


In [27]:
tagger4.tag(text)

text
"Inimesed, kes töötavad Tartus Ülikooli 5, Elva haiglas ja Tõravere observatooriumis, söövad esmaspäeviti õunu."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,18
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,18
address_parts,"grammar_symbol, type",,,True,7
addresses4,"grammar_symbol, INDEKS, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,False,1


In [28]:
text.addresses4

layer name,attributes,parent,enveloping,ambiguous,span count
addresses4,"grammar_symbol, INDEKS, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,False,1

text,grammar_symbol,INDEKS,MAJA,TÄNAV,MAAKOND,ASULA
"['Ülikooli', '5', 'Elva']",ADDRESS,,5,Ülikooli,,Elva


As we can see, if we use our previous tagger, we get an incorrect address that we could detect by checking the data in *town_streets* dictionary. To do additional checks, we can use **validator** functions. Those can be added to the rules just like **decorators** but they must return either True or False, and, based on this, the match is either tagged or not. 

In [29]:
def validator(node):
    street = node[0].text
    town = node[2].text
    if town in town_streets:
        if street in town_streets[town]:
            return True
    return False

In [30]:
grammar5 = Grammar(start_symbols=['ADDRESS'], 
                  legal_attributes=['INDEKS', 'grammar_symbol', 'MAJA', 'TÄNAV', 'MAAKOND', 'ASULA']
                  )

grammar5.add_rule('ADDRESS', 'TÄNAV MAJA ASULA', group='g0', priority=2, 
                 decorator=address_decorator, validator=validator)

grammar5.add_rule('ADDRESS', 'TÄNAV MAJA',       group='g0', priority=3, 
                 decorator=address_decorator) # We cannot use the validator here as this rule doesn't tag the town

tagger5 = GrammarParsingTagger(grammar=grammar5,
                              layer_of_tokens='address_parts',
                              name_attribute='grammar_symbol',
                              output_layer='addresses5',
                              attributes=('INDEKS', 'grammar_symbol', 'MAJA', 'TÄNAV', 'MAAKOND', 'ASULA'),
                              output_ambiguous=True,
                              gap_validator = gap_validator
                              )

In [31]:
tagger5.tag(text)
text.addresses5

layer name,attributes,parent,enveloping,ambiguous,span count
addresses5,"INDEKS, grammar_symbol, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,True,1

text,INDEKS,grammar_symbol,MAJA,TÄNAV,MAAKOND,ASULA
"['Ülikooli', '5']",,ADDRESS,5,Ülikooli,,


Now we successfully got the match ['Ülikooli', '5'] because ['Ülikooli', '5', 'Elva'] was not a valid address according to our validator.

### SEQ and MSEQ rules

There are also special types of rules - **SEQ** and **MSEQ** rules. This means that the symbol can be repeated. Let's see an example where we have several houses on the same street mentioned:

In [32]:
text = Text('Veekatkestus Veski 3, 5, 7 majades kestab 8. juunil kl 12-15.').tag_layer(['words'])
address_part_tagger.tag(text)

text
"Veekatkestus Veski 3, 5, 7 majades kestab 8. juunil kl 12-15."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,17
compound_tokens,"type, normalized",,tokens,False,2
words,normalized_form,,,True,15
address_parts,"grammar_symbol, type",,,True,9


In [33]:
tagger5.tag(text)

text
"Veekatkestus Veski 3, 5, 7 majades kestab 8. juunil kl 12-15."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,17
compound_tokens,"type, normalized",,tokens,False,2
words,normalized_form,,,True,15
address_parts,"grammar_symbol, type",,,True,9
addresses5,"INDEKS, grammar_symbol, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,True,1


In [34]:
text.addresses5

layer name,attributes,parent,enveloping,ambiguous,span count
addresses5,"INDEKS, grammar_symbol, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,True,1

text,INDEKS,grammar_symbol,MAJA,TÄNAV,MAAKOND,ASULA
"['Veski', '3']",,ADDRESS,3,Veski,,


As can be anticipated, our grammar tags only the first house number because we haven't added rules for the others. We could add the rules one by one, like
```
'TÄNAV MAJA'
'TÄNAV MAJA MAJA'
'TÄNAV MAJA MAJA MAJA'
...
```
but of course, we don't know exactly how many we should add. Therefore, it is easier to use a SEQ or an MSEQ rule. A SEQ rule finds the matches of all possible lengths while an MSEQ rule finds the longest possible match. So, let's define a new grammar again:

In [35]:
grammar6 = Grammar(start_symbols=['ADDRESS'], 
                  legal_attributes=['grammar_symbol', 'INDEKS', 'MAJA', 'TÄNAV', 'MAAKOND', 'ASULA']
                  )

Note that if we want to use the decorator we have to redefine it so that it would be able to handle the SEQ/MSEQ rule:

In [36]:
def address_decorator2(nodes):
    asula = ''
    maakond = ''
    t2nav = ''
    indeks = ''
    maja = ''
    for node in nodes:
        if node.name == 'ASULA':
            asula = node.text
        elif node.name == 'TÄNAV':
            t2nav = node.text
        elif node.name == 'MAAKOND':
            maakond = node.text
        elif node.name == 'MAJA':
            maja = node.text
        elif node.name == 'MSEQ(MAJA)' or node.name == 'SEQ(MAJA)':
            maja = [n.text for n in node.support]
        elif node.name == 'INDEKS':
            indeks = node.text
    return {'grammar_symbol': 'ADDRESS',
            'ASULA': asula,
            'TÄNAV': t2nav,
            'INDEKS': indeks,
            'MAAKOND': maakond,
            'MAJA': maja}

Let's try the MSEQ rule:

In [37]:
grammar6.add_rule('ADDRESS', 'TÄNAV MSEQ(MAJA) ASULA', group='g0', priority=3, 
                 decorator=address_decorator2, validator=validator)

grammar6.add_rule('ADDRESS', 'TÄNAV MSEQ(MAJA)',       group='g0', priority=2, 
                 decorator=address_decorator2) # We cannot use the validator here as this rule doesn't tag the town

tagger6 = GrammarParsingTagger(grammar=grammar6,
                              layer_of_tokens='address_parts',
                              name_attribute='grammar_symbol',
                              output_layer='addresses6',
                              attributes=('grammar_symbol', 'INDEKS', 'MAJA', 'TÄNAV', 'MAAKOND', 'ASULA'),
                              output_ambiguous=True,
                              gap_validator = gap_validator
                              )

In [38]:
tagger6.tag(text)

text
"Veekatkestus Veski 3, 5, 7 majades kestab 8. juunil kl 12-15."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,17
compound_tokens,"type, normalized",,tokens,False,2
words,normalized_form,,,True,15
address_parts,"grammar_symbol, type",,,True,9
addresses5,"INDEKS, grammar_symbol, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,True,1
addresses6,"grammar_symbol, INDEKS, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,True,1


In [39]:
text.addresses6

layer name,attributes,parent,enveloping,ambiguous,span count
addresses6,"grammar_symbol, INDEKS, MAJA, TÄNAV, MAAKOND, ASULA",,address_parts,True,1

text,grammar_symbol,INDEKS,MAJA,TÄNAV,MAAKOND,ASULA
"['Veski', '3', '5', '7']",ADDRESS,,"['3', '5', '7']",Veski,,


As we can see, we were now able to match all the house numbers instead of only the first one. If we had used a SEQ rule, we'd also got the matches ['Veski', '3'] and ['Veski', '3', '5'], but this would not be the desired outcome for this example.

### Combining regular rules with SEQ and MSEQ rules [ experimental ]

In some situations, we need to combine regular rules (that is: rules without SEQ or MSEQ) with rules containing SEQ or MSEQ.
Let us consider an example, where we want to extract potential _noun phrase candidates_ (that is: sequences of nouns or proper nouns combined with conjunction words) from a text. The example text snippet is following:

In [40]:
from estnltk import Text

text = Text('Lendur. Lenduri diivani tugijalg. Lenduri diivani tugi ja jalg.'
            'Lendur Leo. Lendur Leo diivan. Leo ja Matthias.').tag_layer(['morph_analysis'])

In order to extract noun phrase candidates, we combine different types of rules and also assign priorities to rules, trying to construct rules in a way that longer phrases will obtain higher priority:

In [41]:
from estnltk.finite_grammar import Grammar, Rule
from estnltk.taggers import GrammarParsingTagger

grammar = Grammar(start_symbols=['NOUN_PHRASE'], 
                  depth_limit=float('inf'), # the default
                  width_limit=float('inf'), # the default
                  )
grammar.add_rule('NOUN', 'S', group='g0', priority=4)
grammar.add_rule('NOUN', 'H', group='g0', priority=4)
grammar.add_rule('NOUN_PHRASE', 'NOUN',        group='g0', priority=4)
grammar.add_rule('NOUN_PHRASE', 'MSEQ(NOUN)',  group='g0', priority=3)
grammar.add_rule('NOUN_PHRASE', 'NOUN J NOUN', group='g0', priority=2)
grammar.add_rule('NOUN_PHRASE', 'MSEQ(NOUN) J NOUN', group='g0', priority=1)
grammar


Grammar:
	start: NOUN_PHRASE
	terminals: H, J, MSEQ(NOUN), S
	nonterminals: NOUN, NOUN_PHRASE
	legal attributes: frozenset()
	depth_limit: inf
	width_limit: inf
Rules:
	NOUN -> S	: 4, val: default_validator, dec: default_decorator, scoring: default_scoring
	NOUN -> H	: 4, val: default_validator, dec: default_decorator, scoring: default_scoring
	NOUN_PHRASE -> NOUN	: 4, val: default_validator, dec: default_decorator, scoring: default_scoring
	NOUN_PHRASE -> MSEQ(NOUN)	: 3, val: default_validator, dec: default_decorator, scoring: default_scoring
	NOUN_PHRASE -> NOUN J NOUN	: 2, val: default_validator, dec: default_decorator, scoring: default_scoring
	NOUN_PHRASE -> MSEQ(NOUN) J NOUN	: 1, val: default_validator, dec: default_decorator, scoring: default_scoring

Now, if we apply **GrammarParsingTagger**, we'll stumble upon a problem: there are just too many phrases in the output. 
It seems that the conflict resolving does not work properly:

In [42]:
# Create & apply GrammarParsingTagger
grammar_tagger = GrammarParsingTagger(grammar=grammar,
                                      name_attribute='partofspeech',
                                      layer_of_tokens='morph_analysis',
                                      output_layer='noun_phrases')
grammar_tagger.tag(text)
# Browse results
text.noun_phrases

layer name,attributes,parent,enveloping,ambiguous,span count
noun_phrases,,,morph_analysis,False,18

text
['Lendur']
['Lenduri']
"['Lenduri', 'diivani', 'tugijalg']"
['diivani']
['tugijalg']
['Lenduri']
"['Lenduri', 'diivani', 'tugi', 'ja', 'jalg']"
['diivani']
['jalg']
['Lendur']


The reason behind this result is that priorities of SEQ and MSEQ rules _do not compete_ with the priorities of the regular rules in the default conflict resolving strategy. 
So, you get the unexpected results (overlaps remain where they should have been resolved).

The situation can be fixed using **GrammarParsingTagger**'s flag `force_resolving_by_priority`.
This forces (experimental) post-resolving all conflicts by *priority* attributes of grammar rules.

If `force_resolving_by_priority` is switched on, then the grammar is first used to create an ambiguous layer retaining all the conflicting (overlapping) annotations.
After the layer has been successfully created, the function [`resolve_conflicts`](https://github.com/estnltk/estnltk/blob/version_1.6/estnltk/layer_operations/conflict_resolver.py#L45) is applied for final resolving of the conflicts based on the *priority* attributes of grammar rules.

In [43]:
# Create & apply GrammarParsingTagger with force_resolving_by_priority=True
grammar_tagger = GrammarParsingTagger(grammar=grammar,
                                      name_attribute='partofspeech',
                                      layer_of_tokens='morph_analysis',
                                      output_layer='noun_phrases2',
                                      force_resolving_by_priority=True)
grammar_tagger.tag(text)
# Browse results
text.noun_phrases2

layer name,attributes,parent,enveloping,ambiguous,span count
noun_phrases2,,,morph_analysis,False,6

text
['Lendur']
"['Lenduri', 'diivani', 'tugijalg']"
"['Lenduri', 'diivani', 'tugi', 'ja', 'jalg']"
"['Lendur', 'Leo']"
"['Lendur', 'Leo', 'diivan']"
"['Leo', 'ja', 'Matthias']"


In [44]:
text.noun_phrases2.display()

---