## <span style="color:purple"> Experimental: noun phrase chunker </span>

EstNLTK includes an experimental noun phrase chunker, which can be used to detect non-overlapping noun phrases from the text.

You can use noun phrase chunking directly via default resolver. This handles all the necessary preprocessing for you:

In [1]:
from estnltk import Text

text = Text('Suur karvane kass nurrus punasel diivanil, väike hiir aga hiilis temast mööda.')

text.tag_layer('np_chunks')

text.np_chunks

layer name,attributes,parent,enveloping,ambiguous,span count
np_chunks,,,words,False,4

text
"['Suur', 'karvane', 'kass']"
"['punasel', 'diivanil']"
"['väike', 'hiir']"
['temast']


You can use `enclosing_text` for obtaining exact strings corresponding to the chunks:

In [2]:
# Get phrase strings
[chunk.enclosing_text for chunk in text.np_chunks]

['Suur karvane kass', 'punasel diivanil', 'väike hiir', 'temast']

As `np_chunks` is an enveloping layer around `words`, you can iterate over all words of each chunk, and you can also access lemmas of these words via `morph_analysis` layer:

In [3]:
# Get lemmas of the words from chunks
for chunk in text.np_chunks:
    for word in chunk:
        print(word.text, word.lemma)
    print()

Suur ['suur']
karvane ['karvane']
kass ['kass']

punasel ['punane']
diivanil ['diivan']

väike ['väike']
hiir ['hiir']

temast ['tema']



**_Technical note_**: as the default noun phrase chunker relies on MaltParser syntactic analysis, you'll need to have Java installed in the system to use the chunker. 
See [the syntactic analysis tutorial](https://github.com/estnltk/estnltk/blob/113cec7af026597d8e45ec9bf06e8492ab3d24e9/tutorials/nlp_pipeline/C_syntax/03_syntactic_analysis_with_maltparser.ipynb) for details.

## Using NounPhraseChunker directly

The chunker uses Vabamorf's morphological analyses and dependency syntactic relations for detecting potential noun phrases.

In the following example, we use `MaltParserTagger` for creating the prerequisite syntactic analysis layer, but you can use any [dependency syntactic layer](https://github.com/estnltk/estnltk/tree/113cec7af026597d8e45ec9bf06e8492ab3d24e9/tutorials/nlp_pipeline/C_syntax) that has `'deprel'` and `'head'` attributes marking the relations:

In [4]:
from estnltk import Text
from estnltk.taggers import ConllMorphTagger
from estnltk.taggers import MaltParserTagger

conll_morph_tagger = ConllMorphTagger( no_visl=True,  morph_extended_layer='morph_analysis' )
maltparser_tagger = MaltParserTagger( input_conll_morph_layer='conll_morph', 
                                      input_type='morph_analysis', 
                                      version='conllu', add_parent_and_children=False )

In [5]:
# Create text for analysis
text = Text('Autojuhi lapitekk pälvis linna koduleheküljel paljude kodanike tähelepanu.')
# Add prerequisite layers
text.tag_layer('morph_analysis')
conll_morph_tagger.tag( text )
maltparser_tagger.tag( text )
text.layers

{'compound_tokens',
 'conll_morph',
 'maltparser_syntax',
 'morph_analysis',
 'sentences',
 'tokens',
 'words'}

Now we can use `NounPhraseChunker`. The tagger must be initialized with the name of the syntax layer:

In [6]:
from estnltk.taggers.miscellaneous.np_chunker import NounPhraseChunker

np_chunker = NounPhraseChunker('maltparser_syntax')
np_chunker.tag(text)
text.np_chunks

layer name,attributes,parent,enveloping,ambiguous,span count
np_chunks,,,words,False,3

text
"['Autojuhi', 'lapitekk']"
"['linna', 'koduleheküljel']"
"['paljude', 'kodanike', 'tähelepanu']"


In [7]:
# Get phrase strings
[chunk.enclosing_text for chunk in text.np_chunks]

['Autojuhi lapitekk', 'linna koduleheküljel', 'paljude kodanike tähelepanu']

### Chunking based on VislTagger

In the following example, we use `VislTagger` to provide the input syntax layer required for chunking:

In [8]:
from estnltk import Text
from estnltk.taggers import VislTagger

# Create text for analysis
text = Text('Juunikuu suveseiklused ootavad Sind juba täna meie uues reisiportaalis.')
# Add prerequisite layers
text.tag_layer(['morph_extended'])
syntactic_parser = VislTagger()
syntactic_parser.tag(text)
text.layers

{'compound_tokens',
 'morph_analysis',
 'morph_extended',
 'sentences',
 'tokens',
 'visl',
 'words'}

In [9]:
# Create NP chunker based on vislcg3 syntactic analysis
from estnltk.taggers.miscellaneous.np_chunker import NounPhraseChunker
np_chunker = NounPhraseChunker('visl')
np_chunker.tag(text)
text.np_chunks

layer name,attributes,parent,enveloping,ambiguous,span count
np_chunks,,,words,False,4

text
"['Juunikuu', 'suveseiklused']"
['Sind']
['meie']
"['uues', 'reisiportaalis']"


---