# Verb chain detector

Verb chain detector identifies main verbs and their extensions (verb chains) in clauses. 
This is somewhat similar to syntactic analysis as it involves finding (grammatical) relations between words.
However, differently from syntactic analysis, the verb chain detector focuses only on detection of main verbs and their structure (leaving out other syntactic components of clause), and the patterns for detection of relations between words are atheoretical (largely based on empirical corpus investigations).

Verb chain detection requires that the input text has sentence and clause boundary annotations, and morphological annotations.

First, let's consider a simple example: verb chain tagging can be used to detect positive (affirmative) and negative statements:

In [1]:
from estnltk import Text
from estnltk.taggers import VerbChainDetector

# Create text and tag prerequisite layers
text = Text('A: Nii, pool rehkendust on nüüd tehtud!'+\
            'B: Eriti palju te küll pole seal teinud.')
text.tag_layer(['words', 'sentences', 'morph_analysis', 'clauses'])

# Detect verb chains
vc_detector = VerbChainDetector()
vc_detector.tag(text)

# Browse results
text.verb_chains

layer name,attributes,parent,enveloping,ambiguous,span count
verb_chains,"pattern, roots, word_ids, mood, polarity, tense, voice, remaining_verbs",,words,False,2

text,pattern,roots,word_ids,mood,polarity,tense,voice,remaining_verbs
"['on', 'tehtud']","['ole', 'verb']","['ole', 'tege']","[6, 8]",indic,POS,perfect,impersonal,False
"['pole', 'teinud']","['pole', 'verb']","['ole', 'tege']","[16, 18]",indic,NEG,perfect,personal,False


_One-word "chains" vs multiword chains._ While in the previous example both verb chains were multiword units, affirmative verb chains can also be one-word units. In practice, most of them are. So, to re-emphasize, by the name "verb chains", we actually consider main verbs (and their extensions):

In [2]:
# Create text and tag prerequisite layers
text = Text('Tulin, nägin ja tema võitis.')
text.tag_layer(['words', 'sentences', 'morph_analysis', 'clauses'])

# Detect verb chains
vc_detector.tag(text)

# Browse results
text.verb_chains

layer name,attributes,parent,enveloping,ambiguous,span count
verb_chains,"pattern, roots, word_ids, mood, polarity, tense, voice, remaining_verbs",,words,False,3

text,pattern,roots,word_ids,mood,polarity,tense,voice,remaining_verbs
['Tulin'],['verb'],['tule'],[0],indic,POS,imperfect,personal,False
['nägin'],['verb'],['näge'],[2],indic,POS,imperfect,personal,False
['võitis'],['verb'],['võit'],[5],indic,POS,imperfect,personal,False


The current version of VerbChainDetector detects the following verb chain constructions:
  * basic main verbs:
    * (affirmative) single non-*olema* main verbs (example: Pidevalt **uurivad** asjade seisu ka hollandlased);
    * (affirmative) single *olema* main verbs (e.g. Raha **on** alati vähe) and two word *olema* verb chains (**Oleme** sellist kino ennegi **näinud**);
    * negated main verbs: *ei/ära/pole/ega* + verb (e.g. Helistasin korraks Carmenile, kuid ta **ei vastanud.**);
  
  * verb chain extensions:
    * `verb + verb` : the chain is extended with an infinite verb if the last verb of the chain subcategorizes for it, e.g. the verb *kutsuma* is extended with *ma*-verb arguments (for example: Kevadpäike **kutsub** mind **suusatama**) and the verb *püüdma* is extended with *da*-verb arguments (in: Aita **ei püüdnudki** Leenat **mõista**);
    * `verb + nom/adv + verb` : the last verb of the chain is extended with nominal/adverb arguments which subcategorize for an    infinite verb, e.g. the verb *otsima* forms a multiword unit with the nominal *võimalust* which, in turn, takes infinite       *da*-verb as an argument (for example: Seepärast **otsisimegi võimalust** kusagilt mõned ilvesed **hankida**);

If ambiguities are not encountered during the process, VerbChainDetector's algorithm extends one verb chain multiple times, resulting in a rather long chain (such as the chain `oleks => pidanud => olema => õigus => kutsuda` detected from the sentence _'Minul oleks pidanud olema õigus ise endale külalisi kutsuda'_).


### Attributes of verb chain

By default, each verb chain has the following attributes filled in: 
  * `pattern` - the general pattern of the chain: for each word in the chain, lists whether it is *'ega'*, *'ei'*, *'ära'*, *'pole'*, *'ole'*, *'&'* (conjunction: ja/ning/ega/või), *'verb'* (verb different than *'ole'*) or *'nom/adv'* (nominal/adverb); Note: words in this list are ordered by subcategorization relations;
  
  * `roots` -- for each word in the chain, lists its corresponding 'root' value from the (Vabamorf's) morphological analysis; Note: words in this list are ordered by subcategorization relations;
  
  * `word_ids` -- for each word in the chain, lists its index in the layer 'words'; Note: words in this list are ordered by subcategorization relations;

  * `mood` - mood of the finite verb. Possible values: *'indic'* (indicative), *'imper'* (imperative), (*'condit'* conditional), *'quotat'* (quotative) or *'??'* (undetermined);
  
  * `polarity` - grammatical polarity of the finite verb. Possible values: *'POS'*, *'NEG'* or *'??'*. Value *'NEG'* means that the chain begins with a negation word *ei/pole/ega/ära*. Value *'??'* is reserved for cases where it is uncertain whether *ära* forms a negated verb chain or not;
  
  * `tense` - tense of the finite verb. Possible values depend on the mood value. Tenses of indicative: *'present'*, *'imperfect'*, *'perfect'*, *'pluperfect'*; tense of imperative: *'present'*; tenses of conditional and quotative: *'present'* and *'past'*. Additionally, the tense may remain undetermined (*'??'*);
    
  * `voice` - voice of the finite verb. Possible values: *'personal'*, *'impersonal'*, *'??'* (undetermined);
  
  * `remaining_verbs` - a boolean value showing whether there are any remaining verbs in the context that can be potentially added to the verb chain; if _True_, then the chain could be broken or incomplete (due to ambiguities, or missing subcategorization information);

### Word order in verb chains

In the attribute `text`, words of the chain are listed in the same order as the appear in the text. 
However, all other attributes (`pattern`,  `roots`, `word_ids`) re-arrange the words, so that they appear in the order of grammatical relations (the order which may not coincide with the word order in text). An example:

In [3]:
# Create text and tag prerequisite layers
text = Text('Plaanis on lihtsalt hängida seal.')
text.tag_layer(['words', 'sentences', 'morph_analysis', 'clauses'])

# Detect verb chains
vc_detector.tag(text)

# Browse results
text.verb_chains

layer name,attributes,parent,enveloping,ambiguous,span count
verb_chains,"pattern, roots, word_ids, mood, polarity, tense, voice, remaining_verbs",,words,False,1

text,pattern,roots,word_ids,mood,polarity,tense,voice,remaining_verbs
"['Plaanis', 'on', 'hängida']","['ole', 'nom/adv', 'verb']","['ole', 'plaan', 'hängi']","[1, 0, 3]",indic,POS,present,personal,False


_Semantic modifiers vs content verbs._ The first word of the chain is the finite verb (main verb) of the clause (except in case of the negation constructions, where the first word is typically a negation word), and each following word is governed by the previous word in the chain. 
As a result of this arrangement, first words usually modify semantics of the statement (e.g. specify negation or possibility of an event), and only the last word bears the content of the statement (what is being negated or considered as possible).

Note that there can also be multiple content verbs in the end of the chain, in that case, both infinite verbs can be considered as being governed by the preceding word in the chain. An example:

In [4]:
# Create text and tag prerequisite layers
text = Text('Kuid see ei anna õigust kaalu kallutada ja kaussi kummuli ajada.')
text.tag_layer(['words', 'sentences', 'morph_analysis', 'clauses'])

# Detect verb chains
vc_detector.tag(text)

# Browse results
text.verb_chains

layer name,attributes,parent,enveloping,ambiguous,span count
verb_chains,"pattern, roots, word_ids, mood, polarity, tense, voice, remaining_verbs",,words,False,1

text,pattern,roots,word_ids,mood,polarity,tense,voice,remaining_verbs
"['ei', 'anna', 'õigust', 'kallutada', 'ja', 'ajada']","['ei', 'verb', 'nom/adv', 'verb', '&', 'verb']","['ei', 'and', 'õigus', 'kalluta', 'ja', 'aja']","[2, 3, 4, 6, 7, 10]",indic,NEG,present,personal,False


### VerbChainDetector's constructor arguments

`VerbChainDetector`'s constructor can take the following arguments:
  * `output_layer` -- Name of the verb chains layer (in case you want to change the name); (default: 'verb_chains')


  * `input_words_layer` -- Name of the (prerequisite) words layer; (default: 'words')


  * `input_clauses_layer` -- Name of the (prerequisite) clauses layer; (default: 'clauses')


  * `input_sentences_layer` -- Name of the (prerequisite) sentences layer; (default: 'sentences')


  * `input_morph_analysis_layer` -- Name of the (prerequisite) morph analysis layer; (default: 'morph_analysis')


  * `resources_dir` -- the path to the resource files (subcategorization lexicon files); defaults to [PACKAGE_PATH / taggers / verb_chains / v1_4_1 / res](https://github.com/estnltk/estnltk/tree/3c45bd5623ab6c54dfd41c3e512821f7fa2ab7f7/estnltk/taggers/verb_chains/v1_4_1/res);


  * `add_morph_attr` -- boolean specifying if attribute `'morph'` will be added to the output layer. This attribute adds detailed morphological information (Vabamorf's part-of-speech + form) for each of the words in the chain; (default: False)


  * `add_analysis_ids_attr` -- boolean specifying if attribute `'analysis_ids'` will be added to the output layer. For each word in the chain, this attribute tells exactly which of the Vabamorf's morphological analyses (analysis indices) of the word possessed features of the verb chain; (default: False)


  * `expand2ndTime` -- boolean specifying if regular verb chains (chains not ending with 'olema') will be expanded twice. Note that this is an experimental feature: expanding twice can give you somewhat longer chains, but also more mistakes on verb chain detection; (default: False)


  * `breakOnPunctuation` -- boolean specifying if the expansion of regular verb chains (chains not ending with 'olema') will be broken in case of intervening punctuation; (default: False)


  * `removeSingleAraEi` -- boolean specifying if verb chains consisting of a single negation word, _'ära'_ or _'ei'_, will be removed; (default: True)


  * `vc_detector` -- If set, then overrides the default verb chain detector with the given VerbChainDetector's instance. Note that here, you can only use VerbChainDetector's instance that is based on [the version 1.4.1 source](https://github.com/estnltk/estnltk/tree/3c45bd5623ab6c54dfd41c3e512821f7fa2ab7f7/estnltk/taggers/verb_chains/v1_4_1); (default: None)
  


### VerbChainDetector's coverage and detection examples

VerbChainDetector's patterns corpus coverage has been measured on the [Balanced corpus of Estonian](https://www.cl.ut.ee/korpused/grammatikakorpus/index.php?lang=en), and summary of the statistics (along with one example per each pattern) is available in the file [tasak_verb_chain_examples.html](https://github.com/estnltk/estnltk/blob/a8f5520b1c4d26fd58223ffc3f0a565778b3d99f/docs/tutorials/_static/tasak_verb_chain_examples.html) (for an online view, use [nbviewer](https://nbviewer.jupyter.org/github/estnltk/estnltk/blob/a8f5520b1c4d26fd58223ffc3f0a565778b3d99f/docs/tutorials/_static/tasak_verb_chain_examples.html)).
Note, however, that these results were obtained with an older version of EstNLTK, and EstNLTK v1.6 may give different results due to changes in tokenization logic.