## <span style="color:purple"> Experimental: Pre-annotation of PropBank semantic roles </span>

PropBankPreannotator provides a lexicon-based tagging of PropBank semantic roles [(Palmer et al 2005)](https://www.cs.rochester.edu/~gildea/palmer-propbank-cl.pdf) in text. 
The lexicon has been automatically derived from manually crafted resources of the project EKTB75. 

Before using PropBankPreannotator, you'll need to download the lexicon, as it is not distributed with the EstNLTK package.

* If you create a new instance of PropBankPreannotator and the lexicon is missing, you'll be prompted with a question asking for permission to download the lexicon;
* Alternatively, you can pre-download the lexicon manually via `download` function:

```python
from estnltk import download
download("propbankpreannotator")
```

In order to use PropBankPreannotator, text must be first syntactically annotated. 
Use [StanzaSyntaxTagger](../C_syntax/03_syntactic_analysis_with_stanza.ipynb) from `estnltk_neural` to provide syntactic annotations:

In [1]:
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
# Initialize Stanza-based syntactic parser
from estnltk_neural.taggers import StanzaSyntaxTagger
syntax_tagger = StanzaSyntaxTagger(input_type='morph_extended', input_morph_layer='morph_extended')

In [2]:
# Initialize PropBankPreannotator working on 'stanza_syntax' layer
from estnltk.taggers.miscellaneous.propbank.preannotator import PropBankPreannotator
propbank_annotator = PropBankPreannotator(input_syntax_layer='stanza_syntax')

In [3]:
# Create example input Text
from estnltk import Text
text = Text('''Ants rääkis asjast Arvole. Seega, ta edastas sõnumi edukalt adressaadile. Küsisin IT-juhilt abi. ''')
# Tag necessary layers
text.tag_layer(['sentences', 'morph_extended'])
syntax_tagger.tag(text)
# Tag semantic role pre-annotations
propbank_annotator.tag(text)

text
"Ants rääkis asjast Arvole. Seega, ta edastas sõnumi edukalt adressaadile. Küsisin IT-juhilt abi."

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,3
tokens,,,,False,19
compound_tokens,"type, normalized",,tokens,False,1
words,normalized_form,,,True,17
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,17
morph_extended,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech, punctuation_type, pronoun_type, letter_case, fin, verb_extension_suffix, subcat",morph_analysis,,True,17
stanza_syntax,"id, lemma, upostag, xpostag, feats, head, deprel, deps, misc, parent_span, children",morph_extended,,False,17

layer name,span_names,attributes,enveloping,ambiguous,relation count
pre_semantic_roles,"verb, arg0, arg1, arg2, arg3, arg4, arg5, argm_mnr, argm_tmp, argm_loc",sense_id,stanza_syntax,True,3


In [4]:
# Examine results
text['pre_semantic_roles']

layer name,span_names,attributes,enveloping,ambiguous,relation count
pre_semantic_roles,"verb, arg0, arg1, arg2, arg3, arg4, arg5, argm_mnr, argm_tmp, argm_loc",sense_id,stanza_syntax,True,3

Unnamed: 0,verb,sense_id,arg0,arg1,arg2,arg3,arg4,arg5,argm_mnr,argm_tmp,argm_loc
0,['rääkis'],rääkima_1,['Ants'],,['Arvole'],['asjast'],,,,,
1,['edastas'],edastama_1,['ta'],,['adressaadile'],,,,,,
2,['Küsisin'],küsima_1,,['abi'],['IT-juhilt'],,,,,,


In [5]:
# Visualize results
text['pre_semantic_roles'].display()

#### Tagger's parameters

PropBankPreannotator allows to modify the following parameters via constructor:

* `output_flat_layer` -- Whether the output layer will be formatted as a flat layer (that is: a layer without any dependencies). By default, the output layer is not a flat layer, but an enveloping layer around the input syntax layer, because lexicon builds upon nodes of the syntax layer. (Default: False)
* `discard_frames_wo_args` -- Whether frames without any arguments will be included in the output (Default: True).
* `add_verb_class` -- Whether the output layer has extra attribute `verb_class` conveying the verb class from the lexicon (Default: False).
* `add_arg_descriptions` -- Whether the output layer contains extra attributes with argument descriptions, named as `arg0_desc`, `arg1_desc`, ... , `arg5_desc`. (Default: False)
* `add_arg_feats` -- Whether information about arguments' morph/syntactic features will be added to the output layer, named as  `arg0_feats`, `arg1_feats`, ... , `argm_loc_feats`. These are the features that were used in the frame extraction. (Default: False)
* `skip_compound_prt` -- Whether compound verbs will be discarded from frame detection (that is: verbs that have a child with `compound:prt` deprel will be discarded) (Default: True)

Example: set `add_arg_descriptions` to reveal manually provided argument descriptions (if any):

In [6]:
propbank_annotator = PropBankPreannotator(input_syntax_layer='stanza_syntax', add_arg_descriptions=True)
# Remove old layer
text.pop_layer(propbank_annotator.output_layer)
# Add layer with new settings
propbank_annotator.tag(text)
# Examine results
text['pre_semantic_roles']

layer name,span_names,attributes,enveloping,ambiguous,relation count
pre_semantic_roles,"verb, arg0, arg1, arg2, arg3, arg4, arg5, argm_mnr, argm_tmp, argm_loc","sense_id, arg0_desc, arg1_desc, arg2_desc, arg3_desc, arg4_desc, arg5_desc",stanza_syntax,True,3

Unnamed: 0,verb,sense_id,arg0_desc,arg0,arg1_desc,arg1,arg2_desc,arg2,arg3_desc,arg3,arg4_desc,arg4,arg5_desc,arg5,argm_mnr,argm_tmp,argm_loc
0,['rääkis'],rääkima_1,rääkija,['Ants'],,,adressaat: kellele/kellega,['Arvole'],millegi kohta/millest,['asjast'],,,,,,,
1,['edastas'],edastama_1,edastaja,['ta'],,,adressaat,['adressaadile'],,,,,,,,,
2,['Küsisin'],küsima_1,,,mida/et,['abi'],kelleltki/kellegi käest,['IT-juhilt'],,,,,,,,,


Example: set `add_arg_feats` to reveal morphosyntactic features triggering each argument:

In [7]:
propbank_annotator = PropBankPreannotator(input_syntax_layer='stanza_syntax', add_arg_feats=True)
# Remove old layer
text.pop_layer(propbank_annotator.output_layer)
# Add layer with new settings
propbank_annotator.tag(text)
# Examine results
text['pre_semantic_roles']

layer name,span_names,attributes,enveloping,ambiguous,relation count
pre_semantic_roles,"verb, arg0, arg1, arg2, arg3, arg4, arg5, argm_mnr, argm_tmp, argm_loc","sense_id, arg0_feats, arg1_feats, arg2_feats, arg3_feats, arg4_feats, arg5_feats, argm_mnr_feats, argm_tmp_feats, argm_loc_feats",stanza_syntax,True,3

Unnamed: 0,verb,sense_id,arg0,arg0_feats,arg1,arg1_feats,arg2,arg2_feats,arg3,arg3_feats,arg4,arg4_feats,arg5,arg5_feats,argm_mnr,argm_mnr_feats,argm_tmp,argm_tmp_feats,argm_loc,argm_loc_feats
0,['rääkis'],rääkima_1,['Ants'],['deprel=nsubj'],,,['Arvole'],"['deprel=obl', 'case=All']",['asjast'],"['deprel=obl', 'case=Ela']",,,,,,,,,,
1,['edastas'],edastama_1,['ta'],['deprel=nsubj'],,,['adressaadile'],"['deprel=obl', 'case=All']",,,,,,,,,,,,
2,['Küsisin'],küsima_1,,,['abi'],['deprel=obj'],['IT-juhilt'],"['deprel=obl', 'case=Abl']",,,,,,,,,,,,


<div class="alert alert-block alert-warning">
<h4><i>PropBankPreannotator's limitations</i></h4>
The output of PropBankPreannotator can be incomplete and erroneous, as it builds upon work-in-progress resources. 
For commonly used verbs, PropBankPreannotator can output many ambiguous sense interpretations, as no disambiguation has been implemented yet.
<br><br>
</div>