# RelationLayer

RelationLayer is a new type of layer introduced in estntlk v1.7.2. 
It allows to store information about relations between entities mentioned in text, such as coreference relations between names and pronouns, or semantic roles/argument structures of verbs.

Current limitations:
* you cannot access attributes of foreign layers (such as lemmas from `morph_analysis`) directly via spans of relation layer;
* `estnltk_core.layer_operations` do not support RelationLayer;
* `estnltk.storage.postgres` does not support RelationLayer;


### Example 1: coreference relations

Essentially, RelationLayer is a collection of relations, and Relation is a set of named spans and annotations. 

In [1]:
from estnltk import Text
from estnltk_core import RelationLayer

In [2]:
text = Text('Mari kirjeldas õhinal, kuidas ta väiksena "Sipsikut" luges: '+\
'"Ma ei suutnud seda raamatut kohe kuidagi käest ära panna! Nii põnev oli see!"').tag_layer('words')

In [3]:
# Get word span locations:
#text.words[['start', 'end', 'text']]

Upon creating a RelationLayer, you need to define names for the spans, and names for the attributes.
Span names are required, attribute names can also be skipped:

In [4]:
coref_layer = RelationLayer('coreference', span_names=['mention', 'entity'], text_object=text)

Use add_annotation method to add new relation to the layer:

In [5]:
# Add relation based on a dictionary
coref_layer.add_annotation( {'mention': (30, 32), 'entity': (0, 4)} )
coref_layer.add_annotation( {'mention': (61, 63), 'entity': (0, 4)} )
# Or add relation by keyword arguments
coref_layer.add_annotation( mention=(75, 88), entity=(42, 52) )
coref_layer.add_annotation( mention=(133, 136), entity=(42, 52) )
coref_layer

layer name,span_names,attributes,ambiguous,relation count
coreference,"mention, entity",,False,4

mention,entity
ta,Mari
Ma,Mari
seda raamatut,"""Sipsikut"""
see,"""Sipsikut"""


#### Visualizing layer

In [6]:
# Display named spans with their respective relation id-s
# (This is available from EstNLTK v1.7.3)
coref_layer.display()

#### Accessing layer

In [7]:
# Access span names
coref_layer.span_names

('mention', 'entity')

In [8]:
# Access attribute names (if defined)
coref_layer.attributes

()

In [9]:
# use numeric indexes to access relations
coref_layer[0]

Relation([NamedSpan(mention: 'ta'), NamedSpan(entity: 'Mari')], [{}])

In [10]:
# or iterate over all relations
for relation in coref_layer:
    print(relation)

Relation([NamedSpan(mention: 'ta'), NamedSpan(entity: 'Mari')], [{}])
Relation([NamedSpan(mention: 'Ma'), NamedSpan(entity: 'Mari')], [{}])
Relation([NamedSpan(mention: 'seda raamatut'), NamedSpan(entity: '"Sipsikut"')], [{}])
Relation([NamedSpan(mention: 'see'), NamedSpan(entity: '"Sipsikut"')], [{}])


In [11]:
coref_layer[0].mention

NamedSpan(mention: 'ta')

In [12]:
coref_layer[0].mention.text

'ta'

In [13]:
coref_layer[0].mention.start, coref_layer[0].mention.end

(30, 32)

In [14]:
coref_layer[0].text

['ta', 'Mari']

In [15]:
# get all mentions
coref_layer[['mention']]

[[NamedSpan(mention: 'ta')],
 [NamedSpan(mention: 'Ma')],
 [NamedSpan(mention: 'seda raamatut')],
 [NamedSpan(mention: 'see')]]

In [16]:
# get all entities
coref_layer[['entity']]

[[NamedSpan(entity: 'Mari')],
 [NamedSpan(entity: 'Mari')],
 [NamedSpan(entity: '"Sipsikut"')],
 [NamedSpan(entity: '"Sipsikut"')]]

#### Accessing other layers

In [17]:
# Add morph_analysis layer
text = text.tag_layer('morph_analysis')

Currently, you can access annotations from other layers via base spans:

In [18]:
# Get lemma of first mention
text['morph_analysis'].get( coref_layer[0].mention.base_span ).lemma

Unnamed: 0,lemma
0,tema


In [19]:
# Get lemma of first entity
text['morph_analysis'].get( coref_layer[0].entity.base_span ).lemma

Unnamed: 0,lemma
0,mari


But this only works for named spans that have exactly corresponding spans on the other layer. 

In [20]:
# Get lemmas for all relations (first attempt)
for relation in coref_layer:
    mention_morph_span = text['morph_analysis'].get( relation.mention.base_span )
    entity_morph_span  = text['morph_analysis'].get( relation.entity.base_span )
    if mention_morph_span is not None:
        print(relation.mention, '->', mention_morph_span.lemma, end='  | ')
    else:
        print(relation.mention, '-> lemma not found', end='  | ')
    if entity_morph_span is not None:
        print(relation.entity, '->', entity_morph_span.lemma)
    else:
        print(relation.entity, '-> lemma not found',)

NamedSpan(mention: 'ta') -> ['tema']  | NamedSpan(entity: 'Mari') -> ['mari']
NamedSpan(mention: 'Ma') -> ['mina']  | NamedSpan(entity: 'Mari') -> ['mari']
NamedSpan(mention: 'seda raamatut') -> lemma not found  | NamedSpan(entity: '"Sipsikut"') -> lemma not found
NamedSpan(mention: 'see') -> ['see']  | NamedSpan(entity: '"Sipsikut"') -> lemma not found


If a named span covers multiple spans on the other layer, then overlapping spans need to be detected via start/end indexes of  comparable spans:

In [21]:
def get_overlapping_spans( named_span, morph_layer ):
    return [span for span in morph_layer if named_span.start <= span.start and span.end <= named_span.end]

# Get lemmas for all relations (second attempt -- also detects partially overlapping phrases)
for relation in coref_layer:
    mention_morph_spans = get_overlapping_spans( relation.mention, text['morph_analysis'] )
    entity_morph_spans  = get_overlapping_spans( relation.entity, text['morph_analysis'] )
    if mention_morph_spans:
        print(relation.mention, '->', [sp.lemma[0] for sp in mention_morph_spans], end='  | ')
    else:
        print(relation.mention, '-> lemma not found', end='  | ')
    if entity_morph_spans:
        print(relation.entity, '->', [sp.lemma[0] for sp in entity_morph_spans])
    else:
        print(relation.entity, '-> lemma not found',)

NamedSpan(mention: 'ta') -> ['tema']  | NamedSpan(entity: 'Mari') -> ['mari']
NamedSpan(mention: 'Ma') -> ['mina']  | NamedSpan(entity: 'Mari') -> ['mari']
NamedSpan(mention: 'seda raamatut') -> ['see', 'raamat']  | NamedSpan(entity: '"Sipsikut"') -> ['"', 'sipsik', '"']
NamedSpan(mention: 'see') -> ['see']  | NamedSpan(entity: '"Sipsikut"') -> ['"', 'sipsik', '"']


### Example 2: temporal relations

In [22]:
from estnltk import Text
from estnltk_core import RelationLayer

In [23]:
text = Text('Pühapäeva varahommikul kutsuti politsei Riia mäele. '+\
'Seal oli ühelt noorelt mehelt ära võetud nahktagi ja käekell. '+\
'Juhtumi kohta algatati uurimine.').tag_layer('words')

In [24]:
# Get word span locations:
#text.words[['start', 'end', 'text']]

In [25]:
tlinks_layer = RelationLayer('temporal_relations', span_names=['entity_a', 'entity_b'], 
                                                   attributes=['rel_type'], text_object=text)

In [26]:
# Access span names
tlinks_layer.span_names

('entity_a', 'entity_b')

In [27]:
# Access attribute names
tlinks_layer.attributes

('rel_type',)

In [28]:
# Add relation based on a dictionary
tlinks_layer.add_annotation( {'entity_a': (0, 22),  'entity_b': (23, 30),   'rel_type': 'INCLUDES'} )
tlinks_layer.add_annotation( {'entity_a': (82, 92), 'entity_b': (23, 30),   'rel_type': 'BEFORE'} )
# Or add relation by keyword arguments
tlinks_layer.add_annotation( entity_a=(82, 92), entity_b=(114, 121), rel_type='IDENTITY' )
tlinks_layer.add_annotation( entity_a=(82, 92), entity_b=(128, 145), rel_type='BEFORE' )
tlinks_layer

layer name,span_names,attributes,ambiguous,relation count
temporal_relations,"entity_a, entity_b",rel_type,False,4

entity_a,entity_b,rel_type
Pühapäeva varahommikul,kutsuti,INCLUDES
ära võetud,kutsuti,BEFORE
ära võetud,Juhtumi,IDENTITY
ära võetud,algatati uurimine,BEFORE


In [29]:
tlinks_layer[0]

Relation([NamedSpan(entity_a: 'Pühapäeva varahommikul'), NamedSpan(entity_b: 'kutsuti')], [{'rel_type': 'INCLUDES'}])

In [30]:
tlinks_layer[0]['rel_type']

'INCLUDES'

In [31]:
# Display named spans with their respective relation id-s
tlinks_layer.display()

### Example 3: semantic roles

In [32]:
from estnltk import Text
from estnltk_core import RelationLayer

In [33]:
text = Text('President Bush kohtus temaga privaatselt Valges Majas teisipäeval. '+\
'Aga John ja Mari kohtusid hoopis kokteilipeol. '+\
'Mari ei ostnud Johnile kokteili.').tag_layer('words')

In [34]:
# Get word span locations:
#text.words[['start', 'end', 'text']]

You can also define more span names than only the minimal set to be used in every relation. 
Some of the named spans can be filled out only in specific contexts. 
Next, we follow the example of English PropBank guidelines and define a layer of semantic roles, which has slots/placeholders for different arguments. 
However, whether an argument is realized or not, depends on the context:

In [35]:
sem_roles_layer = RelationLayer('semantic_roles', span_names=['arg0', 'arg1', 'arg2', 'arg3', 
                                                              'arg4', 'argm_mnr', 'argm_tmp', 
                                                              'argm_loc'], 
                                                  attributes=['rel'], text_object=text)
# Based on PropBank English guidelines:
# ARG0 -- agent
# ARG1 -- patient 
# ARG2 -- instrument, benefactive, attribute 
# ARG3 -- starting point, benefactive, attribute
# ARG4 -- ending point, beneficiary
# ARGM -- modifier (manner, time, location)

In [36]:
sem_roles_layer.add_annotation( {'arg0': (0, 14),  'arg1': (22, 28), 'argm_mnr': (29, 40), 
                                 'argm_loc': (41,53), 'argm_tmp': (54, 65), 'rel': 'kohtumine'} )
sem_roles_layer.add_annotation( {'arg0': (71, 75), 'arg1': (79, 83), 'rel': 'kohtumine'} )
sem_roles_layer.add_annotation( {'arg0': (114, 118), 'arg1': (137, 145), 'arg4': (129, 136), 
                                 'rel': 'ostmine-NEG'} )
sem_roles_layer

layer name,span_names,attributes,ambiguous,relation count
semantic_roles,"arg0, arg1, arg2, arg3, arg4, argm_mnr, argm_tmp, argm_loc",rel,False,3

arg0,arg1,arg2,arg3,arg4,argm_mnr,argm_tmp,argm_loc,rel
President Bush,temaga,,,,privaatselt,teisipäeval,Valges Majas,kohtumine
John,Mari,,,,,,,kohtumine
Mari,kokteili,,,Johnile,,,,ostmine-NEG


In [37]:
# Display named spans with their respective relation id-s
sem_roles_layer.display()

In [38]:
sem_roles_layer[0]

Relation([NamedSpan(arg0: 'President Bush'), NamedSpan(arg1: 'temaga'), NamedSpan(argm_mnr: 'privaatselt'), NamedSpan(argm_tmp: 'teisipäeval'), NamedSpan(argm_loc: 'Valges Majas')], [{'rel': 'kohtumine'}])

In [39]:
sem_roles_layer[0].spans

[NamedSpan(arg0: 'President Bush'),
 NamedSpan(arg1: 'temaga'),
 NamedSpan(argm_mnr: 'privaatselt'),
 NamedSpan(argm_tmp: 'teisipäeval'),
 NamedSpan(argm_loc: 'Valges Majas')]

In [40]:
for relation in sem_roles_layer:
    print(relation)

Relation([NamedSpan(arg0: 'President Bush'), NamedSpan(arg1: 'temaga'), NamedSpan(argm_mnr: 'privaatselt'), NamedSpan(argm_tmp: 'teisipäeval'), NamedSpan(argm_loc: 'Valges Majas')], [{'rel': 'kohtumine'}])
Relation([NamedSpan(arg0: 'John'), NamedSpan(arg1: 'Mari')], [{'rel': 'kohtumine'}])
Relation([NamedSpan(arg0: 'Mari'), NamedSpan(arg1: 'kokteili'), NamedSpan(arg4: 'Johnile')], [{'rel': 'ostmine-NEG'}])


### RelationLayer and Text object

You can use method add_layer to attach the relation layer to the Text object:

In [41]:
from estnltk import Text
from estnltk_core import RelationLayer

In [42]:
text = Text('Mari kirjeldas õhinal, kuidas ta väiksena "Sipsikut" luges: '+\
'"Ma ei suutnud seda raamatut kohe kuidagi käest ära panna! Nii põnev oli see!"').tag_layer('words')

In [43]:
coref_layer = RelationLayer('coreference', span_names=['mention', 'entity'], text_object=text)
# Add relation based on a dictionary
coref_layer.add_annotation( {'mention': (30, 32), 'entity': (0, 4)} )
coref_layer.add_annotation( {'mention': (61, 63), 'entity': (0, 4)} )
# Or add relation by keyword arguments
coref_layer.add_annotation( mention=(75, 88), entity=(42, 52) )
coref_layer.add_annotation( mention=(133, 136), entity=(42, 52) )
text.add_layer(coref_layer)

Now, if you browse Text object's table, you'll also see the table of relation layers:

In [44]:
text

text
"Mari kirjeldas õhinal, kuidas ta väiksena ""Sipsikut"" luges: ""Ma ei suutnud seda raamatut kohe kuidagi käest ära panna! Nii põnev oli see!"""

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,30
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,30

layer name,span_names,attributes,ambiguous,relation count
coreference,"mention, entity",,False,4


In [45]:
# list span layers
text.layers

{'compound_tokens', 'tokens', 'words'}

In [46]:
# list relation layers
text.relation_layers

{'coreference'}

Use pop_layer to remove a relation layer. The method returns removed layer:

In [47]:
text.pop_layer('coreference')

layer name,span_names,attributes,ambiguous,relation count
coreference,"mention, entity",,False,4

mention,entity
ta,Mari
Ma,Mari
seda raamatut,"""Sipsikut"""
see,"""Sipsikut"""


In [48]:
# list relation layers
text.relation_layers

set()

---
## Technical comparisons

---

## Comparison: Layer vs RelationLayer

|       | (Base)Layer | RelationLayer |
|-------|-------|-------|
| **attributes** | name<br><br> default_values<br><br>---<br><br> attributes<br><br> parent<br><br> enveloping<br><br>ambiguous<br><br> text_object<br><br> serialisation_module<br><br> secondary_attributes<br><br> meta | name<br><br>--- (not impl, but could be added?)<br><br>span_names<br><br> attributes<br><br> ---<br><br> ---<br><br> ambiguous<br><br> text_object<br><br> serialisation_module<br><br> secondary_attributes<br><br> meta |
| **@properties** | layer _(why?!)_<br><br> start<br><br> end<br><br> spans<br><br> span_level<br><br> text<br><br> enclosing_text | ---<br><br> ---<br><br> ---<br><br> relations<br><br> span_level<br><br> ---<br><br> --- |
| **overridden built-in methods** | `__deepcopy__`<br><br> `__setattr__`<br><br> `__setitem__`<br><br> `__getattr__` (call raises exception)<br><br> `__getitem__`<br><br> `__delitem__`<br><br> `__iter__`<br><br> `__len__`<br><br> `__eq__`<br><br> `__repr__`<br><br> `_repr_html_`<br><br> | `__deepcopy__`<br><br> `__setattr__`<br><br> `__setitem__`<br><br> `__getattr__` (call raises exception)<br><br> `__getitem__`<br><br> `__delitem__`<br><br> `__iter__`<br><br> `__len__`<br><br> `__eq__`<br><br> `__repr__`<br><br> `_repr_html_`<br><br> |
| **indexing calls (`__getitem__`)** | `layer[index]`<br><br> `layer[ parent_layer[0].base_span ]`<br><br> `layer[attribute(s)]`<br><br> `layer[indexes, attributes]`<br><br> `layer[start:end]`<br><br>`layer[list_of_bools]`<br><br>`layer[list_of_indexes]`<br><br>`layer[list_of_base_span]`<br><br> `layer[selector_function]`  |  `layer[index]`<br><br> ---<br><br> `layer[list_of_span_names_and_attributes]`<br><br> --- (should be added?) <br><br> `layer[start:end]`<br><br>---<br><br>---<br><br>---<br><br> --- |
| **span / annotation manipulation / access methods** | `add_span(span)`<br><br> `add_annotation(base_span, attribute_dict)`<br><br> `remove_span(span)`<br><br> `clear_spans()`<br><br> `get(span(s))`<br><br> ` attribute_values(attributes)`<br><br> | --- <br><br> `add_annotation(relation_dict)`<br><br> `remove_relation(relation)`<br><br> `clear_relations()` <br><br> `get(named_spans)`<br><br> --- <br><br> |
| **get(self, item)** | Finds and returns Span (or EnvelopingSpan) corresponding to the given (Base)Span item(s). If this layer is empty, returns None. If the parameter item is a sequence of BaseSpans, then returns a new Layer populated with specified spans and bearing the same configuration as this layer. | Finds and returns a single Relation corresponding to the given list of NamedSpan(s). Alternatively, list of tuples (span_name, BaseSpan) can be the input parameter. If this layer is empty or Relation was not found, returns None. |
| **other methods** | `check_span_consistency()`<br><br> `diff(other)`<br><br> `ancestor_layers()`<br><br> `descendant_layers()`<br><br> `count_values(...)`<br><br> `group_by(...)`<br><br> `rolling(...)`<br><br> `resolve_attribute(...)`<br><br> `display(...)`<br><br> |  --- (TODO)<br><br> `diff(other)`<br><br>  --- <br><br> ---<br><br> ---<br><br> ---<br><br> ---<br><br> --- (TODO?)<br><br> --- (TODO?)<br><br> |

---

## Comparison: (Enveloping)Span vs Relation

Preamble:
* each Relation must have at least one NamedSpan, and at least one RelationAnnotation;
* Relation does not need to have all spans defined by RelationLayer, some spans (but not all) can be empty/unassigned;


|       | (Enveloping)Span | Relation |
|-------|-------|-------|
| **@properties** | `annotations`<br><br> `spans` (only if enveloping)<br><br> ---<br><br> ---<br><br> `parent`<br><br> `layer`<br><br> ---<br><br> `legal_attribute_names`<br><br> `start`<br><br> `end`<br><br> `base_span`<br><br> `base_spans`<br><br> `text`<br><br> `enclosing_text`<br><br> `text_object`<br><br> `raw_text` (this is text_object.text) | `annotations`<br><br> `spans` (only assigned spans)<br><br> `span_names` (only assigned span names)<br><br> `span_level`<br><br> ---<br><br> `relation_layer`<br><br> `legal_span_names`<br><br> `legal_attribute_names`<br><br> ---<br><br> ---<br><br> ---<br><br> `base_spans`<br><br> `text`<br><br> ---<br><br> `text_object`<br><br> --- |
| **overridden built-in methods** | `__deepcopy__`<br><br> `__setattr__`<br><br> ---<br><br> `__getattr__`<br><br> `__getitem__` (get annotation(s))<br><br>  `__iter__` (only if enveloping)<br><br> `__len__` (only if enveloping)<br><br> `__contains__` (only if enveloping)<br><br> `__lt__`<br><br> `__eq__`<br><br> `__repr__`<br><br> `_repr_html_`<br><br> | `__deepcopy__`<br><br> `__setattr__`<br><br> `__setitem__` (set named span only)<br><br> `__getattr__` (get named span only)<br><br> `__getitem__` (get annotations and/or named spans)<br><br>  `__iter__` (over assigned spans, TODO: is it a good idea?)<br><br> `__len__` (number of assigned spans, TODO: is it a good idea?)<br><br>  `__contains__` (check for existence of named span)<br><br> ---<br><br> `__eq__`<br><br> `__repr__`<br><br> --- (TODO)<br><br> |
| **span / annotation manipulation / access methods** | ---<br><br> ---<br><br> `add_annotation(attribute_dict)`<br><br> `del_annotation(annotation)`<br><br> `clear_annotations()`<br><br> `resolve_attribute(item)`<br><br> | `set_span(name, base_span)`<br><br> `remove_span(name)`<br><br> `add_annotation(attribute_dict)`<br><br> `del_annotation(annotation)`<br><br> `clear_annotations()`<br><br> --- (TODO?)<br><br>  |

---

## Comparison: (Enveloping)Span vs NamedSpan

* main difference between (Enveloping)Span and NamedSpan is that NamedSpan does not have annotations -- annotations belong to relation, not to span;

|       | (Enveloping)Span | NamedSpan |
|-------|-------|-------|
| **@properties** | ---<br><br> `annotations`<br><br> `spans` (only if enveloping)<br><br> `parent`<br><br> ---<br><br> `layer`<br><br>  `legal_attribute_names`<br><br> `start`<br><br> `end`<br><br> `base_span`<br><br> `base_spans`<br><br> `text`<br><br> `enclosing_text`<br><br> `text_object`<br><br> `raw_text` (this is text_object.text) <br><br> --- <br><br> | `name`<br><br> ---<br><br> --- <br><br> ---<br><br> `relation`<br><br> `relation_layer`<br><br> ---<br><br> `start`<br><br> `end`<br><br> `base_span`<br><br> `base_spans`<br><br> `text`<br><br> `enclosing_text`<br><br> `text_object`<br><br> `raw_text` (this is text_object.text) <br><br> `as_tuple` (returns: name, base_span)<br><br> |
| **overridden built-in methods** | `__deepcopy__`<br><br> `__setattr__`<br><br> `__getattr__`<br><br> `__getitem__` (get annotation(s))<br><br>  `__iter__` (only if enveloping)<br><br> `__len__` (only if enveloping)<br><br> `__contains__` (only if enveloping)<br><br> `__lt__`<br><br> `__eq__`<br><br> `__repr__`<br><br> `_repr_html_`<br><br> | `__deepcopy__`<br><br> `__setattr__`<br><br> `__getattr__` (raises exception)<br><br> ---<br><br>  ---<br><br> ---<br><br>  ---<br><br> `__lt__`<br><br> `__eq__`<br><br> `__repr__`<br><br> --- (TODO)<br><br> |
| **span / annotation manipulation / access methods** | `add_annotation(attribute_dict)`<br><br> `del_annotation(annotation)`<br><br> `clear_annotations()`<br><br> `resolve_attribute(item)`<br><br> |  ---<br><br> ---<br><br> ---<br><br> --- (this could be useful, TODO)<br><br>  |

---

## Comparison: Annotation vs RelationAnnotation

|       | Annotation | RelationAnnotation |
|-------|-------|-------|
| **@properties** | `span` <br><br>  `layer`<br><br>  `legal_attribute_names`<br><br> `start`<br><br> `end`<br><br> `text`<br><br> `text_object`<br><br> | `relation`<br><br> `relation_layer`<br><br> `legal_attribute_names` <br><br> ---<br><br> ---<br><br> ---<br><br> `text_object`<br><br> |
| **overridden built-in methods** | `__deepcopy__`<br><br> `__setattr__`<br><br> `__setitem__`<br><br> `__getitem__`<br><br>  `__iter__`<br><br> `__len__`<br><br> `__contains__`<br><br> `__delattr__`<br><br> `__delitem__`<br><br> `__eq__`<br><br> `__repr__`<br><br> | `__deepcopy__`<br><br> `__setattr__`<br><br> `__setitem__`<br><br> `__getitem__`<br><br>  `__iter__`<br><br> `__len__`<br><br> `__contains__`<br><br> `__delattr__`<br><br> `__delitem__`<br><br> `__eq__`<br><br> `__repr__`<br><br> |