# Tutorial for RelExtractor
This tutorial describes the main steps and options for the usage of RelExtractor as part of the BioTermCategorizer library.

### Importing the library
In order to import the library and all its functions, the following code should be executed

In [1]:
import sys, os, re

#set the path to the library
general_path = os.getcwd().split("BioTermCategorizer")[0]+"BioTermCategorizer/"
sys.path.append(general_path+'biotermcategorizer/')
general_path

'/mnt/c/Users/Sergi/Documents/BioTermCategorizer/'

Additionally, the main class `RelExtractor` must be imported

In [2]:
from RelExtractor import RelExtractor

  from .autonotebook import tqdm as notebook_tqdm


### Using RelExtractor's basic functions
#### The RelExtractor object
A first instance of the class `RelExtractor` must be assigned to a variable, as shown below. The parameters that can be introduced are:
- `relation_method` (str): Method for mentions relation. Default value: `"transformers"`.
- `language` (str): Language for text processing. Only implemented in Spanish. Default value: `"spanish"`.
- `n` (int): Maximum number of labels for a single relation. Default value: `1`.
- `thr_setfit` (float): Threshold for SetFit terms relation. Default value: `0.5`.
- `thr_transformers` (float): Threshold for Transformers terms relation. Default value: `-1`.
- `all_combinations` (bool): Whether to compute the relations of all the mentions in source with all the mentions in target. Default value: `False`.
- `model_path` (str): Path to the relator model. Default value: `None`.
- `**kwargs` (dict): Additional keyword arguments. Default value: `None`.

An instance of the object TermExtractor with the default parameters is shown below:

In [3]:
relextractor = RelExtractor()

Once the first instance of the extractor is created, some attributes of the `RelExtractor` object can already be presented. The following attributes can be extracted from the `RelExtractor` object through dot indexing:
- `relation_method` (str): Selected relation method.
- `rel_extractor` (Relator): Relator() class type object to be used.
- `all_combinations` (bool): Provided value for all_combinations as a parameter.

In [4]:
#attributes with the default settings
print("The default settings use the relation extractor:", str(relextractor.relation_method))
print("The selected relator is", str(relextractor.rel_extractor))
print("The parameter all_combinations is", str(relextractor.all_combinations))

The default settings use the relation extractor: transformers
The selected relator is <relators.TransformersRelator.TransformersRelator object at 0x7f231912ed70>
The parameter all_combinations is False


When the RelExtractor object is called on two mentions or lists of mentions, it automatically extracts their relations and stores them in the attribute RelExtractor.relations, which is a list of Relation class objects. The mentions given in target and source can consist of either a string, a Keyword class object or a list of any of those classes. For further information on the Keyword class object refer to the Tutorial on the use of TermExtractor.

In [5]:
source = ["cancer", "enfermedad de pulmón", "mastectomía radical izquierda"]
target = ["cancer de mama", "enfermedad pulmonar", "mastectomía"]

relextractor(source, target)

#### The Relation class
The objects of type `Relation` also have different attributes that can be called through dot indexing:
- `source` (str, list or Keyword): The input source mention(s).
- `target` (str, list or Keyword): The input target mention(s).
- `rel_type` (list): Label or labels assigned to the relation between mentions.
- `relation_method`(str): The relation classification method used to find the relation between mentions.

The user can either access these attributes of each keyword individually or print the keyword directly, which returns all the attributes in a structured manner.

In [6]:
#printing the relations directly
print("Printing the relations directly:\n", relextractor.relations)

#printing attributes individually
relations = [(rel.source.text,rel.target.text,rel.rel_type,rel.relation_method) for rel in relextractor.relations]
print("\nPrinting the relations individually:\n", relations)

Printing the relations directly:
 [<Relation(source mention='cancer', target mention='cancer de mama', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='transformers')>]

Printing the relations individually:
 [('cancer', 'cancer de mama', ['BROAD'], 'transformers'), ('enfermedad de pulmón', 'enfermedad pulmonar', ['EXACT'], 'transformers'), ('mastectomía radical izquierda', 'mastectomía', ['NARROW'], 'transformers')]


### Use of different relators
The mention relation implemented in `RelExtractor` is performed in an supervised manner. The two relation classification models are SetFit and AutoModelForSequenceClassification, both of the Sentence Transformers library. In addition to that, the parameters `n`, `thr_setfit`, `thr_transformers` and `all_combinations` can be modified.

Some examples showing these options are shown below:

In [7]:
#relextractor using default values (transformers model, with n=1 and thr=-1) and all_combinations=False
relextractor1 = RelExtractor()

#relextractor using the setfit model, with n=2 and thr=0.05 and all_combinations=False
relextractor2 = RelExtractor(relation_method="setfit", n=2, thr_setfit=0.05)

#relextractor using default values (transformers model, with n=1 and thr=-1) and all_combinations=True
relextractor3 = RelExtractor(all_combinations=True)

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


The relations extracted by each of these relextractor instances are shown below:

In [8]:
#with transformers model and default values
relextractor1(source, target)
print("Using relextractor1:\n", relextractor1.relations, "\n")

#with setfit model, n=2 and thr=0.05
relextractor2(source, target)
print("Using relextractor2:\n", relextractor2.relations, "\n")

#with default values and all_combinations=True
relextractor3(source, target)
print("Using relextractor3:\n", relextractor3.relations, "\n")

Using relextractor1:
 [<Relation(source mention='cancer', target mention='cancer de mama', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='transformers')>] 

Using relextractor2:
 [<Relation(source mention='cancer', target mention='cancer de mama', relation type='['NARROW']', relation method='setfit')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['NARROW']', relation method='setfit')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='setfit')>] 

Using relextractor3:
 [<Relation(source mention='cancer', target mention='cancer de mama', relation type='['BROA

In [10]:
relextractor1 = RelExtractor(model_path='/mnt/c/Users/Sergi/Desktop/BSC/modelos_entrenados/transformers_rel1')

relextractor2 = RelExtractor(model_path='/mnt/c/Users/Sergi/Desktop/BSC/modelos_entrenados/transformers_rel2')

relextractor3 = RelExtractor(model_path='/mnt/c/Users/Sergi/Desktop/BSC/modelos_entrenados/transformers_rel3')

In [12]:
source = ['laparoscopia','enfermedad pulmonar obstructiva crónica','enfermedad pulmonar obstructiva crónica','neoplasia globular de mama triple negativo','anestesia general', 'pneumonía unilateral','cirurgía torácica']
target = ['mastectomía radical izquierda','colonoscopia','pneumonía bilateral','endoscopia','anestesia local','pneumonía bilateral','angiografía biplanar']

#with transformers model and default values
relextractor1(source, target)
print("Using relextractor1:\n", relextractor1.relations, "\n")

#with setfit model, n=2 and thr=0.05
relextractor2(source, target)
print("Using relextractor2:\n", relextractor2.relations, "\n")

#with default values and all_combinations=True
relextractor3(source, target)
print("Using relextractor3:\n", relextractor3.relations, "\n")

Using relextractor1:
 [<Relation(source mention='laparoscopia', target mention='mastectomía radical izquierda', relation type='[]', relation method='transformers')>, <Relation(source mention='enfermedad pulmonar obstructiva crónica', target mention='colonoscopia', relation type='[]', relation method='transformers')>, <Relation(source mention='enfermedad pulmonar obstructiva crónica', target mention='pneumonía bilateral', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='neoplasia globular de mama triple negativo', target mention='endoscopia', relation type='[]', relation method='transformers')>, <Relation(source mention='anestesia general', target mention='anestesia local', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='pneumonía unilateral', target mention='pneumonía bilateral', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='cirurgía torácica', target mention='angiograf