# Tutorial for RelExtractor
This tutorial describes the main steps and options for the usage of RelExtractor as part of the KeyCARE library.

### Importing the library
First, make sure you have installed the KeyCARE library in your device:

In [None]:
pip install keycare

Then, you can already start importing the main classes from the KeyCARE library! In this tutorial, we will be focusing on the use of the `RelExtractor` class:

In [1]:
from keycare.RelExtractor import RelExtractor

### Using RelExtractor's basic functions
#### The RelExtractor object
A first instance of the class `RelExtractor` must be assigned to a variable, as shown below. The parameters that can be introduced are:
- `relation_method` (str): Method for mentions relation. Default value: `"transformers"`.
- `language` (str): Language for text processing. Only implemented in Spanish. Default value: `"spanish"`.
- `n` (int): Maximum number of labels for a single relation. Default value: `1`.
- `thr_setfit` (float): Threshold for SetFit terms relation. Default value: `0.5`.
- `thr_transformers` (float): Threshold for Transformers terms relation. Default value: `-1`.
- `all_combinations` (bool): Whether to compute the relations of all the mentions in source with all the mentions in target. Default value: `False`.
- `model_path` (str): Path to the relator model. Default value: `None`.
- `**kwargs` (dict): Additional keyword arguments. Default value: `None`.

An instance of the object TermExtractor with the default parameters is shown below:

In [2]:
relextractor = RelExtractor()

Once the first instance of the extractor is created, some attributes of the `RelExtractor` object can already be presented. The following attributes can be extracted from the `RelExtractor` object through dot indexing:
- `relation_method` (str): Selected relation method.
- `rel_extractor` (Relator): Relator() class type object to be used.
- `all_combinations` (bool): Provided value for all_combinations as a parameter.

In [3]:
#attributes with the default settings
print("The default settings use the relation extractor:", str(relextractor.relation_method))
print("The selected relator is", str(relextractor.rel_extractor))
print("The parameter all_combinations is", str(relextractor.all_combinations))

The default settings use the relation extractor: transformers
The selected relator is <keycare.relators.TransformersRelator.TransformersRelator object at 0x7fa65059b070>
The parameter all_combinations is False


When the RelExtractor object is called on two mentions or lists of mentions, it automatically extracts their relations and stores them in the attribute RelExtractor.relations, which is a list of Relation class objects. The mentions given in target and source can consist of either a string, a Keyword class object or a list of any of those classes. For further information on the Keyword class object refer to the Tutorial on the use of TermExtractor.

In [4]:
source = ["cáncer", "enfermedad de pulmón", "mastectomía radical izquierda", "laparoscopia"]
target = ["cáncer de mama", "enfermedad pulmonar", "mastectomía", "Streptococus pneumoniae"]

relextractor(source, target)

#### The Relation class
The objects of type `Relation` also have different attributes that can be called through dot indexing:
- `source` (str, list or Keyword): The input source mention(s).
- `target` (str, list or Keyword): The input target mention(s).
- `rel_type` (list): Label or labels assigned to the relation between mentions.
- `relation_method`(str): The relation classification method used to find the relation between mentions.

The user can either access these attributes of each keyword individually or print the keyword directly, which returns all the attributes in a structured manner.

In [5]:
#printing the relations directly
print("Printing the relations directly:\n", relextractor.relations)

#printing attributes individually
relations = [(rel.source.text,rel.target.text,rel.rel_type,rel.relation_method) for rel in relextractor.relations]
print("\nPrinting the relations individually:\n", relations)

Printing the relations directly:
 [<Relation(source mention='cáncer', target mention='cáncer de mama', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='transformers')>, <Relation(source mention='laparoscopia', target mention='Streptococus pneumoniae', relation type='['NO_RELATION']', relation method='transformers')>]

Printing the relations individually:
 [('cáncer', 'cáncer de mama', ['BROAD'], 'transformers'), ('enfermedad de pulmón', 'enfermedad pulmonar', ['EXACT'], 'transformers'), ('mastectomía radical izquierda', 'mastectomía', ['NARROW'], 'transformers'), ('laparoscopia', 'Streptococus pneumoniae', ['NO_RELATION'], 'transformers')]


### Use of different relators
The mention relation implemented in `RelExtractor` is performed in an supervised manner. The two relation classification models are SetFit and AutoModelForSequenceClassification, both of the Sentence Transformers library. In addition to that, the parameters `n`, `thr_setfit`, `thr_transformers` and `all_combinations` can be modified.

Some examples showing these options  and the extracted relations are shown below:

In [6]:
#relextractor using default values (transformers model, with n=1 and thr=-1) and all_combinations=False
relextractor = RelExtractor()
relextractor(source, target)
print("Relations:\n", relextractor.relations, "\n")

Relations:
 [<Relation(source mention='cáncer', target mention='cáncer de mama', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='transformers')>, <Relation(source mention='laparoscopia', target mention='Streptococus pneumoniae', relation type='['NO_RELATION']', relation method='transformers')>] 



In [7]:
#relextractor using the setfit model, with n=2 and thr=0.05 and all_combinations=False
relextractor = RelExtractor(relation_method="setfit", n=2, thr_setfit=0.05)
relextractor(source, target)
print("Relations:\n", relextractor.relations, "\n")

Downloading config.json:   0%|          | 0.00/763 [00:00<?, ?B/s]

Downloading .gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading 1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading README.md:   0%|          | 0.00/3.32k [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/763 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/128 [00:00<?, ?B/s]

Downloading merges.txt:   0%|          | 0.00/540k [00:00<?, ?B/s]

Downloading model_head.pkl:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/504M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/957 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/2.32M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

Downloading vocab.json:   0%|          | 0.00/894k [00:00<?, ?B/s]

Downloading modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading model_head.pkl:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Relations:
 [<Relation(source mention='cáncer', target mention='cáncer de mama', relation type='['BROAD']', relation method='setfit')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='setfit')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='setfit')>, <Relation(source mention='laparoscopia', target mention='Streptococus pneumoniae', relation type='['NO_RELATION']', relation method='setfit')>] 



In [8]:
#relextractor using default values (transformers model, with n=1 and thr=-1) and all_combinations=True
relextractor = RelExtractor(all_combinations=True)
relextractor(source, target)
print("Relations:\n", relextractor.relations, "\n")

Relations:
 [<Relation(source mention='cáncer', target mention='cáncer de mama', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='cáncer', target mention='enfermedad pulmonar', relation type='['NO_RELATION']', relation method='transformers')>, <Relation(source mention='cáncer', target mention='mastectomía', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='cáncer', target mention='Streptococus pneumoniae', relation type='['NO_RELATION']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='cáncer de mama', relation type='['NO_RELATION']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='mastectomía', relation type='['NO_RELATION']', relation method='transformers')>, <Rela

Already trained models can also be imported from a defined path:

In [9]:
relextractor = RelExtractor(model_path='BSC-NLP4BIA/biomedical-semantic-relation-classifier')
relextractor(source, target)
print("Relations:\n", relextractor.relations, "\n")

Relations:
 [<Relation(source mention='cáncer', target mention='cáncer de mama', relation type='['BROAD']', relation method='transformers')>, <Relation(source mention='enfermedad de pulmón', target mention='enfermedad pulmonar', relation type='['EXACT']', relation method='transformers')>, <Relation(source mention='mastectomía radical izquierda', target mention='mastectomía', relation type='['NARROW']', relation method='transformers')>, <Relation(source mention='laparoscopia', target mention='Streptococus pneumoniae', relation type='['NO_RELATION']', relation method='transformers')>] 

