In this exercise, we are going to fine-tune a model on a supervised **relation extraction** dataset.

The goal of the model is to predict, given a sentence and the character spans of two entities within the sentence, the relationship between the entities.

For example given the sentence:


**John, who played last night, is Doe's father.**


The model given the sentence and the spans of the entities John a Doe the model will have to predict what is the relation between Dohn and Doe from a set of pre-defined relations in this case the relation is parents (please note that some of the relations are one-way relations)

The dataset we will use is a subset of the [TACRED](https://nlp.stanford.edu/projects/tacred/) dataset, a supervised relation extraction dataset by Stanford University. 


As you realize by now, the straight forward supervised approach is just to take one of the transformers and use the sentence and the entity spans as input. However, in this exercise we will try a different approach using **Question Answering (QA)**.

Instead of just using the entities span and the sentence, we will train a model to answer the following questions "Who are the parents of John?","Who are the children of Doe?". If the question answering model will be able to answer the question succusfully than we will be able to conclude that the relation between the two entites exists.

In general **for each realtion** we will need to come up with template questions: In the example above the template questions corresponding to the parents relation are: 

*   Who are the parents of E1?
*   Who are the children of E2?

where E1 and E2 are the entities.



# Your part

You are required to fine-tune a model for relation extraction using the question answering framework.

Notes:


* In previous lectures we have seen a demo notebook demonstrating how to fine-tune a transformer on SQUAD, **from a technical prespective we are doing the same.**
*   For each one of the seven relations in the dataset you will need to find appropriate questions (please note that the questions must be SQUAD-like, meaning that if the answer exists it must be contained within the sentence in a contiguous way.)
* There are several issues that you will need to consider. Please provide a brief explanation whenever you tackle such issues


## Data

Let's download the data from the web, hosted on Dropbox.

In [1]:
%%capture
!pip install -q transformers datasets wandb

In [2]:
import requests, zipfile, io

def download_data():
    url = "https://www.dropbox.com/s/izi2x4sjohpzoot/relation_extraction_dataset.zip?dl=1"
    r = requests.get(url)
    z = zipfile.ZipFile(io.BytesIO(r.content))
    z.extractall()

download_data()

Each row in the dataframe consists of a news article, and a sentence in which a certain relationship was found (just as "invested_in", or "founded_by"). There were some patterns used to gather the data, so it might contain some noise. 

In [3]:
import pandas as pd

df = pd.read_pickle("relation_extraction_dataset.pkl")
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,end_idx,entities,entity_spans,match,original_article,sentence,start_idx,string_id
0,1024,"[Lilium, Baillie Gifford]","[[3, 9], [151, 166]]",raising $35,Happy Friday!\n\nWe sincerely hope you and you...,"3) Lilium, a German startup that’s making an a...",1013,invested_in
1,1762,"[Facebook ’s, Giphy]","[[92, 102], [148, 153]]",acquisition,Happy Friday!\n\nWe sincerely hope you and you...,"Meanwhile, the UK’s watchdog on Friday announc...",1751,acquired_by
2,2784,"[Global-e, Vitruvian Partners]","[[27, 35], [94, 112]]",raised $60,Happy Friday!\n\nWe sincerely hope you and you...,Israeli e-commerce startup Global-e has raised...,2774,invested_in
3,680,"[Joris Van Der Gucht, Silverfin]","[[0, 19], [35, 44]]",founder,Hg is a leading investor in tax and accounting...,"Joris Van Der Gucht, co-founder at Silverfin c...",673,founded_by
4,2070,"[Tim Vandecasteele, Silverfin]","[[0, 17], [71, 80]]",founder,Hg is a leading investor in tax and accounting...,"Tim Vandecasteele, co-founder added: ""We want ...",2063,founded_by


Let's create 2 dictionaries, one that maps each label to a unique integer, and one that does it the other way around.

In [4]:
id2label = dict()
for idx, label in enumerate(df.string_id.value_counts().index):
  id2label[idx] = label

As we can see, there are 7 labels (7 unique relationships):

In [5]:
id2label

{0: 'founded_by',
 1: 'acquired_by',
 2: 'invested_in',
 3: 'CEO_of',
 4: 'subsidiary_of',
 5: 'partners_with',
 6: 'owned_by'}

In [6]:
label2id = {v:k for k,v in id2label.items()}
label2id

{'CEO_of': 3,
 'acquired_by': 1,
 'founded_by': 0,
 'invested_in': 2,
 'owned_by': 6,
 'partners_with': 5,
 'subsidiary_of': 4}

## Prepare train and test datasets with sentences, questions and answers

Let's create a dataset with questions and answers. Each sentence becomes two questions.

In [7]:
questions_about_entity1 = {
    0: 'Which company was founded by {}?',
    1: 'Which company was aquired by {}?',
    2: 'Who invested in {}?',
    3: 'Who is the CEO of {}?',
    4: 'Which company is the subsidiary of {}?',
    5: 'Which company partners with {}?',
    6: 'Which company is owned by {}?'}

questions_about_entity2 = {
    0: 'Who founded {}?',
    1: 'Which company has aquired {}?',
    2: 'Which company has received investition from {}?',
    3: '{} is CEO of which company?',
    4: 'Which company is parent company of {}?',
    5: 'Which company partners with {}?',
    6: 'Which company owns {}?'}

In [8]:
import uuid

# create dataset with 2 correct questions for each sentence
def create_dataset(df, is_train=True):
  dataset = []
  for row_idx, row in df.iterrows():
    relation_gt_id = label2id[row["string_id"]]
    entities = row["entities"]
    entity_spans = row["entity_spans"]
    id = str(uuid.uuid4()).replace("-", "")
    list_relations = [1,2]
    if relation_gt_id not in list_relations:
      list_relations.append(relation_gt_id)
    # 4 hours of training!
    # reduced it for simplicity
    # for relation_id in range(7):
    for relation_id in list_relations:
        incorrect = relation_gt_id!=relation_id

        q1 = questions_about_entity1[relation_id].format(entities[0])
        answers1 = {'answer_start': [entity_spans[0][0]] if not incorrect else [], 'text': [entities[0]] if not incorrect else []}
        dataset.append({
            "context":row["sentence"], 
            "question":q1, 
            "answers":answers1, 
            "relation_id": relation_id,
            "relation_gt_id":relation_gt_id,
            "id": id
            })
        
        q2 = questions_about_entity2[relation_id].format(entities[1])
        answers2 = {'answer_start': [entity_spans[1][0]] if not incorrect else [], 'text': [entities[1]] if not incorrect else []}
        # we try to preserve squad dataset format
        dataset.append({
                "context":row["sentence"], 
                "question":q2, 
                "answers":answers2, 
                "relation_id": relation_id,
                "relation_gt_id":relation_gt_id,
                "id": id
            })

  result = pd.DataFrame(dataset)
  return result

In [9]:
# for simplicity, but is better in this case use KFold
from sklearn.model_selection import train_test_split 
from sklearn.utils import shuffle

In [10]:
question_df = create_dataset(df)
question_df.head(5)

Unnamed: 0,context,question,answers,relation_id,relation_gt_id,id
0,"3) Lilium, a German startup that’s making an a...",Which company was aquired by Lilium?,"{'answer_start': [], 'text': []}",1,2,3b36e7255ee04f45a3560ca9b2c56758
1,"3) Lilium, a German startup that’s making an a...",Which company has aquired Baillie Gifford?,"{'answer_start': [], 'text': []}",1,2,3b36e7255ee04f45a3560ca9b2c56758
2,"3) Lilium, a German startup that’s making an a...",Who invested in Lilium?,"{'answer_start': [3], 'text': ['Lilium']}",2,2,3b36e7255ee04f45a3560ca9b2c56758
3,"3) Lilium, a German startup that’s making an a...",Which company has received investition from Ba...,"{'answer_start': [151], 'text': ['Baillie Giff...",2,2,3b36e7255ee04f45a3560ca9b2c56758
4,"Meanwhile, the UK’s watchdog on Friday announc...",Which company was aquired by Facebook ’s?,"{'answer_start': [92], 'text': ['Facebook ’s']}",1,1,6ac23091545544ff9ab2f89908e7e3a9


In [53]:
# test_df = question_df[int(len(question_df)*0.9):]
test_df = question_df[55984:].reset_index(drop=True) # question bucket starts since this idx

In [12]:
train_df = question_df[:55984].reset_index(drop=True)

In [13]:
test_df.columns

Index(['context', 'question', 'answers', 'relation_id', 'relation_gt_id',
       'id'],
      dtype='object')

In [14]:
train_df, val_df = train_test_split(train_df, test_size=0.2, stratify=train_df["relation_id"])

## Training

In [15]:
from datasets import load_dataset, load_metric, Dataset, DatasetDict
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [16]:
Dataset.from_pandas(train_df).remove_columns('__index_level_0__')

Dataset({
    features: ['context', 'question', 'answers', 'relation_id', 'relation_gt_id', 'id'],
    num_rows: 49760
})

In [55]:
datasets =  DatasetDict(
    dict(train=Dataset.from_pandas(train_df).remove_columns('__index_level_0__'), 
         validation=Dataset.from_pandas(val_df).remove_columns('__index_level_0__'),
         test=Dataset.from_pandas(test_df)))

In [18]:
squad_v2 = True

In [19]:
LOAD_ALREADY_TRAINED = False

if LOAD_ALREADY_TRAINED:
   # we provide already trained model for fast checking
   !gdown https://drive.google.com/uc?id=1-8KWfSFqmX4dZ4jesdLUmv4AYTRxDd72
   !unzip tacred-qa-trained.zip
   model_checkpoint = "tacred-qa-trained"
else:
   model_checkpoint = "bert-base-uncased" # "distilbert-base-uncased"
batch_size = 16

In [20]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=28.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=570.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=466062.0, style=ProgressStyle(descripti…




In [21]:
max_length = 384 # The maximum length of a feature (question and context)
doc_stride = 128 # The authorized overlap between two part of the context when splitting it is needed.

In [22]:
for i, example in enumerate(datasets["train"]):
    if len(tokenizer(example["question"], example["context"])["input_ids"]) > 384 and len(example["answers"]["answer_start"])>0:
        break
example = datasets["train"][i]  
len(tokenizer(example["question"], example["context"])["input_ids"])

391

In [23]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=max_length,
    truncation="only_second",
    return_overflowing_tokens=True,
    stride=doc_stride
)

In [24]:
[len(x) for x in tokenized_example["input_ids"]]

[384, 148]

In [25]:
for x in tokenized_example["input_ids"][:2]:
    print(tokenizer.decode(x))

[CLS] loeb. nyc is ceo of which company? [SEP] lucas timberlake, fintech south innovation chairman co - chair & partner at fintech venture fund growing nest eggs from acorns : fintech, regulation & financial health with kate rooney, markets reporter for cnbc ; manning field, member, cfpb consumer advisory board and chief operating officer for acorns and kathy kraninger, director of the consumer financial protection bureau staying steady for working class americans with dawn kissi, co - founder and editor of emerging market media ; michael loeb, founder and ceo of loeb. nyc and co - founder and chairman of steady and adam roseman, co - founder and ceo of steady cloud for good : how tech is enabling regulatory initiatives to support minority banks & communities with youssef sneifer, strategist and advisory counsel for perkins coie and bill borden, corporate vice president, worldwide financial services for microsoft digital wealth : using digital tools to expand access & improve client ou

In [26]:
tokenized_example = tokenizer(
    example["question"],
    example["context"],
    max_length=max_length,
    truncation="only_second",
    return_overflowing_tokens=True,
    return_offsets_mapping=True,
    stride=doc_stride
)
print(tokenized_example["offset_mapping"][0][:100])

[(0, 0), (0, 2), (2, 4), (4, 5), (5, 8), (9, 11), (12, 15), (16, 18), (19, 24), (25, 32), (32, 33), (0, 0), (0, 5), (6, 12), (12, 16), (16, 17), (18, 21), (21, 25), (26, 31), (32, 42), (43, 51), (52, 54), (54, 55), (55, 60), (61, 62), (63, 70), (71, 73), (74, 77), (77, 81), (82, 89), (90, 94), (95, 102), (103, 107), (108, 112), (113, 117), (118, 120), (120, 123), (123, 124), (124, 125), (126, 129), (129, 133), (133, 134), (135, 145), (146, 147), (148, 157), (158, 164), (165, 169), (170, 174), (175, 181), (181, 182), (183, 190), (191, 199), (200, 203), (204, 206), (206, 208), (208, 209), (210, 217), (218, 223), (223, 224), (225, 231), (231, 232), (233, 235), (235, 236), (236, 237), (238, 246), (247, 255), (256, 261), (262, 265), (266, 271), (272, 281), (282, 289), (290, 293), (294, 296), (296, 299), (299, 300), (301, 304), (305, 310), (311, 312), (312, 316), (316, 320), (320, 321), (322, 330), (331, 333), (334, 337), (338, 346), (347, 356), (357, 367), (368, 374), (375, 382), (383, 389)

In [27]:
first_token_id = tokenized_example["input_ids"][0][1]
offsets = tokenized_example["offset_mapping"][0][1]
print(tokenizer.convert_ids_to_tokens([first_token_id])[0], example["question"][offsets[0]:offsets[1]])

lo Lo


In [28]:
sequence_ids = tokenized_example.sequence_ids()
print(sequence_ids)

[None, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, None, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

In [29]:
answers = example["answers"]
start_char = answers["answer_start"][0]
end_char = start_char + len(answers["text"][0])

# Start token index of the current span in the text.
token_start_index = 0
while sequence_ids[token_start_index] != 1:
    token_start_index += 1

# End token index of the current span in the text.
token_end_index = len(tokenized_example["input_ids"][0]) - 1
while sequence_ids[token_end_index] != 1:
    token_end_index -= 1

# Detect if the answer is out of the span (in which case this feature is labeled with the CLS index).
offsets = tokenized_example["offset_mapping"][0]
if (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char):
    # Move the token_start_index and token_end_index to the two ends of the answer.
    # Note: we could go after the last offset if the answer is the last word (edge case).
    while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char:
        token_start_index += 1
    start_position = token_start_index - 1
    while offsets[token_end_index][1] >= end_char:
        token_end_index -= 1
    end_position = token_end_index + 1
    print(start_position, end_position)
else:
    print("The answer is not in this feature.")

117 120


In [30]:
print(tokenizer.decode(tokenized_example["input_ids"][0][start_position: end_position+1]))
print(answers["text"][0])

loeb. nyc
Loeb.nyc


In [31]:
pad_on_right = tokenizer.padding_side == "right"

In [32]:
def prepare_train_features(examples):
    # Tokenize our examples with truncation and padding, but keep the overflows using a stride. This results
    # in one example possible giving several features when a context is long, each of those features having a
    # context that overlaps a bit the context of the previous feature.
    tokenized_examples = tokenizer(
        examples["question" if pad_on_right else "context"],
        examples["context" if pad_on_right else "question"],
        truncation="only_second" if pad_on_right else "only_first",
        max_length=max_length,
        stride=doc_stride,
        return_overflowing_tokens=True,
        return_offsets_mapping=True,
        padding="max_length",
    )

    # Since one example might give us several features if it has a long context, we need a map from a feature to
    # its corresponding example. This key gives us just that.
    sample_mapping = tokenized_examples.pop("overflow_to_sample_mapping")
    # The offset mappings will give us a map from token to character position in the original context. This will
    # help us compute the start_positions and end_positions.
    offset_mapping = tokenized_examples.pop("offset_mapping")

    # Let's label those examples!
    tokenized_examples["start_positions"] = []
    tokenized_examples["end_positions"] = []

    for i, offsets in enumerate(offset_mapping):
        # We will label impossible answers with the index of the CLS token.
        input_ids = tokenized_examples["input_ids"][i]
        cls_index = input_ids.index(tokenizer.cls_token_id)

        # Grab the sequence corresponding to that example (to know what is the context and what is the question).
        sequence_ids = tokenized_examples.sequence_ids(i)

        # One example can give several spans, this is the index of the example containing this span of text.
        sample_index = sample_mapping[i]
        answers = examples["answers"][sample_index]
        # If no answers are given, set the cls_index as answer.
        if len(answers["answer_start"]) == 0:
            tokenized_examples["start_positions"].append(cls_index)
            tokenized_examples["end_positions"].append(cls_index)
        else:
            # Start/end character index of the answer in the text.
            start_char = answers["answer_start"][0]
            end_char = start_char + len(answers["text"][0])

            # Start token index of the current span in the text.
            token_start_index = 0
            while sequence_ids[token_start_index] != (1 if pad_on_right else 0):
                token_start_index += 1

            # End token index of the current span in the text.
            token_end_index = len(input_ids) - 1
            while sequence_ids[token_end_index] != (1 if pad_on_right else 0):
                token_end_index -= 1

            # Detect if the answer is out of the span (in which case this feature is labeled with the CLS index).
            if not (offsets[token_start_index][0] <= start_char and offsets[token_end_index][1] >= end_char):
                tokenized_examples["start_positions"].append(cls_index)
                tokenized_examples["end_positions"].append(cls_index)
            else:
                # Otherwise move the token_start_index and token_end_index to the two ends of the answer.
                # Note: we could go after the last offset if the answer is the last word (edge case).
                while token_start_index < len(offsets) and offsets[token_start_index][0] <= start_char:
                    token_start_index += 1
                tokenized_examples["start_positions"].append(token_start_index - 1)
                while offsets[token_end_index][1] >= end_char:
                    token_end_index -= 1
                tokenized_examples["end_positions"].append(token_end_index + 1)

    return tokenized_examples

In [33]:
features = prepare_train_features(datasets['train'][:5])

In [34]:
datasets["train"].column_names

['context', 'question', 'answers', 'relation_id', 'relation_gt_id', 'id']

In [35]:
cols = ['context',
 'question',
 'answers',
 'relation_id',
 'relation_gt_id',
 'id']

In [36]:
tokenized_datasets = datasets.map(prepare_train_features, batched=True, remove_columns=cols)

HBox(children=(FloatProgress(value=0.0, max=50.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=13.0), HTML(value='')))




In [37]:
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=440473133.0, style=ProgressStyle(descri…




Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForQuestionAnswering: ['cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased a

In [38]:
args = TrainingArguments(
    f"test-squad",
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=3,
    weight_decay=0.01,
    report_to="wandb",
    run_name="bert-based-uncase-tacred-qa",
    load_best_model_at_end=True,
    save_total_limit=2
)

In [39]:
from transformers import default_data_collator

data_collator = default_data_collator

In [40]:
trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

In [41]:
import wandb

if not LOAD_ALREADY_TRAINED:
  wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [42]:
%env WANDB_ENTITY=vova_ydata
%env WANDB_PROJECT=tacred_qa
%env WANDB_LOG_MODEL=true

env: WANDB_ENTITY=vova_ydata
env: WANDB_PROJECT=tacred_qa
env: WANDB_LOG_MODEL=true


In [43]:
if not LOAD_ALREADY_TRAINED:
  trainer.train()

***** Running training *****
  Num examples = 49781
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 9336
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[34m[1mwandb[0m: Currently logged in as: [33mvova_ydata[0m (use `wandb login --relogin` to force relogin)


Epoch,Training Loss,Validation Loss
1,0.0515,0.04121
2,0.0218,0.035791
3,0.011,0.035531


***** Running Evaluation *****
  Num examples = 12455
  Batch size = 16
Saving model checkpoint to test-squad/checkpoint-3112
Configuration saved in test-squad/checkpoint-3112/config.json
Model weights saved in test-squad/checkpoint-3112/pytorch_model.bin
tokenizer config file saved in test-squad/checkpoint-3112/tokenizer_config.json
Special tokens file saved in test-squad/checkpoint-3112/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 12455
  Batch size = 16
Saving model checkpoint to test-squad/checkpoint-6224
Configuration saved in test-squad/checkpoint-6224/config.json
Model weights saved in test-squad/checkpoint-6224/pytorch_model.bin
tokenizer config file saved in test-squad/checkpoint-6224/tokenizer_config.json
Special tokens file saved in test-squad/checkpoint-6224/special_tokens_map.json
***** Running Evaluation *****
  Num examples = 12455
  Batch size = 16
Saving model checkpoint to test-squad/checkpoint-9336
Configuration saved in test-squad/checkpoi

TrainOutput(global_step=9336, training_loss=0.06348104211848929, metrics={'train_runtime': 3742.6932, 'train_samples_per_second': 39.903, 'train_steps_per_second': 2.494, 'total_flos': 3.746865041694259e+16, 'train_loss': 0.06348104211848929, 'epoch': 3.0})

logs are here https://wandb.ai/vova_ydata/tacred_qa/runs/oqvqiras?workspace=user-vova_ydata

In [44]:
trainer.save_model("tacred-qa-trained")

Saving model checkpoint to tacred-qa-trained
Configuration saved in tacred-qa-trained/config.json
Model weights saved in tacred-qa-trained/pytorch_model.bin
tokenizer config file saved in tacred-qa-trained/tokenizer_config.json
Special tokens file saved in tacred-qa-trained/special_tokens_map.json


In [None]:
# %load_ext tensorboard

In [None]:
# %tensorboard --logdir=runs

## Inference

In [45]:
import numpy as np
from transformers import pipeline

In [59]:
record = datasets["test"][15]
record

{'answers': {'answer_start': [42], 'text': ['MuleSoft']},
 'context': 'He joined in 2018 after Salesforce bought MuleSoft for $6.5 billion, where he had been chairman and CEO since 2009.',
 'id': '27e19d74117b42b6918577e014a7578a',
 'question': 'Which company has aquired MuleSoft?',
 'relation_gt_id': 1,
 'relation_id': 1}

In [63]:
question_answering = pipeline("question-answering", model=model, tokenizer=tokenizer, device=0)

In [64]:
question_answering(
    context=record["context"],
    question=record["question"],
    handle_impossible_answer=True)

{'answer': 'MuleSoft', 'end': 50, 'score': 0.999967634677887, 'start': 42}

let's check when there is no answer

In [65]:
record = datasets["test"][21]
record

{'answers': {'answer_start': [], 'text': []},
 'context': "MuleSoft has become a key part of Salesforce's product portfolio.",
 'id': '07d63e00d34c4a6f8b0821b07aff86f4',
 'question': 'Which company has received investition from Salesforce?',
 'relation_gt_id': 4,
 'relation_id': 2}

In [66]:
question_answering(
    context=record["context"],
    question=record["question"],
    handle_impossible_answer=True)

{'answer': '', 'end': 0, 'score': 0.999984622001648, 'start': 0}

In [67]:
from tqdm import tqdm

In [68]:
preds = []
for i, record in tqdm(test_df.iterrows(), total=len(test_df)):
  answer_pred = question_answering(
    context=record["context"],
    question=record["question"],
    handle_impossible_answer=True)['answer']
  if answer_pred:
    preds.append(1)
  else:
    preds.append(0)

100%|██████████| 6216/6216 [01:14<00:00, 83.81it/s]


In [69]:
copy_test_df = test_df.copy()

In [70]:
copy_test_df["answer_pred"] = preds

In [75]:
copy_test_df[copy_test_df["id"]=="6e6b4d4013d14e0f8b679d14ca5ec882"]

Unnamed: 0,context,question,answers,relation_id,relation_gt_id,id,answer_pred
0,"As of Nov. 27, CRM stock trades below that ent...",Which company was aquired by CRM?,"{'answer_start': [15], 'text': ['CRM']}",1,1,6e6b4d4013d14e0f8b679d14ca5ec882,1
1,"As of Nov. 27, CRM stock trades below that ent...",Which company has aquired Slack?,"{'answer_start': [125], 'text': ['Slack']}",1,1,6e6b4d4013d14e0f8b679d14ca5ec882,1
2,"As of Nov. 27, CRM stock trades below that ent...",Who invested in CRM?,"{'answer_start': [], 'text': []}",2,1,6e6b4d4013d14e0f8b679d14ca5ec882,0
3,"As of Nov. 27, CRM stock trades below that ent...",Which company has received investition from Sl...,"{'answer_start': [], 'text': []}",2,1,6e6b4d4013d14e0f8b679d14ca5ec882,0


In [88]:
y_true = []
y_pred = []
for id in copy_test_df["id"].unique():
  # 7 is number of relation, we fill a slots
  result = np.zeros(7)
  result_gt = np.zeros(7)
  _df = copy_test_df[copy_test_df["id"]==id]
  for i, row in _df.iterrows():
    if len(row["answers"]["answer_start"])>0:
      result_gt[row["relation_id"]] = 1
    result[row["relation_id"]] = row["answer_pred"]
  y_pred.append(result)
  y_true.append(result_gt)

In [96]:
y_true[42]

array([0., 1., 0., 0., 0., 0., 0.])

In [95]:
y_pred[42]

array([0., 1., 0., 0., 0., 0., 0.])

In [89]:
from sklearn.metrics import *

In [92]:
f1_score(y_true, y_pred, average='micro')

0.9987282746926665

In [93]:
f1_score(y_true, y_pred, average='macro')

0.9985772908187747

In [97]:
accuracy_score(y_true, y_pred)

0.9974597798475868