<a href="https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_tacred.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Reproducing experimental results of LUKE on TACRED Using Hugging Face Transformers

This notebook shows how to reproduce the state-of-the-art results on the [TACRED relation classification dataset](https://nlp.stanford.edu/projects/tacred/) reported in [this paper](https://arxiv.org/abs/2010.01057) using the Trasnsformers library and the [fine-tuned model checkpoint](https://huggingface.co/studio-ousia/luke-large-finetuned-tacred) available on the Model Hub.
The source code used in the experiments is also available [here](https://github.com/studio-ousia/luke/tree/master/examples/relation_classification).

There are two other related notebooks:

* [Reproducing experimental results of LUKE on Open Entity Using Hugging Face Transformers](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_open_entity.ipynb)
* [Reproducing experimental results of LUKE on CoNLL-2003 Using Hugging Face Transformers](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb)

In [1]:
# Currently, LUKE is only available on the master branch
!pip install git+https://github.com/huggingface/transformers.git

Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-l7qi_wn0
  Running command git clone -q https://github.com/huggingface/transformers.git /tmp/pip-req-build-l7qi_wn0
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)
[K     |████████████████████████████████| 901kB 19.2MB/s 
Collecting tokenizers<0.11,>=0.10.1
[?25l  Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)
[K     |████████████████████████████████| 3.3MB 53.5MB/s 
[?25hCollecting huggingface-hub==0.0.8
  Do

In [2]:
import json
import torch
from tqdm import trange
from transformers import LukeTokenizer, LukeForEntityPairClassification

## Loading the dataset

**The TACRED dataset is not publicly available.** In the cell below, we copy the test set (test.json) uploaded to our Google Drive to the working directory. **Please obtain the dataset by following instructuions on the [TACRED web site](https://nlp.stanford.edu/projects/tacred/) and replace the cell below to place the test.json file in the current directory.**

In [3]:
from google.colab import drive
drive.mount('/content/drive')
!cp /content/drive/MyDrive/projects/luke/data/tacred/test.json test.json

Mounted at /content/drive


In [4]:
def load_examples(dataset_file):
    with open(dataset_file, "r") as f:
        data = json.load(f)

    examples = []
    for i, item in enumerate(data):
        tokens = item["token"]
        token_spans = dict(
            subj=(item["subj_start"], item["subj_end"] + 1),
            obj=(item["obj_start"], item["obj_end"] + 1)
        )

        if token_spans["subj"][0] < token_spans["obj"][0]:
            entity_order = ("subj", "obj")
        else:
            entity_order = ("obj", "subj")

        text = ""
        cur = 0
        char_spans = {}
        for target_entity in entity_order:
            token_span = token_spans[target_entity]
            text += " ".join(tokens[cur : token_span[0]])
            if text:
                text += " "
            char_start = len(text)
            text += " ".join(tokens[token_span[0] : token_span[1]])
            char_end = len(text)
            char_spans[target_entity] = (char_start, char_end)
            text += " "
            cur = token_span[1]
        text += " ".join(tokens[cur:])
        text = text.rstrip()

        examples.append(dict(
            text=text,
            entity_spans=[tuple(char_spans["subj"]), tuple(char_spans["obj"])],
            label=item["relation"]
        ))

    return examples

In [5]:
test_examples = load_examples("test.json")

## Loading the fine-tuned model and tokenizer

We construct the model and tokenizer using the [fine-tuned model checkpoint](https://huggingface.co/studio-ousia/luke-large-finetuned-tacred).

In [6]:
# Load the model checkpoint
model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
model.eval()
model.to("cuda")

# Load the tokenizer
tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=3299.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=2239466725.0, style=ProgressStyle(descr…




Some weights of the model checkpoint at studio-ousia/luke-large-finetuned-tacred were not used when initializing LukeForEntityPairClassification: ['luke.embeddings.position_ids']
- This IS expected if you are initializing LukeForEntityPairClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LukeForEntityPairClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=898822.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=15287192.0, style=ProgressStyle(descrip…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=33.0, style=ProgressStyle(description_w…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1000.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1691.0, style=ProgressStyle(description…




## Measuring performance

We classify relations between entity pairs in the test set and measure the performance of the model.
The performance reported in the [original paper](https://arxiv.org/abs/2010.01057) is successfully reproduced.

In [7]:
batch_size = 128

num_predicted = 0
num_gold = 0
num_correct = 0

for batch_start_idx in trange(0, len(test_examples), batch_size):
    batch_examples = test_examples[batch_start_idx:batch_start_idx + batch_size]
    texts = [example["text"] for example in batch_examples]
    entity_spans = [example["entity_spans"] for example in batch_examples]
    gold_labels = [example["label"] for example in batch_examples]

    inputs = tokenizer(texts, entity_spans=entity_spans, return_tensors="pt", padding=True)
    inputs = inputs.to("cuda")
    with torch.no_grad():
        outputs = model(**inputs)
    predicted_indices = outputs.logits.argmax(-1)
    predicted_labels = [model.config.id2label[index.item()] for index in predicted_indices]
    for predicted_label, gold_label in zip(predicted_labels, gold_labels):
        if predicted_label != "no_relation":
            num_predicted += 1
        if gold_label != "no_relation":
            num_gold += 1
            if predicted_label == gold_label:
                num_correct += 1

precision = num_correct / num_predicted
recall = num_correct / num_gold
f1 = 2 * precision * recall / (precision + recall)

print(f"\n\nprecision: {precision} recall: {recall} f1: {f1}")

100%|██████████| 122/122 [07:24<00:00,  3.64s/it]



precision: 0.7034638130104196 recall: 0.7512781954887218 f1: 0.7265852239674229





## Detecting a relation between a pair of entities

Finally, we detect a relation between a pair of entities in a text using the [fine-tuned model](https://huggingface.co/studio-ousia/luke-large-finetuned-tacred).

In [8]:
text = "Beyoncé lives in Los Angeles."
entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"

inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
inputs = inputs.to("cuda")
outputs = model(**inputs)

predicted_class_idx = outputs.logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

Predicted class: per:cities_of_residence
