<a href="https://colab.research.google.com/github/morleyd/morleyd.github.io/blob/master/David_Morley_Riley_Challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Riley Challenge
David C. Morley

This work seeks to apply the [TACRED relation classification dataset](https://nlp.stanford.edu/projects/tacred/) reported in [this paper](https://arxiv.org/abs/2010.01057) to the needs of Riley. It builds off pretrained models stored oh HuggingFace's Transformers. I use the [fine-tuned model checkpoint](https://huggingface.co/studio-ousia/luke-large-finetuned-tacred) from Luke, ["{LUKE}: Deep Contextualized Entity Representations with Entity-aware Self-attention"](https://aclanthology.org/2020.emnlp-main.523) for handling the relation extraction. The NER is handled by [SpanBERT](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred).


## Setup Environment

In [None]:
# Currently, LUKE is only available on the master branch
!pip install git+https://github.com/huggingface/transformers.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-m_hnhrzt
  Running command git clone -q https://github.com/huggingface/transformers.git /tmp/pip-req-build-m_hnhrzt
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone


In [None]:
import json
import torch
from tqdm import trange
from transformers import LukeTokenizer, LukeForEntityPairClassification
from transformers import AutoTokenizer, AutoModel

## Loading the dataset
Download from source

In [None]:
!gdown --id 1-BJPEGPCkcEBgqlEq0WqvwECbOqM0bH3
!tar xvzf tacred_LDC2018T24.tgz
# Clean up.
!rm tacred_LDC2018T24.tgz

Downloading...
From: https://drive.google.com/uc?id=1-BJPEGPCkcEBgqlEq0WqvwECbOqM0bH3
To: /content/tacred_LDC2018T24.tgz
100% 62.1M/62.1M [00:00<00:00, 208MB/s]
tacred/
tacred/data/
tacred/data/conll/
tacred/data/conll/dev.conll
tacred/data/conll/test.conll
tacred/data/conll/train.conll
tacred/data/gold/
tacred/data/gold/test.gold
tacred/data/gold/dev.gold
tacred/data/gold/train.gold
tacred/data/json/
tacred/data/json/test.json
tacred/data/json/dev.json
tacred/data/json/train.json
tacred/docs/
tacred/docs/tacred_stats.tsv
tacred/docs/README.md
tacred/docs/file.tbl
tacred/docs/zhang2017tacred.pdf
tacred/tools/
tacred/tools/generate_json.py
tacred/tools/score.py
tacred/index.html


### Apply the patch from [TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task](https://github.com/DFKI-NLP/tacrev)

In [None]:
!git clone https://github.com/DFKI-NLP/tacrev
# !pip install -r tacrev/requirements.txt  # only necessary for notebooks

fatal: destination path 'tacrev' already exists and is not an empty directory.


In [None]:
!python tacrev/scripts/apply_tacred_patch.py \
  --dataset-file ./tacred/data/json/dev.json \
  --patch-file ./tacrev/patch/dev_patch.json \
  --output-file ./tacred/data/json/dev_rev.json

!python tacrev/scripts/apply_tacred_patch.py \
  --dataset-file ./tacred/data/json/test.json \
  --patch-file ./tacrev/patch/test_patch.json \
  --output-file ./tacred/data/json/test_rev.json

06/17/2022 13:08:22 - INFO - __main__ - Number of examples in TACRED dataset: 22631
06/17/2022 13:08:22 - INFO - __main__ - Number of unique relations in TACRED dataset: 42
06/17/2022 13:08:22 - INFO - __main__ - Relation counts in TACRED dataset: [('no_relation', 17195), ('per:title', 919), ('org:top_members/employees', 534), ('per:employee_of', 375), ('org:alternate_names', 338), ('per:age', 243), ('per:countries_of_residence', 226), ('per:origin', 210), ('per:date_of_death', 206), ('per:cities_of_residence', 179), ('org:country_of_headquarters', 177), ('per:cause_of_death', 168), ('per:spouse', 159), ('per:city_of_death', 118), ('org:subsidiaries', 113), ('org:city_of_headquarters', 109), ('per:charges', 105), ('per:children', 99), ('org:parents', 96), ('org:website', 86), ('org:members', 85), ('per:other_family', 80), ('org:founded_by', 76), ('per:stateorprovinces_of_residence', 72), ('org:stateorprovince_of_headquarters', 70), ('per:parents', 56), ('org:shareholders', 55), ('per:r

### Process Data

Modified from [Luke's source code](https://github.com/studio-ousia/luke/blob/master/examples/relation_classification/reader.py)

In [None]:
def load_examples(dataset_file):
    with open(dataset_file, "r") as f:
        data = json.load(f)

    examples = []
    for i, item in enumerate(data):
        tokens = item["token"]
        token_spans = dict(
            subj=(item["subj_start"], item["subj_end"] + 1),
            obj=(item["obj_start"], item["obj_end"] + 1)
        )

        if token_spans["subj"][0] < token_spans["obj"][0]:
            entity_order = ("subj", "obj")
        else:
            entity_order = ("obj", "subj")

        text = ""
        cur = 0
        char_spans = {}
        for target_entity in entity_order:
            token_span = token_spans[target_entity]
            text += " ".join(tokens[cur : token_span[0]])
            if text:
                text += " "
            char_start = len(text)
            text += " ".join(tokens[token_span[0] : token_span[1]])
            char_end = len(text)
            char_spans[target_entity] = (char_start, char_end)
            text += " "
            cur = token_span[1]
        text += " ".join(tokens[cur:])
        text = text.rstrip()

        examples.append(dict(
            text=text,
            entity_spans=[tuple(char_spans["subj"]), tuple(char_spans["obj"])],
            label=item["relation"]
        ))

    return examples

In [None]:
test_examples = load_examples("/content/tacred/data/json/test_rev.json")

In [None]:
# dataset_file = "train.json"
# with open(dataset_file, "r") as f:
#     data = json.load(f)

## Loading the fine-tuned model and tokenizer


### First get the relation extraction model
We construct the model and tokenizer using the [fine-tuned model checkpoint](https://huggingface.co/studio-ousia/luke-large-finetuned-tacred).

In [None]:
# Load the model checkpoint
rel_model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
rel_model.eval()
rel_model.to("cuda")

# Load the tokenizer
rel_tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")

Some weights of the model checkpoint at studio-ousia/luke-large-finetuned-tacred were not used when initializing LukeForEntityPairClassification: ['luke.embeddings.position_ids']
- This IS expected if you are initializing LukeForEntityPairClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LukeForEntityPairClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### Now get the NER model
We construct the model and tokenizer using the [fine-tuned model checkpoint](https://huggingface.co/mrm8488/spanbert-large-finetuned-tacred).

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

ner_tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER-uncased")
ner_model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER-uncased")
ner_model.eval()
ner_model.to("cuda")

ner = pipeline("ner", model=ner_model, tokenizer=ner_tokenizer, device=0)

## Putting both models together

In [None]:
from itertools import permutations
from collections import namedtuple
import numpy as np
from typing import List, Dict, Tuple

In [None]:
Entity = namedtuple('Entity', ['e_type', 'span', 'text', 'score'])
SPECIAL_TOKENS = {'FLA': '','LCB':'{', 'LRB':'(', 'LSB':'[', 'RCB':'}', 'RRB':')', 'RSB':']'}

mean = lambda x: sum(x) / len(x)

def clean_text(input_text: str) -> str:
    for k,v in SPECIAL_TOKENS.items():
        input_text = input_text.replace(f"-{k}-",v)
    return input_text

def merge_ents(ner_results: List[Dict]) -> List[Dict]:
    ner_results = sorted(ner_results, key=lambda x: x['start'], reverse=False)
    prev_e_type = ''
    out = [{'entity':'', 'end': 0, 'start': 1, 'score':0, 'word':''}]
    for entity in ner_results:
        try:
            e_loc, e_type = entity['entity'].split('-')
            entity['entity'] = e_type
            if e_loc == 'B':
                out.append(entity)
            elif e_loc == 'I':
                out[-1]['end'] = entity['end']
                out[-1]['word'] = ' '.join([out[-1]['word'], entity['word']])
                out[-1]['score'] = mean([entity['score'], out[-1]['score']])
            out[-1]['word'] = out[-1]['word'].replace(' ##', '')
            if out[-1]['word'] in SPECIAL_TOKENS.keys():
                out.pop()
        except ValueError:
            if entity['end'] > out[-1]['end']:
                out.append(entity)
    return out

def ent_objects(entities: List[Dict]) -> List[Entity]:
    out = []
    for e in entities:
        out.append(Entity(e['entity'], (e['start'], e['end']), e['word'], e['score']))
    return out

def ent_spans(entities: List[Dict]) -> List[Tuple]:
    out = []
    for e in entities:
        if e['end'] - e['start'] > 1:
            out.append((e['start'], e['end']))
    return out

def get_span(text: str, span: Tuple[int]) -> str:
    return text[span[0]: span[1]]

def extract_relations(text: str, ner_results: List[Dict]) -> List[Dict]:
    out = []
    entities = ent_objects(merge_ents(ner_results))
    for e1, e2 in permutations(entities, 2):
        inputs = rel_tokenizer(text, entity_spans=[e1.span, e2.span], return_tensors="pt")
        inputs = inputs.to("cuda")
        outputs = rel_model(**inputs)

        predicted_class_idx = outputs.logits.argmax(-1).item()
        predicted_label = rel_model.config.id2label[predicted_class_idx]
        out.append({'text': text, 'entity_spans': [e1.span, e2.span], 'label': predicted_label})
    return out

def get_all_spans(ner_results: List[List[Dict]]):  
    pred_spans = []
    for result in ner_results:
        entities = ent_spans(merge_ents(result))
        perms = [list(perm) for perm in permutations(entities, 2)]
        pred_spans.append(perms)
    return pred_spans 

## Measuring performance

### Base Model
This reproduces the evaluation reported in the [original paper](https://arxiv.org/abs/2010.01057) is successfully reproduced. It classifies relations between entity pairs in the test set and measures the performance of the model.

In [None]:
batch_size = 128

num_predicted = 0
num_gold = 0
num_correct = 0

for batch_start_idx in trange(0, len(test_examples), batch_size):
    batch_examples = test_examples[batch_start_idx:batch_start_idx + batch_size]
    texts = [example["text"] for example in batch_examples]
    entity_spans = [example["entity_spans"] for example in batch_examples]
    gold_labels = [example["label"] for example in batch_examples]
    
    inputs = rel_tokenizer(texts, entity_spans=entity_spans, return_tensors="pt", padding=True)
    inputs = inputs.to("cuda")
    with torch.no_grad():
        outputs = rel_model(**inputs)
    predicted_indices = outputs.logits.argmax(-1)
    predicted_labels = [rel_model.config.id2label[index.item()] for index in predicted_indices]
    for predicted_label, gold_label in zip(predicted_labels, gold_labels):
        if predicted_label != "no_relation":
            num_predicted += 1
        if gold_label != "no_relation":
            num_gold += 1
            if predicted_label == gold_label:
                num_correct += 1

precision = num_correct / num_predicted
recall = num_correct / num_gold
f1 = 2 * precision * recall / (precision + recall)

print(f"\n\nprecision: {precision} recall: {recall} f1: {f1}")

100%|██████████| 122/122 [07:42<00:00,  3.79s/it]



precision: 0.7661131438221221 recall: 0.8715978226064681 f1: 0.8154583582983823





### My Model
The main difference here is that I mimic the data that you'll provide me by not including the spans of the entities. Instead, I use NER to predict the location of the entities and return all the relations between those entities. This has the benefit of giving more than one possible relation per utterance. Unfortunately, this also makes it have different results from the originial test set. I mitigate this by comparing the overlap of the spans that I predict with those provided. If the closest matching span (with at least half overlap) has the same relation, I count it as a positive example.

In [None]:
def span_overlap(s1: Tuple[int], s2: Tuple[int]) -> float:
    # print(repr(s1))
    b1, e1 = s1
    b2, e2 = s2
    # Check no overlap
    if e1 < b2 or b1 > e2:
        return 0.
    else:
        return abs(1 - abs((e2 - e1 + b2 - b1) / (max(e2, e1) - min(b2, b1))))

def matching_spans(pred_spans: List[Dict], gold_spans: List[Tuple], k: int) -> List[int]:
    """For a list of predicted span dictionaries (output of extract_relations)
    return the indices of those which most overlap with their gold counterparts
    in order of most similar to least
    """
    matches = [i for i, pred in enumerate(pred_spans) if span_overlap(pred[k], gold_spans[k]) > .5]
    # return sorted(matches, key=lambda x: span_overlap(pred_spans[x]['entity_spans'][k], gold_spans[k]), reverse=True)
    return matches

def get_pred_label(pred_spans: Tuple[Tuple], gold_spans: List[Tuple], text):
    global TICKER
    beg_spans = matching_spans(pred_spans, gold_spans, 0)
    end_spans = matching_spans(pred_spans, gold_spans, 1)
    if not beg_spans or not end_spans:
        TICKER += 1
        return ["no_relation"]
    else:
        intersection = np.intersect1d(beg_spans, end_spans)
        filtered_spans = [list(pred_spans[i]) for i in intersection]
        n = len(filtered_spans)

        try:
            inputs = rel_tokenizer([text]*n, entity_spans=filtered_spans, return_tensors="pt",padding=True,truncation=True)
            inputs = inputs.to("cuda")
            with torch.no_grad():
                outputs = rel_model(**inputs)

            predicted_indices = outputs.logits.argmax(-1)
            predicted_labels = [rel_model.config.id2label[index.item()] for index in predicted_indices]
        except ValueError:
            print(filtered_spans)
            predicted_labels = ["no_relation"]
        return predicted_labels

### First Model Eval

Note the high precision and low recall. This is caused by my method of extracting spans. I only look for names, organizations and locations while TACRED's data set looks for more types of relations. However, those that I do select, are selected correctly. This number is taken only out of the samples that had a relation (~%20 of the total ammount). My true accuracy was actually much higher.

In [None]:
TICKER = 0
batch_size = 128

num_predicted = 0
num_gold = 0
num_correct = 0

total = 0
num_same = 0

for batch_start_idx in trange(0, len(test_examples), batch_size):
    batch_examples = test_examples[batch_start_idx:batch_start_idx + batch_size]
    texts = [example["text"] for example in batch_examples]
    entity_spans = [example["entity_spans"] for example in batch_examples]
    gold_labels = [example["label"] for example in batch_examples]

    ner_results = ner(texts)
    new_spans = get_all_spans(ner_results)

    for pred_spans, gold_label, gold_span, text in zip(new_spans, gold_labels, entity_spans, texts):
        total += 1
        pred_labels = get_pred_label(pred_spans, gold_span, text)
        if gold_label in pred_labels:
            num_same += 1
        if gold_label != "no_relation":
            num_gold += 1
            if gold_label in pred_labels:
                num_correct += 1
        num_predicted += sum(1 for label in pred_labels if label != "no_relation")

precision = num_correct / num_predicted
recall = num_correct / num_gold
f1 = 2 * precision * recall / (precision + recall)

print(f"\n\nprecision: {precision} recall: {recall} f1: {f1}")
TICKER

100%|██████████| 122/122 [06:00<00:00,  2.95s/it]



precision: 0.7490774907749077 recall: 0.45501120717259047 f1: 0.5661354581673307





9846

Approx. 86.5% of my predictions matched the gold sample even though almost 2/3 of my predictions defaulted to no relation because the spans I predicted weren't the ones that were annotated (correctly or otherwise). This is just an example of why this is a poor dataset.

In [None]:
num_same / total, TICKER/total

(0.8651105809529951, 0.6348571797021084)

## Sandbox

In [None]:
import textwrap

In [None]:
wrapper = textwrap.TextWrapper(width=100)

def print_relations(text):
    if not isinstance(text, list):
        text = [text]
    ner_results = ner(text)
    new_spans = get_all_spans(ner_results)
    for doc, result in zip(text, new_spans):
        found_relation = False
        print('-'*100)
        print(wrapper.fill(clean_text(doc)))
        print()
        
        n = len(result)
        inputs = rel_tokenizer([doc]*n, entity_spans=result, return_tensors="pt", padding=True)
        inputs = inputs.to("cuda")
        with torch.no_grad():
            outputs = rel_model(**inputs)

        predicted_indices = outputs.logits.argmax(-1)
        predicted_labels = [rel_model.config.id2label[index.item()] for index in predicted_indices]
        for span, label in zip(result, predicted_labels):
            if label != 'no_relation':
                found_relation = True
                print(f"{repr(get_span(doc, span[0]))} -> {repr(get_span(doc, span[1]))}:\t{label}")

        if not found_relation:
            print("***No Relations Found***")

In [None]:
texts = [
        'Tom Thabane resigned in October last year to form the All Basotho Convention -LRB- ABC -RRB- , crossing the floor with 17 members of parliament , causing constitutional monarch King Letsie III to dissolve parliament and call the snap election .',
        "I just went for a jog and ran into Ben Fischer who is the chief revenue officer of XYZ company.",
        "Beyoncé lives in Los Angeles.",
        "This was among a batch of paperback Oxford World 's Classics that I was given as a reward for reading and commenting on a manuscript for OUP .",
        "The latest investigation was authorized after the Supreme Court in 2007 found DCC and its founder , Jim Flavin , guilty of selling DCC 's -LRB- EURO -RRB- 106 million -LRB- then $ 130 million -RRB- stake in Fyffes after Flavin --also a Fyffes director at the time -- received inside information about bad Fyffes news in the pipeline .",
        "I just went for a jog and ran into Jimmy who just started at XYZ company.",
        ]

print_relations(texts)

----------------------------------------------------------------------------------------------------
Tom Thabane resigned in October last year to form the All Basotho Convention ( ABC ) , crossing the
floor with 17 members of parliament , causing constitutional monarch King Letsie III to dissolve
parliament and call the snap election .

'Tom Thabane' -> 'All Basotho':	per:employee_of
'All Basotho' -> 'Tom Thabane':	org:founded_by
'All Basotho' -> 'LRB':	org:alternate_names
'All Basotho' -> 'ABC':	org:alternate_names
'All Basotho' -> 'RRB':	org:alternate_names
'LRB' -> 'All Basotho':	org:alternate_names
'ABC' -> 'Tom Thabane':	org:founded_by
----------------------------------------------------------------------------------------------------
I just went for a jog and ran into Ben Fischer who is the chief revenue officer of XYZ company.

'Ben Fischer' -> 'XYZ':	per:employee_of
'XYZ' -> 'Ben Fischer':	org:top_members/employees
---------------------------------------------------------------

In [None]:
print_relations("I met nate cohen and he's the president of the united states of America and his wife joyce is an engineer at google")

----------------------------------------------------------------------------------------------------
I met nate cohen and he's the president of the united states of America and his wife joyce is an
engineer at google

'nate cohen' -> 'united states of America':	per:countries_of_residence
'nate cohen' -> 'joyce':	per:spouse
'united states of America' -> 'nate cohen':	org:top_members/employees
'united states of America' -> 'joyce':	org:top_members/employees
'joyce' -> 'nate cohen':	per:spouse
'joyce' -> 'united states of America':	per:countries_of_residence
'joyce' -> 'google':	per:employee_of


In [None]:
def extract_span(doc, word):
    start = doc.index(word)
    end = start + len(word)
    return start, end

def check_relation(text, e1, e2):
    e1_index = extract_span(text, e1)
    e2_index = extract_span(text, e2)
    inputs = rel_tokenizer(text, entity_spans=[e1_index, e2_index], return_tensors="pt")
    inputs = inputs.to("cuda")
    with torch.no_grad():
        outputs = rel_model(**inputs)

    predicted_index = outputs.logits.argmax(-1)
    return rel_model.config.id2label[predicted_index.item()]

In [None]:
text = "I just went for a jog and ran into Ben Fischer who is the chief revenue officer of XYZ company."
check_relation(text, "Ben Fischer", "chief revenue officer")

'per:title'

In [None]:
text = "I went jogging with mike tyson who works as a boxer at Sony and he lives in waltham and travels to miami"
check_relation(text, "mike tyson", "boxer")

'per:title'

# Ideas for Future Experimentation

## Biographical

Could be interesting to combine these results with a model generated from [Biographical: A Semi-Supervised Relation Extraction Dataset](https://plumaj.github.io/biographical/). This is a dataset annotated for personal information, including: birthdate, birthplace, deathdate, deathplace, occupation, ofParent, educatedAt, hasChild, sibling, other.



In [None]:
!gdown --id 1i2Gz_evbO0uXAluoKXOG3C0yrcZvauUs
# !tar xvzf tacred_LDC2018T24.tgz

## Question Answering (QA)
The paper [Supervised Relation Classification as Two-way Span-Prediction](https://arxiv.org/pdf/2010.04829.pdf) changed the paradigm to predicting spans in based on a QA context. Given a document, they merely ask a predefined set of questions, e.g. What is his role in the company? What company does he work for?, etc. This combined with NER could provide for a very flexible set of relations extracted.

## Knowledge Graph
Nodes with similar relations can be connected together or nodes with equal entities can be used to predict the presence of other relations. For example, a graph that has two people working at the same company could suggest a coworker relation or propose insights about the two people together.

## Dygie

Implementing the paper "Entity, Relation, and Event Extraction with Contextualized Span Representations" [DYGIE](https://github.com/dwadden/dygiepp.git) provides an end-to-end framework (through Spacy) with models focused on span prediction as a step to relation extraction.