# Testing FewRel few-shot relation extraction

* goal is to make it like the demo at http://opennre.thunlp.ai/#/fewshot_re - in the demo it seems to work pretty well
* based on https://github.com/thunlp/FewRel
* use torch 1.3.1
* copy val_wiki.json as test_wiki.json in the data folder to make it work
* python requirements same as opennre. use the requirements.txt from there.
* the checkpoint files are very big, so they aren't in the repository. (5-way 1-shot) checkpoint is at https://drive.google.com/file/d/1yiz3q3xNz-llsY55g5OdodxiH1RThYuz/view?usp=sharing
* 5-way 3-shot checkpoint: https://drive.google.com/open?id=1mzeMyu4yjcLXWSi1CjFEvsrtE6EaAqv7
* 5-way 3-shot with N/A checkpoint: https://drive.google.com/open?id=1C9tn_vpFf4tDcopZTZwBN2lwnqc3sKtF
* 6-way 4-shot with N/A checkpoint: https://drive.google.com/open?id=1hdl81PcWIaA6CyJw38VGTcRs6MQ1ipfH
* can't train on prof song's gpus, not enough vram. the hpc computers have enough vram, but i couldn't get pytorch to work on them, it uses a 10 year old version of linux.... 
* however testing using this code does actually work on the gpu (but you would need to use cuda pytorch (refer to pytorch.org) and whereever you are using model do "model = model.cuda()", and also "tensor=tensor.cuda()"). refer to test_script.py
* on my laptop cpu this code takes about 2 seconds per query, on the gpu it's a lot faster, runs in under 1 second. the speed seems fine, but if there are a lot of queries it will take a decent amount of time to run.
* we should probably make our own benchmark in order to compare different models to each other. 
* the maximum length is in terms of the number of bert tokens, not in terms of the number of characters in the sentence. increasing it seems to work fine
* make sure that csv files use utf 8 encoding. open it in sublime and resave it to ensure this.
* running in macos seems to not be optimized correctly in pytorch, it makes the laptop basically unusable. *I highly recommend that you only run this code on the server. you can follow the directions in the knowcomp manual to make jupyter lab work correctly from the server*
* I investigated if there would be any issues because the maximum number of bert tokens allowed is 128. I don't think that there would be many problems due to this. From my testing, the 128 token limit is only reached with extremely long sentences, which are definitely not the norm anyway.
* make sure that there are no spaces in unexpected places, because they can cause problems since I am assuming that the strings match up exactly always.
* also you should only pass 1 sentence at a time.
* tried out the 5-1 and 5-3-na3 models on the more realistic dataset - the 5-3 and 5-3-na3 models are the ones which work the best for sure. 5-3-na3 may be slightly better than the other one imo. at this point both seem to have only around 50% accuracies.
* which entity is the head and which one is the tail definitely matters, the results are different if they are swapped. The softmax results can also be significantly different. I'm not sure if this is due to how the model was trained or due to the support dataset.
* also if there are multiple relations in the same sentence additional processing will have to be done to figure out the actual results. but at least in this model you have a chance to actually figure out the different relations, if you use a classifier model you definitely won't be able to.
* the choice of support sentences definitely seems to have a significant effect on the output of the model. simpler sentences seem to work better.
* also which words are used as the head and the tail probably matter. one problem with the spacy ner model is that it doesn't detect long phrases as entities. ex. in "Trump had opposed President Obama's invasion of Iraq", i think i would want the entities detected to be "Trump" and "President Obama's invasion of Iraq", but instead the model detects "Trump", "Obama" and "Iraq" - this probably affects the accuracy of the output. I could try to test this.....
* need to test how much things like tenses and stuff affect the output. 
* i was actually accidently running it on the cpu on the server, it runs pretty fast on the cpu, like a quarter of a second per prediction. it doesn't actually seem to run much faster on the gpu. However, when running in this jupyter notebook the gpu sometimes runs  out of memory.... which is very weird.... this didn't happen when using the script for some reason. it outputs some predictions, then it runs out of memory and crashes.... i guess we'll just have to use the cpu.
* with 4-shot rather than 3-shot it seems to work better. with 5-shot it seems to work about the same as 4-shot. the na3 model definitely seems to work better with 5-shot.
* noticed that using the exact same word as in the relation support dataset doesn't seem to matter much, synonyms seem to produce pretty much the same output tensor.
* it's important that the head and tail are the name of the full entity - ex. in "Trump hacked Obama's laptop", the tail should be "Obama's laptop", not "Obama". Then it works A LOT better, and it is usually accurate.
* the 5-3-na3 model tested using 5 way 5 shot seems to work pretty damn well.....

## The current code is at the very bottom. the rest of the code is just for testing purposes

In [None]:
checkpoint_path = "checkpoint/pair-bert-train_wiki-val_wiki-5-1.pth.tar"
bert_pretrained_checkpoint = 'bert-base-uncased'
max_length = 128

In [None]:
from fewshot_re_kit.data_loader import FewRelDatasetPair, get_loader_pair
from fewshot_re_kit.framework import FewShotREFramework
from fewshot_re_kit.sentence_encoder import BERTPAIRSentenceEncoder
from models.pair import Pair
import os
import torch

from spacy.tokenizer import Tokenizer
from spacy.lang.en import English
import spacy
import neuralcoref


In [None]:
sentence_encoder = BERTPAIRSentenceEncoder(
                    bert_pretrained_checkpoint,
                    max_length)

In [None]:
# meow_loader = get_loader_pair('val_wiki', sentence_encoder,
#                 N=5, K=1, Q=1, na_rate=0, batch_size=1, encoder_name='bert')

val_data_loader = iter(FewRelDatasetPair('val_wiki', sentence_encoder, N=5, K=1, Q=1, na_rate=0, root='./data', encoder_name='bert'))


In [None]:
model = Pair(sentence_encoder, hidden_size=768)
if torch.cuda.is_available():
    model = model.cuda()

In [None]:
type(next(val_data_loader)[0]['word'])

In [None]:
def __load_model__(ckpt):
    '''
    ckpt: Path of the checkpoint
    return: Checkpoint dict
    '''
    if os.path.isfile(ckpt):
        checkpoint = torch.load(ckpt)
        print("Successfully loaded checkpoint '%s'" % ckpt)
        return checkpoint
    else:
        raise Exception("No checkpoint found at '%s'" % ckpt)

        
def item(x):
    '''
    PyTorch before and after 0.4
    '''
    torch_version = torch.__version__.split('.')
    if int(torch_version[0]) == 0 and int(torch_version[1]) < 4:
        return x[0]
    else:
        return x.item()
    
def bert_tokenize(tokens, head_indices, tail_indices):
    word = sentence_encoder.tokenize(tokens,
            head_indices,
            tail_indices)
    return word

In [None]:
nlp = spacy.load("en_core_web_sm")
# Create a Tokenizer with the default settings for English
# including punctuation rules and exceptions
tokenizer = nlp.Defaults.create_tokenizer(nlp)
list(map(str, tokenizer("""hello meow. meow is donald trump's friend""")))

In [None]:
# loading from the model checkpoint state

model.eval()
state_dict = __load_model__(checkpoint_path)['state_dict']
own_state = model.state_dict()
for name, param in state_dict.items():
    if name not in own_state:
        continue
    own_state[name].copy_(param)

In [None]:
#evaluating on the wikidata dataset, which is what they have already implemented.

N = 5
K = 1
Q = 1
na_rate = 0
with torch.no_grad():
    for it in range(10):
        batch, label = next(val_data_loader)
        label = torch.tensor(label)
        batch['word'] = torch.stack(batch['word'])
        batch['seg'] = torch.stack(batch['seg'])
        batch['mask'] = torch.stack(batch['mask'])
        logits, pred = model(batch, N, K, Q * N + Q * na_rate)
        print(pred, label)
        right = model.accuracy(pred, label)
        print(item(right.data))

In [None]:
N = 5
K = 2
Q = 1
na_rate = 0
#names have to be upper case, otherwise they are not detected by NER
example_relation_data = [
    {'name':'love',
    'examples':[
        {'sentence':'Meow loves Mo', 'head':'Meow', 'tail':'Mo'},
        {'sentence':'Tom is in love with Jull', 'head':'Tom', 'tail':'Jull'}
    ]},
    {'name':'hate',
    'examples':[
        {'sentence':'Trump hates the Mooch', 'head':'Trump', 'tail':'Mooch'},
        {'sentence':'Ivanka and Jared dislike each other intensely', 'head':'Ivanka', 'tail':'Jared'}
    ]},
    {'name':'spouse',
    'examples':[
        {'sentence':'Trump is married to Ivanka', 'head':'Trump', 'tail':'Ivanka'},
        {'sentence':"Bill went out with his wife Jill on saturday", 'head':'Bill', 'tail':'Jill'}
    ]},
        {'name':'insult',
    'examples':[
        {'sentence':'The President said that Michael Cohen is a rat', 'head':'The President', 'tail':'Michael'},
        {'sentence':'Meow and Tom threw jabs at each other', 'head':'Meow', 'tail':'Tom'}
    ]},
        {'name':'capital',
    'examples':[
        {'sentence':'Austin is the capital of Texas', 'head':'Austin', 'tail':'Texas'},
        {'sentence':"the capital of China is located in Beijing", 'head':'Beijing', 'tail':"China"}
    ]}
    
]

queries = [{
    'sentence':'Cohen and Fluffy are very loving to each other','head':'Cohen','tail':'Fluffy'
},
{
    'sentence':"""The US's capital is Washington""",'head':'Washington','tail':'US'
}]

## using already specified entities in the query sentence

In [None]:
nlp = spacy.load("en_core_web_sm")

def spacy_tokenize(sentence):
    return list(map(str, nlp(sentence)))

max_length = 128   #max length in terms of the number of characters
for q in queries:
    fusion_set = {'word': [], 'mask': [], 'seg': []}
#     tokens = q['sentence'].split(" ")  #TODO: generalize, make it tokenize like in the example wikidata, would probably need to use some nlp library to do it
    tokens = spacy_tokenize(q['sentence'])
    tokenized_head = spacy_tokenize(q['head'])
    tokenized_tail = spacy_tokenize(q['tail'])
    head_indices = list(range(tokens.index(tokenized_head[0]), tokens.index(tokenized_head[0])+len(tokenized_head)))   #TODO: make it work with multi-word entities
    tail_indices = list(range(tokens.index(tokenized_tail[0]), tokens.index(tokenized_tail[0])+len(tokenized_tail)))
    bert_query_tokens = bert_tokenize(tokens, head_indices, tail_indices)
    for relation in example_relation_data:
        for ex in relation['examples']:
#             tokens = ex['sentence'].split(" ")  #TODO: generalize
            tokens = spacy_tokenize(ex['sentence'])
            tokenized_head = spacy_tokenize(ex['head'])
            tokenized_tail = spacy_tokenize(ex['tail'])
            head_indices = list(range(tokens.index(tokenized_head[0]), tokens.index(tokenized_head[0])+len(tokenized_head)))
            tail_indices = list(range(tokens.index(tokenized_tail[0]), tokens.index(tokenized_tail[0])+len(tokenized_tail)))
            bert_relation_example_tokens = bert_tokenize(tokens, head_indices, tail_indices)
            
            SEP = sentence_encoder.tokenizer.convert_tokens_to_ids(['[SEP]'])
            CLS = sentence_encoder.tokenizer.convert_tokens_to_ids(['[CLS]'])
            word_tensor = torch.zeros((max_length)).long()
            
            new_word = CLS + bert_relation_example_tokens + SEP + bert_query_tokens + SEP
            for i in range(min(max_length, len(new_word))):
                word_tensor[i] = new_word[i]
            mask_tensor = torch.zeros((max_length)).long()
            mask_tensor[:min(max_length, len(new_word))] = 1
            seg_tensor = torch.ones((max_length)).long()
            seg_tensor[:min(max_length, len(bert_relation_example_tokens) + 1)] = 0
            fusion_set['word'].append(word_tensor)
            fusion_set['mask'].append(mask_tensor)
            fusion_set['seg'].append(seg_tensor)
    
    fusion_set['word'] = torch.stack(fusion_set['word'])
    fusion_set['seg'] = torch.stack(fusion_set['seg'])
    fusion_set['mask'] = torch.stack(fusion_set['mask'])
    logits, pred = model(fusion_set, N, K, 1)
    print(pred, logits)
    
            
            

In [None]:
nlp = spacy.load("en_core_web_sm")
# Create a Tokenizer with the default settings for English
# including punctuation rules and exceptions
tokenizer = nlp.Defaults.create_tokenizer(nlp)
neuralcoref.add_to_pipe(nlp)

# ex = "hello Meow. Meow is Donald Trump's friend"
ex = """The US's capital is Washington"""
doc = nlp(ex)
print(doc._.coref_resolved)
doc = nlp(doc._.coref_resolved)

In [None]:
print([(X.text, X.label_) for X in doc.ents])

In [None]:
list(map(str, doc))

## using entities detected in the query sentence

In [None]:
from itertools import combinations
nlp = spacy.load("en_core_web_sm")
neuralcoref.add_to_pipe(nlp)

def spacy_tokenize(sentence):
    doc = nlp(sentence)
    return list(map(str, nlp(doc._.coref_resolved)))

def get_head_tail_pairs(sentence):
    acceptable_entity_types = ['PERSON', 'NORP', 'ORG', 'GPE', 'PRODUCT', 'EVENT', 'LAW', 'LOC', 'FAC']
    doc = nlp(sentence)
    doc = nlp(doc._.coref_resolved)
    entity_info = [(X.text, X.label_) for X in doc.ents]
    entity_info = set(map(lambda x:x[0], filter(lambda x:x[1] in acceptable_entity_types, entity_info)))

    return combinations(entity_info, 2)
    

max_length = 128   #max length in terms of the number of characters - by default it was 128, seems to work with longer lengths also though.
# the actual length of the sentence doesn't matter, only the number of bert tokens which are created. and this is managed automatically. 

for q in queries:
    fusion_set = {'word': [], 'mask': [], 'seg': []}
    tokens = spacy_tokenize(q['sentence'])
    
    
    for head, tail in get_head_tail_pairs(q['sentence']):  #iterating through all possible combinations of 2 named entities
        tokenized_head = spacy_tokenize(head)
        tokenized_tail = spacy_tokenize(tail)
        head_indices = list(range(tokens.index(tokenized_head[0]), tokens.index(tokenized_head[0])+len(tokenized_head)))   
        tail_indices = list(range(tokens.index(tokenized_tail[0]), tokens.index(tokenized_tail[0])+len(tokenized_tail)))
        bert_query_tokens = bert_tokenize(tokens, head_indices, tail_indices)
        for relation in example_relation_data:
            for ex in relation['examples']:
                tokens = spacy_tokenize(ex['sentence'])
                tokenized_head = spacy_tokenize(ex['head'])  #head and tail spelling and punctuation should match the corefered output exactly
                tokenized_tail = spacy_tokenize(ex['tail'])
                head_indices = list(range(tokens.index(tokenized_head[0]), tokens.index(tokenized_head[0])+len(tokenized_head)))
                tail_indices = list(range(tokens.index(tokenized_tail[0]), tokens.index(tokenized_tail[0])+len(tokenized_tail)))
                bert_relation_example_tokens = bert_tokenize(tokens, head_indices, tail_indices)

                SEP = sentence_encoder.tokenizer.convert_tokens_to_ids(['[SEP]'])
                CLS = sentence_encoder.tokenizer.convert_tokens_to_ids(['[CLS]'])
                word_tensor = torch.zeros((max_length)).long()

                new_word = CLS + bert_relation_example_tokens + SEP + bert_query_tokens + SEP
                for i in range(min(max_length, len(new_word))):
                    word_tensor[i] = new_word[i]
                mask_tensor = torch.zeros((max_length)).long()
                mask_tensor[:min(max_length, len(new_word))] = 1
                seg_tensor = torch.ones((max_length)).long()
                seg_tensor[:min(max_length, len(bert_relation_example_tokens) + 1)] = 0
                fusion_set['word'].append(word_tensor)
                fusion_set['mask'].append(mask_tensor)
                fusion_set['seg'].append(seg_tensor)

        fusion_set['word'] = torch.stack(fusion_set['word'])
        fusion_set['seg'] = torch.stack(fusion_set['seg'])
        fusion_set['mask'] = torch.stack(fusion_set['mask'])
        
        if torch.cuda.is_available():
            fusion_set['word'] = fusion_set['word'].cuda()
            fusion_set['seg'] = fusion_set['seg'].cuda()
            fusion_set['mask'] = fusion_set['mask'].cuda()
        
        logits, pred = model(fusion_set, N, K, 1)
        print('Sentence: \"{}\", head: \"{}\", tail: \"{}\", prediction: {}'.format(q['sentence'], head, tail, example_relation_data[pred.item()]['name']))   #TODO: handle na case, which would be out of bounds
    
    
            
            

In [None]:
from fyp_detection_functions import Detector
d = Detector(chpt_path="checkpoint/pair-bert-train_wiki-val_wiki-5-3.pth.tar")
d.run_on_sample_data()

# Testing Mueller connections dataset

In [None]:
import pandas as pd

In [None]:
#this code can be used to go through the dataset and make sure that the heads and tails are accurate.

#use utf 8 encoding if there are errors. can open the csv in sublime and resave it to use utf 8 encoding.
df = pd.read_csv("connections_Mueller_cleaned.csv")   #this csv file is heavily modified from the original, it has been cleaned to make sure that the heads and tails actually exist.
#it could be cleaned even further by getting rid of useless relations etc.
dfi = df[['sentence', 'head', 'tail', 'reldescription']].copy()

nlp = spacy.load("en_core_web_sm")    #no coref being done here, the assumption is that no coref will be done on the support/training data, only on test data.

def spacy_tokenize(sentence):
    doc = nlp(sentence)
    return list(map(str, doc))

for ind, row in dfi.iterrows():
#     if ind < 304:
#         continue
    head = row['head']
    tail = row['tail']
    sentence = row['sentence']
    
    tokens = spacy_tokenize(sentence)
    
#     print(ind)
    tokenized_head = spacy_tokenize(head)
    tokenized_tail = spacy_tokenize(tail)
    
    head_indices = None
    tail_indices = None
    for i in range(len(tokens)):
        if tokens[i] == tokenized_head[0] and tokens[i:i+len(tokenized_head)] == tokenized_head:
            head_indices = list(range(i,i+len(tokenized_head)))
            break
    for i in range(len(tokens)):
        if tokens[i] == tokenized_tail[0] and tokens[i:i+len(tokenized_tail)] == tokenized_tail:
            tail_indices = list(range(i,i+len(tokenized_tail)))
            break
    if head_indices is None or tail_indices is None:
        print(sentence)
        print(head)
        print(tail)
        raise ValueError
    

In [None]:
import pandas as pd
df = pd.read_csv("connections_Mueller_cleaned.csv")
df['reldescription'].value_counts()

In [None]:
cnt = dfi['reldescription'].value_counts()
cnt = cnt[cnt >= 3]
dfi = dfi[dfi['reldescription'].isin(cnt.index)].copy().reset_index(drop=True)
dft = dfi[dfi['reldescription'] == 'media platform']
dft.sample(3, replace=True, random_state=2025)

In [None]:
from fyp_detection_framework import DataLoader

In [None]:
DataLoader.load_relation_support_csv("connections_Mueller_cleaned.csv")

# For testing the current version - from below only

In [1]:
from fyp_detection_framework import DetectionFramework
d = DetectionFramework(ckpt_path="checkpoint/pair-bert-train_wiki-val_wiki-5-3-na3.pth.tar")
# d = DetectionFramework(ckpt_path="checkpoint/pair-bert-train_wiki-val_wiki-6-4-na3.pth.tar")
# d = DetectionFramework(ckpt_path="checkpoint/pair-bert-train_wiki-val_wiki-5-3.pth.tar")

Successfully loaded checkpoint 'checkpoint/pair-bert-train_wiki-val_wiki-5-3-na3.pth.tar'


In [16]:
d.clear_support_queries()
# d.load_support("connections_Mueller_cleaned.csv", min_instance=9)
# d.load_support("test_relation_support_dataset.csv", K=5)
d.load_support("test_relation_support_dataset_2.csv", K=5)
# d.load_support("test_relation_support_dataset_3.csv", K=5)
# d.load_queries_csv("test_queries.csv")
d.load_queries_predefined_head_tail_csv("test_queries_with_head_tail.csv")

In [17]:
# len(list(r['name'] for r in d.support))
list(r['name'] for r in d.support)
# d.support = [i for i in d.support if i['name'] != 'oppose']
# d.support

['coordination', 'contact', 'oppose', 'assistance', 'part_of']

In [19]:
d.detect()

Sentence: "The dean of HKUST interviewed Carrie Lam, but it was inconclusive.", head: "dean of HKUST", tail: "Carrie Lam", prediction: assistance
Sentence: "Trump hacked Obama's laptop in 2017.", head: "Obama's laptop", tail: "Trump", prediction: oppose
Sentence: "Trump had had contact, including a meeting in 2010, with Obama before he became President. ", head: "Trump", tail: "Obama", prediction: coordination
Sentence: "Trump had gotten funding from Congress to investigate Obama.", head: "Trump", tail: "Obama", prediction: coordination
Sentence: "Trump had gotten funding from Congress to investigate Obama.", head: "Trump", tail: "Congress", prediction: part_of
Sentence: "Trump opposed President Obama's invasion of Iraq.", head: "invasion of Iraq", tail: "Trump", prediction: oppose
Sentence: "Trump had gotten funding from Congress to investigate Obama.", head: "Congress", tail: "Trump", prediction: assistance
Sentence: "Trump objected to President Obama's invasion of Iraq.", head: "inv

[('The dean of HKUST interviewed Carrie Lam, but it was inconclusive.',
  'dean of HKUST',
  'Carrie Lam',
  'assistance',
  tensor([[[0.5422, 0.3769, 1.9786, 2.7187, 1.4845, 0.4444]]],
         grad_fn=<CatBackward>)),
 ("Trump hacked Obama's laptop in 2017.",
  "Obama's laptop",
  'Trump',
  'oppose',
  tensor([[[ 0.4807,  1.4791,  2.7131,  1.8642, -0.5507,  0.4212]]],
         grad_fn=<CatBackward>)),
 ('Trump had had contact, including a meeting in 2010, with Obama before he became President. ',
  'Trump',
  'Obama',
  'coordination',
  tensor([[[1.9689, 1.7125, 0.1885, 0.5415, 0.6385, 0.4719]]],
         grad_fn=<CatBackward>)),
 ('Trump had gotten funding from Congress to investigate Obama.',
  'Trump',
  'Obama',
  'coordination',
  tensor([[[2.0353, 1.8534, 1.1228, 1.5320, 1.1279, 0.4638]]],
         grad_fn=<CatBackward>)),
 ('Trump had gotten funding from Congress to investigate Obama.',
  'Trump',
  'Congress',
  'part_of',
  tensor([[[-0.2632,  0.7425, -0.0563,  1.7721,  2.

In [18]:
d.support[4]

{'name': 'part_of',
 'examples': [{'sentence': 'Zeng Qun, the deputy head of ShanghaiÃ•s Civil Affairs Bureau, said at a news conference on Saturday that aerosol transmission is among the ways the novel coronavirus can be spread.',
   'head': 'Zeng Qun',
   'tail': 'ShanghaiÃ•s Civil Affairs Bureau'},
  {'sentence': "Taylor Swift played a song at Trump's inauguration ceremony.",
   'head': 'Taylor Swift',
   'tail': 'inauguration ceremony.'},
  {'sentence': 'The team will be led by Bruce Aylward, a Canadian physician and epidemiologist who has previously overseen international campaigns to fight Ebola and polio, the organizationÃ•s director general, Tedros Adhanom Ghebreyesus, announced on Sunday in Geneva.',
   'head': 'Bruce Aylward',
   'tail': 'The team'},
  {'sentence': 'Shen Yinzhong, the medical director of the Shanghai Public Health Clinical Center, told The Paper, a Shanghai newspaper, the coronavirus can spread through the air in theory, a confirmation requires further resear