# **Assignment 1**
Initial exploration of graph machine learning concepts using the "Nations" dataset.

- The TransE paper: [*Translating Embeddings for Modeling
Multi-relational Data*](https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf)

## **Setup**

Download and process the “Nations” baseline, which you can find on GitHub, e.g.: ZhenfengLei/KGDatasets


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd
from torch import nn
import torch
import torch.optim as optim
import numpy as np
import random
import math

In [None]:
# Filepaths and constants
user = input('Emilio or Clare? ')
if user.lower() == 'emilio':
  folder_path = '/content/drive/MyDrive/Graph ML/data/nations/'
elif user.lower() == 'clare':
  folder_path = '/content/drive/MyDrive/College/College Year 3/Oxford/Graph Machine Learning/data/Nations/'
entity_filepath = folder_path + 'entity2id.txt'
relation_filepath = folder_path + 'relation2id.txt'
train_filepath = folder_path + 'train.txt'
valid_filepath = folder_path + 'valid.txt'
test_filepath = folder_path + 'test.txt'

Emilio or Clare? emilio


In [None]:
# Function to reverse a dictionary (flip keys and values)
def reverse_dict(old_dict, new_dict):
    for k in old_dict.keys():
        new_dict[old_dict[k]] = k

In [None]:
# Dictionary mappings for entity and relation IDs
entity_2_id = {}
id_2_entity = {}

relation_2_id = {}
id_2_relation = {}

id_2_entity = pd.read_table(entity_filepath, header=None, delim_whitespace=True).to_dict()[0]
id_2_relation = pd.read_table(relation_filepath, header=None, delim_whitespace=True).to_dict()[0]

reverse_dict(id_2_entity, entity_2_id)
reverse_dict(id_2_relation, relation_2_id)

In [None]:
# Define a new class for facts - encapsulate necessary functionality
class Fact:
    def __init__(self, head, relation, tail):
        self.head = head
        self.relation = relation
        self.tail = tail

    def equals_head(self, other):
        if self.head == other.head:
            return True
        return False

    def equals_tail(self, other):
        if self.tail == other.tail:
            return True
        return False

    def equals_relation(self, other):
        if self.relation == other.relation:
            return True
        return False

    def __eq__(self, other):
        return self.equals_head(other) and self.equals_tail(other) and self.equals_relation(other)

    def __hash__(self):
        return hash((self.head, self.relation, self.tail))

    def __repr__(self):
        return "HI"

    def __str__(self):
        return self.head + " " + self.relation + " " + self.tail

In [None]:
# Function to translate raw data into fact objects
def get_the_facts(content):
    list_of_facts = []
    for triple in content:
        list_of_facts.append(Fact(triple.split()[0], triple.split()[1], triple.split()[2]))
    return list_of_facts

In [None]:
# Read in facts from train set
train_file = open(train_filepath, "r")
train_content = train_file.readlines()
train_data = get_the_facts(train_content)

# Checking format of facts
for t in train_data[0:10]:
  print(t)
print()

# Misc testing to ensure everything is set up correctly
print(train_data[1])
print(train_data[2])
print(train_data[1].equals_head(train_data[1]))
print(train_data[1].equals_tail(train_data[2]))
print(train_data[1].equals_relation(train_data[2]))
print(train_data[1] == train_data[1])

netherlands militaryalliance uk
egypt intergovorgs3 usa
jordan relbooktranslations usa
poland timesincewar ussr
uk negativebehavior ussr
poland relintergovorgs uk
usa weightedunvote india
china accusation india
uk unweightedunvote egypt
poland embassy netherlands

egypt intergovorgs3 usa
jordan relbooktranslations usa
True
True
False
True


In [None]:
# Read in validation and test facts
valid_file = open(valid_filepath, "r")
valid_content = valid_file.readlines()
valid_data = get_the_facts(valid_content)

test_file = open(test_filepath, "r")
test_content = test_file.readlines()
test_data = get_the_facts(test_content)

## **Question 1a**

Compute the number of entities, relations, and facts, and compare with the reported numbers (sanity checks)


In [None]:
print(f'The number of entities is {len(entity_2_id)}.')
print(f'The number of relations is {len(relation_2_id)}.')
print(f'The number of facts is {len(train_data)+len(valid_data)+len(test_data)}.')

The number of entities is 14.
The number of relations is 55.
The number of facts is 1992.


## **Question 1b**

Conduct a brief data analysis, and identify the most popular relations, entities, and briefly describe the dataset structure, i.e., how are entities connected, their types, what does the dataset describe? Etc.

---

The three most popular entities are 'usa', 'uk', and 'ussr'. The three most popular relationships are 'embassy', 'commonbloc1' ("common bloc 1"), and 'timessinceally' ("times since ally"). 

This dataset describes socio-political relationships of a small sample of nations during the Cold War (hence the inclusion of the USSR). Most of the relationships appear to relate to politcal structures and diplomacy. 


In [None]:
# Putting together train, val, and test set
full_facts = list(set(train_data + valid_data + test_data))

count_entities = {}
count_relations = {}

# Compute entity/relation frequencies
for f in full_facts:
    if f.head in count_entities:
        count_entities[f.head] += 1
    else:
        count_entities[f.head] = 1

    if f.tail in count_entities:
        count_entities[f.tail] += 1
    else:
        count_entities[f.tail] = 1

    if f.relation in count_relations:
        count_relations[f.relation] += 1
    else:
        count_relations[f.relation] = 1

In [None]:
# Get entity counts
ent_df = pd.DataFrame.from_dict(count_entities, orient='index', columns = ['count'])
ent_df = ent_df.sort_values('count', ascending=False)
ent_df

Unnamed: 0,count
usa,514
uk,462
ussr,331
netherlands,313
india,302
poland,287
egypt,284
brazil,260
china,249
israel,243


In [None]:
# Get relation counts
rel_df = pd.DataFrame.from_dict(count_relations, orient='index', columns = ['count'])
rel_df = rel_df.sort_values('count', ascending=False)
rel_df.head(10) # only show the first 10

Unnamed: 0,count
embassy,141
commonbloc1,97
timesinceally,95
relintergovorgs,94
intergovorgs3,93
ngoorgs3,92
relngo,91
reldiplomacy,87
intergovorgs,84
independence,78


## **Establishing, Training, and Evaluating TransE**

**(Contains questions 2a and 2b)**

Using a machine learning framework of your choice (TensorFlow, PyTorch, etc.), implement the basic TransE model and train it on the "Nations". To do this, your implementation should include: 
1. Entity processing and mapping to embeddings (You can use 100-dimensional embeddings, for instance, but the choice of model size is not critical. It just should be reasonably sized)
2. Scoring function and loss (My recommendation for loss is negative sampling loss (from the RotatE paper). I also suggest not using the self-adversarial parameter alpha, i.e., setting it to 0, for a start)
3. (Uniform) Negative sampling and training loop
4. Evaluation metrics (mean rank, mean reciprocal rank, Hits@K)


**TransE Notes**

- Underlying assumption of TransE: "In TransE, relationships are represented as translations in the embedding space: if (h, l, t) holds, then the embedding of the tail entity t should be close to the embedding of the head entity h plus some vector that depends on the relationship l. Our approach relies on a reduced set of parameters as it learns only one low-dimensional vector for each entity and each relationship." --> improvement over other multi-relational models that are complex and thus require lots of computing power and overfit and often barely do better than a simple linear model
- Why does this work? heirarchy is super common in knowledge bases (like our "Nations" dataset)
- Research suggests that entities of different types could also be represented by translations in the embedding space (I think embedding space is the dimension of the vectors that we allow for these translations --> this dimension can be represented by the hyperparameter k)
- Goal is vector embedding for h + l is close to t if (h, l, t) is a triple and farther otherwise (h, l, and t are vectors once embedded) --> "t should be a nearest neighbor of h + l"
- "L2-norm of the embeddings of the entities is 1" --> no regularization for relations except at initialization--> regularize entities to prevent driving up vector size to drive down loss/massively overfit
- "At each main iteration of the algorithm, the embedding vectors of the entities are first normalized"
- Full algorithm description: ". All embeddings for entities and
relationships are first initialized following the random procedure proposed in [4]. At each main
iteration of the algorithm, the embedding vectors of the entities are first normalized. Then, a small
set of triplets is sampled from the training set, and will serve as the training triplets of the minibatch.
For each such triplet, we then sample a single corrupted triplet. The parameters are then updated by
taking a gradient step with constant learning rate. The algorithm is stopped based on its performance
on a validation set"
- distance function can be squared Euclidean distance function


In [None]:
# Functiont to normalize an array in-place
def normalize_vector(my_array):
    norm = torch.linalg.vector_norm(my_array)
    my_array /= norm

In [None]:
# Function to intialize normal vectors for entity/relation embeddings
def initialize_normal_vectors(d, items):
    my_list = []

    for i in items:
        new_array = torch.FloatTensor(d).uniform_(-6/d**(1/2), 6/d**(1/2)) # Generate randomized vector from uniform distribution
        normalize_vector(new_array) # Normalize vector
        new_array.requires_grad_()
        my_list.append(new_array)

    return my_list

In [None]:
# Generate a list of negative facts by passing in all possible entities, a fact to corrupt, and a list of all true facts
def get_list_of_negative_facts(all_entities, a_fact, all_facts):
    all_corrupted_facts = []
    all_facts_set = set(all_facts)
    for a in all_entities:
        
        # Change just the head and add if that fact isn't in the set of true facts
        if a != a_fact.head:
            new_fact = Fact(a, a_fact.relation, a_fact.tail)
            if new_fact not in all_facts_set: # checking that the fact isn't in ANY part of the KG
                all_corrupted_facts.append(new_fact)
                
        # Change just the tail and add if that fact isn't in the set of true facts
        if a != a_fact.tail:
            new_fact = Fact(a_fact.head, a_fact.relation, a)
            if new_fact not in all_facts_set:
                all_corrupted_facts.append(new_fact)
      
    return all_corrupted_facts

In [None]:
# Dissimilarity measure - L2 norm (||h+r-t||_2)
def score_L2(h, r, t):
    return torch.linalg.vector_norm(h+r-t)

In [None]:
# Create positive/negative fact pairs - one neg sampled for each pos, as descibed in paper
def get_pos_neg_fact_pairs(all_entities, all_facts):
    a_set = set()
    for a in all_facts:
        neg_fact = random.choice(get_list_of_negative_facts(all_entities, a, all_facts))
        a_set.add((a, neg_fact))
    return a_set

In [None]:
# Compute TransE loss, as described in original paper
def transe_loss(t_batch, margin, ent_2_vec, rel_2_vec, entity_embeddings, relation_embeddings):
    total_loss = 0
    for t in t_batch:
        pos_fact = t[0]
        neg_fact = t[1]
        
        pos_fact_score = score_L2(entity_embeddings[ent_2_vec[pos_fact.head]], 
                                  relation_embeddings[rel_2_vec[pos_fact.relation]], 
                                  entity_embeddings[ent_2_vec[pos_fact.tail]])
        neg_fact_score = score_L2(entity_embeddings[ent_2_vec[neg_fact.head]], 
                                  relation_embeddings[rel_2_vec[neg_fact.relation]], 
                                  entity_embeddings[ent_2_vec[neg_fact.tail]])

        total_loss += max(0, margin+pos_fact_score-neg_fact_score)
    return total_loss

In [None]:
# Get rank of the target entity in a numpy array of embedding scores
def get_rank(a_list, an_index):
    temp = a_list.argsort()
    rank = np.empty_like(temp)
    rank[temp] = np.arange(len(a_list))
    return rank[an_index]+1

In [None]:
# Get rank of observed heads/tails against corrupted alternatives
def create_rank_arrays(model, dataset):
    entity_embeddings = model[0]
    relation_embeddings = model[1]
    ent_2_vec = model[2]
    rel_2_vec = model[3]

    head_rank_list = np.zeros(len(dataset))
    tail_rank_list = np.zeros(len(dataset))
    for idxv, v in enumerate(valid_data):
        head_embedding_index = ent_2_vec.get(v.head)
        relation_embedding = relation_embeddings[rel_2_vec.get(v.relation)]
        tail_embedding_index = ent_2_vec.get(v.tail)

        heads_corrupted = np.zeros(len(entity_embeddings))
        tails_corrupted = np.zeros(len(entity_embeddings))
        for idxe, e in enumerate(entity_embeddings):
            heads_corrupted[idxe] = score_L2(e, relation_embedding, entity_embeddings[tail_embedding_index]).item()
            tails_corrupted[idxe] = score_L2(entity_embeddings[head_embedding_index], relation_embedding, e).item()

        head_rank_list[idxv] = get_rank(heads_corrupted, head_embedding_index)
        tail_rank_list[idxv] = get_rank(tails_corrupted, tail_embedding_index)
    
    return head_rank_list, tail_rank_list

In [None]:
# Mean rank metric
def compute_mean_rank(head_rank_list, tail_rank_list, dataset_length):
    return (sum(head_rank_list)+sum(tail_rank_list))/(2*dataset_length)

In [None]:
# Mean reciprocal rank metric
def compute_mean_reciprocal_rank(head_rank_list, tail_rank_list, dataset_length):
    return (np.sum(1/head_rank_list)+np.sum(1/tail_rank_list))/(2*dataset_length)

In [None]:
# Hits@k metric
def compute_hits_at_k(head_rank_list, tail_rank_list, k, dataset_length):
    return (np.count_nonzero(head_rank_list <= k)+np.count_nonzero(tail_rank_list <= k))/(2*dataset_length) * 100

In [None]:
# Evaluate a model (entity/relation embeddings) using all 3 metrics
def eval_model(model, dataset, k):
  head_rank_list, tail_rank_list = create_rank_arrays(model, valid_data)

  return (compute_mean_rank(head_rank_list, tail_rank_list, len(dataset)), 
          compute_mean_reciprocal_rank(head_rank_list, tail_rank_list, len(dataset)), 
          compute_hits_at_k(head_rank_list, tail_rank_list, k, len(dataset)))

In [None]:
# Callback to perform early stopping
#   - if current val mean rank isn't better than any of last "how_many_to_check" epochs, stop the training
def early_stopping_callback(model, dataset, k, mr_array, how_many_to_check):
    head_rank_list, tail_rank_list = create_rank_arrays(model, dataset)

    cur_mr = compute_mean_rank(head_rank_list, tail_rank_list, len(dataset))
    mr_array.append(cur_mr)
    if len(mr_array) > how_many_to_check + 1:
        recent_epoch_mrs = np.array(mr_array[-(how_many_to_check+1) : -1])

        if np.count_nonzero(recent_epoch_mrs > cur_mr) > 0:
            return False
        print('Final validation mean rank: ', cur_mr)
        return True

    return False

In [None]:
# Train the TransE model - output is entity/relation embeddings in k-dimensional vector space
def train(entities, relations, facts, d, margin, epochs, batch_size, learning_rate, valid_dataset=None, k=5, how_many_to_check=5, verbose=True):
    entity_embeddings = initialize_normal_vectors(d, entities)
    relation_embeddings = initialize_normal_vectors(d, relations)
    
    ent_2_vec = {}
    rel_2_vec = {}

    # Create a dictionary mapping for easy embedding lookup for entities
    for e in range(len(entities)):
        ent_2_vec[entities[e]] = e

    # Create a dictionary mapping for easy embedding lookup for relations
    for r in range(len(relations)):
        rel_2_vec[relations[r]] = r
    
    mr_array = []

    # Define our optimizer to use stochastic gradient descent
    optimizer = optim.SGD(entity_embeddings + relation_embeddings, lr = learning_rate)
    
    for e in range(epochs):
        if e%5 == 0 and verbose:
            print('Epoch:', e)
        for i in range(int(len(facts) / batch_size)):

            # Select a batch from all facts
            fact_sample = random.sample(facts, batch_size)
            t_batch = get_pos_neg_fact_pairs(entities, fact_sample)
            
            # Only normalize the entity vector embeddings
            with torch.no_grad():
                for an_ent in entity_embeddings:
                    normalize_vector(an_ent)
            
            # Compute transe loss
            loss = transe_loss(t_batch, margin, ent_2_vec, rel_2_vec, entity_embeddings, relation_embeddings)    
            
            # Update embeddings
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        # Evaluate model at end of epoch
        if valid_data is not None:
            model = entity_embeddings, relation_embeddings, ent_2_vec, rel_2_vec
            if early_stopping_callback(model, valid_dataset, k, mr_array, how_many_to_check):
                print('Stopping at epoch:', e)
                break
          
    return entity_embeddings, relation_embeddings, ent_2_vec, rel_2_vec

In [None]:
# Check model training w/arbitrary-chosen hyperparams
entities = list(entity_2_id.keys())
relations = list(relation_2_id.keys())
d = 100
margin = 0.5
epochs = 1000
batch_size = 32
learning_rate = 0.001
k = 5
validation_num_to_check = 5

print('Length of training set: ', len(train_data))
model = train(entities, relations, train_data, d, margin, epochs, batch_size, learning_rate, valid_data, k, validation_num_to_check, verbose = True)

Length of training set:  1592
Epoch: 0
Epoch: 5
Epoch: 10
Epoch: 15
Epoch: 20
Epoch: 25
Epoch: 30
Final validation mean rank:  6.922110552763819
Stopping at epoch: 34


In [None]:
# Formal hyperparamter tuning using same grid-search values as TransE paper (we added 100 for latent dim)
lr_list = [0.001, 0.01, 0.1]
margin_list = [1, 2, 10]
latent_dim = [20, 50, 100]
all_models = {}

for lr in lr_list:
    for m in margin_list:
        for ld in latent_dim:
            print('Learning rate:', lr, 'Margin:', m, 'Latent dimension:', ld)
            all_models[(lr, m, ld)] = train(entities, relations, train_data, ld, m, epochs, batch_size, lr, valid_data, k, validation_num_to_check, False)

Learning rate: 0.001 Margin: 1 Latent dimension: 20
Final validation mean rank:  6.71356783919598
Stopping at epoch: 68
Learning rate: 0.001 Margin: 1 Latent dimension: 50
Final validation mean rank:  6.665829145728643
Stopping at epoch: 49
Learning rate: 0.001 Margin: 1 Latent dimension: 100
Final validation mean rank:  6.673366834170854
Stopping at epoch: 57
Learning rate: 0.001 Margin: 2 Latent dimension: 20
Final validation mean rank:  6.902010050251256
Stopping at epoch: 46
Learning rate: 0.001 Margin: 2 Latent dimension: 50
Final validation mean rank:  6.9045226130653266
Stopping at epoch: 47
Learning rate: 0.001 Margin: 2 Latent dimension: 100
Final validation mean rank:  6.768844221105527
Stopping at epoch: 29
Learning rate: 0.001 Margin: 10 Latent dimension: 20
Final validation mean rank:  6.85678391959799
Stopping at epoch: 49
Learning rate: 0.001 Margin: 10 Latent dimension: 50
Final validation mean rank:  6.751256281407035
Stopping at epoch: 67
Learning rate: 0.001 Margin: 

In [None]:
# Calculating test performance for all saved models from hyperparam tuning
for key in all_models.keys():
  print(key)
  cur_model = all_models.get(key)
  eval = np.round(eval_model(cur_model, test_data, k), 2)
  print('Average rank:', eval[0])
  print('Mean reciprocal rank:', eval[1])
  print(f'Hits @ {k}:', str(eval[2])+'%')
  print()

(0.001, 1, 20)
Average rank: 6.65
Mean reciprocal rank: 0.23
Hits @ 5: 42.04%

(0.001, 1, 50)
Average rank: 6.6
Mean reciprocal rank: 0.23
Hits @ 5: 41.79%

(0.001, 1, 100)
Average rank: 6.61
Mean reciprocal rank: 0.22
Hits @ 5: 45.02%

(0.001, 2, 20)
Average rank: 6.83
Mean reciprocal rank: 0.22
Hits @ 5: 39.8%

(0.001, 2, 50)
Average rank: 6.84
Mean reciprocal rank: 0.21
Hits @ 5: 39.05%

(0.001, 2, 100)
Average rank: 6.7
Mean reciprocal rank: 0.2
Hits @ 5: 40.55%

(0.001, 10, 20)
Average rank: 6.79
Mean reciprocal rank: 0.22
Hits @ 5: 40.3%

(0.001, 10, 50)
Average rank: 6.68
Mean reciprocal rank: 0.21
Hits @ 5: 42.04%

(0.001, 10, 100)
Average rank: 6.79
Mean reciprocal rank: 0.21
Hits @ 5: 43.03%

(0.01, 1, 20)
Average rank: 6.94
Mean reciprocal rank: 0.2
Hits @ 5: 36.32%

(0.01, 1, 50)
Average rank: 6.82
Mean reciprocal rank: 0.22
Hits @ 5: 40.55%

(0.01, 1, 100)
Average rank: 6.7
Mean reciprocal rank: 0.22
Hits @ 5: 41.04%

(0.01, 2, 20)
Average rank: 6.7
Mean reciprocal rank: 0

### **Evaluation & Analysis**
"Evaluation protocol For evaluation, we use the same ranking procedure as in [3]. For each test triplet, the head is removed and replaced by each of the entities of the dictionary in turn. Dissimilarities (or energies) of those corrupted triplets are first computed by the models and then sorted by ascending order; the rank of the correct entity is finally stored. This whole procedure is repeated while removing the tail instead of the head. We report the mean of those predicted ranks and the hits@10, i.e. the proportion of correct entities ranked in the top 10."

"These metrics are indicative but can be flawed when some corrupted triplets end up being valid
ones, from the training set for instance. "

This is different in the paper from the class slide implementation.

#### **Question 2a**
How does TransE perform on this task?

---

The below cells show TransE's test set performance. The trained TransE *substantially* better than a random embeddings (the cell below), which shows that the model did indeed learn during training, even if the performance isn't stellar. It's possible that one element constraining performance is the negative saampling scheme from the TransE paper, which is very simple--only one corrupted fact is used for each real fact, which is substantially less informative than more sophisticated sampling methods.

How can we explain the facts that TransE can and cannot predict well? 
- Poorly: from class, we know that TransE cannot do well on symmetric relations and this dataset has many realtions with inherent symmetry (e.g. countries are allies both ways). 
- Well: being able to embody a composition pattern as described in class seems intuitively good, as approximately transitivity seems prevalent in this dataset.
- Hypothesis: had some difficulties with properly capturing the political allies and enemies in this dataset, which seems to focus on cold war poltical relations (see for example embassy relations). This may have to do with the inference patterns that TransE can and cannot capture (mentioned above).
  - Difficult to say this for sure, since we're not super familiar with cold war era political relations.

In [None]:
# Compute mean rank, mean reciprocal rank, and hits@k on the test set
best_model = all_models.get((0.001, 1, 100)) # getting the best model from the tuning process

eval = np.round(eval_model(best_model, test_data, k), 2)
print('Average rank:', eval[0])
print('Mean reciprocal rank:', eval[1])
print(f'Hits@{k}:', str(eval[2])+'%')

Average rank: 6.61
Mean reciprocal rank: 0.22
Hits@5: 45.02%


In [None]:
# Calculating the performance of a randomly-initialized model... for comparison against trained model!
mr = []
mrr = []
hak = []

for i in range(20): # averaging over 20 random initializations for embeddings vectors
  random_model = train(entities, relations, train_data, 100, 1, 0, batch_size, 0.001, test_data, k, validation_num_to_check, False) # zero epochs == random embeddings
  eval = np.round(eval_model(random_model, test_data, k), 2)
  mr.append(eval[0])
  mrr.append(eval[1])
  hak.append(eval[2])

print('Average rank:', round(np.mean(mr), 2))
print('Mean reciprocal rank:', round(np.mean(mrr), 2))
print(f'Hits@{k}: {round(np.mean(hak), 2)}%')

Average rank: 7.92
Mean reciprocal rank: 0.17
Hits@5: 30.15%


In [None]:
# Return a list of facts that the trained model performed well on under head and tail corruption
def get_good_facts(model, dataset, how_good):
    head_rank_list, tail_rank_list = create_rank_arrays(model, dataset)
    mask_head = head_rank_list < how_good
    mask_tail = tail_rank_list < how_good
    mask = mask_head & mask_tail
    dataset_np = np.array(dataset)
    
    return dataset_np[mask.nonzero()]

In [None]:
# Return a list of facts that the trained model performed poorly on under head and tail corruption
def get_bad_facts(model, dataset, how_bad, entities):
    head_rank_list, tail_rank_list = create_rank_arrays(model, dataset)
    mask_head = head_rank_list > (len(entities) - how_bad)
    mask_tail = tail_rank_list > (len(entities) - how_bad)
    mask = mask_head & mask_tail
    dataset_np = np.array(dataset)
    
    return dataset_np[mask.nonzero()]

In [None]:
# Find facts the model did well and poorly on in the test set
how_good = 4
how_bad = 4

print('List of test data facts model did well on: ')
[print(i) for i in get_good_facts(best_model, test_data, how_good)]
print()
print()

print('List of test data facts model did poorly on: ')
[print(i) for i in get_bad_facts(best_model, test_data, how_bad, entities)]
print()

List of test data facts model did well on: 
israel commonbloc1 cuba
usa independence china
burma relintergovorgs usa
india commonbloc1 poland
china relexports cuba
usa unweightedunvote israel
ussr relbooktranslations usa
uk tourism3 egypt
poland reltreaties ussr
israel intergovorgs3 egypt
indonesia reldiplomacy egypt
egypt embassy cuba
jordan relintergovorgs poland
israel relngo india
egypt reldiplomacy usa
indonesia ngoorgs3 india
cuba timesincewar usa
indonesia relngo usa
netherlands relexportbooks uk
burma ngoorgs3 israel
usa embassy brazil
netherlands commonbloc2 uk
burma relintergovorgs egypt
egypt commonbloc1 brazil
india intergovorgs burma
indonesia commonbloc1 netherlands
brazil intergovorgs poland
jordan relngo usa
israel commonbloc1 netherlands
jordan ngoorgs3 netherlands


List of test data facts model did poorly on: 
egypt embassy uk
indonesia violentactions uk
usa officialvisits indonesia
china officialvisits indonesia
uk intergovorgs3 brazil
poland reldiplomacy cuba
indon

#### **Question 2b**

Hyperparameter tuning is performed above. In the end, the best choice of hyperparameters was a learning rate of 0.001, a margin of $\gamma=$ 1, and an embedding dimension of 100 (slightly higher performance, as measured by hits@k, than alternative embedding dimensions).

- Margin: the model doesn't seem too sensitive to the choice of margin--we tried [1,2,10] and model performance didn't seem to generally be directly affected by this choice.
- Learning rate: 0.001 and 0.01 seemed to work substantially better than 0.1... mean rank deteriorates when `lr=0.1`.
- Embedding dimensionality: hits@k often improved when increasing embedding dimensionality, but mean rank is relatively consistent across choices of these hyperparameters.

## **Misc Testing**

Code below is not for the model but for our own benefit. 

In [None]:
hi = np.zeros(4)
hi[0] = 4
hi[1] = 2
hi[2] = 7
hi[3] = 1

print(get_rank(hi, 3))
print(np.count_nonzero(hi < 10))
print(np.sum(1/hi))

1
4
1.8928571428571428


In [None]:
np.array([1, 0, 1]) & np.array([1, 0, 0])

array([1, 0, 0])

#### Sources
1. https://stackoverflow.com/questions/12021730/can-pandas-handle-variable-length-whitespace-as-column-delimiters
2. https://www.geeksforgeeks.org/how-to-read-text-files-with-pandas/#:~:text=We%20can%20read%20data%20from,of%20a%20comma%20by%20default
3. https://realpython.com/pandas-sort-python/#:~:text=To%20sort%20the%20DataFrame%20based,not%20modify%20the%20original%20DataFrame.
4. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.from_dict.html
5. https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
6. https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html
7. [Paper describing TransE](https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf)
