# Knowledge Base Completion (KBC) / Link Prediction with RDF2Vec

## Prerequisites (Linux/MacOS)

- install kbc_rdf2vec ([https://github.com/janothan/kbc_rdf2vec](https://github.com/janothan/kbc_rdf2vec))
- install kbc_evaluation ([https://github.com/janothan/kbc_evaluation/](https://github.com/janothan/kbc_evaluation/))
- install jRDF2Vec and add the shell script to your path (see [here](https://github.com/dwslab/jRDF2Vec/blob/master/src/main/bin/jrdf2vec.sh)) - alternatively, you can modify the specified jrdf2vec command in this notebook to fit your path.

The only thing you have to do before running the whole notebook is to set your `work_dir` in the cell below.

In [16]:
# TODO: Now let's decide on your directory where everything shall be written to (requires > 5Gb of disk space)
work_dir = "/work/jportisc/kbc_rdf2vec/strategy_grid_2/notebook_files"

## Let's Transform WN18 and FB15k Into NT Files

In [18]:
import sys
from kbc_rdf2vec.dataset import DataSet
import os

# create the directory if it does not exist yet
nt_dir = os.path.join(work_dir, "nt_files")
if not os.path.exists(nt_dir):
    os.makedirs(nt_dir)

DataSet.write_training_file_nt(data_set=DataSet.WN18, file_to_write=os.path.join(nt_dir, "WN18.nt"))
DataSet.write_training_file_nt(data_set=DataSet.FB15K, file_to_write=os.path.join(nt_dir, "FB15k.nt"))

## Let's Train Embeddings with jRDF2Vec

Train embeddings for WN18 by running the following line:
```
!jrdf2vec -graph "./WN18.nt" -numberOfWalks 300 -threads 20 -depth 4 -walkDirectory <set manually or use generated statement> -trainingMode sg -dimension 200 -window 2 -epochs 25
```

Train embeddings for FB15k by running the following line:
```
!jrdf2vec -graph "./FB15k.nt" -numberOfWalks 300 -threads 20 -depth 4 -walkDirectory <set manually or use generated statement> -trainingMode sg -dimension 200 -window 2 -epochs 25
```

You do not have to do anything except for running the cells below.

In [22]:
wn18_walk_path = os.path.join(work_dir, "wn18_walks")
wn_nt_path = os.path.join(nt_dir, "WN18.nt")

!jrdf2vec -graph $wn_nt_path -numberOfWalks 250 -threads 20 -depth 4 -walkDirectory $wn18_walk_path -trainingMode sg -dimension 200 -window 2 -epochs 25

The specified walk directory does not exist. Trying to make the directory.
Using 20 threads for walk generation and training.
Using vector dimension: 200
Using depth 4
Generating 250 walks per entity.
RDF2Vec Classic
 INFO [main] (ParserManager.java:53) - Using NxParser.
 INFO [main] (ParserManager.java:88) - Model read into memory.
walkGeneration mode is null... Using default: RANDOM_WALKS_DUPLICATE_FREE
 INFO [pool-1-thread-1] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 1000
 INFO [pool-1-thread-1] (WalkGenerator.java:321) - TOTAL NUMBER OF PATHS : 236230
 INFO [pool-1-thread-6] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 2000
 INFO [pool-1-thread-6] (WalkGenerator.java:321) - TOTAL NUMBER OF PATHS : 472839
 INFO [pool-1-thread-4] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 3000
 INFO [pool-1-thread-4] (WalkGenerator.java:321) - TOTAL NUMBER OF PATHS : 709585
 INFO [pool-1-thread-2] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 4000
 INFO [pool-1-thr

DEBUG [main] (PoolingHttpClientConnectionManager.java:409) - Connection manager is shutting down
DEBUG [main] (PoolingHttpClientConnectionManager.java:415) - Connection manager shut down
DEBUG [main] (RequestAddCookies.java:123) - CookieSpec selected: default
DEBUG [main] (RequestAuthCache.java:77) - Auth cache not set in the context
DEBUG [main] (PoolingHttpClientConnectionManager.java:266) - Connection request: [route: {}->http://127.0.0.1:1808][total kept alive: 0; route allocated: 0 of 2; total allocated: 0 of 20]
DEBUG [main] (PoolingHttpClientConnectionManager.java:310) - Connection leased: [id: 1][route: {}->http://127.0.0.1:1808][total kept alive: 0; route allocated: 1 of 2; total allocated: 1 of 20]
DEBUG [main] (MainClientExec.java:234) - Opening connection {}->http://127.0.0.1:1808
DEBUG [main] (DefaultHttpClientConnectionOperator.java:139) - Connecting to /127.0.0.1:1808
DEBUG [main] (LoggingManagedHttpClientConnection.java:96) - http-outgoing-1: Shutdown connection
DEBUG [

DEBUG [main] (Wire.java:73) - http-outgoing-5 << "HTTP/1.0 200 OK[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-5 << "Content-Type: text/html; charset=utf-8[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-5 << "Content-Length: 4[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-5 << "Server: Werkzeug/1.0.1 Python/3.8.3[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-5 << "Date: Thu, 14 Jan 2021 09:03:50 GMT[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-5 << "[\r][\n]"
DEBUG [main] (Wire.java:87) - http-outgoing-5 << "True"
DEBUG [main] (LoggingManagedHttpClientConnection.java:122) - http-outgoing-5 << HTTP/1.0 200 OK
DEBUG [main] (LoggingManagedHttpClientConnection.java:125) - http-outgoing-5 << Content-Type: text/html; charset=utf-8
DEBUG [main] (LoggingManagedHttpClientConnection.java:125) - http-outgoing-5 << Content-Length: 4
DEBUG [main] (LoggingManagedHttpClientConnection.java:125) - http-outgoing-5 << Server: Werkzeug/1.0.1 Python/3.8.3
DEBUG [main] (L

In [30]:
fb15k_walk_path = os.path.join(work_dir, "fb15k_walks")
fb15k_nt_path = os.path.join(nt_dir, "FB15k.nt")

!jrdf2vec -graph $fb15k_nt_path -numberOfWalks 250 -threads 20 -depth 4 -walkDirectory $fb15k_walk_path -trainingMode sg -dimension 200 -window 2 -epochs 25

The specified walk directory does not exist. Trying to make the directory.
Using 20 threads for walk generation and training.
Using vector dimension: 200
Using depth 4
Generating 250 walks per entity.
RDF2Vec Classic
 INFO [main] (ParserManager.java:53) - Using NxParser.
 INFO [main] (ParserManager.java:88) - Model read into memory.
walkGeneration mode is null... Using default: RANDOM_WALKS_DUPLICATE_FREE
 INFO [pool-1-thread-14] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 1000
 INFO [pool-1-thread-14] (WalkGenerator.java:321) - TOTAL NUMBER OF PATHS : 250000
 INFO [pool-1-thread-7] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 2000
 INFO [pool-1-thread-7] (WalkGenerator.java:321) - TOTAL NUMBER OF PATHS : 499109
 INFO [pool-1-thread-7] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 3000
 INFO [pool-1-thread-7] (WalkGenerator.java:321) - TOTAL NUMBER OF PATHS : 748678
 INFO [pool-1-thread-18] (WalkGenerator.java:320) - TOTAL PROCESSED ENTITIES: 4000
 INFO [pool-1-

DEBUG [main] (Wire.java:73) - http-outgoing-2 << "HTTP/1.0 200 OK[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-2 << "Content-Type: text/html; charset=utf-8[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-2 << "Content-Length: 4[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-2 << "Server: Werkzeug/1.0.1 Python/3.8.3[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-2 << "Date: Thu, 14 Jan 2021 14:56:55 GMT[\r][\n]"
DEBUG [main] (Wire.java:73) - http-outgoing-2 << "[\r][\n]"
DEBUG [main] (Wire.java:87) - http-outgoing-2 << "True"
DEBUG [main] (LoggingManagedHttpClientConnection.java:122) - http-outgoing-2 << HTTP/1.0 200 OK
DEBUG [main] (LoggingManagedHttpClientConnection.java:125) - http-outgoing-2 << Content-Type: text/html; charset=utf-8
DEBUG [main] (LoggingManagedHttpClientConnection.java:125) - http-outgoing-2 << Content-Length: 4
DEBUG [main] (LoggingManagedHttpClientConnection.java:125) - http-outgoing-2 << Server: Werkzeug/1.0.1 Python/3.8.3
DEBUG [main] (L

## Let's Check the Embeddings

```
!jrdf2vec -analyzeVocab ./wn18_walks/model.kv ./WN18.nt &> wn_analysis.txt
!jrdf2vec -analyzeVocab ./fb15k_walks/model.kv ./FB15k.nt &> fb_analysis.txt

```
The reports are written to the specified files (`wn_analysis.txt`/`fb_analysis.txt`) as they can be quite long.
You can find the reports in the walk directories.

In [26]:
wn18_kv_path = os.path.join(wn18_walk_path, "model.kv")
wn18_analysis_path = os.path.join(wn18_walk_path, "wn_analysis.txt")

!jrdf2vec -analyzeVocab $wn18_kv_path $wn_nt_path &> $wn18_analysis_path

In [31]:
fb15k_kv_path = os.path.join(fb15k_walk_path, "model.kv")
fb15k_analysis = os.path.join(fb15k_walk_path, "fb_analysis.txt")

!jrdf2vec -analyzeVocab $fb15k_kv_path $fb15k_nt_path &> $fb15k_analysis

## Let's predict!
We start by generating the files containing the predictions.

In [None]:
from kbc_rdf2vec.dataset import DataSet
from kbc_rdf2vec.prediction import PredictionFunctionEnum, PredictionFunction
from kbc_rdf2vec.rdf2vec_kbc import Rdf2vecKbc

import os


def generate_prediction_files() -> None:
    wn_vector_file = wn18_kv_path
    wn_nt_file = wn_nt_path
    fb15k_vector_file = fb15k_kv_path
    fb15k_nt_file = fb15k_nt_path

    # let's make a directory if it does not exist yet
    prediction_path = os.path.join(work_dir, "predictions")
    if not os.path.exists(prediction_path):
        os.makedirs(prediction_path)
    
    # ANN WN
    kbc = Rdf2vecKbc(
        model_path=wn_vector_file,
        data_set=DataSet.WN18,
        n=None,
        prediction_function=PredictionFunctionEnum.ANN,
        file_for_predicate_exclusion=wn_nt_file,
        is_reflexive_match_allowed=False,
    )
    kbc.predict(os.path.join(prediction_path, "wn_ann.txt"))

    # ANN FB
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        data_set=DataSet.FB15K,
        n=None,
        prediction_function=PredictionFunctionEnum.ANN,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=False,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_ann.txt"))
    

    # most similar WN
    kbc = Rdf2vecKbc(
        model_path=wn_vector_file,
        n=None,
        data_set=DataSet.WN18,
        file_for_predicate_exclusion=wn_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.MOST_SIMILAR,
    )
    kbc.predict(os.path.join(prediction_path, "wn_most_similar.txt"))
    
    # most similar FB
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        n=None,
        data_set=DataSet.FB15K,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.MOST_SIMILAR,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_most_similar.txt"))
    
    # avg most similar WN
    kbc = Rdf2vecKbc(
        model_path=wn_vector_file,
        n=None,
        data_set=DataSet.WN18,
        file_for_predicate_exclusion=wn_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.PREDICATE_AVERAGING_MOST_SIMILAR,
    )
    kbc.predict(os.path.join(prediction_path, "wn_averaged_most_similar.txt"))

    # avg most similar FB
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        n=None,
        data_set=DataSet.FB15K,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.PREDICATE_AVERAGING_MOST_SIMILAR,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_averaged_most_similar.txt")) 
    
    # addition WN
    kbc = Rdf2vecKbc(
        model_path=wn_vector_file,
        n=None,
        data_set=DataSet.WN18,
        file_for_predicate_exclusion=wn_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.ADDITION,
    )
    kbc.predict(os.path.join(prediction_path, "wn_addition.txt"))

    # addition FB
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        n=None,
        data_set=DataSet.FB15K,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.ADDITION,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_addition.txt"))
    
    # addition FB with reflexive matches allowed
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        n=None,
        data_set=DataSet.FB15K,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=True,
        prediction_function=PredictionFunctionEnum.ADDITION,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_reflexive_addition.txt"))
    
    # avg addition WN
    kbc = Rdf2vecKbc(
        model_path=wn_vector_file,
        n=None,
        data_set=DataSet.WN18,
        file_for_predicate_exclusion=wn_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.PREDICATE_AVERAGING_ADDITION,
    )
    kbc.predict(os.path.join(prediction_path, "wn_averaged_addition.txt"))

    # avg addition FB
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        n=None,
        data_set=DataSet.FB15K,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=False,
        prediction_function=PredictionFunctionEnum.PREDICATE_AVERAGING_ADDITION,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_averaged_addition.txt"))
    
    # avg addition FB with reflexive matches allowed
    kbc = Rdf2vecKbc(
        model_path=fb15k_vector_file,
        n=None,
        data_set=DataSet.FB15K,
        file_for_predicate_exclusion=fb15k_nt_file,
        is_reflexive_match_allowed=True,
        prediction_function=PredictionFunctionEnum.PREDICATE_AVERAGING_ADDITION,
    )
    kbc.predict(os.path.join(prediction_path, "fb15k_reflexive_averaged_addition.txt"))
    

generate_prediction_files()

2021-01-13 15:22:10,626 - kbc_rdf2vec.rdf2vec_kbc - INFO - Gensim vector file detected.
2021-01-13 15:22:10,628 - gensim.utils - INFO - loading Word2VecKeyedVectors object from ./wn18_walks/model.kv
2021-01-13 15:22:10,993 - gensim.utils - INFO - setting ignored attribute vectors_norm to None
2021-01-13 15:22:10,994 - gensim.utils - INFO - loaded ./wn18_walks/model.kv
Predicting Tails and Heads
  0%|          | 0/5000 [00:00<?, ?it/s]2021-01-13 15:22:11,270 - gensim.models.keyedvectors - INFO - precomputing L2-norms of word weight vectors
100%|██████████| 5000/5000 [08:20<00:00,  9.99it/s]
2021-01-13 15:30:31,536 - kbc_rdf2vec.rdf2vec_kbc - INFO - Erroneous Triples: 0
2021-01-13 15:30:31,789 - kbc_rdf2vec.rdf2vec_kbc - INFO - Gensim vector file detected.
2021-01-13 15:30:31,790 - gensim.utils - INFO - loading Word2VecKeyedVectors object from ./fb15k_walks/model.kv
2021-01-13 15:30:31,917 - gensim.utils - INFO - setting ignored attribute vectors_norm to None
2021-01-13 15:30:31,919 - ge

  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)


 48%|████▊     | 28627/59071 [44:49<47:02, 10.79it/s]  2021-01-13 17:03:45,627 - kbc_rdf2vec.rdf2vec_kbc - ERROR - Could not process the triple: ['/m/018zsw', '/organization/organization/parent./organization/organization_relationship/parent', '/m/018zqj']
 59%|█████▉    | 34941/59071 [55:12<43:00,  9.35it/s]  2021-01-13 17:14:08,100 - kbc_rdf2vec.rdf2vec_kbc - ERROR - Could not process the triple: ['/m/01qm7', '/education/school_category/schools_of_this_kind', '/m/02gn8s']
 96%|█████████▋| 56991/59071 [1:30:00<03:12, 10.81it/s]  2021-01-13 17:48:56,585 - kbc_rdf2vec.rdf2vec_kbc - ERROR - Could not process the triple: ['/m/0cv72h', '/american_football/football_player/receiving./american_football/player_receiving_statistics/season', '/m/03gqdq7']
100%|██████████| 59071/59071 [1:33:14<00:00, 10.56it/s]
2021-01-13 17:52:10,307 - kbc_rdf2vec.rdf2vec_kbc - ERROR - Erroneous Triples: 3
2021-01-13 17:52:10,672 - kbc_rdf2vec.rdf2vec_kbc - INFO - Gensim vector file detected.
2021-01-13 17:52:10,

## Let's evaluate!
Now we just evaluate the files that we have.

In [None]:
from kbc_evaluation.evaluator import Evaluator, EvaluatorResult 
from kbc_rdf2vec.dataset import DataSet
from typing import List, Tuple
import os

# TODO delete:
work_dir = "/work/jportisc/kbc_rdf2vec/strategy_grid_2/notebook_files"
prediction_path = os.path.join(work_dir, "predictions")

def evaluate_files() -> List[Tuple[str, EvaluatorResult]]:
    
    result_map = {}
    
    # Let's make a directory for predictions
    evaluation_path = os.path.join(work_dir, "evaluation")
    if not os.path.exists(evaluation_path):
        os.makedirs(evaluation_path)
    
    # evaluation of WN 18
    
    file_to_be_written=os.path.join(evaluation_path, "wn_ann_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "wn_ann.txt"),
        data_set=DataSet.WN18,
        n=10,
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["ANN"] = [results]
    
    
    file_to_be_written=os.path.join(evaluation_path, "wn_most_similar_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "wn_most_similar.txt"),
        data_set=DataSet.WN18,
        n=10,
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H, L)"] = [results]

    
    file_to_be_written=os.path.join(evaluation_path, "wn_averaged_most_similar_result.txt")
    Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "wn_averaged_most_similar.txt"),
        data_set=DataSet.WN18,
        n=10,
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H, AVG(T-H))"] = [results]

    
    file_to_be_written=os.path.join(evaluation_path, "wn_addition_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "wn_addition.txt"),
        data_set=DataSet.WN18,
        n=10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H + L)"] = [results]
    
    
    file_to_be_written=os.path.join(evaluation_path, "wn_averaged_addition_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "wn_averaged_addition.txt"),
        data_set=DataSet.WN18,
        n = 10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H + AVG(T-H))"] = [results]
    
    
    # evaluation of fb15k
    

    file_to_be_written=os.path.join(evaluation_path, "fb15k_ann_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_ann.txt"),
        data_set=DataSet.FB15K,
        n=10,
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["ANN"].append(results)

    
    file_to_be_written=os.path.join(evaluation_path, "fb15k_most_similar_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_most_similar.txt"),
        data_set=DataSet.FB15K,
        n=10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H, L)"].append(results)

    
    file_to_be_written=os.path.join(evaluation_path, "fb15k_averaged_most_similar_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_averaged_most_similar.txt"),
        data_set=DataSet.FB15K,
        n=10,
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H, AVG(T-H))"].append(results)

    
    file_to_be_written=os.path.join(evaluation_path, "fb15k_addition_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_addition.txt"),
        data_set=DataSet.FB15K,
        n=10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H + L)"].append(results)
    

    file_to_be_written=os.path.join(evaluation_path, "fb15k_averaged_addition_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_averaged_addition.txt"),
        data_set=DataSet.FB15K,
        n=10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H + AVG(T-H))"].append(results)
    

    file_to_be_written=os.path.join(evaluation_path, "fb15k_reflexive_addition_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_reflexive_addition.txt"),
        data_set=DataSet.FB15K,
        n=10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H + L) reflexive"] = [None, results]
    

    file_to_be_written=os.path.join(evaluation_path, "fb15k_reflexive_averaged_addition_result.txt")
    results = Evaluator.calculate_results(
        file_to_be_evaluated=os.path.join(prediction_path, "fb15k_reflexive_averaged_addition.txt"),
        data_set=DataSet.FB15K,
        n=10
    )
    Evaluator.write_result_object_to_file(file_to_be_written=file_to_be_written, result_object=results)
    result_map["most_similar(H + AVG(T-H)) reflexive"] = [None, results]

    
    return result_map

    
result_map = evaluate_files()

Reading provided file...
2021-01-15 15:26:39,713 - root - INFO - Hits@10 Heads: 2069
2021-01-15 15:26:39,716 - root - INFO - Hits@10 Tails: 670
2021-01-15 15:26:39,717 - root - INFO - Hits@10 Total: 2739
Calculating Mean Rank
2021-01-15 15:26:40,451 - root - INFO - Mean Head Rank: 668.1978 (0 ignored lines)
2021-01-15 15:26:40,452 - root - INFO - Mean Tail Rank: 3208.1964 (0 ignored lines)
2021-01-15 15:26:40,453 - root - INFO - Mean rank: 1938.1971; rounded: 1938
Reading provided file...
Apply Filtering
100%|██████████| 5000/5000 [00:46<00:00, 108.40it/s]
2021-01-15 15:27:53,157 - root - INFO - Hits@10 Heads: 2069
2021-01-15 15:27:53,161 - root - INFO - Hits@10 Tails: 670
2021-01-15 15:27:53,162 - root - INFO - Hits@10 Total: 2739
Calculating Mean Rank
2021-01-15 15:27:53,844 - root - INFO - Mean Head Rank: 668.0548 (0 ignored lines)
2021-01-15 15:27:53,846 - root - INFO - Mean Tail Rank: 3208.0692 (0 ignored lines)
2021-01-15 15:27:53,846 - root - INFO - Mean rank: 1938.062; rounded:

## Let's Render our Evaluation Results
We have already individual evaluation files written to disk. Now, let's quickly render an HTML table.

In [89]:
from IPython.display import display, HTML
from typing import Dict

def transform_result_list_to_html(result_map: List[Dict[str, List[EvaluatorResult]]]) -> str:
    
    first_entry = next(iter(result_map.values()))
    
    if first_entry[0].n is not None:
        n = first_entry[0].n
    elif first_entry[1].n is not None:
        n = first_entry[1].n
    else:
        n = "?"
    
    
    result = f"""
        <table style="border: 1px solid black;">
            <tr>
                <td>&nbsp;</td>
                <td colspan="6"><center><b>WN18</b></center></td>
                <td colspan="6"><center><b>FB15k</b></center></td>
            </tr>
            <tr>
                <td>Metric</td>
                <td colspan="2"><center>Mean Rank</center></td>
                <td colspan="2"><center>HITS@{n}</center></td>
                <td colspan="2"><center>RelativeHITS@{n}</center></td>
                <td colspan="2"><center>Mean Rank</center></td>
                <td colspan="2"><center>HITS@{n}</center></td>
                <td colspan="2"><center>RelativeHITS@{n}</center></td>
            <tr>
            <tr>
                <td>Evaluation Setting</td>
                <td>Raw</td>
                <td>Filtered</td>
                <td>Raw</td>
                <td>Filtered</td>
                <td>Raw</td>
                <td>Filtered</td>
                <td>Raw</td>
                <td>Filtered</td>
                <td>Raw</td>
                <td>Filtered</td>
                <td>Raw</td>
                <td>Filtered</td>
            </tr>
        """    
    
    for setting, entry in result_map.items():
        result = result + f"""
            <tr>
                <td>{setting}</td>
            """
        if entry[0] is not None:
            result = result + f"""
                <td>{entry[0].non_filtered_mean_rank_all}</td>
                <td>{entry[0].filtered_mean_rank_all}</td>
                <td>{entry[0].non_filtered_hits_at_n_all}</td>
                <td>{entry[0].filtered_hits_at_n_all}</td>
                <td>{round(entry[0].non_filtered_hits_at_n_relative, 4)}</td>
                <td>{round(entry[0].filtered_hits_at_n_relative, 4)}</td>
            """
        else:
            result = result + f"""
                <td>&nbsp;</td>
                <td>&nbsp;</td>
                <td>&nbsp;</td>
                <td>&nbsp;</td>
            """
        if entry [1] is not None:
            result = result + f"""
                <td>{entry[1].non_filtered_mean_rank_all}</td>
                <td>{entry[1].filtered_mean_rank_all}</td>
                <td>{entry[1].non_filtered_hits_at_n_all}</td>
                <td>{entry[1].filtered_hits_at_n_all}</td>
                <td>{round(entry[1].non_filtered_hits_at_n_relative, 4)}</td>
                <td>{round(entry[1].filtered_hits_at_n_relative, 4)}</td>
            """
        else:
             result = result + f"""
                <td>&nbsp;</td>
                <td>&nbsp;</td>
                <td>&nbsp;</td>
                <td>&nbsp;</td>
            """
        
        result = result + "</tr>"
        
    
    result = result + "\n</table>"
    return result


display(HTML(transform_result_list_to_html(result_map)))

0,1,2,3,4,5,6,7,8,9,10,11,12
,WN18,WN18,WN18,WN18,WN18,WN18,FB15k,FB15k,FB15k,FB15k,FB15k,FB15k
Metric,Mean Rank,Mean Rank,HITS@10,HITS@10,RelativeHITS@10,RelativeHITS@10,Mean Rank,Mean Rank,HITS@10,HITS@10,RelativeHITS@10,RelativeHITS@10
,,,,,,,,,,,,
Evaluation Setting,Raw,Filtered,Raw,Filtered,Raw,Filtered,Raw,Filtered,Raw,Filtered,Raw,Filtered
ANN,1938,1938,2739,2739,0.2739,0.2739,346,343,43229,43614,0.3659,0.3692
most_similar(H + L),728,728,3705,3705,0.3705,0.3705,346,343,43229,43614,0.3659,0.3692


10