# Evaluating frequency-based baselines for link prediction

Some knowledege graphs come with particularly frequent instances (either relations, or entities), that a model can use to learn spurious correlations that lead to high ranking metrics, due to the calculation of micro-averages.
A sanity check thus consists of running a baseline that simply uses counts, which can be compared with models that are supposed to generalize much better.

In [5]:
import os.path as osp

from pykeen.models.baseline import MarginalDistributionBaseline
from pykeen.triples import TriplesFactory
from pykeen.evaluation import RankBasedEvaluator, evaluate
import torch

## Data loading

In [6]:
graph_path = osp.join('..', 'data', 'biokgb', 'graph')
train_triples = 'biokg.links-train.csv'
valid_triples = 'biokg.links-valid.csv'
test_triples = 'biokg.links-test.csv'

train, valid, test = [TriplesFactory.from_path(osp.join(graph_path, f)) for f in (train_triples, valid_triples, test_triples)]

## Instantiating a frequency-based baseline

PyKEEN comes with a set of interesting baselines that, ideally, any machine learning model should outperform. Here we will use the [`MarginalDistributionBaseline`](https://pykeen.readthedocs.io/en/stable/api/pykeen.models.MarginalDistributionBaseline.html).

When predicting the tail for a triple (h, r, t), the model scores each possible tail t as the probability that t co-occurs with r times the probability that t co-occurs with h:

$$
P(t\vert h, r) = P(t\vert r) P(t\vert h)
$$

In [7]:
model = MarginalDistributionBaseline(train)
# An ugly hack to add a dummy parameter to this non-parametric baseline
# so that evaluation works as for models with learnable parameters
model.foo = torch.nn.Embedding(1, 2)

## Evaluation

We now get the ranking metrics on the test set, using triples in the training, validation, and test sets for filtering.

**Warning:** the next cell can take around half an hour to run.

In [10]:
evaluator = RankBasedEvaluator()
results = evaluate(model, test.mapped_triples, evaluator, batch_size=1024, mode=None, device=torch.device('cpu'),
                   additional_filter_triples=[train.mapped_triples, valid.mapped_triples, test.mapped_triples])

Evaluating on cpu:   0%|          | 0.00/185k [00:00<?, ?triple/s]

In [17]:
metrics = ['both.inverse_harmonic_mean_rank',
           'both.hits_at_1',
           'both.hits_at_3',
           'both.hits_at_10']

for m in metrics:
    print(f'{m:<40}{results.get_metric(m) * 100:.2f}')

both.inverse_harmonic_mean_rank         0.07
both.hits_at_1                          0.07
both.hits_at_3                          0.07
both.hits_at_10                         0.07
