### Relational Graph Convolutional Networks

Here are some experimentations on R-GCNs, based on the paper *Modeling Relational Data with Graph Convolutional Networks*  [[Schlichtkrull *et al.*]](https://arxiv.org/pdf/1703.06103.pdf).

Load the [FB15K-237](https://www.microsoft.com/en-us/download/details.aspx?id=52312) dataset, from [[Toutanova et al. EMNLP 2015]](http://dx.doi.org/10.18653/v1/D15-1174). You can pass `download_if_absent=True` for automatic downloading.

In [1]:
import numpy as np
import torch.nn as nn
import torch


import fb15k

train, val, test = fb15k.load("train", "valid", "test")
print(f"{len(train)} training triples found.")

272115 training triples found.


Build the adjacency tensor from the training data. Let $R = |\mathcal{R}|$ and $E = |\mathcal{E}|$ the number of relations and entities in the knowledge graph. The adjacency tensor is a list of tensors $\mathcal{T} = (T^{(1)}, \ldots, T^{(R)}$, with $T^{(i)} \in \mathbb{R}^{E \times E}$ defined by:
$$
T^{(i)} = (D^{(i)})^{-1} \times A^{(i)}
$$
With $A$ the canonical adjacency tensor:
\begin{equation}
    A_{kj}^{(i)} =
    \begin{cases}
      1, & \text{if}\ (e_k, r_i, e_j) \text{ is in the graph} \\
      0, & \text{otherwise}
    \end{cases}
  \end{equation}
 And $D^{(i)}$ a diagonal matrix containing the in-degree of each entity in the graph, as described in [[Kipf & Welling 2016]](https://arxiv.org/abs/1609.02907). 

In [2]:
from utils import graph

T, e2c, e2i, r2i = graph.build_adjacency_tensor(train)
n_relations = len(T)
n_entities = T[0].shape[0]

print(f"{n_entities} entities and {n_relations} relations found.")

14505 entities and 237 relations found.


In [4]:
# Supervised setting: each entity has a class. Here we build the ground truth, that is the expected output tensor
# Give a unique identifier to each class
classes = {c: i for i, c in enumerate(set(e2c))}
n_classes = len(classes)
y_true = [classes[c] for c in e2c]
y_true = torch.LongTensor(y_true)
print(f"{n_classes} distinct classes found.")

29 distinct classes found.


Now let's build the R-GCN network. The equation for each convolution layer is:
$$H_i^{(l+1)} = \sigma\left( \sum_{r \in \mathcal{R}} \sum_{j \in \mathcal{N}_i^r} \frac{1}{c_{i,r}} W_r^{(l)}H_j^{(l)} + W_0^{(l)} H_i^{(l)} \right)$$

with $H_i^{(l)} \in \mathbb{R}^{d^{(l)}}$ the hidden state of node $e_i$.

We use *basis decomposition* rather than *block-diagonal decomposition*. Each weight matrix is a linear combination of $B$ basis functions:
$$
W_r^{(l)} = \sum_{b=1}^B A_{rb}^{(l)} V_b^{(l)}
$$

and $V_b^{(l)} \in \mathbb{R}^{d^{(l+1)} \times d^{(l)}}$ 

In [6]:
from rgcn import RGCN

rgcn = RGCN(T,
            n_classes=n_classes,
            hidden_sizes=[64, 32, 16],
            n_basis=20
            )

print(rgcn)

RGCN(
  (convolutions): ModuleList(
    (0): RGCNLayer()
    (1): RGCNLayer()
    (2): RGCNLayer()
    (3): RGCNLayer()
  )
  (softmax): Softmax(dim=0)
)


To train the network, we feed it the featureless representation of the nodes, which is simply the identify matrix of $\mathbb{E \times E}$. We use an Adam optimizer with cross-entropy loss.

⚠️ *The use of `torch.sparse.LongTensor` yields the following error: `RuntimeError: sparse_.is_sparse() INTERNAL ASSERT FAILED`, both on CPU and GPU.*

In [None]:
from utils import tensor

optim = torch.optim.Adam(rgcn.parameters(recurse=True), lr=0.001)
cross_entropy = torch.nn.CrossEntropyLoss()
# We're in the featureless setting, so each entity is one-hot encoded, hence
# the input data is simply the identity matrix of dim N_entities x N_entities
I = tensor.sparse_eye(n_entities)

#
# Training
#
EPOCHS = 10
for i in range(EPOCHS):
    print(f"Step {i+1}/{EPOCHS}")
    optim.zero_grad()
    y_pred = rgcn(I)
    loss = cross_entropy(y_pred, y_true)
    loss.backward()
    optim.step()