## **Introducing Dataset**
The dataset used is the Facebook Page-Page dataset, which was created using the Facebook Graph API in Nov 2017. In the dataset, each of the 22470 nodes represent an official Facebook page. Pages are connected when there are mutual likes between them. Node features (128-dim vectors) are created from textua descriptions by the owners of these pages. The goal is to classify each node into one of four categories: politicians, companies, television shows and governmental organisations.

In [9]:
!pip install -q torch-scatter~=2.1.0 torch-sparse~=0.6.16 torch-cluster~=1.6.0 torch-spline-conv~=1.2.1 torch-geometric==2.2.0 -f https://data.pyg.org/whl/torch-{torch.__version__}.html

In [25]:
import torch
from torch_geometric.utils import to_dense_adj
from torch_geometric.datasets import FacebookPagePage
dataset = FacebookPagePage(root = ".")
data = dataset[0]

print(f'Dataset: {dataset}')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of nodes: {data.x.shape[0]}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

Dataset: FacebookPagePage()
Number of graphs: 1
Number of nodes: 22470
Number of features: 128
Number of classes: 4


In [26]:
# Create training, validation and testing masks
data.train_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
data.train_mask[:18000] = True

data.val_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
data.val_mask[18000:20000] = True

data.test_mask = torch.zeros(data.num_nodes, dtype=torch.bool)
data.test_mask[20000:22470] = True

## **Classifying Nodes with Vanilla Graph Neural Networks**
- A basic neural network layer corresponds to a linear transformation $h_A = x_AW^T$ where $x_A$ is the input vector of node $A$ and $W$ is the weight matrix.

- Suppose $N_A$ is the set of neighbours of node $A$, our graph linear layer can be written as
$$h_A = \sum_{i \in N_A} x_iW^T$$
- Define an adjacency matrix $A$ that contains connections between every node in the graph, multiplying the input matrix by the adjacency matrix will directly sum up the neighbouring node features.
- To ensure that the central node is considered in the operation, we can add self loops by $\tilde{A} = A + I$.

$$H = \tilde{A}^TXW^T$$

In [13]:
adjacency = to_dense_adj(data.edge_index)[0]
adjacency += torch.eye(len(adjacency)) # Self-loops
adjacency

tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])

In [22]:
from torch.nn import Linear
import torch.nn.functional as F

class VanillaGNNLayer(torch.nn.Module):
  def __init__(self, dim_in, dim_out):
    super().__init__()
    self.linear = Linear(dim_in, dim_out, bias = False)

  def forward(self, x, adjacency):
    x = self.linear(x)
    x = torch.sparse.mm(adjacency, x)
    return x

In [21]:
def accuracy(y_pred, y_true):
  return torch.sum(y_pred == y_true) / len(y_true)

In [28]:
class VanillaGNN(torch.nn.Module):
  def __init__(self, dim_in, dim_h, dim_out):
    super().__init__()
    self.gnn1 = VanillaGNNLayer(dim_in, dim_h)
    self.gnn2 = VanillaGNNLayer(dim_h, dim_out)

  def forward(self, x, adjacency):
    h = self.gnn1(x, adjacency)
    h = torch.relu(h)
    h = self.gnn2(h, adjacency)
    return F.log_softmax(h, dim = 1)

  def fit(self, data, epochs):
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(self.parameters(), lr = 0.01, weight_decay = 5e-4)

    self.train()
    for epoch in range(epochs + 1):
      optimizer.zero_grad()
      out = self(data.x, adjacency)
      loss = criterion(out[data.train_mask], data.y[data.train_mask])
      acc = accuracy(out[data.train_mask].argmax(dim = 1), data.y[data.train_mask])
      loss.backward()
      optimizer.step()

      if (epoch % 20 == 0):
        val_loss = criterion(out[data.val_mask], data.y[data.val_mask])
        val_acc = accuracy(out[data.val_mask].argmax(dim=1), data.y[data.val_mask])
        print(f'Epoch {epoch:>3} | Train Loss: {loss:.3f} | Train Acc:'
              f' {acc*100:>5.2f}% | Val Loss: {val_loss:.2f} | '
              f'Val Acc: {val_acc*100:.2f}%')

  @torch.no_grad()
  def test(self, data):
     self.eval()
     out = self(data.x, adjacency)
     acc = accuracy(out.argmax(dim = 1)[data.test_mask], data.y[data.test_mask])
     return acc

In [29]:
gnn = VanillaGNN(dataset.num_features, 16, dataset.num_classes)
print(gnn)

# Training
gnn.fit(data, epochs = 100)

# Testing
acc = gnn.test(data)
print(f'\nGNN test accuracy: {acc*100:.2f}%')

VanillaGNN(
  (gnn1): VanillaGNNLayer(
    (linear): Linear(in_features=128, out_features=16, bias=False)
  )
  (gnn2): VanillaGNNLayer(
    (linear): Linear(in_features=16, out_features=4, bias=False)
  )
)
Epoch   0 | Train Loss: 159.498 | Train Acc: 30.09% | Val Loss: 164.61 | Val Acc: 30.00%
Epoch  20 | Train Loss: 3.251 | Train Acc: 77.18% | Val Loss: 2.41 | Val Acc: 78.25%
Epoch  40 | Train Loss: 1.759 | Train Acc: 80.95% | Val Loss: 1.38 | Val Acc: 82.45%
Epoch  60 | Train Loss: 1.122 | Train Acc: 81.43% | Val Loss: 0.92 | Val Acc: 83.45%
Epoch  80 | Train Loss: 0.814 | Train Acc: 81.98% | Val Loss: 0.72 | Val Acc: 84.40%
Epoch 100 | Train Loss: 0.672 | Train Acc: 83.22% | Val Loss: 0.62 | Val Acc: 85.35%

GNN test accuracy: 83.48%
