# Graph Neural Networks
## What are Graph Neural Networks (GNNs)?

In [24]:
#import the basics
import os
import torch
import torch_geometric
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline

In [25]:
torch_geometric.__version__

'2.6.1'

In [26]:
# Let's verify what device we are working with
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("You are using device: %s" % device)

You are using device: cuda


Graph Neural Networks are a type of "geometric deep learning" models that use pairwise message passing. They typically have an architecture consisting of 3 types of layers. From [wikipedia](https://en.wikipedia.org/wiki/Graph_neural_network):
1. Permutation equivariant: a permutation equivariant layer maps a representation of a graph into an updated representation of the same graph. In the literature, permutation equivariant layers are implemented via **pairwise message passing between graph nodes**. Intuitively, in a message passing layer, nodes update their representations by aggregating the messages received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
2. Local pooling: a local pooling layer coarsens the graph via downsampling. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in convolutional neural networks. Examples include k-nearest neighbours pooling, top-k pooling, and self-attention pooling.
3. Global pooling: a global pooling layer, also known as readout layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output. Examples include element-wise sum, mean or maximum.

## Attributes
- [T]he preprocessing step first
“squashes” the graph structured data into a vector of reals and
then deals with the preprocessed data using a list-based data
processing technique. However, important information, e.g., the
topological dependency of information on each node may be
lost during the preprocessing stage and the final result may depend, in an unpredictable manner, on the details of the preprocessing algorith [1] **GNNS preserve the structure of the graph it is based on.**
- It will be shown that the GNN
is an extension of both recursive neural networks and random
walk models and that it retains their characteristics. The model
extends recursive neural networks since it can process a more
general class of graphs including cyclic, directed, and undirected graphs, and it can deal with node-focused applications
without any preprocessing steps. The approach extends random
walk theory by the introduction of a learning algorithm and by
enlarging the class of processes that can be modeled. [1]
- Weights are shared across layer structures

### What is message passing?
From [wikipedia](https://en.wikipedia.org/wiki/Graph_neural_network#Message_passing_layers):
<br>
![img](./img/notebook/messagePassing.png)

## Computation Graph
"The neighbour of a node defines its computation graph" - @12:34 https://www.youtube.com/watch?v=JtDgmmQ60x8&ab_channel=AntonioLonga



# Data
Heterogeneous graphs are perfect for recommendation systems. Let's examine a data set from pytorch geometric to understand some basics about the data.

### Datasets:
"AmazonBook" - A subset of the AmazonBook rating dataset from the "LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation" paper.

"MovieLens" - A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens website, consisting of nodes of type "movie" and "user". User ratings for movies are available as ground truth labels for the edges between the users and the movies ("user", "rates", "movie").

In [59]:
from torch_geometric.datasets import AmazonBook, MovieLens
from torch_geometric.transforms import Compose, ToDevice, ToUndirected

transform = Compose([ToDevice(device), ToUndirected()])

# amazon_dataset = AmazonBook(root="./data/AmazonBook")
movielens_dataset = MovieLens(root="./data/MovieLens", transform=transform, model_name='all-MiniLM-L6-v2')

print(f"Dataset: {movielens_dataset}")
print(f"Number of graphs in dataset: {len(movielens_dataset)}")
print(f"Number of features of dataset: {movielens_dataset.num_features}")

Dataset: MovieLens()
Number of graphs in dataset: 1
Number of features of dataset: {'movie': 404, 'user': 0}


In [60]:
data = movielens_dataset[0]
print(f"data = Dataset[0]: {data}")

print(f"Number of node features of data: {data.num_features}")
print(f"Number of edge features of data: {data.num_edge_features}")
print(f"Number of nodes of data: {data.num_nodes}")
print(f"Number of edges of data: {data.num_edges}")
print(f"data is directed?: {data.is_directed()}")
print(f"data has isolated nodes?: {data.has_isolated_nodes()}")
print(f"data contains self loops?: {data.has_self_loops()}")
print(f"data node types: {data.node_types}")
print(f"data edge types: {data.edge_types}")
print(f"data is on device: {'CUDA' if data.is_cuda else 'CPU'}")

data = Dataset[0]: HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 100836],
    edge_label=[100836],
    time=[100836],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 100836],
    edge_label=[100836],
    time=[100836],
  }
)
Number of node features of data: {'movie': 404, 'user': 0}
Number of edge features of data: {('user', 'rates', 'movie'): 0, ('movie', 'rev_rates', 'user'): 0}
Number of nodes of data: 10352
Number of edges of data: 201672
data is directed?: False
data has isolated nodes?: True
data contains self loops?: False
data node types: ['movie', 'user']
data edge types: [('user', 'rates', 'movie'), ('movie', 'rev_rates', 'user')]
data is on device: CUDA


In [61]:
data.validate(raise_on_error=True)

True

In [30]:
# print(f"Number of training edges: {data['user','rates','book'].edge_index.shape[1]}")
# print(f"Number of testing edges: {data['user','rates','book'].edge_label_index.shape[1]}")

In [31]:
# print(f"{torch.min(data['user', 'rates', 'book'].edge_index[0])}")
# print(f"{torch.max(data['user', 'rates', 'book'].edge_index[0])}")

Because there are no features on the edges or the nodes, the connections (edges) themselves are what we are training on.

- Link Prediction on MovieLens.ipynb - https://colab.research.google.com/drive/1xpzn1Nvai1ygd_P5Yambc_oe4VBPK_ZT?usp=sharing#scrollTo=JMGYv83WzSRr
- [1] Neural Graph Collaborative Filtering - https://dl.acm.org/doi/pdf/10.1145/3331184.3331267

In [62]:
torch_geometric.seed_everything(1234)

## Naive Graph Neural Network



In [189]:
from torch_geometric.nn import MLP, MessagePassing, Linear
from torch.nn import Embedding
from torch.nn import functional as F

"""
    GNN layer consists of:
        1.
        2.
        3.
"""

class GNNLayer(MessagePassing):
    def __init__(self, hidden_channels=12800, aggr='add'):
        super(GNNLayer, self).__init__(aggr=aggr)
        self.message_function = MLP(in_channels= 2*404,
                                    hidden_channels=hidden_channels,
                                    out_channels=hidden_channels, num_layers=2) # default activation is ReLU
        self.update_function = MLP(in_channels=hidden_channels,
                                   hidden_channels=hidden_channels,
                                   out_channels=2*404, num_layers=2) # default activation is ReLU

        self.reset_parameters()

    def reset_parameters(self):
        self.message_function.reset_parameters()
        self.update_function.reset_parameters()

    def forward(self, edge_index, x):
        return self.propagate(edge_index=edge_index, x=x)

    def message(self,x_j, x_i):
        # (i,j)         -> source to target
        # x_j           [N, num_of_features]
        return self.message_function(torch.concat([x_i, x_j], dim=1))

    def update(self, aggr_out):
        return self.update_function(aggr_out)

class GNN(torch.nn.Module):
    def __init__(self, hidden_channels=1280, aggr='add'):
        super(GNN, self).__init__()
        self.gnn_layer_1 = GNNLayer(hidden_channels=hidden_channels, aggr=aggr)

    def forward(self, edge_index, x):
        out = self.gnn_layer_1(edge_index, x)

        edge_feat_user = out[edge_index[0]]
        edge_feat_movie = out[edge_index[1]]

        return (edge_feat_user * edge_feat_movie).sum(dim=1)


In [190]:
from torch_geometric.transforms import RandomLinkSplit

# lets split the data into training, validation, and test sets
transform = RandomLinkSplit(
    edge_types=('user', 'rates', 'movie'),
    rev_edge_types=('movie', 'rev_rates', 'user')
)

train_data, val_data, test_data = transform(data)

In [132]:
train_data

HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 70586],
    time=[70586],
    edge_label=[141172],
    edge_label_index=[2, 141172],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 70586],
    edge_label=[70586],
    time=[70586],
  }
)

In [65]:
val_data

HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 70586],
    time=[70586],
    edge_label=[20166],
    edge_label_index=[2, 20166],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 70586],
    edge_label=[70586],
    time=[70586],
  }
)

In [66]:
test_data

HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 80669],
    time=[80669],
    edge_label=[40334],
    edge_label_index=[2, 40334],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 80669],
    edge_label=[80669],
    time=[80669],
  }
)

In [191]:
from torch_geometric.loader import LinkNeighborLoader, NeighborLoader

# Define a mini-batch loader
num_samples = [20, 10]
num_hops = 2
batch_size = 128

loader = LinkNeighborLoader(
    train_data,
    num_neighbors= num_samples,
    batch_size=batch_size,
    edge_label_index=(train_data.edge_types[0], train_data[train_data.edge_types[0]].edge_label_index),
    edge_label=train_data[train_data.edge_types[0]]['edge_label'],
    neg_sampling_ratio=2,
    shuffle=True
)

# loader = NeighborLoader(
#     train_data,
#     num_neighbors=num_samples * num_hops,
#     batch_size=batch_size
# )

In [74]:
sampled_data = next(iter(loader))
print(sampled_data)

HeteroData(
  movie={
    x=[2796, 404],
    n_id=[2796],
    num_sampled_nodes=[3],
  },
  user={
    num_nodes=607,
    n_id=[607],
    num_sampled_nodes=[3],
  },
  (user, rates, movie)={
    edge_index=[2, 17910],
    time=[17910],
    edge_label=[384],
    edge_label_index=[2, 384],
    e_id=[17910],
    num_sampled_edges=[2],
    input_id=[128],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 7879],
    edge_label=[7879],
    time=[7879],
    e_id=[7879],
    num_sampled_edges=[2],
  }
)


In [75]:
len(loader)

1103

In [186]:
print(len(sampled_data["user", "rates", "movie"]['edge_label']))
sampled_data["user", "rates", "movie"]['edge_label']

384


tensor([1, 1, 1, 5, 1, 4, 6, 6, 1, 1, 3, 5, 7, 1, 2, 5, 5, 1, 1, 4, 6, 1, 6, 1,
        5, 1, 1, 1, 1, 6, 1, 1, 1, 1, 1, 6, 6, 3, 1, 5, 1, 6, 5, 5, 1, 1, 1, 1,
        1, 1, 6, 1, 1, 1, 1, 1, 5, 6, 5, 1, 4, 6, 1, 1, 5, 4, 1, 1, 4, 1, 1, 1,
        1, 6, 4, 6, 5, 5, 4, 6, 1, 1, 6, 1, 1, 1, 1, 1, 6, 1, 1, 7, 5, 1, 1, 6,
        7, 1, 6, 6, 1, 1, 3, 6, 1, 6, 1, 1, 4, 1, 1, 6, 1, 3, 5, 6, 6, 1, 1, 6,
        1, 1, 5, 7, 4, 6, 1, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

In [185]:
sampled_data["user", "rates", "movie"]['edge_label_index']

tensor([[ 98,  60, 145, 127, 147, 201, 140,  77, 275, 197, 201, 141,  17,  34,
          70, 134, 134, 278, 220,  84, 111, 202,  72,  88, 248,  31, 268,  69,
         147, 125, 101, 254, 226, 224, 179,  48,  37, 182, 222,  70,  71, 158,
         242,  52, 212, 259, 206, 154, 214,  67, 141, 199, 227, 186, 237, 133,
         259, 279, 225, 184,  95, 115,   6, 252, 115,  53, 267,  24, 182,  30,
           4, 119,  29, 272, 260,  57, 213, 182, 259, 259,  11, 211,  48,  50,
         107,  47, 148, 130, 103,  35, 122,   1, 169, 280, 233, 137, 183,  76,
         164,  85,  25, 213, 200, 152, 276, 115,  64, 269, 262, 224,  91, 281,
         138, 229, 200,  96,  29, 265,   0,  39, 204, 173,  32,   1, 217,  84,
         172, 241, 235,   3,  99, 210, 194, 244, 223, 142,  46,  54, 135, 178,
         273,   8, 255, 122, 227, 248,  79, 258,   5,  42,  84, 216, 166,  62,
         271, 111,  96, 180, 167,   9, 195,  82, 149, 102,  59,  31, 123, 187,
         259, 208, 189, 171, 262,  19,  63, 109, 274

In [192]:
num_of_epochs = 10
model = GNN()
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

print(model)

GNN(
  (gnn_layer_1): GNNLayer()
)


In [195]:
def train_model(model, batch):
    optimizer.zero_grad()

    batch.to(device)

    target = batch['user', 'rates', 'movie'].edge_label.to(device).to(torch.float64)

    pred = model(batch['user','rates', 'movie'].edge_label_index, batch["movie"].x)

    loss = F.mse(pred, target)
    loss.backward()
    optimizer.step()

    return pred, loss

In [196]:
train_model(model, sampled_data)

(tensor([ 1.1088e+14,  2.2519e+14,  3.9070e+12,  3.1812e+14,  1.5204e+13,
          4.4266e+14,  5.6005e+13,  2.4818e+14,  2.4148e+12,  1.0819e+14,
          5.1214e+14,  1.0722e+14,  2.6487e+14,  1.6070e+14,  4.2657e+14,
          4.4054e+13,  1.5941e+14,  2.0803e+14,  3.0544e+13,  5.7222e+13,
          4.5635e+14,  1.5069e+14,  1.7184e+14,  2.7676e+14,  5.0512e+14,
          8.9223e+14,  5.9364e+13,  1.2541e+13,  8.6922e+12,  4.2534e+14,
          1.1235e+14,  6.5642e+13,  4.3145e+13,  3.6124e+13,  2.0093e+13,
          8.4702e+13,  8.6053e+14,  8.6622e+13,  4.3207e+13,  9.3353e+12,
          6.8015e+13,  1.2616e+15,  9.5460e+13,  4.4821e+14,  6.0252e+13,
          5.0503e+13,  1.8754e+13,  7.0613e+13,  9.7298e+12,  6.8861e+13,
          1.8519e+14,  3.6206e+14,  1.7288e+13, -2.4414e+09,  3.4143e+13,
          1.1475e+14,  1.0412e+14,  5.6317e+12,  1.8534e+14,  7.6671e+13,
          1.2252e+13,  6.6338e+12,  1.0331e+15,  1.0099e+14,  1.0050e+13,
          7.6862e+14,  1.0103e+13,  2.

In [194]:
import tqdm
from torch.nn import functional as F

model.train()
for epoch in range(num_of_epochs):
    total_loss = total_examples = 0.0
    for batch in tqdm.tqdm(loader):
        pred, loss = train_model(model, batch)
        total_loss += float(loss) * pred.numel()
        total_examples += pred.numel()
    print(f"Epoch: {epoch:03d}, loss: {total_loss/total_examples:.4f}")


100%|██████████| 1103/1103 [00:12<00:00, 87.08it/s]


Epoch: 000, loss: -213068739744.4834


100%|██████████| 1103/1103 [00:12<00:00, 88.35it/s]


Epoch: 001, loss: -2642091958241.2642


100%|██████████| 1103/1103 [00:12<00:00, 86.06it/s]


Epoch: 002, loss: -11126041355296.2852


100%|██████████| 1103/1103 [00:12<00:00, 88.27it/s]


Epoch: 003, loss: -29832610421403.9219


100%|██████████| 1103/1103 [00:12<00:00, 88.30it/s]


Epoch: 004, loss: -62593995361581.1172


100%|██████████| 1103/1103 [00:12<00:00, 88.91it/s]


Epoch: 005, loss: -114272206584038.6094


 34%|███▍      | 377/1103 [00:04<00:08, 87.55it/s]


KeyboardInterrupt: 