# Graph Neural Networks
## What are Graph Neural Networks (GNNs)?

In [1]:
#import the basics
import os
import torch
import torch_geometric
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from networkx.classes import number_of_nodes
%matplotlib inline

In [2]:
torch_geometric.__version__

'2.6.1'

In [3]:
# Let's verify what device we are working with
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("You are using device: %s" % device)

You are using device: cuda


Graph Neural Networks are a type of "geometric deep learning" models that use pairwise message passing. They typically have an architecture consisting of 3 types of layers. From [wikipedia](https://en.wikipedia.org/wiki/Graph_neural_network):
1. Permutation equivariant: a permutation equivariant layer maps a representation of a graph into an updated representation of the same graph. In the literature, permutation equivariant layers are implemented via **pairwise message passing between graph nodes**. Intuitively, in a message passing layer, nodes update their representations by aggregating the messages received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
2. Local pooling: a local pooling layer coarsens the graph via downsampling. Local pooling is used to increase the receptive field of a GNN, in a similar fashion to pooling layers in convolutional neural networks. Examples include k-nearest neighbours pooling, top-k pooling, and self-attention pooling.
3. Global pooling: a global pooling layer, also known as readout layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output. Examples include element-wise sum, mean or maximum.

## Attributes
- [T]he preprocessing step first
“squashes” the graph structured data into a vector of reals and
then deals with the preprocessed data using a list-based data
processing technique. However, important information, e.g., the
topological dependency of information on each node may be
lost during the preprocessing stage and the final result may depend, in an unpredictable manner, on the details of the preprocessing algorith [1] **GNNS preserve the structure of the graph it is based on.**
- It will be shown that the GNN
is an extension of both recursive neural networks and random
walk models and that it retains their characteristics. The model
extends recursive neural networks since it can process a more
general class of graphs including cyclic, directed, and undirected graphs, and it can deal with node-focused applications
without any preprocessing steps. The approach extends random
walk theory by the introduction of a learning algorithm and by
enlarging the class of processes that can be modeled. [1]
- Weights are shared across layer structures

### What is message passing?
From [wikipedia](https://en.wikipedia.org/wiki/Graph_neural_network#Message_passing_layers):
<br>
![img](./img/notebook/messagePassing.png)

## Computation Graph
"The neighbour of a node defines its computation graph" - @12:34 https://www.youtube.com/watch?v=JtDgmmQ60x8&ab_channel=AntonioLonga



# Data
Heterogeneous graphs are perfect for recommendation systems. Let's examine a data set from pytorch geometric to understand some basics about the data.

### Datasets:
"AmazonBook" - A subset of the AmazonBook rating dataset from the "LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation" paper.

"MovieLens" - A heterogeneous rating dataset, assembled by GroupLens Research from the MovieLens website, consisting of nodes of type "movie" and "user". User ratings for movies are available as ground truth labels for the edges between the users and the movies ("user", "rates", "movie").

In [65]:
from torch_geometric.datasets import AmazonBook, MovieLens
from torch_geometric.transforms import Compose, ToDevice, ToUndirected

transform = Compose([ToUndirected(), ToDevice(device)])

# amazon_dataset = AmazonBook(root="./data/AmazonBook")
movielens_dataset = MovieLens(root="./data/MovieLens", transform=transform, model_name='all-MiniLM-L6-v2')

print(f"Dataset: {movielens_dataset}")
print(f"Number of graphs in dataset: {len(movielens_dataset)}")
print(f"Number of features of dataset: {movielens_dataset.num_features}")

Dataset: MovieLens()
Number of graphs in dataset: 1
Number of features of dataset: {'movie': 404, 'user': 0}


In [66]:
data = movielens_dataset[0]
print(f"data = Dataset[0]: {data}")

print(f"Number of node features of data: {data.num_features}")
print(f"Number of edge features of data: {data.num_edge_features}")
print(f"Number of nodes of data: {data.num_nodes}")
print(f"Number of user nodes of data: {data['user'].num_nodes}")
print(f"Number of movie nodes of data: {data['movie'].num_nodes}")
print(f"Number of edges of data: {data.num_edges}")
print(f"data is directed?: {data.is_directed()}")
print(f"data has isolated nodes?: {data.has_isolated_nodes()}")
print(f"data contains self loops?: {data.has_self_loops()}")
print(f"data node types: {data.node_types}")
print(f"data edge types: {data.edge_types}")
print(f"'user', 'rates', 'movie' first 5 labels: {data['user', 'rates', 'movie'].edge_label[0:5]}")
print(f"data is on device: {'CUDA' if data.is_cuda else 'CPU'}")

data = Dataset[0]: HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 100836],
    edge_label=[100836],
    time=[100836],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 100836],
    edge_label=[100836],
    time=[100836],
  }
)
Number of node features of data: {'movie': 404, 'user': 0}
Number of edge features of data: {('user', 'rates', 'movie'): 0, ('movie', 'rev_rates', 'user'): 0}
Number of nodes of data: 10352
Number of user nodes of data: 610
Number of movie nodes of data: 9742
Number of edges of data: 201672
data is directed?: False
data has isolated nodes?: True
data contains self loops?: False
data node types: ['movie', 'user']
data edge types: [('user', 'rates', 'movie'), ('movie', 'rev_rates', 'user')]
'user', 'rates', 'movie' first 5 labels: tensor([4, 4, 4, 5, 5], device='cuda:0')
data is on device: CUDA


In [67]:
data.validate(raise_on_error=True)

True

In [7]:
# print(f"Number of training edges: {data['user','rates','book'].edge_index.shape[1]}")
# print(f"Number of testing edges: {data['user','rates','book'].edge_label_index.shape[1]}")

In [8]:
# print(f"{torch.min(data['user', 'rates', 'book'].edge_index[0])}")
# print(f"{torch.max(data['user', 'rates', 'book'].edge_index[0])}")

Because there are no features on the edges or the nodes, the connections (edges) themselves are what we are training on.

- Link Prediction on MovieLens.ipynb - https://colab.research.google.com/drive/1xpzn1Nvai1ygd_P5Yambc_oe4VBPK_ZT?usp=sharing#scrollTo=JMGYv83WzSRr
- [1] Neural Graph Collaborative Filtering - https://dl.acm.org/doi/pdf/10.1145/3331184.3331267

In [70]:
torch_geometric.seed_everything(1234)

## Naive Graph Neural Network



In [71]:
from torch_geometric.transforms import RandomLinkSplit, ToDevice, ToUndirected, Compose

# lets split the data into training, validation, and test sets
transform = Compose([RandomLinkSplit(
    is_undirected=True,
    edge_types=('user', 'rates', 'movie'),
    rev_edge_types=('movie', 'rev_rates', 'user')
), ToDevice(device)])

train_data, val_data, test_data = transform(data)

In [72]:
train_data

HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 70586],
    time=[70586],
    edge_label=[141172],
    edge_label_index=[2, 141172],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 70586],
    edge_label=[70586],
    time=[70586],
  }
)

In [73]:
val_data

HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 70586],
    time=[70586],
    edge_label=[20166],
    edge_label_index=[2, 20166],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 70586],
    edge_label=[70586],
    time=[70586],
  }
)

In [74]:
test_data

HeteroData(
  movie={ x=[9742, 404] },
  user={ num_nodes=610 },
  (user, rates, movie)={
    edge_index=[2, 80669],
    time=[80669],
    edge_label=[40334],
    edge_label_index=[2, 40334],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 80669],
    edge_label=[80669],
    time=[80669],
  }
)

In [75]:
print(f"train data # of nodes: {train_data.num_nodes}")
print(f"train data # of edges: {train_data.num_edges}")
print(f"train data # of user nodes: {train_data['user'].num_nodes}")
print(f"train data # of movie nodes: {train_data['movie'].num_nodes}")
print(f"train data # of 'user', 'rates', 'movies' edges: {train_data['user','rates','movie'].num_edges}")
print(f"train data # of 'movie', 'rev_rates', 'user' edges: {train_data['movie','rev_rates','user'].num_edges}")
print(f"train data # of 'user', 'rates', 'movies' edge_label size: {train_data['user','rates','movie'].edge_label.size()}")
print(f"train data # of 'user', 'rates', 'movies' edge_label_index size: {train_data['user','rates','movie'].edge_label_index.size()}")
print(f"train data # of 'user', 'rates', 'movies' edge_label first 5: {train_data['user','rates','movie'].edge_label[0:5]}")
print(f"train data # of 'user', 'rates', 'movies' edge_label_index first 5: {train_data['user','rates','movie'].edge_label_index[0,0:5]}")
print(f"train data # of 'user', 'rates', 'movies' edge_label_index first 5: {train_data['user','rates','movie'].edge_label_index[1,0:5]}")
print(f"train data # of 'user', 'rates', 'movies' edge_label min: {train_data['user','rates','movie'].edge_label.min()}")
print(f"train data # of 'user', 'rates', 'movies' edge_label max: {train_data['user','rates','movie'].edge_label.max()}")
print(f"train data size of edge_index first 5: {train_data['user','rates','movie'].edge_index[0,0:5]}")
print(f"train data size of edge_index first 5: {train_data['user','rates','movie'].edge_index[1,0:5]}")
print(f"train data size of edge_index: {train_data['user','rates','movie'].edge_index.size()}")
print(f"train data size of edge_label_index: {train_data['user','rates','movie'].edge_label_index.size()}")
70586*2

train data # of nodes: 10352
train data # of edges: 141172
train data # of user nodes: 610
train data # of movie nodes: 9742
train data # of 'user', 'rates', 'movies' edges: 70586
train data # of 'movie', 'rev_rates', 'user' edges: 70586
train data # of 'user', 'rates', 'movies' edge_label size: torch.Size([141172])
train data # of 'user', 'rates', 'movies' edge_label_index size: torch.Size([2, 141172])
train data # of 'user', 'rates', 'movies' edge_label first 5: tensor([4, 6, 4, 6, 4], device='cuda:0')
train data # of 'user', 'rates', 'movies' edge_label_index first 5: tensor([425,  19, 560, 606, 437], device='cuda:0')
train data # of 'user', 'rates', 'movies' edge_label_index first 5: tensor([1777, 1593, 6770,  701, 5819], device='cuda:0')
train data # of 'user', 'rates', 'movies' edge_label min: 0
train data # of 'user', 'rates', 'movies' edge_label max: 6
train data size of edge_index first 5: tensor([425,  19, 560, 606, 437], device='cuda:0')
train data size of edge_index first 5

141172

In [76]:
train_data['user', 'rates', 'movie'].edge_label[train_data['user', 'rates', 'movie'].edge_label==0].size()

torch.Size([70586])

In [82]:
from torch_geometric.loader import LinkNeighborLoader, NeighborLoader

# Define a mini-batch loader
num_hops = 1
num_samples = {node_type: ([30] * num_hops) for node_type in train_data.edge_types}
batch_size = 500

print(num_samples)

edge_type = ('user', 'rates', 'movie')
edge_label_index = train_data['user', 'rates', 'movie'].edge_label_index # edge indices for which neighbors are sampled to create mini-batch
edge_label= train_data['user', 'rates', 'movie'].edge_label

loader = LinkNeighborLoader(
    train_data,
    num_neighbors= num_samples,
    edge_label_index=(edge_type, edge_label_index),
    edge_label=edge_label,
    batch_size=batch_size
)

# loader = NeighborLoader(
#     train_data,
#     num_neighbors=num_samples * num_hops,
#     batch_size=batch_size
# )

{('user', 'rates', 'movie'): [30], ('movie', 'rev_rates', 'user'): [30]}


In [83]:
sampled_data = next(iter(loader))
print(sampled_data)

HeteroData(
  movie={
    x=[2706, 404],
    n_id=[2706],
    num_sampled_nodes=[2],
  },
  user={
    num_nodes=601,
    n_id=[601],
    num_sampled_nodes=[2],
  },
  (user, rates, movie)={
    edge_index=[2, 8872],
    time=[8872],
    edge_label=[500],
    edge_label_index=[2, 500],
    e_id=[8872],
    num_sampled_edges=[1],
    input_id=[500],
  },
  (movie, rev_rates, user)={
    edge_index=[2, 6914],
    edge_label=[6914],
    time=[6914],
    e_id=[6914],
    num_sampled_edges=[1],
  }
)


In [84]:
len(loader)

283

In [92]:
print(sampled_data['user']['n_id'][0:5])
print(sampled_data['movie']['n_id'][0:5])
print(sampled_data['user','rates','movie']['e_id'][0:5])
print(sampled_data['user','rates','movie']['edge_index'][:,0:5])

tensor([ 5,  6,  7,  9, 10])
tensor([ 0,  4,  6, 10, 31])
tensor([12008, 42975,  3260, 38015, 51713])
tensor([[237,  61, 238,  86, 239],
        [  0,   0,   0,   0,   0]], device='cuda:0')


In [10]:
from torch_geometric.nn import MLP, MessagePassing, Linear
from torch.nn import Parameter
from torch.nn import Embedding
from torch.nn import functional as F

class EmbeddingPropLayer(torch.nn.Module):
    def __init__(self, emb_size, hidden_channels=128):
        super(EmbeddingPropLayer, self).__init__()

        self.W1 = Linear(emb_size, hidden_channels, bias=False)
        self.W2 = Linear(emb_size, hidden_channels, bias=False)

    def reset_parameters(self):
        self.W1.reset_parameters()
        self.W2.reset_parameters()

    def forward(self, e_i, e_u):
        m_uu, m_ui = self.message_aggregation(e_i, e_u)
        return F.leaky_relu(m_uu + m_ui.sum(dim=1))

    def message_construction(self, e_i, e_u, p_ui):
        return p_ui * (self.W1(e_i) + self.W2(torch.mul(e_i, e_u)))

    def message_aggregation(self,e_i, e_u):
        p_ui = 1
        m_ui = self.message_construction(e_i, e_u, p_ui)
        m_uu = self.W1(e_u)
        return m_uu, m_ui


class EmbeddingLayer(torch.nn.Module):
    """
        user u and item i with an embedding vector e_u that's an element of d real numbers
        and e_i that's an element of a d real numbers
    """
    def __init__(self, num_user_nodes, num_movie_nodes, hidden_channels=128):
        super(EmbeddingLayer, self).__init__()
        self.user_embedding(num_user_nodes, hidden_channels)
        self.movie_embedding(num_movie_nodes, hidden_channels)

    def forward(self, user_nodes, movie_nodes):
        e_u = self.user_embedding(user_nodes)
        e_i = self.movie_embedding(movie_nodes)

        return e_u, e_i

class NGCF(torch.nn.Module):
    def __init__(self, num_user_nodes, num_movie_nodes, hidden_channels=128, aggr='mean'):
        super(NGCF, self).__init__()

        self.embeddings = EmbeddingLayer(num_user_nodes, num_movie_nodes, hidden_channels)
        self.embedding_prop_1 = EmbeddingPropLayer(hidden_channels, hidden_channels)
        self.embedding_prop_2 = EmbeddingPropLayer(hidden_channels, hidden_channels)
        self.embedding_prop_3 = EmbeddingPropLayer(hidden_channels, hidden_channels)



    def forward(self, ):
        e_u, e_i = self.embeddings()
        e_u1, e_i1 = self.embedding_prop_1(e_u, e_i)
        e_u2, e_i2 = self.embedding_prop_1(e_u1, e_i1)
        e_u3, e_i3 = self.embedding_prop_1(e_u2, e_i2)

        torch.stack
        return


In [54]:
#print(len(sampled_data["user", "rates", "movie"]['edge_label']))
print(sampled_data["user", "rates", "movie"].edge_label_index[:,0:5])
#print(sampled_data["user", "rates", "movie"].edge_label[0:5])

tensor([[ 54,  28, 219, 113, 108],
        [204, 298,  59, 325, 266]])


In [241]:
torch.concat([torch.tensor([[1,2],[3,4]]),torch.tensor([[5,6],[7,8]])], dim=1)

tensor([[1, 2, 5, 6],
        [3, 4, 7, 8]])

In [55]:
num_of_epochs = 10
model = GNN()
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

print(model)

NameError: name 'GNN' is not defined

In [288]:
def train_model(model, batch):
    optimizer.zero_grad()

    batch.to(device)

    target = batch['user', 'rates', 'movie'].edge_label.to(device).to(torch.float64)

    pred = model(batch['user','rates', 'movie'].edge_label_index, batch["movie"].x)

    loss = F.binary_cross_entropy_with_logits(pred, target)
    loss.backward()
    optimizer.step()

    return pred, loss

In [277]:
pred, loss = train_model(model, sampled_data)

x_i size: torch.Size([1500, 404])


In [266]:
torch.nn.functional.softmax(pred, dim=0).max()

tensor(1., device='cuda:0', grad_fn=<MaxBackward1>)

In [282]:
import tqdm
from torch.nn import functional as F

model.train()
for epoch in range(num_of_epochs):
    total_loss = total_examples = 0.0
    for batch in tqdm.tqdm(loader):
        pred, loss = train_model(model, batch)
        total_loss += float(loss) * pred.numel()
        total_examples += pred.numel()
    print(f"Epoch: {epoch:03d}, loss: {total_loss/total_examples:.4f}")


100%|██████████| 283/283 [00:04<00:00, 59.10it/s]


Epoch: 000, loss: -276054893.4609


100%|██████████| 283/283 [00:04<00:00, 64.23it/s]


Epoch: 001, loss: -6965468880.1423


100%|██████████| 283/283 [00:04<00:00, 64.58it/s]


Epoch: 002, loss: -43552658599.4177


100%|██████████| 283/283 [00:04<00:00, 63.90it/s]


Epoch: 003, loss: -149708702269.1843


100%|██████████| 283/283 [00:04<00:00, 64.20it/s]


Epoch: 004, loss: -374746060748.0128


100%|██████████| 283/283 [00:04<00:00, 63.67it/s]


Epoch: 005, loss: -783696045250.5245


100%|██████████| 283/283 [00:04<00:00, 64.46it/s]


Epoch: 006, loss: -1409574596453.5586


100%|██████████| 283/283 [00:04<00:00, 64.42it/s]


Epoch: 007, loss: -2346320333454.2905


100%|██████████| 283/283 [00:04<00:00, 64.39it/s]


Epoch: 008, loss: -3637393403807.4053


100%|██████████| 283/283 [00:04<00:00, 62.57it/s]

Epoch: 009, loss: -5352262381109.0234



