# Graph Neural Networks (GNNs)

## Introduction

Graph Neural Networks (GNNs) are a class of neural networks that operate on graph-structured data. They have gained significant attention due to their ability to model relationships and interactions in data represented as graphs. Applications of GNNs span social network analysis, recommendation systems, biological networks, and more.

In this tutorial, we'll explore how to apply neural networks to graph-structured data. We'll delve into the mathematical foundations of GNNs, implement GNNs for tasks like node classification and link prediction, and discuss the latest developments in the field.

## Table of Contents

1. [Understanding Graph Neural Networks](#1)
   - [What are Graphs?](#1.1)
   - [Neural Networks on Graphs](#1.2)
2. [Mathematical Foundations](#2)
   - [Graph Convolutional Networks (GCNs)](#2.1)
   - [Message Passing Neural Networks](#2.2)
3. [Implementing GNNs](#3)
   - [Node Classification with GCNs](#3.1)
   - [Link Prediction with Graph Autoencoders](#3.2)
4. [Latest Developments in GNNs](#4)
   - [Graph Attention Networks (GAT)](#4.1)
   - [Graph Isomorphism Networks (GIN)](#4.2)
5. [Conclusion](#5)
6. [References](#6)


<a id="1"></a>
# 1. Understanding Graph Neural Networks

<a id="1.1"></a>
## 1.1 What are Graphs?

A graph is a data structure consisting of nodes (or vertices) and edges connecting pairs of nodes. Graphs are used to represent relational data and can model complex interactions in data such as social networks, molecules, transportation networks, and more.

**Formally**, a graph \( G \) is defined as \( G = (V, E) \), where:

- \( V \) is the set of nodes.
- \( E \) is the set of edges connecting nodes.

<a id="1.2"></a>
## 1.2 Neural Networks on Graphs

Traditional neural networks (e.g., CNNs, RNNs) are designed for data with grid-like structures (e.g., images, sequences). GNNs extend neural networks to operate on graph-structured data by aggregating and transforming information from a node's neighbors.

**Key Idea**: Each node aggregates information from its neighbors to update its representation.

<a id="2"></a>
# 2. Mathematical Foundations

<a id="2.1"></a>
## 2.1 Graph Convolutional Networks (GCNs)

Graph Convolutional Networks [[1]](#ref1) generalize convolutional neural networks to graphs. The convolution operation in GCNs involves aggregating feature information from a node's local neighborhood.

### GCN Layer Definition

The forward propagation rule for a single GCN layer is:

$[
H^{(l+1)} = \sigma\left( \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(l)} W^{(l)} \right)
]$

Where:

- $( H^{(l)} )$: Node representations at layer $( l )$.
- $( H^{(0)} = X )$: Input feature matrix.
- $( \tilde{A} = A + I )$: Adjacency matrix with added self-connections.
- $( \tilde{D} )$: Degree matrix of $( \tilde{A} )$.
- $( W^{(l)} )$: Weight matrix at layer $( l )$.
- $( \sigma )$: Activation function (e.g., ReLU).

### Intuition

- **Normalization**: The term \( \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} \) normalizes the adjacency matrix to prevent numerical instabilities and maintain the scale of features.
- **Aggregation**: Each node aggregates features from its neighbors (including itself) to update its representation.

<a id="2.2"></a>
## 2.2 Message Passing Neural Networks

Message Passing Neural Networks (MPNNs) [[2]](#ref2) provide a general framework for GNNs.

### Message Passing Framework

1. **Message Function**: Compute messages from neighbors.

   $[
   m_{v}^{(l+1)} = \sum_{u \in \mathcal{N}(v)} M^{(l)}(h_{v}^{(l)}, h_{u}^{(l)}, e_{uv})
   ]$

2. **Update Function**: Update node representations.

   $[
   h_{v}^{(l+1)} = U^{(l)}(h_{v}^{(l)}, m_{v}^{(l+1)})
   ]$

Where:

- $( h_{v}^{(l)} )$: Hidden state of node $( v )$ at layer $( l )$.
- $( e_{uv} )$: Edge features between nodes $( u )$ and $( v )$.
- $( \mathcal{N}(v) )$: Neighbors of node $( v )$.
- $( M^{(l)} )$, $( U^{(l)} )$: Learnable functions (e.g., neural networks).

### Intuition

- Nodes exchange messages with their neighbors and update their states based on the aggregated messages.

<a id="3"></a>
# 3. Implementing GNNs

We'll implement a GCN for node classification on the Cora dataset, a citation network where nodes represent papers, and edges represent citations.

<a id="3.1"></a>
## 3.1 Node Classification with GCNs

### Setup

We'll use the PyTorch Geometric library, which provides utilities for GNNs.

In [None]:
# Install PyTorch Geometric
# Please run this cell to install the necessary packages.
# For Google Colab users, uncomment the following lines.

# !pip install torch
# !pip install torch_geometric
# !pip install torch_sparse
# !pip install torch_scatter
# !pip install torch_cluster
# !pip install torch_spline_conv
# !pip install torch-geometric

import torch
import torch.nn.functional as F
from torch_geometric.datasets import Planetoid
from torch_geometric.nn import GCNConv

### Load the Dataset

We'll load the Cora dataset using PyTorch Geometric's built-in loader.

In [None]:
# Load the Cora dataset
dataset = Planetoid(root='/tmp/Cora', name='Cora')
print(f'Dataset: {dataset}')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_node_features}')
print(f'Number of classes: {dataset.num_classes}')

data = dataset[0]  # Cora has only one graph
print(data)

# Data attributes
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops: {data.has_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')

### Define the GCN Model

In [None]:
class GCN(torch.nn.Module):
    def __init__(self):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(dataset.num_node_features, 16)
        self.conv2 = GCNConv(16, dataset.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)

        return F.log_softmax(x, dim=1)

model = GCN()
print(model)

**Explanation:**

- **GCNConv** layers perform the graph convolution operations.
- We use ReLU activation and dropout for regularization.
- The output layer uses a log-softmax activation for classification.

### Training the Model

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

model.train()
for epoch in range(200):
    optimizer.zero_grad()
    out = model(data)
    loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    if epoch % 20 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item():.4f}')

**Explanation:**

- **Train Mask**: Specifies which nodes to use for training.
- **Negative Log-Likelihood Loss**: Used for multi-class classification.

### Evaluating the Model

In [None]:
model.eval()
_, pred = model(data).max(dim=1)
correct = int(pred[data.test_mask].eq(data.y[data.test_mask]).sum().item())
accuracy = correct / int(data.test_mask.sum())
print(f'Test Accuracy: {accuracy:.4f}')

**Results:**

- The GCN achieves a test accuracy of around 80% on the Cora dataset.

<a id="3.2"></a>
## 3.2 Link Prediction with Graph Autoencoders

Link prediction involves predicting the existence of an edge between two nodes. We'll implement a simple Graph Autoencoder (GAE) for this task.

In [None]:
from torch_geometric.nn import VGAE
from torch_geometric.utils import train_test_split_edges

# Prepare data for link prediction
data = train_test_split_edges(data)

class VariationalGraphAutoEncoder(torch.nn.Module):
    def __init__(self, in_channels, out_channels):
        super(VariationalGraphAutoEncoder, self).__init__()
        self.conv1 = GCNConv(in_channels, 2 * out_channels)
        self.conv_mu = GCNConv(2 * out_channels, out_channels)
        self.conv_logstd = GCNConv(2 * out_channels, out_channels)

    def encode(self, x, edge_index):
        x = F.relu(self.conv1(x, edge_index))
        return self.conv_mu(x, edge_index), self.conv_logstd(x, edge_index)

    def reparameterize(self, mu, logstd):
        if self.training:
            std = torch.exp(logstd)
            return mu + std * torch.randn_like(std)
        else:
            return mu

    def decode(self, z, pos_edge_index):
        return torch.sigmoid((z[pos_edge_index[0]] * z[pos_edge_index[1]]).sum(dim=1))

    def forward(self, data):
        mu, logstd = self.encode(data.x, data.train_pos_edge_index)
        z = self.reparameterize(mu, logstd)
        return self.decode(z, data.train_pos_edge_index), mu, logstd

model = VariationalGraphAutoEncoder(dataset.num_node_features, 16)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

**Explanation:**

- **VGAE**: Variational Graph Autoencoder extends the GAE with probabilistic latent variables.
- **Reparameterization Trick**: Allows backpropagation through stochastic variables.

### Training the VGAE

In [None]:
def train():
    model.train()
    optimizer.zero_grad()
    pos_pred, mu, logstd = model(data)
    # Reconstruction loss
    pos_loss = -torch.log(pos_pred + 1e-15).mean()
    # KL divergence
    kl_loss = -0.5 / data.num_nodes * torch.mean(
        torch.sum(1 + 2 * logstd - mu**2 - torch.exp(2 * logstd), dim=1))
    loss = pos_loss + kl_loss
    loss.backward()
    optimizer.step()
    return loss.item()

for epoch in range(100):
    loss = train()
    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss:.4f}')

### Evaluating the VGAE

In [None]:
from torch_geometric.utils import negative_sampling

model.eval()
with torch.no_grad():
    z = model.reparameterize(*model.encode(data.x, data.train_pos_edge_index))

    # Positive edges
    pos_edge_index = data.test_pos_edge_index
    pos_pred = (z[pos_edge_index[0]] * z[pos_edge_index[1]]).sum(dim=1)
    pos_pred = torch.sigmoid(pos_pred)

    # Negative edges
    neg_edge_index = negative_sampling(
        edge_index=data.train_pos_edge_index,
        num_nodes=data.num_nodes,
        num_neg_samples=pos_edge_index.size(1),
    )
    neg_pred = (z[neg_edge_index[0]] * z[neg_edge_index[1]]).sum(dim=1)
    neg_pred = torch.sigmoid(neg_pred)

    # Compute AUC
    from sklearn.metrics import roc_auc_score
    preds = torch.cat([pos_pred, neg_pred]).cpu().numpy()
    labels = torch.cat([torch.ones(pos_pred.size(0)), torch.zeros(neg_pred.size(0))]).cpu().numpy()
    auc = roc_auc_score(labels, preds)
    print(f'Test AUC: {auc:.4f}')

**Results:**

- The VGAE achieves a test AUC indicating its ability to predict links.

<a id="4"></a>
# 4. Latest Developments in GNNs

<a id="4.1"></a>
## 4.1 Graph Attention Networks (GAT)

Graph Attention Networks [[3]](#ref3) introduce attention mechanisms to GNNs, allowing nodes to weigh the importance of their neighbors.

### GAT Layer Definition

The attention coefficient between nodes $( i )$ and $( j )$:

$[
\alpha_{ij} = \frac{\exp\left( \text{LeakyReLU}(a^{\top}[W h_i \parallel W h_j]) \right)}{\sum_{k \in \mathcal{N}(i)} \exp\left( \text{LeakyReLU}(a^{\top}[W h_i \parallel W h_k]) \right)}]$

Where:

- $( h_i )$: Feature vector of node $( i )$.
- $( W )$: Weight matrix.
- $( a )$: Learnable attention vector.
- $( \parallel )$: Concatenation operator.

### Intuition

- Nodes attend over their neighbors, assigning higher weights to more important ones.

<a id="4.2"></a>
## 4.2 Graph Isomorphism Networks (GIN)

Graph Isomorphism Networks [[4]](#ref4) are powerful GNNs capable of distinguishing different graph structures.

### GIN Layer Definition

$[
h_v^{(k)} = \text{MLP}^{(k)} \left( (1 + \epsilon^{(k)}) \cdot h_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} h_u^{(k-1)} \right)
]$

Where:

- $( \text{MLP}^{(k)} )$: Multi-layer perceptron at layer $( k )$.
- $( \epsilon^{(k)} )$: Learnable or fixed scalar.

### Intuition

- GINs are designed to be as powerful as the Weisfeiler-Lehman graph isomorphism test.

<a id="5"></a>
# 5. Conclusion

Graph Neural Networks extend deep learning to graph-structured data, enabling powerful modeling of relational data. We explored GCNs for node classification, GAEs for link prediction, and discussed advanced architectures like GAT and GIN. The field is rapidly evolving, with ongoing research pushing the boundaries of what's possible with GNNs.

<a id="6"></a>
# 6. References

1. <a id="ref1"></a>Kipf, T. N., & Welling, M. (2016). *Semi-Supervised Classification with Graph Convolutional Networks*. [arXiv:1609.02907](https://arxiv.org/abs/1609.02907)
2. <a id="ref2"></a>Gilmer, J., et al. (2017). *Neural Message Passing for Quantum Chemistry*. [arXiv:1704.01212](https://arxiv.org/abs/1704.01212)
3. <a id="ref3"></a>Veličković, P., et al. (2017). *Graph Attention Networks*. [arXiv:1710.10903](https://arxiv.org/abs/1710.10903)
4. <a id="ref4"></a>Xu, K., et al. (2018). *How Powerful are Graph Neural Networks?* [arXiv:1810.00826](https://arxiv.org/abs/1810.00826)

---

This notebook provides an in-depth exploration of Graph Neural Networks. You can run the code cells to see how GNNs are implemented and experiment with different architectures and datasets.