# Assignment 4: Graph Neural Networks

Contact: [William Cappelletti](mailto:william.cappelletti@epfl.ch), [Ying Cao](mailto:ying.cao@epfl.ch)

Note that the classification sections are interchangeable, which means you do not have to run them sequentially to proceed.
You can run each section independently, and only train the corresponding models.

## Setup

In [1]:
#!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html

#!pip install torchmetrics

In [2]:
from typing import Callable, List, Optional

import networkx as nx
import torch
import torchmetrics
import torch_geometric as pyg
from torch import nn
from torchvision import transforms
from torchvision.datasets import MNIST
from torch_geometric.data import Dataset, Data
from torch_geometric.loader import DataLoader
from torch_geometric.datasets import GNNBenchmarkDataset, Planetoid
from torch_geometric.utils import from_networkx, to_networkx, get_laplacian
from torch_geometric.nn.conv import MessagePassing
from tqdm.notebook import tqdm

import matplotlib.pyplot as plt

To activate a GPU in colab , open the *Runtime* drop-down menu and click *Change runtime type*, then choose GPU as hardware accelerator.

In [3]:
if torch.cuda.is_available():
    device = 'cuda'
else:
    print("No GPU :(")
    device = 'cpu'


## Introduction

Prerequisite: [Tutorial 3]()

### Learning outcomes

- Implement a GNN, from convolution to pooling layers
- Use loss functions and training loops to train NNs on graph and node tasks
- Understand the usual deep learning pipeline

### Description

In this assignment we will implement and test the building blocks of Graph Neural Networks.
We will see how different elements work together to create embeddings out of network data.
Then, we shall use those embeddings to solve a few tasks.

The idea is to define a few (at least two) different embedding blocks and use them as the core of GNNs for the main networks problems, such as node and graph classification.

1. Graph classification: Graph MNIST
2. Node classification: Cora citation dataset


### Structure of the notebook:

#### 1. Graph convolution

1. Implement the *Laplacian-polynomial convolution*
2. *Test it* on a sample graph and comment on input/output

#### 2. Node embedding

- *Define modules* to compute node embeddings.
    These will be the building blocks of the specific GNNs.

#### 3. For each of the two tasks

##### 3.1 Pooling (completing the GNN)

- Define one GNNs for each task. They will use the node-embedding modules as inner blocks and have different readout layers.

##### 3.2 Training

- Two training loops shall be implemented, since for graph classification we have distinct graphs, while for node classification we have to use disjoint masks to extract train/test/val.

##### 3.3 Evaluation

- Write an evaluation loop
- Compute training (and validation) accuracy

##### 3.4 Discussion

- Evaluate your architectures
- Try some hyperparameters
- Compare.

This exercise will be an open question where you discuss how different models and parameters perform.
It should include at least the following points:

- *Learning rate*: how does it affect training and evaluation performances and why?
- *Embedding dimension*: does the performance improve with size?
- *Depth*:
    - How does the number of layers affects fitting and generalization?
    - How does depth affect the information flow over a graph?

### Expected output

You will have coding and theoretical questions. Coding exercises shall be solved within the specified space:
```python
# Your solution here ###########################################################
...
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
Anything outside shall not be touched, except if otherwise stated.

Theoretical questions shall be answered between the two horizontal lines (`---`) following the statement.
Pay attention to leave a blank line before the triple dash, or Markdown will interpret it as a title.

For instance:

---
*Your answer here*

---

## Datasets

Nothing to do here, just defining a dataset that we will use later 😄

In [4]:
class GraphMNIST(Dataset):
    def __init__(
        self,
        root: str,
        dist_p: float = 2,
        train: bool = True,
        download: bool = False,
    ):
        super().__init__()
        self.dist_p = dist_p
        self.img_size = 28

        self._mnist_transform = transforms.ToTensor()
        self._mnist = MNIST(root, transform=self._mnist_transform, train=train, download=download)

        self._pos_perm = torch.tensor([[0.,-1.], [1.,0.]], dtype=torch.float) 
        self._pos_off = torch.tensor([0., self.img_size], dtype=torch.float)
    
    def get(self, index: int) -> Data:
        x, y = self._mnist[index]
        x = x[0]

        nz = x.nonzero()
        # x = x[nz[:,0], nz[:,1]]

        nz = nz.to(torch.float)

        # Manhattan ngbs
        # edge_index = (torch.cdist(nz, nz, p=1) == 1).nonzero().T
        # Euclidean ngbs
        cdists = torch.cdist(nz, nz, p=self.dist_p)
        edge_index = ((0 < cdists) & (cdists < 2)).nonzero().T

        # We forse pos to be between -1 and 1
        pos = (
            nz @ self._pos_perm
            + self._pos_off
        ) / (self.img_size / 2) - 1 

        return Data(
            x=pos,
            edge_index=edge_index,
            y=y,
        )

    def len(self) -> int:
        return len(self._mnist)


## 1. Graph convolution

In this section we will implement and analyze a particular example of convolution on graphs, building on what you have seen in class.

The **Laplacian Polynomial convolution** is defined as
$$ f(X) = \sum_i L^i X \theta_i$$

### Exercise

It is time to implement the convolution as a neural network layer, to be used with PyTorch Geometric data.

Our layer will be a subclass of [`torch_geometric.nn.conv.MessagePassing`][messagepass], which is a subclass of [`torch.nn.module`][nn.module] to achieve message propagation in graph neural networks.
As we have seen in the tutorial, we only need to implement the `__init__` and the `forward` methods.
The forward method must follow the PyTorch Geometric API, i.e. it should expect features `x` and the edge list from the `Data` class.

We provided the backbone of the class, it's your turn to populate the methods!

[messagepass]: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_gnn.html#the-messagepassing-base-class
[nn.module]: https://pytorch.org/docs/stable/generated/torch.nn.Module.html

In [6]:
class LaplacianConv(MessagePassing):
    def __init__(self, in_features, out_features, K) -> None:
        """ Convolution defined as a polinomial of the Laplacian.

        Arguments:
            in_features (int): dimension of the input features
            out_features (int): dimension of the output features
            K (int): order of the Laplacian polynomial
        """
        super().__init__()
        # Your solution here ###################################################
        self.in_features = in_features
        self.out_features = out_features
        self._K = K

        # Init weights
        self.lins = torch.nn.ModuleList([
            Linear(in_channels, out_channels, bias=False,
                   weight_initializer='glorot') for _ in range(K)
        ])
        # What about bias?
        
        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    def forward(self, x: torch.Tensor, edge_index: torch.Tensor):
        # Your solution here ###################################################
        # hint: you may use the get_laplacian function and the propogation operation in torch.geometric package.
        # feel free to use other methods without the need to use MessagePassing
        
        Tx_0 = x
        Tx_1 = x  # Dummy.
        out = self.lins[0](Tx_0)

        # propagate_type: (x: Tensor, norm: Tensor)
        if len(self.lins) > 1:
            Tx_1 = self.propagate(edge_index, x=x, norm=norm, size=None)
            out = out + self.lins[1](Tx_1)

        for lin in self.lins[2:]:
            Tx_2 = self.propagate(edge_index, x=Tx_1, norm=norm, size=None)
            Tx_2 = 2. * Tx_2 - Tx_0
            out = out + lin.forward(Tx_2)
            Tx_0, Tx_1 = Tx_1, Tx_2

        return out
        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    def message(self, x_j, norm):
        # Your solution here ###################################################
        return norm.view(-1, 1) * x_j
        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

### Exercise

Let's study the effect of this convolution on a sample input. Run the following cell a few times, playing with the parameters. You should understand how this convolution affects each node representation and answer the following questions with your intuitions.

In [None]:
NB_NODES = 8
G = nx.cycle_graph(n=NB_NODES)
pos = nx.circular_layout(G)

sample_cycle = from_networkx(G)
_x = torch.zeros((NB_NODES, 1))
_x[0,0] = 1
sample_cycle.x = _x

fig, axes = plt.subplots(2, 4, figsize=(12,6))
nx.draw(G, pos=pos, node_color=_x.flatten().tolist(), ax=axes[0,0], vmin=-1, vmax=1)

L = nx.laplacian_matrix(G)
axes[1,0].spy(L**0)
axes[1,0].set(title="$L^0$", xticks=[], yticks=[])

for k in range(1, 1):
    conv = LaplacianConv(1, 1, k)
    _x = conv(sample_cycle.x, sample_cycle.edge_index)

    nx.draw(G, pos=pos, node_color=_x.flatten().tolist(), ax=axes[0, k], vmin=-1, vmax=1)
    axes[0, k].set(title=f"Conv(k={k})")

    axes[1,k].spy(L**k)
    axes[1,k].set(title=f"$L^{k}$", xticks=[], yticks=[])

plt.show()

The figure above shows, in the first row, the effect of the convolution you coded on a simple input graph (an 8-cycle with its signal concentrated in a single point). 
The second row shows various powers of the Laplacian.

**Question:** What does the degree of the Laplacian polynomial represent, in terms of information propagation?

---
The degree of the Laplacian defines the depth of the information propagation for each node for a single graph convolution layer. For instance, K=2 means that each convolution take into account the 1- and 2-hops neighbors when propagating the message.

---

**Question:** Explain in your own words what is the effect of this convolution.


---
*Your solution here*

---

## 2. Node embeddings

As you have seen in class, GNNs are parametrized functions of nodes features.
The output of a GNN layer is almost always a new set of features for the same
graph. By composing layers one after the other, our function becomes more
expressive.


### Exercise

In the following cells, define at least two GNN blocks based on the PyTorch Geometric API. 
You can add some hyperparameters to the constructor, to allow for more flexibility.


In [None]:
class GNNBlock1(nn.Module):
    def __init__(self, nb_features: int, embedding_dim: int) -> None:
        super().__init__()

        # Your solution here ###################################################
        # Feel free to add arguments if you need them, but provide default
        # values so that our examples still works
        self.conv1 = pyg.nn.GraphConv(...)
        ...

        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    def forward(self, x, edge_index):
        # Your solution here ###################################################
        x = self.conv1(x, edge_index).relu()
        ...

        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

        return x



In [None]:
class GNNBlock2(nn.Module):
    # Your solution here #######################################################
    ...

    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's test our GNN blocks on the sample cycle we defined before.
We choose an embedding dimension of 16, and thus we expect the output to be of shape `(8, 16)`, which represents the embedding of the eight nodes.

In [None]:
gnn_block1 = GNNBlock1(nb_features=1, embedding_dim=16)

out = gnn_block1(sample_cycle.x, sample_cycle.edge_index)
assert out.shape == (8,16)

## 3. Graph Classification

In this section we will use the two GNN blocks that we just defined to build a graph classifier.
First, we construct the model, then we define a training loop, and finally we evaluate it.

### Exercise : Pooling

Complete the following module by pooling the nodes representations with one of the methods defined in [`torch_geometric.nn`][pyg.nn].
We already imported `torch_geometric` as `pyg`.

[pyg.nn]: https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html

In [None]:
class GraphClassifier(nn.Module):
    def __init__(self, gnn_block: nn.Module, embedding_dim: int, num_classes: int) -> None:
        super().__init__()
        self.gnn_block = gnn_block
        self.classifier = nn.Sequential(
            nn.Linear(embedding_dim, embedding_dim),
            nn.ReLU(),
            nn.Linear(embedding_dim, num_classes),
        )
        # Your solution here ###################################################
        self.pooling = ...

        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    
    def forward(self, x, edge_index, batch) -> torch.Tensor:
        x = self.gnn_block(x, edge_index)
        x = self.pooling(x, batch)

        return self.classifier(x)

Again, let's verify the output of the graph classifier on the sample cycle that we define above.
The following cell defines a graph classifier using the `GNNBlock1`, to predict over two classes.
We expect the output to be of shape `(1,2)`, which represents the logits for two classes of a single graph.


In [None]:
EMBEDDING_DIM = 5

graph_gnn = GraphClassifier(
    GNNBlock1(sample_cycle.x.shape[1], EMBEDDING_DIM),
    embedding_dim=EMBEDDING_DIM,
    num_classes=2,
)

out = graph_gnn(sample_cycle.x, sample_cycle.edge_index, batch=None)
print(out)
assert out.shape == (1,2)


### Exercise : Training

Having defined the model, we now have to train it for some task.
We will work with a graph version of the MNIST handwritten-digit dataset.
Each sample is a graph given by adjacent colored pixels (the nodes), as you can see in the image below.

In [None]:
dataset_tr = GraphMNIST(".", dist_p=2, train=True, download=True)        
dataset_te = GraphMNIST(".", dist_p=2, train=False, download=True)        

In [None]:
fig, ax = plt.subplots(1,5, figsize=(26,4))
for i in range(5):
    g = dataset_tr[i]
    G = to_networkx(g, to_undirected=True)

    nx.draw(
        G,
        pos=g.x.numpy(),
        node_shape='.',
        ax=ax[i],
    )


Define a function to train graph-level tasks.
It should take as input a model, a data loader, a loss function, an optimizer and a number of epochs.

Then, complete the next cell by declaring all missing variables.

In [None]:
def train(
    model: nn.Module,
    loader: DataLoader,
    loss_fn: nn.Module,
    optimizer: torch.optim.Optimizer,
    nb_epochs: int,
):
    # Your solution here #######################################################
    ...

    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


In [None]:
# Your solution here ###########################################################
BATCH_SIZE = 64

train_loader = ...

model = GraphClassifier(
    ...
)

loss_fn = ...

optimizer = ...

nb_epochs = ...

train(model, train_loader, loss_fn, optimizer, nb_epochs)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

### Exercise : Evaluation

Let's evaluate the trained model(s).
For this purpose we can use metrics from the [TorchMetrics][TorchMetrics] package. 

Let's write a function that evaluates a trained model for a given metric, over a certain data loader.
Fill the body in the next cell.

[TorchMetrics]: https://torchmetrics.readthedocs.io/

In [None]:
def evaluate(model: nn.Module, metric: torchmetrics.Metric, loader: DataLoader):
    model.eval()  # Deactivate dropout
    with torch.no_grad():
        # Your solution here ###################################################
        ...

        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    return metric.compute().item()


Now, let's compute the training accuracy.

In [None]:
accuracy_fn_tr = torchmetrics.Accuracy().to(device)
print("Train accuracy:", evaluate(model, accuracy_fn_tr, train_loader))


### Exercise: Discussion

At this point we should have a complete pipeline!

We can play with hyperparameters such as the *learning rate* and the *embedding dimension*.
Also, we can change the architecture of our model, for instance by changing the embedding block with `GNNBlock2`.

**Question**: How do different parameters affect your training scores? Which architecture is the best?

---
*Your answer here*

---


Now, let's test our final model!

In [None]:
test_loader = DataLoader(dataset_te, batch_size=BATCH_SIZE)
accuracy_fn_te = torchmetrics.Accuracy().to(device)

print("Test accuracy:", evaluate(model, accuracy_fn_te, test_loader))

## 4. Node classification

In this section, instead of having multiple graphs with a label each, we have a single graph and node-wise annotations.
We will work with a subset of the Cora dataset, available in [`torch_geometric.datasets.Planetoid`][planetoid].

[planetoid]: https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html#torch_geometric.datasets.Planetoid

In [None]:
dataset = Planetoid(".", "Cora")

Each node is a scientific paper of one of 7 classes.
Links are given by citations.
The graph has 2708 nodes with 1433 features.
Feature vectors represent the embedding of the content of each paper.

In [None]:
data = dataset[0]
print(data)
print(data.num_node_features)


### Exercise : Embedding

Similarly to previous section, we will define a model containing our GNN block.
The main difference will be in the pooling layer, since now we want to extract the representation of each node.

Think about the GNN block. What is the shape of its output? What does it represent?

In the following cell, you will find a class which is lacking its body, fill it.

In [None]:
class NodeClassifier(nn.Module):
    # Your solution here #######################################################

    def __init__(self, gnn_block: nn.Module, embedding_dim: int, num_classes: int) -> None:
        super().__init__()
        ...

    
    def forward(self, x, edge_index) -> torch.Tensor:
        ...

        return x

    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

### Exercise : Training

As we have seen in the tutorial, to split the dataset in train and test we have to use masks.
Luckily, standard datasets come with predefined splits, which we can access with `data.train_mask` and `data.test_mask`.

As before, write a function that trains the model on a given graph, by using the training mask.
Then, use such function to train a node classifier on the Cora graph.

In [None]:
def train(
    model: nn.Module,
    data: Data,
    loss_fn: nn.Module,
    optimizer: torch.optim.Optimizer,
    nb_epochs: int,
):
    # Your solution here #######################################################
    ...

    #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


In [None]:
# Your solution here ###########################################################
model = NodeClassifier(
    ...
)

loss_fn = ...

optimizer = ...

nb_epochs = ...

train(model, data, loss_fn, optimizer, nb_epochs)

#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

### Exercise: Evaluation

Again, we have to revisit our evaluation function. We should now allow to pass a desired mask.

In [None]:
def evaluate(model: nn.Module, metric: torchmetrics.Metric, data: Data, mask: torch.Tensor):
    model.eval()  # Deactivate dropout
    with torch.no_grad():
        # Your solution here ###################################################
        ...
        
        #^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    return metric.compute().item()


Now, let's compute the accuracy on both training and validation splits

In [None]:
# Your solution here ###########################################################

accuracy_train = evaluate(model, ..., data, ...)
accuracy_val = ...

print("Train accuracy:", accuracy_train)
print("Val accuracy:", accuracy_val)
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

### Exercise: Discussion

Again, we have a complete pipeline. Play again with your hyperparameters and architecture, then answer the question.

**Question**: How do different parameters affect your training scores? Which architecture is the best?

---
*Your answer here*

---


Let's test our final model!

In [None]:
print("Test accuracy:", evaluate(model, torchmetrics.Accuracy().to(device), data, data.test_mask))