# Introduction
Welcome to Practical 3 for Graph Representation Learning.  In this practical, you are expected to first implement two functions from scratch: **Graph Mini-Batching** and **Global Pooling**. Then, you need to incorporate these two functions into a graph neural network model to solve a **Graph Classification** task.

- **Graph Mini-Batching** A mini-batch groups a set of graphs into a unified representation where it can efficiently be processed in parallel.
- **Global Pooling** Obtain the graph feature based on all node features in the graph, in which you can use different operations such as summation, mean and max.

We will be using [PyTorch](https://pytorch.org/docs/stable/index.html) and [PyG](https://pytorch-geometric.readthedocs.io/en/latest/) for experiments.

The notebook is divided into sections, each of which comes with complete or partially completed code. Before each snippet of code there will be a description of what we are about to implement. The sections of code you need to complete are marked as **Tasks**.

Please ensure that you operate within the framework given in the notebook and bring any questions you may have to the practical demonstrators. We suggest that you **DO NOT** edit code that is a part of the framework, since this will make it more difficult for demonstrators to assist if your code is broken.

Since we are working in a Jupyter Notebook, the code is very interactive. When you're stuck on something, try adding a new block of code below what you're working on and using it to debug your code.

## Installing dependencies
First of all, we advise you to enable GPU acceleration for your notebook. This can be done by navigating to `Runtime > Change runtime type > Hardware accelerator (GPU) > Save`. You may getting an error explaining that no GPUs are currently available. This is fine, you don't really need them for this practical, however they'll make your computations significantly faster.

Some other tips & tricks:
- press `Shift + Enter` to run a cell and move to the next one (`Ctrl + Enter` to only run it)
- when you execute a cell, the variables you create are saved into a global namespace. As a consequence, changes in the code will not take effect until you re-run that specific cell.
- remember to save your notebook every once in a while!

In [1]:
# Download the corresponding PyTorch Geometric module
# %%capture
!pip install git+https://github.com/pyg-team/pytorch_geometric.git

Collecting git+https://github.com/pyg-team/pytorch_geometric.git
  Cloning https://github.com/pyg-team/pytorch_geometric.git to /private/var/folders/4x/1dlwhbps0mgf3z6747y6zf6h0000gn/T/pip-req-build-534v54gc
  Running command git clone --filter=blob:none --quiet https://github.com/pyg-team/pytorch_geometric.git /private/var/folders/4x/1dlwhbps0mgf3z6747y6zf6h0000gn/T/pip-req-build-534v54gc
  Resolved https://github.com/pyg-team/pytorch_geometric.git to commit 46705844b39ededc0fcef1de90e73923480a6446
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


# Imports

Run the following blocks of code to install and import and the necessary python packages.

In [2]:
# Let's first import all the things we are gonna need for this task
import torch
import time
import torch.nn as nn
import numpy as np
import random
import torch.nn.functional as F
from torch_geometric.utils import scatter
from torch_geometric.nn import MessagePassing
import torch_geometric.utils as U
# torch_geometric only used to load the Cora dataset
from torch_geometric.datasets import Planetoid
from torch_geometric.loader import DataLoader
from torch_geometric.data.data import Data
import torch_geometric.utils as U
import torch
from sklearn.metrics import roc_auc_score
from torch_geometric.datasets import TUDataset


# Dataset

## Loading MUTAG from TUDataset
We will use TUDataset to load the MUTAG dataset, which contains molecular graphs of 188 chemical compounds divided into two classes according to their mutagenic effect on a bacterium.

We construct three lists: A, X, Y, which correspond to a list of adj_matrix, a list of node features and a list of graph labels, respectively.

Please **DO NOT** modify any part of the following block.

In [3]:
# please **DO NOT** modify  any part of the following code  in this cell
raw_dataset = TUDataset(root='data/TUDataset', name='MUTAG') # raw_dataset is an instance of class TUDataset
batch_size = 32
A = []
X = []
Y = []
# raw_dataset satisfies the Python iterable protocol (need to check details)
for graph in raw_dataset: 
    adj_matrix = U.to_dense_adj(graph.edge_index).squeeze(0)
    A.append(adj_matrix)
    X.append(graph.x)
    Y.append(graph.y)

**Notes on data structure**

The next code cell outputs `Data(edge_index=[2, 38], x=[17, 7], edge_attr=[38, 4], y=[1])`, the form of a single graph in this dataset.
- `edge_index=[2, 38]` is the edge list in COO (Coordinate) format where
  - $2$ indicates the rows: one row for the source nodes and one for the destination nodes;
  - $38$ is the total number of edges in the graph.
- `x=[17, 7]` is the node feature matrix where
  - $17$ is the number of nodes in this particular graph;
  - $7$ is the feature dimension of all nodes.
- `edge_attr=[38, 4]` is the edge feature matrix where
  - $38$ is the total number of edges in the graph;
  - $4$ is the feature dimension of all edges.
- `y=[1]` is the binary label of this graph.

In [4]:
for graph in raw_dataset:
    print(graph)
    break

Data(edge_index=[2, 38], x=[17, 7], edge_attr=[38, 4], y=[1])


We can then run the following cell to output the first one element in the three constructed lists.

In [5]:
print(A[0]) # a single adjacency matrix
print(X[0]) # a single matrix of node features
print(Y[0]) # a single number of graph label

tensor([[0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1.,

In [6]:
# Note that the initial feature dimension is 7
# the first number e.g. 17, 13, etc denote the number of nodes in this particular graph
print(A[0].size())
print(X[0].size())
print(X[1].size())
print(X[2].size())
print(X[187].size())

torch.Size([17, 17])
torch.Size([17, 7])
torch.Size([13, 7])
torch.Size([13, 7])
torch.Size([16, 7])


In [7]:
num_adj = len(A)
num_adj # molecular graphs of 188 chemical compounds divided into two classes

188

## Task 1
Define an iterator `graph_mini_batch` that takes in a list of adj_matrix (A), a list of node features (X), a list of graph labels (Y) and a batch_size B=64, and outputs four items $A_B$, $X_B$, $Y_B$ and $\textsf{Batch}$ each time, such that the batched Adjacency matrices  $A_B$ are stacked in a diagonal fashion (creating a giant graph that holds multiple isolated subgraphs), and the batched node features list $X_B$ and the batched graph label list $Y_B$ are simply concatenated in the node dimension, i.e.,

\begin{split}\mathbf{A_B} = \begin{bmatrix} \mathbf{A}_1 & & \\ & \ddots & \\ & & \mathbf{A}_n \end{bmatrix}, \qquad \mathbf{X_B} = \begin{bmatrix} \mathbf{X}_1 \\ \vdots \\ \mathbf{X}_n \end{bmatrix}, \qquad \mathbf{Y_B} = \begin{bmatrix} \mathbf{Y}_1 \\ \vdots \\ \mathbf{Y}_n \end{bmatrix}.\end{split}

Furthermore, you are expected to output a  **`Batch` vector**, which maps each node to its respective graph in the batch:

$$
\textrm{Batch} = [ 0, \ldots, 0, 1, \ldots, 1, 2, \ldots, n, \ldots, n]
$$

**Hints:**
1. use the keyword **yield** to make your function be an iterator.
2. note that the last batch might not have the enough items satisfying the specified batch size, so your function should be able to deal with such a case and have a correct output.

*More on the `Batch` vector*

Suppose we have three graphs, each with the following nodes:
- Graph 0: 3 nodes
- Graph 1: 2 nodes
- Graph 2: 4 nodes

Nodes:  [0, 1, 2, 3, 4, 5, 6, 7, 8]

Graphs: [0, 0, 0, 1, 1, 2, 2, 2, 2]

Graph 0:   ●---●---●      Batch = [0, 0, 0]

Graph 1:   ●---●          Batch = [1, 1]

Graph 2:   ●---●---●---●  Batch = [2, 2, 2, 2]



In [8]:
# Check list slicing
my_list = [1, 2, 3, 4]
new_list = my_list[0 : 2]
new_list

[1, 2]

In [9]:
# Check `torch.block_diag`
A_ = torch.tensor([[0, 1], [1, 0]])
B_ = torch.tensor([[3, 4, 5], [6, 7, 8]])
C_ = torch.tensor(7)
D_ = torch.tensor([1, 2, 3])
E_ = torch.tensor([[4], [5], [6]])
torch.block_diag(A_, B_, C_, D_, E_) # Note block_diag expects separate tensors and that's why later we use unpacking operator *

tensor([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 3, 4, 5, 0, 0, 0, 0, 0],
        [0, 0, 6, 7, 8, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 7, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 2, 3, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 4],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 6]])

In [10]:
# Check `torch.full`
tensor_a = torch.full((5,), 3.141592)
tensor_a

tensor([3.1416, 3.1416, 3.1416, 3.1416, 3.1416])

In [11]:
def graph_mini_batch(adj_matrix_list,  x_list,  y_list, batch_size=64):
     # Implement the function here
     num_graphs = len(adj_matrix_list)
     for i in range(0, num_graphs, batch_size):
        # Get the batch
        A_batch = adj_matrix_list[i : i + batch_size]
        X_batch = x_list[i : i + batch_size]
        Y_batch = y_list[i : i + batch_size]
        
        # Compute the batched adjacency matrix (block-diagonal)
        A_B = torch.block_diag(*A_batch) # Use unpacking operator * to unpact list into tensors
        
        # Concatenate node features and labels
        X_B = torch.cat(X_batch, dim=0) # Concatenate them vertically
        Y_B = torch.cat(Y_batch, dim=0)
        
        # Create the Batch vector
        batch_vector = torch.cat([
            torch.full((x.size(0),), idx, dtype=torch.long)  # Creates [idx, idx, ..., idx]
            for idx, x in enumerate(X_batch)
        ])
        
        # Yield the batch
        yield A_B, X_B, Y_B, batch_vector

In [12]:
# Test your batch function here
for adj_matrix, x, y, batch in graph_mini_batch(A, X, Y, batch_size):
     print(adj_matrix.size())
     print(x.size())
     print(y.size())
     print(batch.size())

torch.Size([585, 585])
torch.Size([585, 7])
torch.Size([32])
torch.Size([585])
torch.Size([583, 583])
torch.Size([583, 7])
torch.Size([32])
torch.Size([583])
torch.Size([589, 589])
torch.Size([589, 7])
torch.Size([32])
torch.Size([589])
torch.Size([590, 590])
torch.Size([590, 7])
torch.Size([32])
torch.Size([590])
torch.Size([498, 498])
torch.Size([498, 7])
torch.Size([32])
torch.Size([498])
torch.Size([526, 526])
torch.Size([526, 7])
torch.Size([28])
torch.Size([526])


## Task 2
Define a function `global_sum_pool` which takes a batch of node features and a Batch vector mapping each node to its respective graph in the batch, and outputs a batch of graph representation vectors by summing all node features in a graph.

**Hints:** You are allowed to use the function scatter from torch_scatter library. See [here](https://pytorch-scatter.readthedocs.io/en/latest/functions/scatter.html#torch_scatter.scatter) for a detailed introduction about the usage.

In [13]:
def global_sum_pool(x, batch):
   # Implement the function here
   # Use scatter to sum node features for each graph
   """
    Perform global sum pooling on a batch of graphs.

    Args:
        x (torch.Tensor): Node feature matrix of shape (N, F),
                          where N is the number of nodes in a batch, and F is the feature dimension.
        batch (torch.Tensor): Batch vector of shape (N,), mapping nodes to graphs.

    Returns:
        torch.Tensor: Graph-level representation matrix of shape (B, F),
                      where B is the number of graphs in the batch.
    """
   graph_representations = scatter(x, batch, dim=0, reduce='sum')
   return graph_representations

In [14]:
# Test your pooling function, assuming you are given a mini-batch of node features and a batch vector
# Note that 5 * 32 + 28 = 188
for _, x, _ , batch in graph_mini_batch(A, X, Y, batch_size):
      sum_graph_rep =  global_sum_pool(x, batch)
      print(sum_graph_rep.size())

torch.Size([32, 7])
torch.Size([32, 7])
torch.Size([32, 7])
torch.Size([32, 7])
torch.Size([32, 7])
torch.Size([28, 7])


Here is a more detailed explanation of the above `global_sum_pool` function to illustrate the idea.

For the input:
- Graph 0: Nodes $0,1 \rightarrow$ Sum features: $[1+4,2+5,3+6]=[5,7,9]$
- Graph 1: Nodes 2, $3 \rightarrow$ Sum features: $[7+10,8+11,9+12]=[17,19,21]$
- Graph 2: Node $4 \rightarrow$ Features remain unchanged: $[13,14,15]$

In [15]:
# Example node features (5 nodes, 3 features each)
x = torch.tensor([[1.0, 2.0, 3.0],
                  [4.0, 5.0, 6.0],
                  [7.0, 8.0, 9.0],
                  [10.0, 11.0, 12.0],
                  [13.0, 14.0, 15.0]])

# Example batch vector (5 nodes belonging to 3 graphs)
batch = torch.tensor([0, 0, 1, 1, 2])  # Graph 0: 2 nodes, Graph 1: 2 nodes, Graph 2: 1 node

# Perform global sum pooling
graph_representations = global_sum_pool(x, batch)

print("Graph Representations:\n", graph_representations)

Graph Representations:
 tensor([[ 5.,  7.,  9.],
        [17., 19., 21.],
        [13., 14., 15.]])


# Model
We now implement a GNN model, `GIN`, which is used to do the graph classification. **DO NOT** change the following block.

In [16]:
class GINConv(MessagePassing):
    def __init__(self, emb_dim):
        '''
            emb_dim (int): node embedding dimensionality
        '''
        super(GINConv, self).__init__(aggr = "add")

        self.mlp = torch.nn.Sequential(torch.nn.Linear(emb_dim, 2*emb_dim), torch.nn.BatchNorm1d(2*emb_dim), torch.nn.ReLU(), torch.nn.Linear(2*emb_dim, emb_dim))
        self.eps = torch.nn.Parameter(torch.Tensor([0]))

    def forward(self, x, edge_index):
        out = self.mlp((1 + self.eps) *x + self.propagate(edge_index, x=x))
        return out

    def message(self, x_j):
        return F.relu(x_j)

    def update(self, aggr_out):
        return aggr_out


### GNN to generate node embedding
class GIN(torch.nn.Module):
    """
    Output:
        node representations
    """
    def __init__(self, num_layer, emb_dim, hidden_dim, drop_ratio = 0, JK = "last", residual = False):
        '''
            emb_dim (int): node embedding dimensionality
            num_layer (int): number of GNN message passing layers
        '''
        super(GIN, self).__init__()
        self.num_layer = num_layer
        self.drop_ratio = drop_ratio
        self.JK = JK
        ### add residual connection or not
        self.residual = residual
        if self.num_layer < 2:
            raise ValueError("Number of GNN layers must be greater than 1.")

        self.embed = torch.nn.Linear(emb_dim, hidden_dim)
        self.convs = torch.nn.ModuleList()
        self.batch_norms = torch.nn.ModuleList()
        for layer in range(num_layer):
            self.convs.append(GINConv(hidden_dim))
            self.batch_norms.append(torch.nn.BatchNorm1d(hidden_dim))

    def forward(self, x, edge_index):
        h_list = [self.embed(x)]
        for layer in range(self.num_layer):
            h = self.convs[layer](h_list[layer], edge_index)
            h = self.batch_norms[layer](h)
            if layer == self.num_layer - 1:
                #remove relu for the last layer
                h = F.dropout(h, self.drop_ratio, training = self.training)
            else:
                h = F.dropout(F.relu(h), self.drop_ratio, training = self.training)

            if self.residual:
                h += h_list[layer]

            h_list.append(h)
        ### Different implementations of Jk-concat
        if self.JK == "last":
            node_representation = h_list[-1]
        elif self.JK == "sum":
            node_representation = 0
            for layer in range(self.num_layer + 1):
                node_representation += h_list[layer]

        return node_representation

### Define the hyperparameters we are going to use

In [17]:
params = {
    "input_features": 7,
    "hidden_features": 50,
    "num_layers": 3,
    "learning_rate": 1e-4,
    "weight_decay": 0,
    "num_epochs": 100,
    "num_classes": 2,
    "batch_size": 32
}


In [18]:
# Check dict value
params["num_layers"]

3

Now, we come to the most exciting part, that is, training and evaluating the model based on the graph mini-batch and the graph pooling functions you implemented above.

By checking the training time,  you can see the power of the graph batching by changing the batch_size, i.e., from 1 to 64.

In [19]:
begin_time = time.time()

model = GIN(params["num_layers"], params["input_features"], params["hidden_features"])
pooling = global_sum_pool
graph_pred_linear = torch.nn.Linear(params["hidden_features"], params["num_classes"])
loss_fn = nn.CrossEntropyLoss()
model_param_group = [{"params": model.parameters(), "lr": params["learning_rate"]}]
if graph_pred_linear is not None:
        model_param_group.append(
            {"params": graph_pred_linear.parameters(), "lr": params["learning_rate"]}
        )

optimizer = torch.optim.AdamW(model_param_group,
                                lr=params["learning_rate"],
                                weight_decay=params["weight_decay"])
for epoch in range(params["num_epochs"]):
  model.train()
  for adj_matrix, x, y, batch in graph_mini_batch(A, X, Y, params["batch_size"]):
      optimizer.zero_grad()
      edge_index = U.dense_to_sparse(adj_matrix)[0]
      nodes = model(x, edge_index)
      graph_reps = pooling(nodes, batch)
      pred = graph_pred_linear(graph_reps)
      loss = loss_fn(pred, y)
      loss.backward()
      optimizer.step()
  if epoch % 10 == 0:
      model.eval()
      correct = 0
      total_num = 0
      for adj_matrix, x, y, batch in graph_mini_batch(A, X, Y, params["batch_size"]):
        optimizer.zero_grad()
        edge_index = U.dense_to_sparse(adj_matrix)[0]
        nodes = model(x, edge_index)
        graph_reps = pooling(nodes, batch)
        pred = graph_pred_linear(graph_reps)
        correct += (pred.argmax(dim=-1) == y).sum()
        total_num += len(y)
      print("epoch={}, loss={}, accuracy={}".format(epoch, loss.item(), correct/total_num))
print("time={}".format(time.time()-begin_time))

epoch=0, loss=1.972123384475708, accuracy=0.664893627166748
epoch=10, loss=0.1613469272851944, accuracy=0.7765957713127136
epoch=20, loss=0.07435809075832367, accuracy=0.9095744490623474
epoch=30, loss=0.03826948627829552, accuracy=0.9095744490623474
epoch=40, loss=0.021557671949267387, accuracy=0.936170220375061
epoch=50, loss=0.013883644714951515, accuracy=0.9308510422706604
epoch=60, loss=0.009876693598926067, accuracy=0.9308510422706604
epoch=70, loss=0.007313788402825594, accuracy=0.9308510422706604
epoch=80, loss=0.005458567291498184, accuracy=0.9308510422706604
epoch=90, loss=0.004191061481833458, accuracy=0.936170220375061
time=9.226132869720459


### Task 3 (Optional)
This is an optional task. You are expected to implement 10-Fold Cross-validation. The general procedure is as follows:

1. Shuffle the dataset randomly.
2. Split the dataset into $k$ groups.
For each unique group:
  - Take the group as a hold out or test data set
  - Take the remaining groups as a training data set
  - Fit a model on the training set and evaluate it on the test set.
  Retain the evaluation score and discard the model
  - Summarize the skill of the model using the sample of model evaluation scores

In [20]:
# Check KFold
from sklearn.model_selection import KFold
import numpy as np

# Initialize random seed and dataset
np.random.seed(42)
number = np.random.permutation(10)

# Set up KFold
kf = KFold(n_splits=2, shuffle=False)

# Iterate through the splits and print training/test indices
for train_index, test_index in kf.split(number):
    print("Train indices:", train_index)
    print("Test indices:", test_index)


Train indices: [5 6 7 8 9]
Test indices: [0 1 2 3 4]
Train indices: [0 1 2 3 4]
Test indices: [5 6 7 8 9]


In [21]:
# implement your solutions here
from sklearn.model_selection import KFold
import numpy as np
import torch

def cross_validation(dataset, model_class, params, k=10):
    """
    Perform k-fold cross-validation.
    
    Args:
        dataset: A tuple of (A_list, X_list, Y_list) representing adjacency matrices, node features, and labels.
        model_class: The model class (e.g., GIN) to be trained.
        params: A dictionary containing hyperparameters for the model.
        k: Number of folds (default: 10).
        
    Returns:
        mean_score: Mean evaluation score across folds.
        std_score: Standard deviation of evaluation scores across folds.
    """
    A_list, X_list, Y_list = dataset
    n_graphs = len(A_list)

    # Ensure reproducibility by seeding
    np.random.seed(42)
    indices = np.random.permutation(n_graphs) # A numpy array containing shuffled indices

    # Prepare k-fold splitting
    kf = KFold(n_splits=k, shuffle=False) # kf is an instance of KFold class

    scores = []

    for fold, (train_idx, test_idx) in enumerate(kf.split(indices)):
        print(f"Processing Fold {fold + 1}/{k}...")

        # Split dataset into training and test sets
        train_A = [A_list[i] for i in train_idx]
        train_X = [X_list[i] for i in train_idx]
        train_Y = [Y_list[i] for i in train_idx]

        test_A = [A_list[i] for i in test_idx]
        test_X = [X_list[i] for i in test_idx]
        test_Y = [Y_list[i] for i in test_idx]

        # Initialize model, optimizer, and loss function
        model = model_class(params["num_layers"], params["input_features"], params["hidden_features"])
        pooling = global_sum_pool
        graph_pred_linear = torch.nn.Linear(params["hidden_features"], params["num_classes"])
        loss_fn = nn.CrossEntropyLoss()

        model_param_group = [{"params": model.parameters(), "lr": params["learning_rate"]}]
        model_param_group.append({"params": graph_pred_linear.parameters(), "lr": params["learning_rate"]})

        optimizer = torch.optim.AdamW(model_param_group, 
                                      lr=params["learning_rate"], 
                                      weight_decay=params["weight_decay"])

        # Train model
        for epoch in range(params["num_epochs"]):
            model.train()
            for adj_matrix, x, y, batch in graph_mini_batch(train_A, train_X, train_Y, params["batch_size"]):
                optimizer.zero_grad()
                edge_index = U.dense_to_sparse(adj_matrix)[0]
                nodes = model(x, edge_index)
                graph_reps = pooling(nodes, batch)
                pred = graph_pred_linear(graph_reps)
                loss = loss_fn(pred, y)
                loss.backward()
                optimizer.step()

        # Evaluate model on the test set
        model.eval()
        correct = 0
        total_num = 0
        with torch.no_grad():
            for adj_matrix, x, y, batch in graph_mini_batch(test_A, test_X, test_Y, params["batch_size"]):
                edge_index = U.dense_to_sparse(adj_matrix)[0]
                nodes = model(x, edge_index)
                graph_reps = pooling(nodes, batch)
                pred = graph_pred_linear(graph_reps)
                correct += (pred.argmax(dim=-1) == y).sum().item()
                total_num += len(y)

        accuracy = correct / total_num
        print(f"Fold {fold + 1} Accuracy: {accuracy:.4f}")
        scores.append(accuracy)

    # Compute mean and standard deviation of scores
    mean_score = np.mean(scores)
    std_score = np.std(scores)

    print(f"\nCross-Validation Results: Mean Accuracy = {mean_score:.4f}, Std = {std_score:.4f}")
    return mean_score, std_score

In [22]:
dataset = (A, X, Y) 
mean_acc, std_acc = cross_validation(dataset, GIN, params, k=10)

Processing Fold 1/10...
Fold 1 Accuracy: 0.8947
Processing Fold 2/10...
Fold 2 Accuracy: 0.7895
Processing Fold 3/10...
Fold 3 Accuracy: 0.7895
Processing Fold 4/10...
Fold 4 Accuracy: 0.7895
Processing Fold 5/10...
Fold 5 Accuracy: 0.8947
Processing Fold 6/10...
Fold 6 Accuracy: 0.7368
Processing Fold 7/10...
Fold 7 Accuracy: 0.9474
Processing Fold 8/10...
Fold 8 Accuracy: 0.6842
Processing Fold 9/10...
Fold 9 Accuracy: 0.9444
Processing Fold 10/10...
Fold 10 Accuracy: 0.7778

Cross-Validation Results: Mean Accuracy = 0.8249, Std = 0.0852
