# **GCN - Lab Practice**

## **Setup**

The installation of Torch and PyG is need for building and training GCN, but if you use colab, torch might be already installed.

In [12]:
!pip install torch
!pip install torch_geometric



In [13]:
import os

import torch
import torch.nn as nn

import torch_geometric
print("PyTorch has version {}".format(torch.__version__))
print("PyG has version {}".format(torch_geometric.__version__))

PyTorch has version 2.4.1+cu121
PyG has version 2.6.1


# **Task 0: PyTorch Geometric (Datasets and Data)**

PyTorch Geometric has two classes for storing and/or transforming graphs into tensor format. One is `torch_geometric.datasets`, which contains a variety of common graph datasets. Another is `torch_geometric.data`, which provides the data handling of graphs in PyTorch tensors.

In this section, we will learn how to use `torch_geometric.datasets`

## PyG Datasets

The `torch_geometric.datasets` class has many common graph datasets. Here we will explore its usage through one example dataset KarateClub.


In [14]:
from torch_geometric.datasets import KarateClub

dataset = KarateClub()

In [15]:
# You will find that dataset type is based on torch_geometric.datasets
# and please check the object class
print(type(dataset))
print(dataset)

print(f"datset has {dataset.num_classes} classes")
print(f"datset has {dataset.num_features} features")

<class 'torch_geometric.datasets.karate.KarateClub'>
KarateClub()
datset has 4 classes
datset has 34 features


In [16]:
print(dataset[0])

Data(x=[34, 34], edge_index=[2, 156], y=[34], train_mask=[34])


In [17]:
graph = dataset[0]

## PyG Graph Data
Graph data generally has

||Meaning|Tensor shape|
|-----|-----|-----|
|graph.x|node feature|[ *N x F* ]|
|graph.edge_index|edge information|[ *2 x E* ]|
|graph.y|node's label|[ *N* ]|


**You can check data.x , data.edge_index, data.y in the following codes**


In [18]:
print(f"Graph(Dataset index 0) has node feature \n{graph.x}")
print(f"Node featureshas tensor shape \n{graph.x.shape}")

Graph(Dataset index 0) has node feature 
tensor([[1., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 1., 0.],
        [0., 0., 0.,  ..., 0., 0., 1.]])
Node featureshas tensor shape 
torch.Size([34, 34])


In [19]:
print(f"Graph(Dataset index 0) has edge_index \n{graph.edge_index}")
print(f"Edge_index has tensor shape \n{graph.edge_index.shape}")

Graph(Dataset index 0) has edge_index 
tensor([[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,
          1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,
          3,  3,  3,  3,  3,  4,  4,  4,  5,  5,  5,  5,  6,  6,  6,  6,  7,  7,
          7,  7,  8,  8,  8,  8,  8,  9,  9, 10, 10, 10, 11, 12, 12, 13, 13, 13,
         13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 20, 20, 21,
         21, 22, 22, 23, 23, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 27, 27,
         27, 27, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31,
         31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
         33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33],
        [ 1,  2,  3,  4,  5,  6,  7,  8, 10, 11, 12, 13, 17, 19, 21, 31,  0,  2,
          3,  7, 13, 17, 19, 21, 30,  0,  1,  3,  7,  8,  9, 13, 27, 28, 32,  0,
          1,  2,  7, 12, 13,  0,  6, 10,  0,  6, 10, 16,  0,  4,  5, 16,  0,  1,
          2,

In [20]:
print(f"Graph(Dataset index 0) has node labels \n{graph.y}")
print(f"Graph label has tensor shape \n{graph.y.shape}")

Graph(Dataset index 0) has node labels 
tensor([1, 1, 1, 1, 3, 3, 3, 1, 0, 1, 3, 1, 1, 1, 0, 0, 3, 1, 0, 1, 0, 1, 0, 0,
        2, 2, 0, 0, 2, 0, 0, 2, 0, 0])
Graph label has tensor shape 
torch.Size([34])


# **Task1: Node Classification with Matrix Multiplication**

In this section to better understand the GCN matrix calculation, we'll do a quick exercise with matrix operations. Then we will apply it to the task of node classification.


It is only for to get the basic idea of GNN(what's going on inside the GNN model), so elaborated training and testing is omitted.

In the later part, we will create a full model using the simpler, and already implemented pyG module GCNConv.



In this time, we will use **Cora dataset**.

The Cora dataset is a citation graph where nodes represent papers and edges represent citations between pairs of papers. The task involved is document classification, where the goal is to classify each paper into one of seven categories, i.e., a multi-class classification problem with seven classes.

In [21]:
from torch_geometric.datasets import Planetoid
root_dir = os.getcwd()
dataset = Planetoid(root = root_dir, name = "Cora")
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

cpu


In [22]:
# You can confirm that Cora is composed of a single large graph.
data = dataset[0]
data = data.to(device)
print(data)

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])


### **GCN Convolutional Networks**

### **$ H^{(l+1)} = \sigma(\tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} H^{(l)} W^{(l)})  \quad where \quad  \tilde{A} = A + I,  \; \tilde{D}_{ii} = \sum_j{\tilde{A}_{ij}}$**

In [23]:
import torch.nn.functional as F
from torch import optim
from tqdm import tqdm

class matGCNLayer(nn.Module):
    def __init__(self, in_features, out_features):
        super(matGCNLayer, self).__init__()

        # self.Linear takes role of parameter W
        self.linear = nn.Linear(in_features, out_features)

    # Adjacency matrix normalization
    def forward(self, x, adj):
        ############# Your code here ############
        # Add self-loops and you must specify device in torch.eye
        adj_hat = adj + torch.eye(adj.size(0),device=device)

        # Degree matrix
        degree = torch.diag(torch.sum(adj_hat, dim=1))

        # D_tilde^(-1/2)
        degree_inv_sqrt = torch.inverse(torch.sqrt(degree))

        # D_tilde^(-1/2)* A_hat * D_tilde^(-1/2)
        # you can use torch.mm to multiply torch.mm(input,mat2)
        # If input is a [n, m] tensor, mat2 is a [m, p] tensor, out will be a [n, p] tensor
        adj_norm = torch.mm(degree_inv_sqrt, torch.mm(adj_hat, degree_inv_sqrt))

        # Graph Convolution: A_hat * X * W
        out = self.linear(torch.mm(adj_norm, x))
        #########################################
        return out

class matGCN(nn.Module):
    def __init__(self, n_features, hidden_size, n_classes):
        super(matGCN, self).__init__()

        self.gc1 = matGCNLayer(n_features, hidden_size)
        self.gc2 = matGCNLayer(hidden_size, n_classes)

        self.softmax = nn.LogSoftmax()

    def forward(self, x, adj):
        ############# Your code here ############
        # First layer
        # We use F.relu here instead of sigmoid
        x = F.relu(self.gc1(x, adj))

        # Second layer
        out = self.gc2(x, adj)

        # predict using softmax
        pred = self.softmax(out)
        #########################################
        return pred

In [24]:
def train(model, graph, adj, epochs=20, lr=0.01):
    optimizer = optim.Adam(model.parameters(), lr=lr)
    loss_fn = nn.CrossEntropyLoss()

    # Set the model to be trained
    model.train()

    labels = graph.y

    for epoch in tqdm(range(epochs)):
        ############# Your code here ############
        # Clear gradients
        optimizer.zero_grad()

        # Perform a single forward pass
        output = model(graph.x, adj)

        # Compute the loss solely based on the training nodes
        # You should use datasets for train by masking the ouput and labels
        loss = loss_fn(output[graph.train_mask], labels[graph.train_mask])

        # Derive gradients
        loss.backward()

        # Update parameters based on gradients
        optimizer.step()

        #########################################

        print(f'Epoch {epoch}, Loss: {loss.item()}')

def test(model, graph, adj):
    # Set the model to be evaluated
    model.eval()
    with torch.no_grad():
        ############# Your code here ############
        # Perform a single forward pass
        out = model(graph.x, adj)

        # Prediction would be the class with highest probability(you can use argmax)
        pred = out.argmax(dim=1)

        # Count the number of correct prediction against ground truth
        test_correct = torch.sum(pred[graph.test_mask] == graph.y[graph.test_mask])

        # Derive ratio of correct predictions
        test_acc = test_correct / int(graph.test_mask.sum())
        #########################################

    print(f'Accuracy: {test_acc:.4f}')
    return test_acc

In [25]:
from torch_geometric.utils import to_dense_adj

# Build the model and adjacency matrix
model = matGCN(n_features=data.num_features, hidden_size=16, n_classes=7).to(device)

# By utilizing to_dense_adj, we can construct adjacency matrix easily
adj = to_dense_adj(data.edge_index).squeeze()

In [26]:
# Let's just see the training process simply
train(model, data, adj)

  return self._call_impl(*args, **kwargs)
  5%|▌         | 1/20 [00:12<03:52, 12.25s/it]

Epoch 0, Loss: 1.9604758024215698


 10%|█         | 2/20 [00:18<02:35,  8.63s/it]

Epoch 1, Loss: 1.906951904296875


 15%|█▌        | 3/20 [00:24<02:06,  7.46s/it]

Epoch 2, Loss: 1.8220713138580322


 20%|██        | 4/20 [00:29<01:45,  6.60s/it]

Epoch 3, Loss: 1.7341594696044922


 25%|██▌       | 5/20 [00:35<01:34,  6.30s/it]

Epoch 4, Loss: 1.641500473022461


 30%|███       | 6/20 [00:40<01:22,  5.92s/it]

Epoch 5, Loss: 1.5400410890579224


 35%|███▌      | 7/20 [00:45<01:11,  5.53s/it]

Epoch 6, Loss: 1.4325940608978271


 40%|████      | 8/20 [00:51<01:08,  5.71s/it]

Epoch 7, Loss: 1.3219056129455566


 45%|████▌     | 9/20 [00:56<00:59,  5.44s/it]

Epoch 8, Loss: 1.2093298435211182


 50%|█████     | 10/20 [01:01<00:55,  5.51s/it]

Epoch 9, Loss: 1.0967273712158203


 55%|█████▌    | 11/20 [01:07<00:48,  5.42s/it]

Epoch 10, Loss: 0.9857112169265747


 60%|██████    | 12/20 [01:11<00:41,  5.22s/it]

Epoch 11, Loss: 0.8776938319206238


 65%|██████▌   | 13/20 [01:18<00:38,  5.49s/it]

Epoch 12, Loss: 0.7740538716316223


 70%|███████   | 14/20 [01:22<00:31,  5.28s/it]

Epoch 13, Loss: 0.6762350797653198


 75%|███████▌  | 15/20 [01:28<00:27,  5.41s/it]

Epoch 14, Loss: 0.5853884220123291


 80%|████████  | 16/20 [01:33<00:21,  5.34s/it]

Epoch 15, Loss: 0.5025535821914673


 85%|████████▌ | 17/20 [01:39<00:15,  5.33s/it]

Epoch 16, Loss: 0.4282490909099579


 90%|█████████ | 18/20 [01:45<00:11,  5.59s/it]

Epoch 17, Loss: 0.3626924455165863


 95%|█████████▌| 19/20 [01:50<00:05,  5.34s/it]

Epoch 18, Loss: 0.30567166209220886


100%|██████████| 20/20 [01:55<00:00,  5.80s/it]

Epoch 19, Loss: 0.25674518942832947





In [27]:
test(model,data,adj)

Accuracy: 0.7880


tensor(0.7880)

# **Task 2: Node Classification with GCNConv**


## Load and Split the Dataset



In [28]:
import pandas as pd
import torch.nn.functional as F

from torch_geometric.nn import GCNConv

## GCN Model

Now we will construct our GCN model!


## Why we use torch.nn.ModuleList?

We need to let PyTorch know that the modules exist by putting them in nn.ModuleList.
If we don't put them in nn.ModuleList and only put them in the Python list, PyTorch won't know they exist.
In this case, you will get an error like "your model has no parameter" when declaring optimzier and passing parameters to model.parameter().
So if you keep your modules in a Python list, make sure to wrap them in nn.ModuleList at the end.


In [29]:
class GCN(torch.nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, dropout,
                return_embeds=False):
        super(GCN, self).__init__()

        self.convs = None
        self.softmax = None

        ############# Your code here ############
        # GCNConv() is used to represent a GCN layer
        # Convolutional layers with input layer, 2 hidden layers, and output layer

        self.conv1 = GCNConv(input_dim, hidden_dim)
        self.hidden1 = GCNConv(hidden_dim, hidden_dim)
        self.hidden2 = GCNConv(hidden_dim, hidden_dim)
        self.conv2 = GCNConv(hidden_dim, output_dim)

        # To manage a list of modules, torch.nn.ModuleList() is recommended
        # Fill in the parameter of torch.nn.ModuleList() --> torch.nn.ModuleList([layer_1] + [layer_2] + ... + [layer_n])
        self.convs = torch.nn.ModuleList([self.conv1] + [self.hidden1] + [self.hidden2] + [self.conv2])

        #########################################

        # We define softmax to pass the last layer output through
        self.softmax = torch.nn.LogSoftmax()

        # Probability of an element getting zeroed
        self.dropout = dropout

        # Skip classification layer and return node embeddings
        self.return_embeds = return_embeds

    def reset_parameters(self):
        for conv in self.convs:
            conv.reset_parameters()

    def forward(self, x, edge_index):
        out = None

        ############# Your code here ############
        # Each layer output is put into the F.relu,
        # and during training process, prevent overfitting by using F.dropout

        for conv in self.convs[:-1]:
            # Make the input to pass a layer and activation function
            x1 = F.relu(conv(x,edge_index))
            if self.training:
                # Dropout with probability p : you can use self.dropout as parameter p
                x1 = F.dropout(x1, p =self.dropout)
            x = x1
        # The last layer output doesn't pass the activation function and dropout layer
        x = self.convs[-1](x,edge_index)
        out = x if self.return_embeds else self.softmax(x)
        #########################################

        return out

In [30]:
train_mask = data.train_mask
val_mask = data.val_mask
test_mask = data.test_mask

In [31]:
def train(model, data, optimizer, loss_fn):

    model.train()
    loss = 0

    ############# Your code here ############
    # Clear gradients
    optimizer.zero_grad()

    # Perform a single forward pass
    output = model(data.x, data.edge_index)

    # Compute the loss solely based on the training nodes
    # Predictions & labels are specified as output[train_mask] & data.y[train_mask], respectively
    loss = loss_fn(output[train_mask], data.y[train_mask])

    # Derive gradients
    loss.backward()

    # Update parameters based on gradients
    optimizer.step()
    #########################################
    return loss.item()

In [32]:
def test(model,data, mode):
    # TODO: Implement a function that tests the model by
    model.eval()

    out = None

    if mode == "val":
        mask = val_mask
    elif mode == "test":
        mask = test_mask
    else: assert False, "mode should be 'val' or 'test'"

    with torch.no_grad():
        ############# Your code here ############
        # Perform a single forward pass
        out = model(data.x, data.edge_index)

        # Prediction would be the class with highest probability(you can use argmax)
        pred = out.argmax(dim=1)

        # Count the number of correct prediction against ground truth
        test_correct = torch.sum(pred[mask] == data.y[mask])

        # Derive ratio of correct predictions
        test_acc = test_correct / int(mask.sum())
        #########################################

    return test_acc

In [33]:
args = {
    'device': device,
    'hidden_dim': 256,
    'dropout': 0.2,
    'lr': 0.001,
    'epochs': 50,
}
args

{'device': 'cpu', 'hidden_dim': 256, 'dropout': 0.2, 'lr': 0.001, 'epochs': 50}

In [34]:
model = GCN(data.num_features, args['hidden_dim'], dataset.num_classes, args['dropout']).to(device)

In [35]:
import copy

# reset the parameters to initial random value
model.reset_parameters()

optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
loss_fn = torch.nn.CrossEntropyLoss()

best_model = None
best_valid_acc = 0

for epoch in range(1, 1 + args["epochs"]):
    loss = train(model, data, optimizer, loss_fn)
    if epoch % 5 == 0:
        val_acc = test(model, data, "val")
        print(f'Epoch: {epoch:02d}, '
        f'Loss: {loss:.4f}, '
        f'Val_acc: {100*val_acc}%')

        if best_valid_acc < val_acc:
            best_valid_acc = val_acc
            best_model = copy.deepcopy(model)

  return self._call_impl(*args, **kwargs)


Epoch: 05, Loss: 1.7954, Val_acc: 68.4000015258789%
Epoch: 10, Loss: 1.3508, Val_acc: 78.0%
Epoch: 15, Loss: 0.6127, Val_acc: 79.79999542236328%
Epoch: 20, Loss: 0.1874, Val_acc: 78.79999542236328%
Epoch: 25, Loss: 0.0572, Val_acc: 77.80000305175781%
Epoch: 30, Loss: 0.0246, Val_acc: 76.80000305175781%
Epoch: 35, Loss: 0.0132, Val_acc: 76.4000015258789%
Epoch: 40, Loss: 0.0073, Val_acc: 75.80000305175781%
Epoch: 45, Loss: 0.0041, Val_acc: 76.0%
Epoch: 50, Loss: 0.0011, Val_acc: 76.5999984741211%


In [36]:
best_result = test(best_model, data, "test")
print(f'Best model: '
    f'Test: {100 * best_result:.2f}%')

Best model: Test: 81.20%


## **(Optional) Task 3: Implication of to_dense_adj on 2 dimension**
## **to_dense_adj**

We can easily use to_dense_adj defined in torch_geometric when transforming edge_index to adjacency matrix. If we have a good understanding of the relationship between edge_index and adj matrix, can we create our own function similarily? Of course, ours is a simplified version, no batch, and can only be used for cora datasets where max_node_num(node_num) is already known.

We could create the adj matrix by simply putting a 1 in the index corresponding to each pair of edge_index, but let's use torch_geometric.scatter to calculate in a more computationally efficient way.

https://pytorch-geometric.readthedocs.io/en/2.3.1/modules/utils.html#torch_geometric.utils.to_dense_adj


## **torch_geometric.utils.scatter**

    def scatter(src: Tensor, index: Tensor, dim: int = 0,
                dim_size: Optional[int] = None, reduce: str = 'sum') -> Tensor:

As torch package has scatter, torch_geometric has scatter function.
The scatter function in PyG aggregates a tensor of data according to given indices. It supports aggregation methods like sum, mean, max, and min, making it useful for summarizing node or edge features in graph data.

https://pytorch-geometric.readthedocs.io/en/latest/modules/utils.html#torch_geometric.utils.scatter

***Be sure to read this page before going to next step:*** https://pytorch-scatter.readthedocs.io/en/latest/functions/scatter.html





In [37]:
from torch_geometric.utils import scatter

def to_dense_adj(edge_index, num_nodes):
    # adjacency matrix size
    size = [num_nodes, num_nodes]

    # Row index(idx1) & column index(idx2)
    idx1 = edge_index[0]
    idx2 = edge_index[1]

    # value 1s to be scattered
    src = torch.ones(edge_index.size(1), device=edge_index.device)

    ############# Your code here ############
    # Output dimension of scatter function
    flattened_size = num_nodes * num_nodes

    # Parameter index of scatter function
    idx = idx1 * num_nodes + idx2

    # Scatter
    flatten_adj = scatter(src, idx, dim=0, dim_size=flattened_size, reduce='sum')

    # Resize [num_nodes * num_nodes] to [num_nodes, num_nodes]
    adj = flatten_adj.view(size)
    #########################################
    return adj