<a href="https://colab.research.google.com/github/reiniscimurs/gnn_with_pytorch/blob/main/section_4/03_exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise
Training using the dataset "ENZYMES" with the GCN model. Please provide code for constructing the model and training it. Since GPU will be used for learning, select "GPU" in "Edit" -> "Notebook Settings" -> "Hardware Accelerator."

## Installing PyTorch Geometric
Install the library "PyTorch Geometric" for Graph Neural Networks (GNN) along with related libraries.

In [None]:
!pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
!pip install torch-geometric
!pip install scipy==1.8.0

## Loading the Dataset
Load the dataset 'ENZYMES' from the TUDataset, which contains 600 graphs.


In [None]:
from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader

# ENZYMES dataset loading from TUDataset with root directory set to "/tmp/ENZYMES"
dataset = TUDataset(root="/tmp/ENZYMES", name="ENZYMES")

print("Number of graphs:", len(dataset))
print("Number of classes:", dataset.num_classes)

dataset = dataset.shuffle()  # Shuffle the dataset
dataset_train = dataset[:500]  # Training dataset
dataset_test = dataset[500:]  # Test dataset

batch_size = 64  # Batch size
loader_train = DataLoader(dataset_train, batch_size=batch_size, shuffle=True)
loader_test = DataLoader(dataset_test, batch_size=batch_size, shuffle=False)

## Model Construction
Please add code to the following cell to complete the class for the GCN model. Utilize `GCNConv()` and `nn.Linear()`.

https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.GCNConv  
https://pytorch.org/docs/stable/generated/torch.nn.Linear.html  

In [None]:
import torch
import torch.nn as nn
from torch_geometric.nn import GCNConv, global_mean_pool

n_h = 64  # Number of features in the hidden layer

class GCN(nn.Module):
    def __init__(self):
        super().__init__()
        # ------- Add code below -------



        # ------- Add code above -------

        self.relu = nn.ReLU()  # ReLU
        self.dropout = nn.Dropout(p=0.5)  # Dropout: (p=dropout rate)

    def forward(self, data):
        x = data.x
        edge_index = data.edge_index
        batch = data.batch

        x = self.conv1(x, edge_index)
        x = self.relu(x)
        x = self.conv2(x, edge_index)
        x = self.relu(x)
        x = self.conv3(x, edge_index)

        # Take the mean of each feature across all nodes
        x = global_mean_pool(x, batch)  # Convert to (batch_size, number of features)

        x = self.dropout(x)

        # Add fully connected layers if needed
        # Example:
        # x = self.lin1(x)
        # x = self.lin2(x)

        return x

net = GCN()
net.cuda()  # GPU support

## Training

Prepare a function for model evaluation.

In [None]:
def evaluate(loader):
    correct = 0  # Number of correct predictions

    for data in loader:
        data = data.cuda()  # GPU support
        out = net(data)
        pred = out.argmax(dim=1)
        correct += int((pred == data.y).sum())

    return correct / len(loader.dataset)  # Accuracy

Let's train the model using the training data of 'ENZYMES'. Please add code to the following cell to train the model using mini-batch method.  

In [None]:
from torch import optim

# Cross-entropy loss function
loss_fnc = nn.CrossEntropyLoss()

# Optimization algorithm
optimizer = optim.Adam(net.parameters())

for epoch in range(1000):
    # Training
    net.train()  # Set to training mode
    # ------- Add code below -------




    # ------- Add code above -------

    # Evaluation
    net.eval()  # Set to evaluation mode
    acc_train = evaluate(loader_train)
    acc_test = evaluate(loader_test)
    print("Epoch:", epoch,
          "acc_train:", str(acc_train*100) + "%",
          "acc_test:", str(acc_test*100) + "%")

## Model Evaluation
Evaluate the trained model.

In [None]:
net.eval()  # Set to evaluation mode
acc_test = evaluate(loader_test)
print("Accuracy:", str(acc_test*100) + "%")

### Example Solution
The following is an example solution.

In [None]:
import torch
import torch.nn as nn
from torch_geometric.nn import GCNConv, global_mean_pool

n_h = 64  # Number of features in the hidden layer

class GCN(nn.Module):
    def __init__(self):
        super().__init__()
        # ------- Add code below -------
        self.conv1 = GCNConv(dataset.num_node_features, n_h)
        self.conv2 = GCNConv(n_h, n_h)
        self.conv3 = GCNConv(n_h, n_h)
        self.fc = nn.Linear(n_h, dataset.num_classes)  # Fully connected layer
        # ------- Add code above -------

        self.relu = nn.ReLU()  # ReLU
        self.dropout = nn.Dropout(p=0.5)  # Dropout: (p=dropout rate)

    def forward(self, data):
        x = data.x
        edge_index = data.edge_index
        batch = data.batch

        x = self.conv1(x, edge_index)
        x = self.relu(x)
        x = self.conv2(x, edge_index)
        x = self.relu(x)
        x = self.conv3(x, edge_index)

        # Take the mean of each feature across all nodes
        x = global_mean_pool(x, batch)  # Convert to (batch_size, number of features)

        x = self.dropout(x)
        x = self.fc(x)

        return x

net = GCN()
net.cuda()  # GPU support

In [None]:
from torch import optim

# Cross-entropy loss function
loss_fnc = nn.CrossEntropyLoss()

# Optimization algorithm
optimizer = optim.Adam(net.parameters())

for epoch in range(1000):
    # Training
    net.train()  # Set to training mode
    # ------- Add code below -------
    for data in loader_train:
        data = data.cuda()  # GPU support

        optimizer.zero_grad()  # Step ①: Initialize gradients
        out = net(data)  # Step ②: Obtain predictions through forward pass
        loss = loss_fnc(out, data.y)  # Step ③: Calculate loss from predictions and ground truth

        loss.backward()  # Step ④: Backpropagate gradients to calculate gradients
        optimizer.step()  # Step ⑤: Update parameters using the optimization algorithm
    # ------- Add code above -------

    # Evaluation
    net.eval()  # Set to evaluation mode
    acc_train = evaluate(loader_train)
    acc_test = evaluate(loader_test)
    print("Epoch:", epoch,
          "acc_train:", str(acc_train*100) + "%",
          "acc_test:", str(acc_test*100) + "%")