<a href="https://colab.research.google.com/github/reiniscimurs/gnn_with_pytorch/blob/main/section_4/02_gcn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementation of GCN
We will implement Graph Convolutional Networks (GCN). Utilizing a dataset with multiple graphs as training data, learning will be performed through mini-batch training.
As GPU will be used for training, let's select "GPU" under "Hardware Accelerator" in "Runtime" -> "Change runtime type."

## Installation of PyTorch Geometric
Install the library "PyTorch Geometric" for Graph Neural Networks (GNN), as well as related libraries.

In [None]:
!pip install pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
!pip install torch-geometric
!pip install scipy==1.8.0

## Loading the Dataset
Load the dataset "MUTAG" from TUDataset, which contains 188 graphs.  


In [None]:
from torch_geometric.datasets import TUDataset
from torch_geometric.loader import DataLoader

dataset = TUDataset(root="/tmp/MUTAG", name="MUTAG")

dataset = dataset.shuffle()  # Shuffle the dataset
dataset_train = dataset[:140]  # Training dataset
dataset_test = dataset[140:]  # Test dataset

batch_size = 64  # Batch size
loader_train = DataLoader(dataset_train, batch_size=batch_size, shuffle=True)
loader_test = DataLoader(dataset_test, batch_size=batch_size, shuffle=False)

## Model Construction
We will build the model for GCN.
For the layer implementation, we will use GCNConv() and configure it as follows:
```
GCNConv(input_feature_size, output_feature_size)
```  
https://pytorch-geometric.readthedocs.io/en/latest/modules/nn.html#torch_geometric.nn.conv.GCNConv  
This time, we will introduce "dropout," which randomly removes neurons before the fully connected layer.
By introducing dropout, we can train the model to be more robust to unknown data.  
https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html  

In [None]:
import torch
import torch.nn as nn
from torch_geometric.nn import GCNConv
from torch_geometric.nn import global_mean_pool

n_h = 64  # Number of features in the hidden layer

class GCN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = GCNConv(dataset.num_node_features, n_h)
        self.conv2 = GCNConv(n_h, n_h)
        self.conv3 = GCNConv(n_h, n_h)
        self.fc = nn.Linear(n_h, dataset.num_classes)  # Fully connected layer

        self.relu = nn.ReLU()  # ReLU
        self.dropout = nn.Dropout(p=0.5)  # Dropout: (p=dropout rate)

    def forward(self, data):
        x = data.x
        edge_index = data.edge_index
        batch = data.batch

        x = self.conv1(x, edge_index)
        x = self.relu(x)
        x = self.conv2(x, edge_index)
        x = self.relu(x)
        x = self.conv3(x, edge_index)

        # Take the mean of each feature for all nodes
        x = global_mean_pool(x, batch)  # Convert to (batch size, number of features)

        x = self.dropout(x)
        x = self.fc(x)

        return x

net = GCN()
net.cuda()  # Enable GPU support


`global_mean_pool()` takes the mean of each feature along the "node dimension," transforming the data into the shape (batch size, number of features).

## Training

Prepare a function for model evaluation.

In [None]:
def eval(loader):
    correct = 0  # Number of correct predictions

    for data in loader:
        data = data.cuda()  # GPU support
        out = net(data)
        pred = out.argmax(dim=1)
        correct += int((pred == data.y).sum())

    return correct / len(loader.dataset)  # Accuracy

Train the model using the training data.

In [None]:
from torch import optim

# Cross-entropy loss function
loss_fnc = nn.CrossEntropyLoss()

# Optimization algorithm
optimizer = optim.Adam(net.parameters())

for epoch in range(200):
    # Training
    net.train()  # Training mode
    for data in loader_train:
        data = data.cuda()  # GPU support

        optimizer.zero_grad()  # Step 1: Initialize gradients
        out = net(data)  # Step 2: Forward pass to obtain predictions
        loss = loss_fnc(out, data.y)  # Step 3: Compute loss from predictions and ground truth

        loss.backward()  # Step 4: Backpropagation to compute gradients
        optimizer.step()  # Step 5: Update parameters using the optimization algorithm

    # Evaluation
    net.eval()  # Evaluation mode
    acc_train = eval(loader_train)
    acc_test = eval(loader_test)
    print("Epoch:", epoch,
          "acc_train:", str(acc_train * 100) + "%",
          "acc_test:", str(acc_test * 100) + "%")


## Model Evaluation
Evaluate the trained model.

In [None]:
net.eval()  # Evaluation mode
acc_test = eval(loader_test)
print("Accuracy:", str(acc_test * 100) + "%")