# **CS224W - Colab 2**

In this Colab, we will construct our own graph neural network by using PyTorch Geometric (PyG) and apply the model on two of Open Graph Benchmark (OGB) datasets. Those two datasets are used to benchmark the model performance on two different graph-related tasks. One is node property prediction, predicting properties of single nodes. Another one is graph property prediction, predicting the entire graphs or subgraphs.

At first, we will learn how PyTorch Geometric stores the graphs in PyTorch tensor.

We will then load and take a quick look on one of the Open Graph Benchmark (OGB) datasets by using the `ogb` package. OGB is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. The `ogb` package not only provides the data loader of the dataset but also the evaluator.

At last, we will build our own graph neural networks by using PyTorch Geometric. And then apply and evaluate the models on node property prediction and grpah property prediction tasks.

**Note**: Make sure to **sequentially run all the cells in each section**, so that the intermediate variables / packages will carry over to the next cell

Have fun on Colab 2 :)

# Device
You might need to use GPU for this Colab.

Please click `Runtime` and then `Change runtime type`. Then set the `hardware accelerator` to **GPU**.

# Installation

In [2]:
!pip install -q torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu101.html
!pip install -q torch-sparse -f https://pytorch-geometric.com/whl/torch-1.7.0+cu101.html
!pip install -q torch-geometric
!pip install ogb

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/108.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m108.0/108.0 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for torch-scatter (setup.py) ... [?25l[?25hcanceled
[31mERROR: Operation cancelled by user[0m[31m
[0mTraceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 423, in run
    _, build_failures = build(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/wheel_builder.py", line 319, in build
    wheel_file = _build_one

# 1 PyTorch Geometric (Datasets and Data)


PyTorch Geometric generally has two classes for storing or transforming the graphs into tensor format. One is the `torch_geometric.datasets`, which contains a variety of common graph datasets. Another one is `torch_geometric.data` that provides the data handling of graphs in PyTorch tensors.

In this section, we will learn how to use the `torch_geometric.datasets` and `torch_geometric.data`.

## PyG Datasets

The `torch_geometric.datasets` has many common graph datasets. Here we will explore the usage by using one example dataset.

In [3]:
# TUDataset is a dataset class provided by PyG that contains various graph datasets typically used for graph classification tasks.
from torch_geometric.datasets import TUDataset

root = './enzymes' # The root variable specifies the directory where the dataset will be saved (or loaded from if it has already been downloaded).
name = 'ENZYMES' # The name variable holds the name of the dataset.
# In this case, we're using the ENZYMES dataset, which is a collection of biochemical data where each graph represents a molecule.
# Each node typically corresponds to an atom, and edges represent bonds between atoms.

# The ENZYMES dataset
pyg_dataset= TUDataset('./enzymes', 'ENZYMES')

Downloading https://www.chrsmrrs.com/graphkerneldatasets/ENZYMES.zip
Processing...
Done!


In [4]:
# You can find that there are 600 graphs in this dataset
print(pyg_dataset)

ENZYMES(600)


In [5]:
pyg_dataset[0] # Each graph object in PyTorch Geometric has attributes like y (for labels/classes) and x (for features).

Data(edge_index=[2, 168], x=[37, 3], y=[1])

In [6]:
pyg_dataset[0].edge_index

tensor([[ 0,  0,  0,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,
          3,  4,  4,  4,  4,  5,  5,  5,  5,  5,  6,  6,  6,  6,  7,  7,  7,  7,
          7,  8,  8,  8,  9,  9,  9,  9,  9, 10, 10, 10, 10, 11, 11, 11, 11, 12,
         12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 15, 16, 16,
         16, 16, 17, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 20,
         21, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25,
         25, 25, 25, 25, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 28, 28, 28, 28,
         28, 28, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 31, 31, 31, 32,
         32, 32, 32, 33, 33, 33, 33, 33, 33, 34, 34, 34, 34, 34, 34, 35, 35, 35,
         35, 35, 36, 36, 36, 36],
        [ 1,  2,  3,  0,  2,  3, 24, 27,  0,  1,  3, 27, 28,  0,  1,  2,  4,  5,
         28,  3,  5,  6, 29,  3,  4,  6,  7, 29,  4,  5,  7,  8,  5,  6,  8,  9,
         10,  6,  7,  9,  7,  8, 10, 11, 12,  7,  9, 11, 12,  9, 10, 12, 26

In [7]:
pyg_dataset[0].edge_index.shape

torch.Size([2, 168])

In [8]:
pyg_dataset[0].edge_index.shape[1] # number of edges in graph 0

168

In [9]:
pyg_dataset[0].x

tensor([[1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.]])

In [10]:
pyg_dataset[0].y

tensor([5])

In [11]:
pyg_dataset[0].y.item()  # Convert tensor to an integer

5

In [12]:
pyg_dataset[599]

Data(edge_index=[2, 156], x=[48, 3], y=[1])

In [13]:
pyg_dataset[599].edge_index

tensor([[ 1,  1,  2,  3,  3,  4,  4,  4,  5,  6,  7,  7,  8,  8,  9, 10, 10, 11,
         12, 13, 13, 14, 14, 15, 15, 15, 15, 16, 16, 17, 17, 18, 19, 19, 20, 21,
         21, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25,
         25, 25, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28,
         28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 31,
         31, 31, 31, 31, 31, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 34, 34, 34,
         34, 35, 35, 35, 35, 36, 36, 36, 37, 37, 37, 38, 38, 38, 39, 39, 39, 39,
         40, 40, 40, 40, 41, 41, 41, 41, 42, 42, 42, 43, 43, 43, 44, 44, 44, 45,
         45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 47, 47],
        [24, 26, 25,  4, 29,  3, 27, 29, 28, 30, 30, 32, 23, 31, 35, 33, 35, 34,
         36, 36, 38, 23, 37, 24, 29, 30, 41, 39, 41, 23, 40, 42, 42, 44, 43, 22,
         47, 21, 45, 47,  8, 14, 17, 46,  1, 15, 29, 35, 41, 47,  2, 26, 28, 34,
         40, 46,  1, 25, 27, 33, 39, 45,  4, 26, 28

In [14]:
pyg_dataset[599].x

tensor([[1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0

In [15]:
pyg_dataset[599].y

tensor([3])

In [16]:
pyg_dataset[599].y.item()  # Convert tensor to an integer

3

## Question 1: What is the number of classes and number of features in the ENZYMES dataset? (5 points)

In [17]:
# we can utilize the built-in attributes of the PyTorch Geometric dataset.
print("{} dataset has {} classes".format(name, pyg_dataset.num_classes))
print("{} dataset has {} features (node attributes) per node".format(name, pyg_dataset.num_node_features))

ENZYMES dataset has 6 classes
ENZYMES dataset has 3 features (node attributes) per node


In [18]:
def get_num_classes(pyg_dataset):
  # TODO: Implement this function that takes a PyG dataset object
  # and return the number of classes for that dataset.


  # we can utilize the built-in attributes of the PyTorch Geometric dataset.
  num_classes = pyg_dataset.num_classes

  return num_classes

def get_num_features(pyg_dataset):
  # TODO: Implement this function that takes a PyG dataset object
  # and return the number of features for that dataset.


  # we can utilize the built-in attributes of the PyTorch Geometric dataset.
  num_features = pyg_dataset.num_node_features

  return num_features

# You may find that some information need to be stored in the dataset level,
# specifically if there are multiple graphs in the dataset

## PyG Data

Each PyG dataset usually stores a list of `torch_geometric.data.Data` objects. Each `torch_geometric.data.Data` object usually represents a graph. You can easily get the `Data` object by indexing on the dataset.

For more information such as what will be stored in `Data` object, please refer to the [documentation](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Data).

## Question 2: What is the label of the graph (index 100 in the ENZYMES dataset)? (5 points)

In [19]:
def get_graph_class(pyg_dataset, idx):
  # TODO: Implement this function that takes a PyG dataset object,
  # the index of the graph in dataset, and returns the class/label
  # of the graph (in integer).

  # Access the graph at index `idx` and retrieve its class label (data.y)
  label = pyg_dataset[idx].y.item()  # Convert tensor to an integer

  return label

In [20]:
# Here pyg_dataset is a dataset for graph classification
graph_0 = pyg_dataset[0]
print(graph_0)
idx = 100
label = get_graph_class(pyg_dataset, idx)
print('Graph with index {} has label {}'.format(idx, label))

Data(edge_index=[2, 168], x=[37, 3], y=[1])
Graph with index 100 has label 4


## Question 3: What is the number of edges for the graph (index 200 in the ENZYMES dataset)? (5 points)

In [21]:
def get_graph_num_edges(pyg_dataset, idx):
  # TODO: Implement this function that takes a PyG dataset object,
  # the index of the graph in dataset, and returns the number of
  # edges in the graph (in integer). You should not count an edge
  # twice if the graph is undirected. For example, in an undirected
  # graph G, if two nodes v and u are connected by an edge, this edge
  # should only be counted once.

  # Access the graph at index `idx`
  data = pyg_dataset[idx]

  # Each edge is represented as two nodes (source, target) in `edge_index`
  # We count each edge once in undirected graphs, so we divide by 2
  num_edges = data.edge_index.shape[1] // 2  # Divide by 2 for undirected graphs

  return num_edges

In [22]:
idx = 200
num_edges = get_graph_num_edges(pyg_dataset, idx)
print('Graph with index {} has {} edges'.format(idx, num_edges))

Graph with index 200 has 53 edges


# 2 Open Graph Benchmark (OGB)

The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. Its datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can also be evaluated by using the OGB Evaluator in a unified manner.

## Dataset and Data

OGB also supports the PyG dataset and data. Here we take a look on the `ogbn-arxiv` dataset.

In [23]:
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset

dataset_name = 'ogbn-arxiv'
#The ogbn-arxiv dataset is a directed graph, representing the citation network between all Computer Science (CS) arXiv papers
# indexed by MAG. Each node is an arXiv paper and each directed edge indicates that one paper cites another one.
# Each paper comes with a 128-dimensional feature vector obtained by averaging the embeddings of words in its title and abstract.
# The embeddings of individual words are computed by running the skip-gram model over the MAG corpus.
# Formally, the task is to predict the primary categories of the arXiv papers, which is formulated as a 40-class classification problem.


# Load the dataset and transform it to sparse tensor
dataset = PygNodePropPredDataset(name=dataset_name, transform=T.ToSparseTensor())

Downloading http://snap.stanford.edu/ogb/data/nodeproppred/arxiv.zip


Downloaded 0.08 GB: 100%|██████████| 81/81 [00:13<00:00,  5.95it/s]


Extracting dataset/arxiv.zip


Processing...


Loading necessary files...
This might take a while.
Processing graphs...


100%|██████████| 1/1 [00:00<00:00, 5047.30it/s]


Converting graphs into PyG objects...


100%|██████████| 1/1 [00:00<00:00, 2538.92it/s]

Saving...



Done!
  self.data, self.slices = torch.load(self.processed_paths[0])


In [24]:
print('The {} dataset has {} graph'.format(dataset_name, len(dataset)))

The ogbn-arxiv dataset has 1 graph


In [25]:
dataset[0]

  adj = torch.sparse_csr_tensor(


Data(num_nodes=169343, x=[169343, 128], node_year=[169343, 1], y=[169343, 1], adj_t=[169343, 169343])

In [26]:
dataset.num_node_features

128

In [27]:
dataset.num_classes

40

In [28]:
dataset[0].node_year

tensor([[2013],
        [2015],
        [2014],
        ...,
        [2020],
        [2020],
        [2020]])

In [29]:
dataset[0].y

tensor([[ 4],
        [ 5],
        [28],
        ...,
        [10],
        [ 4],
        [ 1]])

In [30]:
dataset[0].adj_t

tensor(crow_indices=tensor([      0,     289,     290,  ..., 1166240,
                            1166243, 1166243]),
       col_indices=tensor([   411,    640,   1162,  ...,  30351,  35711,
                           103121]),
       values=tensor([1., 1., 1.,  ..., 1., 1., 1.]), size=(169343, 169343),
       nnz=1166243, layout=torch.sparse_csr)

# 3 GNN: Node Property Prediction

In this section we will build our first graph neural network by using PyTorch Geometric and apply it on node property prediction (node classification).

We will build the graph neural network by using GCN operator ([Kipf et al. (2017)](https://arxiv.org/pdf/1609.02907.pdf)).

You should use the PyG built-in `GCNConv` layer directly.

## Setup

In [31]:
import torch # Imports the PyTorch library for building deep learning models.

import torch.nn.functional as F # Imports additional functionalities from PyTorch's nn module, commonly used for activation functions and other utilities

print(torch.__version__)

2.4.1+cu121


In [32]:
from torch_geometric.nn import GCNConv # Imports the GCNConv class from PyTorch Geometric's nn module, which is the core building block of Graph Convolutional Networks (GCNs).

import torch_geometric.transforms as T # Imports data transformation functions from PyTorch Geometric's transforms module. These functions are used to pre-process graph data for use in GCNs.

from ogb.nodeproppred import PygNodePropPredDataset, Evaluator # Imports the PygNodePropPredDataset and Evaluator classes from the Open Graph Benchmark (OGB) library for node property prediction tasks.

## Load and Preprocess the Dataset

In [33]:
dataset_name='ogbn-arxiv' # Defines the dataset name to be loaded, which is "ogbn-arxiv" in this case. This dataset contains bibliographic information about research papers.

dataset=PygNodePropPredDataset(name=dataset_name, # Loads the specified dataset using the PygNodePropPredDataset class.
                               transform=T.Compose([T.ToUndirected(), T.ToSparseTensor()])) # This argument specifies that the data should be converted to a sparse tensor format, which can be more efficient for GCNs.

# Transposes the adjacency matrix (adj_t) and assigns it back to the same variable,
# ensuring the adjacency matrix is symmetric (since edges in graphs can be bidirectional).
# While some GCN implementations can handle non-symmetric adjacency matrices,
# it's sometimes a recommended practice to enforce symmetry.

data=dataset[0]


# (old version of PyG)
#dataset=PygNodePropPredDataset(name=dataset_name, transform=T.ToSparseTensor())  # (old version of PyG)
# Make the adjacency matrix symmetric (no need anymore)
#data.adj_t=data.adj_t.to_symmetric()  # (old version of PyG)

  self.data, self.slices = torch.load(self.processed_paths[0])


In [34]:
dataset

PygNodePropPredDataset()

In [35]:
dataset[0]

Data(num_nodes=169343, x=[169343, 128], node_year=[169343, 1], y=[169343, 1], adj_t=[169343, 169343])

In [36]:
print('The {} dataset has {} graph'.format(dataset_name, len(dataset)))

The ogbn-arxiv dataset has 1 graph


In [37]:
dataset.num_node_features

128

In [38]:
dataset.num_classes

40

In [39]:
dataset[0].node_year

tensor([[2013],
        [2015],
        [2014],
        ...,
        [2020],
        [2020],
        [2020]])

In [40]:
dataset[0].y

tensor([[ 4],
        [ 5],
        [28],
        ...,
        [10],
        [ 4],
        [ 1]])

In [43]:
dataset[0].adj_t

tensor(crow_indices=tensor([      0,     291,     293,  ..., 2315576,
                            2315596, 2315598]),
       col_indices=tensor([   411,    640,   1162,  ..., 163274,  27824,
                           158981]),
       values=tensor([1., 1., 1.,  ..., 1., 1., 1.]), size=(169343, 169343),
       nnz=2315598, layout=torch.sparse_csr)

In [52]:
dataset[0].adj_t[100000]

tensor(indices=tensor([[  9430,  18041,  19358,  29387,  34570,  52950,  76054,
                        108668]]),
       values=tensor([1., 1., 1., 1., 1., 1., 1., 1.]),
       size=(169343,), nnz=8, layout=torch.sparse_coo)

In [57]:
device = 'cuda' if torch.cuda.is_available() else 'cpu' # Checks if a CUDA GPU is available. If so, sets the device to "cuda" for GPU training. Otherwise, sets the device to "cpu" for CPU training.

# Prints the selected device ("cuda" or "cpu").
print('Device: {}'.format(device))

Device: cpu


In [58]:
data = data.to(device) # Transfers the graph data ("data") to the chosen device (GPU or CPU) for efficient training.

In [59]:
split_idx = dataset.get_idx_split() # retrieves the split indices for training, validation, and testing sets from the dataset object.
# In the PyTorch Geometric OGB node property prediction datasets, the get_idx_split() method typically returns a dictionary
# with three keys: 'train', 'valid', and 'test'. Each key corresponds to a list of indices representing the nodes belonging to that respective split.


train_idx = split_idx['train'].to(device)


In [71]:
split_idx

{'train': tensor([     0,      1,      2,  ..., 169145, 169148, 169251]),
 'valid': tensor([   349,    357,    366,  ..., 169185, 169261, 169296]),
 'test': tensor([   346,    398,    451,  ..., 169340, 169341, 169342])}

In [74]:
split_idx['train']

tensor([     0,      1,      2,  ..., 169145, 169148, 169251])

In [75]:
split_idx['train'].shape

torch.Size([90941])

In [76]:
split_idx['valid'].shape

torch.Size([29799])

In [77]:
split_idx['test'].shape

torch.Size([48603])

In [70]:
data.y[split_idx['train']].shape

torch.Size([90941, 1])

## GCN Model

Now we will implement our GCN model!

Please follow the figure below to implement your `forward` function.


![test](https://drive.google.com/uc?id=128AuYAXNXGg7PIhJJ7e420DoPWKb-RtL)

In [60]:
###### Class Definition ######
class GCN(torch.nn.Module): # Defines a new class named GCN that inherits from the torch.nn.Module class. This means that the GCN class will be a PyTorch module, allowing it to be used within a neural network architecture.

    ##### Initialization #####
    # The __init__ method is the constructor of the GCN class.
    def __init__(self, input_dim, hidden_dim, output_dim, num_layers,
                 dropout, return_embeds=False):
        '''
        It takes the following arguments:


        # input_dim: The number of input features for each node.
        # hidden_dim: The number of hidden units in each GCN layer.
        # output_dim: The number of output features (usually the number of classes for node classification).
        # num_layers: The total number of GCN layers.
        # dropout: The dropout rate for regularization.
        # return_embeds: A boolean flag indicating whether to return the node embeddings instead of the final classification predictions.
        '''

        super(GCN, self).__init__() # calls the constructor of the parent class (torch.nn.Module) to initialize common attributes.


        ##### Creating GCN Layers #####
        self.convs = torch.nn.ModuleList() # Creates a list of GCNConv layers
        # First GCNConv layer
        self.convs.append(GCNConv(input_dim, hidden_dim)) # The first layer takes the input features and maps them to the hidden dimension.
        # Hidden layers
        for _ in range(num_layers - 2):  # The intermediate layers (if num_layers is greater than 2) each have the same input and output dimensions (hidden_dim).
            self.convs.append(GCNConv(hidden_dim, hidden_dim))
        # Last GCNConv layer
        self.convs.append(GCNConv(hidden_dim, output_dim)) # The last layer maps the hidden features to the output dimension, typically the number of classes.


        ##### Creating Batch Normalization Layers ######
        # A list of 1D batch normalization layers
        self.bns = torch.nn.ModuleList([torch.nn.BatchNorm1d(hidden_dim) for _ in range(num_layers - 1)]) # Creates a list of BatchNorm1d layers, one for each hidden GCN layer. Batch normalization helps to stabilize training and improve convergence by normalizing the activations.


        ##### Creating LogSoftmax Layer #####
        # The log softmax layer
        self.softmax = torch.nn.LogSoftmax(dim=1) # Creates a LogSoftmax layer to apply the softmax function to the output logits, converting them into probability distributions. The dim=1 argument indicates that the softmax should be applied along the second dimension (i.e., the dimension corresponding to the number of classes).


        ##### Setting Dropout Probability #####
        # Probability of an element to be zeroed
        self.dropout = dropout # Sets the dropout attribute to the specified value, which controls the probability of randomly dropping neurons during training to prevent overfitting.


        ##### Setting Return Embeddings Flag #####
        # Skip classification layer and return node embeddings
        self.return_embeds = return_embeds # Sets the return_embeds attribute to the specified value. If True, the forward method will return the node embeddings instead of the final classification predictions, which can be useful for downstream tasks or visualization.

    def reset_parameters(self):
        for conv in self.convs:
            conv.reset_parameters()
        for bn in self.bns:
            bn.reset_parameters()

    ##### defines the forward method of the GCN class #####
    def forward(self, x, adj_t):
        '''
        It takes the following arguments:

        x: A tensor representing the node features (shape: [num_nodes, input_dim]).
        adj_t: A tensor representing the transposed adjacency matrix (shape: [num_nodes, num_nodes]).
        '''

        ##### Processing through GCN Layers (except the last) #####
        for i, conv in enumerate(self.convs[:-1]): # This loop iterates over all GCNConv layers except the last one.
            x = conv(x, adj_t)  # Applies the current GCNConv layer to the node features x and the adjacency matrix adj_t.
            x = self.bns[i](x)  # Applies batch normalization (using the i-th element of the self.bns list) to the output of the GCNConv layer.
            x = F.relu(x)       # Applies the ReLU (Rectified Linear Unit) activation function to the batch-normalized output.
            x = F.dropout(x, p=self.dropout, training=self.training)  # Applies dropout to the ReLU output with probability self.dropout (only during training, training=True).


        ##### Processing through the Last GCN Layer #####
        x = self.convs[-1](x, adj_t) # Applies the final GCNConv layer to the node features x and the adjacency matrix adj_t.
        # Note: Unlike the previous layers, there's no ReLU or dropout applied here.


        ##### Processing through Conditional Softmax #####
        if not self.return_embeds: # Checks if the return_embeds flag is False (meaning we want predicted class probabilities).
            x = self.softmax(x)  # If so, applies the LogSoftmax function to the output of the final GCNConv layer. This converts the logits (raw scores) into normalized class probabilities.

        return x
        '''
        If self.return_embeds is True (embeddings mode), the node embeddings after the final GCNConv layer are returned (without ReLU/dropout or softmax).
        If self.return_embeds is False (classification mode), the class probabilities after applying LogSoftmax are returned.
        '''

In [61]:
def train(model, data, train_idx, optimizer, loss_fn): # This line defines the train function, which encapsulates the training process for a single epoch.
    '''
    It takes the following arguments:

    model: The GCN model instance.
    data: The graph data containing node features and adjacency matrix.
    train_idx: The indices of nodes belonging to the training set.
    optimizer: The optimizer used for updating model parameters.
    loss_fn: The loss function used to calculate the training loss.
    '''
    model.train() # Sets the model to training mode. This enables dropout layers (if present) and other training-specific behaviors.

    optimizer.zero_grad()  # Resets the gradients of the model's parameters to zero before the forward pass. This is necessary to prevent gradients from accumulating across multiple training steps.

    out = model(data.x, data.adj_t)  # Performs a forward pass of the GCN model, passing the node features data.x and adjacency matrix data.adj_t as input.
    # The output out will be a tensor containing the predicted class probabilities or node embeddings, depending on the return_embeds flag in the GCN class.

    out = out[train_idx]  # Extracts the predicted class probabilities or node embeddings for the training nodes using the train_idx indices.

    label = data.y[train_idx].squeeze()  # Extracts the ground truth labels for the training nodes using the train_idx indices.
    # The squeeze() method is used to remove any unnecessary dimensions from the label tensor, ensuring it has the correct shape for the loss function.

    loss = loss_fn(out, label)  # Calculates the loss between the predicted class probabilities out and the ground truth labels label using the specified loss_fn (e.g., cross-entropy loss).

    loss.backward()  # Computes the gradients of the loss with respect to the model's parameters using backpropagation.

    optimizer.step()  # Update parameters. Updates the model's parameters using the calculated gradients according to the optimization algorithm defined in the optimizer.

    return loss.item()
    '''
    Returns the calculated loss as a Python scalar (using item()), which can be used for monitoring training progress.
    '''

In [62]:
@torch.no_grad() # This decorator is used to disable gradient calculation during the testing phase. This is important because we don't want to update model parameters during testing.

def test(model, data, split_idx, evaluator): # Defines the test function, which is used to evaluate the model's performance on the training, validation, and test sets.

    '''
    Takes the following arguments:

    model: The trained GCN model.
    data: The graph data.
    split_idx: The split indices for training, validation, and testing.
    evaluator: An evaluator object from the OGB library, used to calculate metrics.
    '''
    model.eval() # Sets the model to evaluation mode. This disables dropout layers and other training-specific behaviors.

    out = model(data.x, data.adj_t)  # Performs a forward pass of the GCN model on the entire dataset to obtain the predicted class probabilities.

    y_pred = out.argmax(dim=-1, keepdim=True)  # Calculates the predicted class labels by taking the index of the maximum probability for each node. The keepdim=True argument ensures that the output tensor has the same number of dimensions as the input.

    train_acc = evaluator.eval({ # Evaluates the model's performance on the training set using the provided evaluator object.
        'y_true': data.y[split_idx['train']], # Passes the ground truth labels (data.y[split_idx['train']]) and predicted labels (y_pred[split_idx['train']]) to the evaluator.
        'y_pred': y_pred[split_idx['train']], # Passes the ground truth labels (data.y[split_idx['train']]) and predicted labels (y_pred[split_idx['train']]) to the evaluator.
    })['acc'] # Extracts the accuracy metric from the evaluation results.

    valid_acc = evaluator.eval({
        'y_true': data.y[split_idx['valid']],
        'y_pred': y_pred[split_idx['valid']],
    })['acc']

    test_acc = evaluator.eval({
        'y_true': data.y[split_idx['test']],
        'y_pred': y_pred[split_idx['test']],
    })['acc']

    return train_acc, valid_acc, test_acc

In [63]:
# Please do not change the args
args = {
    'device': device,
    'num_layers': 3,
    'hidden_dim': 256,
    'dropout': 0.5,
    'lr': 0.01,
    'epochs': 100,
}
args

{'device': 'cpu',
 'num_layers': 3,
 'hidden_dim': 256,
 'dropout': 0.5,
 'lr': 0.01,
 'epochs': 100}

In [64]:
model = GCN(data.num_features, args['hidden_dim'],
            dataset.num_classes, args['num_layers'],
            args['dropout']).to(device)
evaluator = Evaluator(name='ogbn-arxiv')

In [65]:
import copy

# reset the parameters to initial random value
model.reset_parameters()

optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
loss_fn = F.nll_loss

best_model = None
best_valid_acc = 0

for epoch in range(1, 1 + args["epochs"]):
  loss = train(model, data, train_idx, optimizer, loss_fn)
  result = test(model, data, split_idx, evaluator)
  train_acc, valid_acc, test_acc = result
  if valid_acc > best_valid_acc:
      best_valid_acc = valid_acc
      best_model = copy.deepcopy(model)
  print(f'Epoch: {epoch:02d}, '
        f'Loss: {loss:.4f}, '
        f'Train: {100 * train_acc:.2f}%, '
        f'Valid: {100 * valid_acc:.2f}% '
        f'Test: {100 * test_acc:.2f}%')

Epoch: 01, Loss: 4.3820, Train: 20.98%, Valid: 27.00% Test: 24.41%
Epoch: 02, Loss: 2.3338, Train: 32.60%, Valid: 38.61% Test: 40.18%
Epoch: 03, Loss: 1.9567, Train: 34.47%, Valid: 33.59% Test: 37.91%
Epoch: 04, Loss: 1.7572, Train: 37.67%, Valid: 33.91% Test: 34.11%
Epoch: 05, Loss: 1.6573, Train: 36.97%, Valid: 27.12% Test: 24.85%
Epoch: 06, Loss: 1.5631, Train: 35.67%, Valid: 23.26% Test: 20.75%
Epoch: 07, Loss: 1.4957, Train: 35.78%, Valid: 24.97% Test: 23.00%
Epoch: 08, Loss: 1.4477, Train: 35.94%, Valid: 24.07% Test: 21.67%
Epoch: 09, Loss: 1.4045, Train: 36.89%, Valid: 24.86% Test: 23.33%
Epoch: 10, Loss: 1.3688, Train: 40.43%, Valid: 33.40% Test: 37.16%
Epoch: 11, Loss: 1.3422, Train: 42.11%, Valid: 35.67% Test: 39.93%
Epoch: 12, Loss: 1.3126, Train: 42.77%, Valid: 36.34% Test: 40.61%
Epoch: 13, Loss: 1.2856, Train: 44.21%, Valid: 38.85% Test: 42.92%
Epoch: 14, Loss: 1.2687, Train: 46.98%, Valid: 44.39% Test: 47.88%
Epoch: 15, Loss: 1.2536, Train: 49.82%, Valid: 48.83% Test: 51

In [66]:
best_result = test(best_model, data, split_idx, evaluator)
train_acc, valid_acc, test_acc = best_result
print(f'Best model: '
      f'Train: {100 * train_acc:.2f}%, '
      f'Valid: {100 * valid_acc:.2f}% '
      f'Test: {100 * test_acc:.2f}%')

Best model: Train: 73.38%, Valid: 71.91% Test: 71.18%


## Question 5: What are your `best_model` validation and test accuracy? Please report them on Gradescope. For example, for an accuracy such as 50.01%, just report 50.01 and please don't include the percent sign. (20 points)

# 4 GNN: Graph Property Prediction

In this section we will create a graph neural network for graph property prediction (graph classification)


## Load and preprocess the dataset

In [78]:
# Importing Libraries

from ogb.graphproppred import PygGraphPropPredDataset, Evaluator # Imports PyTorch Geometric OGB library functions for loading graph property prediction datasets and evaluators.
from torch_geometric.data import DataLoader # Imports the DataLoader class from PyTorch Geometric to create efficient data loaders for training, validation, and testing.
from tqdm.notebook import tqdm # Imports the tqdm progress bar for displaying training progress (notebook specific).
from ogb.graphproppred.mol_encoder import AtomEncoder # Imports the AtomEncoder class from OGB, likely used for encoding node features (atoms in this case).
from torch_geometric.nn import global_add_pool, global_mean_pool # Imports global pooling functions (global_add_pool and global_mean_pool) from PyTorch Geometric, potentially used to aggregate node information for graph classification.

In [79]:
# Loading the Dataset

dataset = PygGraphPropPredDataset(name='ogbg-molhiv')
# Loads the "ogbg-molhiv" dataset from the OGB library.
# This dataset likely contains graphs representing molecules and their properties (HIV activity prediction in this case).

Downloading http://snap.stanford.edu/ogb/data/graphproppred/csv_mol_download/hiv.zip


Downloaded 0.00 GB: 100%|██████████| 3/3 [00:03<00:00,  1.10s/it]
Processing...


Extracting dataset/hiv.zip
Loading necessary files...
This might take a while.
Processing graphs...


100%|██████████| 41127/41127 [00:00<00:00, 85061.81it/s]


Converting graphs into PyG objects...


100%|██████████| 41127/41127 [00:02<00:00, 17266.18it/s]


Saving...


Done!
  self.data, self.slices = torch.load(self.processed_paths[0])


In [80]:
dataset

PygGraphPropPredDataset(41127)

In [91]:
dataset[10]

Data(edge_index=[2, 24], edge_attr=[24, 3], x=[13, 9], y=[1, 1], num_nodes=13)

In [86]:
dataset[41126]

Data(edge_index=[2, 80], edge_attr=[80, 3], x=[37, 9], y=[1, 1], num_nodes=37)

In [82]:
dataset.num_node_features

9

In [81]:
dataset.num_classes

2

In [105]:
dataset[0].y

tensor([[0]])

In [106]:
dataset[0].x

tensor([[ 5,  0,  4,  5,  3,  0,  2,  0,  0],
        [ 5,  0,  4,  5,  2,  0,  2,  0,  0],
        [ 5,  0,  3,  5,  0,  0,  1,  0,  1],
        [ 7,  0,  2,  6,  0,  0,  1,  0,  1],
        [28,  0,  4,  2,  0,  0,  5,  0,  1],
        [ 7,  0,  2,  6,  0,  0,  1,  0,  1],
        [ 5,  0,  3,  5,  0,  0,  1,  0,  1],
        [ 5,  0,  4,  5,  2,  0,  2,  0,  0],
        [ 5,  0,  4,  5,  3,  0,  2,  0,  0],
        [ 5,  0,  4,  5,  2,  0,  2,  0,  1],
        [ 7,  0,  2,  6,  0,  0,  1,  0,  1],
        [ 5,  0,  3,  5,  0,  0,  1,  0,  1],
        [ 5,  0,  4,  5,  2,  0,  2,  0,  0],
        [ 5,  0,  4,  5,  3,  0,  2,  0,  0],
        [ 5,  0,  4,  5,  2,  0,  2,  0,  1],
        [ 5,  0,  3,  5,  0,  0,  1,  0,  1],
        [ 5,  0,  4,  5,  2,  0,  2,  0,  0],
        [ 5,  0,  4,  5,  3,  0,  2,  0,  0],
        [ 7,  0,  2,  6,  0,  0,  1,  0,  1]])

In [107]:
# Setting Device

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Device: {}'.format(device))

Device: cpu


In [108]:
# Obtaining Split Indices

split_idx = dataset.get_idx_split() # Retrieves the split indices for training, validation, and testing sets from the loaded dataset object.

In [109]:
split_idx

{'train': tensor([    3,     4,     5,  ..., 41124, 41125, 41126]),
 'valid': tensor([10127, 10129, 10132,  ..., 22785, 22786, 22788]),
 'test': tensor([    0,     1,     2,  ..., 10122, 10124, 10125])}

In [110]:
split_idx['train']

tensor([    3,     4,     5,  ..., 41124, 41125, 41126])

In [111]:
split_idx['train'].shape

torch.Size([32901])

In [112]:
split_idx['valid'].shape

torch.Size([4113])

In [113]:
split_idx['test'].shape

torch.Size([4113])

In [117]:
# Check task type

print('Task type: {}'.format(dataset.task_type))

Task type: binary classification


In [118]:
# Creating Dataloaders
# Load the data sets into dataloader

train_loader = DataLoader(dataset[split_idx["train"]], batch_size=32, shuffle=True, num_workers=0)
valid_loader = DataLoader(dataset[split_idx["valid"]], batch_size=32, shuffle=False, num_workers=0)
test_loader = DataLoader(dataset[split_idx["test"]], batch_size=32, shuffle=False, num_workers=0)

'''
dataset[split_idx["train"]]: Selects the training set graphs from the dataset based on the split index.
batch_size=32: Specifies the batch size for training, meaning 32 graphs will be processed in each batch.
shuffle=True: Enables shuffling of the training data for better generalization.
Note that shuffling is disabled for validation and testing sets.
num_workers=0: Sets the number of worker processes for data loading (0 here).
'''



'\ndataset[split_idx["train"]]: Selects the training set graphs from the dataset based on the split index.\nbatch_size=32: Specifies the batch size for training, meaning 32 graphs will be processed in each batch.\nshuffle=True: Enables shuffling of the training data for better generalization.\nNote that shuffling is disabled for validation and testing sets.\nnum_workers=0: Sets the number of worker processes for data loading (0 here).\n'

In [137]:
i = 0
lis = []
for batch in train_loader:
  lis.append(batch)
  i += 1

In [141]:
print(i)
print(lis[0])
print(lis[100])
print(lis[0].x)
print(lis[0].batch)

1029
DataBatch(edge_index=[2, 1774], edge_attr=[1774, 3], x=[832, 9], y=[32, 1], num_nodes=832, batch=[832], ptr=[33])
DataBatch(edge_index=[2, 1826], edge_attr=[1826, 3], x=[842, 9], y=[32, 1], num_nodes=842, batch=[842], ptr=[33])
tensor([[5, 0, 4,  ..., 2, 0, 0],
        [5, 0, 4,  ..., 2, 0, 0],
        [7, 0, 2,  ..., 1, 0, 0],
        ...,
        [5, 0, 3,  ..., 1, 1, 1],
        [6, 0, 3,  ..., 1, 0, 0],
        [7, 0, 2,  ..., 2, 0, 0]])
tensor([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
         1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,
         2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,  3,  

In [140]:
print(lis[0].x.shape[0])

832


In [143]:
print(lis[0].batch[-1])

tensor(31)


In [145]:
for step, batch in enumerate(tqdm(train_loader, desc="Iteration")):
  print(step)
  print(batch)

Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

0
DataBatch(edge_index=[2, 1888], edge_attr=[1888, 3], x=[887, 9], y=[32, 1], num_nodes=887, batch=[887], ptr=[33])
1
DataBatch(edge_index=[2, 1752], edge_attr=[1752, 3], x=[812, 9], y=[32, 1], num_nodes=812, batch=[812], ptr=[33])
2
DataBatch(edge_index=[2, 1598], edge_attr=[1598, 3], x=[739, 9], y=[32, 1], num_nodes=739, batch=[739], ptr=[33])
3
DataBatch(edge_index=[2, 1890], edge_attr=[1890, 3], x=[880, 9], y=[32, 1], num_nodes=880, batch=[880], ptr=[33])
4
DataBatch(edge_index=[2, 1692], edge_attr=[1692, 3], x=[792, 9], y=[32, 1], num_nodes=792, batch=[792], ptr=[33])
5
DataBatch(edge_index=[2, 1846], edge_attr=[1846, 3], x=[855, 9], y=[32, 1], num_nodes=855, batch=[855], ptr=[33])
6
DataBatch(edge_index=[2, 1626], edge_attr=[1626, 3], x=[761, 9], y=[32, 1], num_nodes=761, batch=[761], ptr=[33])
7
DataBatch(edge_index=[2, 1578], edge_attr=[1578, 3], x=[739, 9], y=[32, 1], num_nodes=739, batch=[739], ptr=[33])
8
DataBatch(edge_index=[2, 1848], edge_attr=[1848, 3], x=[853, 9], y=[32

In [120]:
# Setting Hyperparameters

args = {
    'device': device, # Set earlier, specifying the device ("cuda" or "cpu").
    'num_layers': 5, # Number of GCN layers in the model (default 5 here).
    'hidden_dim': 256, # Hidden dimension of each GCN layer (default 256 here).
    'dropout': 0.5, # Dropout rate for regularization (default 0.5 here).
    'lr': 0.001, # Learning rate for the optimizer (default 0.001 here).
    'epochs': 30, # Number of training epochs (default 30 here).
}
args

{'device': 'cpu',
 'num_layers': 5,
 'hidden_dim': 256,
 'dropout': 0.5,
 'lr': 0.001,
 'epochs': 30}

## Graph Prediction Model

Now we will implement our GCN Graph Prediction model!

We will reuse the existing GCN model to generate `node_embeddings` and use  Global Pooling on the nodes to predict properties for the whole graph.

In [119]:
### GCN to predict graph property
class GCN_Graph(torch.nn.Module):
    def __init__(self, hidden_dim, output_dim, num_layers, dropout):
        super(GCN_Graph, self).__init__()

        # Load encoders for Atoms in molecule graphs.
        self.node_encoder = AtomEncoder(hidden_dim)
        # Creates an instance of the AtomEncoder class, which is likely used to encode node features (atoms in this case).
        # The hidden dimension parameter specifies the dimensionality of the encoded node features.

        # Node embedding model
        # Creates an instance of the GCN class (defined in a previous part of the code).
        self.gnn_node = GCN(hidden_dim, hidden_dim, hidden_dim, num_layers, dropout, return_embeds=True)
        # The return_embeds=True argument ensures that the GCN returns the node embeddings instead of the final classification predictions.

        # Global mean pooling layer
        self.pool = global_mean_pool
        # Sets the pool attribute to the global_mean_pool function from PyTorch Geometric.
        # This function will be used to aggregate node embeddings into a single graph embedding.

        # Output layer
        self.linear = torch.nn.Linear(hidden_dim, output_dim)
        # Creates a linear layer for the final classification step, mapping the graph embeddings to the output dimension.


    def reset_parameters(self):
      self.gnn_node.reset_parameters()
      self.linear.reset_parameters()
      # Defines a method to reset the parameters of the GCN model and the linear layer.
      # This can be helpful for initializing the model with random weights before training.

    def forward(self, batched_data):
        x, edge_index, batch = batched_data.x, batched_data.edge_index, batched_data.batch
        '''
        It takes a batched_data object as input, which is assumed to contain the node features (x),
        edge index (edge_index), and batch information (batch).
        '''

        # Encode node features
        embed = self.node_encoder(x)
        # Encodes the node features using the node_encoder (likely an AtomEncoder for molecular graphs).

        # Generate node embeddings using GCN
        node_embeds = self.gnn_node(embed, edge_index)
        # Generates node embeddings using the GCN model, passing the encoded node features and edge index as input.

        # Aggregate node embeddings into graph embeddings
        graph_embeds = self.pool(node_embeds, batch)
        # Aggregates the node embeddings into a single graph embedding using the global_mean_pool function,
        # taking into account the batch information to group nodes belonging to the same graph.

        # Predict graph-level properties
        out = self.linear(graph_embeds)
        # Applies the linear layer to the graph embeddings to obtain the final predictions
        # (e.g., class probabilities for graph classification).

        return out

In [121]:
def train(model, device, data_loader, optimizer, loss_fn):
    model.train()
    total_loss = 0

    for step, batch in enumerate(tqdm(data_loader, desc="Iteration")):
        batch = batch.to(device)

        if batch.x.shape[0] == 1 or batch.batch[-1] == 0:
            continue

        # Ignore nan targets (unlabeled) when computing training loss.
        is_labeled = batch.y == batch.y

        # Zero grad the optimizer
        optimizer.zero_grad()

        # Feed the data into the model
        out = model(batch)

        # Mask the output and labels
        loss = loss_fn(out[is_labeled], batch.y[is_labeled].float())

        # Backpropagation
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    return total_loss / len(data_loader)

In [122]:
# The evaluation function
def eval(model, device, loader, evaluator):
    model.eval()
    y_true = []
    y_pred = []

    for step, batch in enumerate(tqdm(loader, desc="Iteration")):
        batch = batch.to(device)

        if batch.x.shape[0] == 1:
            pass
        else:
            with torch.no_grad():
                pred = model(batch)

            y_true.append(batch.y.view(pred.shape).detach().cpu())
            y_pred.append(pred.detach().cpu())

    y_true = torch.cat(y_true, dim = 0).numpy()
    y_pred = torch.cat(y_pred, dim = 0).numpy()

    input_dict = {"y_true": y_true, "y_pred": y_pred}

    return evaluator.eval(input_dict)

In [123]:
model = GCN_Graph(args['hidden_dim'],
            dataset.num_tasks, args['num_layers'],
            args['dropout']).to(device)
evaluator = Evaluator(name='ogbg-molhiv')

In [124]:
import copy

model.reset_parameters()

optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'])
loss_fn = torch.nn.BCEWithLogitsLoss()

best_model = None
best_valid_acc = 0

for epoch in range(1, 1 + args["epochs"]):
  print('Training...')
  loss = train(model, device, train_loader, optimizer, loss_fn)

  print('Evaluating...')
  train_result = eval(model, device, train_loader, evaluator)
  val_result = eval(model, device, valid_loader, evaluator)
  test_result = eval(model, device, test_loader, evaluator)

  train_acc, valid_acc, test_acc = train_result[dataset.eval_metric], val_result[dataset.eval_metric], test_result[dataset.eval_metric]
  if valid_acc > best_valid_acc:
      best_valid_acc = valid_acc
      best_model = copy.deepcopy(model)
  print(f'Epoch: {epoch:02d}, '
        f'Loss: {loss:.4f}, '
        f'Train: {100 * train_acc:.2f}%, '
        f'Valid: {100 * valid_acc:.2f}% '
        f'Test: {100 * test_acc:.2f}%')

Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 01, Loss: 0.1585, Train: 70.10%, Valid: 70.37% Test: 72.78%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 02, Loss: 0.1488, Train: 74.92%, Valid: 76.88% Test: 73.85%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 03, Loss: 0.1455, Train: 75.31%, Valid: 71.05% Test: 71.96%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 04, Loss: 0.1429, Train: 77.76%, Valid: 76.33% Test: 72.96%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 05, Loss: 0.1404, Train: 77.53%, Valid: 75.79% Test: 72.18%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 06, Loss: 0.1394, Train: 77.96%, Valid: 75.89% Test: 74.03%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 07, Loss: 0.1388, Train: 79.54%, Valid: 76.28% Test: 71.36%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 08, Loss: 0.1365, Train: 78.28%, Valid: 73.63% Test: 70.81%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 09, Loss: 0.1354, Train: 76.45%, Valid: 70.53% Test: 73.35%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 10, Loss: 0.1348, Train: 79.47%, Valid: 75.85% Test: 74.85%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 11, Loss: 0.1340, Train: 78.53%, Valid: 78.55% Test: 70.53%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 12, Loss: 0.1338, Train: 79.20%, Valid: 76.13% Test: 73.88%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 13, Loss: 0.1334, Train: 81.22%, Valid: 77.57% Test: 74.11%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 14, Loss: 0.1312, Train: 81.02%, Valid: 77.40% Test: 74.82%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 15, Loss: 0.1300, Train: 81.45%, Valid: 78.21% Test: 74.97%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 16, Loss: 0.1305, Train: 81.62%, Valid: 75.55% Test: 72.71%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 17, Loss: 0.1296, Train: 81.34%, Valid: 72.96% Test: 74.20%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 18, Loss: 0.1293, Train: 82.47%, Valid: 77.83% Test: 73.90%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 19, Loss: 0.1301, Train: 81.95%, Valid: 79.42% Test: 73.83%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 20, Loss: 0.1290, Train: 82.30%, Valid: 78.87% Test: 75.83%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 21, Loss: 0.1277, Train: 82.20%, Valid: 76.90% Test: 72.21%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 22, Loss: 0.1270, Train: 82.63%, Valid: 78.24% Test: 75.06%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 23, Loss: 0.1265, Train: 81.69%, Valid: 78.04% Test: 73.67%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 24, Loss: 0.1260, Train: 83.06%, Valid: 79.14% Test: 76.15%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 25, Loss: 0.1259, Train: 83.00%, Valid: 77.17% Test: 74.67%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 26, Loss: 0.1264, Train: 83.07%, Valid: 76.44% Test: 74.66%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 27, Loss: 0.1240, Train: 83.82%, Valid: 79.42% Test: 75.96%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 28, Loss: 0.1234, Train: 83.53%, Valid: 78.07% Test: 75.29%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 29, Loss: 0.1246, Train: 83.69%, Valid: 77.51% Test: 74.08%
Training...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Evaluating...


Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Epoch: 30, Loss: 0.1234, Train: 83.96%, Valid: 79.82% Test: 75.88%


In [125]:
train_acc = eval(best_model, device, train_loader, evaluator)[dataset.eval_metric]
valid_acc = eval(best_model, device, valid_loader, evaluator)[dataset.eval_metric]
test_acc = eval(best_model, device, test_loader, evaluator)[dataset.eval_metric]

print(f'Best model: '
      f'Train: {100 * train_acc:.2f}%, '
      f'Valid: {100 * valid_acc:.2f}% '
      f'Test: {100 * test_acc:.2f}%')

Iteration:   0%|          | 0/1029 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Iteration:   0%|          | 0/129 [00:00<?, ?it/s]

Best model: Train: 83.96%, Valid: 79.82% Test: 75.88%


## Question 6: What are your `best_model` validation and test ROC-AUC score? Please report them on Gradescope. For example, for an ROC-AUC score such as 50.01%, just report 50.01 and please don't include the percent sign. (20 points)

## Question 7 (Optional): Experiment with other two global pooling layers other than mean pooling in Pytorch Geometric.

# Submission

In order to get credit, you must go submit your answers on Gradescope.

Also, you need to submit the `ipynb` file of Colab 2, by clicking `File` and `Download .ipynb`. Please make sure that your output of each cell is available in your `ipynb` file.