# Exercise 4
Due:  Tue December 3, 8:00am

## GPS and Hyperparameters

This exercise consists of two parts: first, you are to combine global transformer attention (from the last exercise) with message-passing (from the second exercise). It is completely up to you how you combine those aspects, alternating between the two seems to be one of the best available options though. You may use (pure) message-passing layers from pytorch-geometric for this exercise (but obviously not layers like GPSConv that already combine things - especially since GPSConv differs significantly from the architecture in the GPS paper...).

The second part of the exercise is to find a good model (with hyperparameters) for peptides-func. For this task, I want you to use the tool weights&biases (wandb.ai) and their "sweep" functionality. You can find example code for this below. Since we do not have access to your wandb accounts, please provide screenshots of your results and verify that these models are indeed good.

For the hyperparameter tuning, you must perform this on your hybrid architecture. It might be interesting to see in how far the results (which parameters are important etc) differ between pure transformers, pure message-passing (possibly with VN), and hybrid approaches, although such an evaluation is not necessary.

## Hybrid GPS-like architecture

In [1]:
# your model code goes here
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch_geometric as pyg
from torch_geometric.utils import to_networkx
from torch_geometric.nn import MessagePassing, global_mean_pool
import networkx as nx
import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class GINELayerWithVN(MessagePassing):
    def __init__(self, in_channels, out_channels, edge_dim):
        super(GINELayerWithVN, self).__init__(aggr='add')  # "Add" aggregation.
        self.mlp = torch.nn.Sequential(
            torch.nn.Linear(out_channels, out_channels),
            torch.nn.ReLU(),
            torch.nn.Linear(out_channels, out_channels)
        )
        self.edge_encoder = torch.nn.Linear(edge_dim, out_channels)
        # Remove node_encoder from here
        self.virtual_node_mlp = torch.nn.Sequential(
            torch.nn.Linear(out_channels, out_channels),
            torch.nn.ReLU(),
            torch.nn.Linear(out_channels, out_channels),
            torch.nn.ReLU(),
        )
        self.reset_parameters()

    def reset_parameters(self):
        torch.nn.init.xavier_uniform_(self.edge_encoder.weight)
        for m in self.mlp:
            if isinstance(m, torch.nn.Linear):
                torch.nn.init.xavier_uniform_(m.weight)
        for m in self.virtual_node_mlp:
            if isinstance(m, torch.nn.Linear):
                torch.nn.init.xavier_uniform_(m.weight)

    def forward(self, x, edge_index, edge_attr, vn_embed, batch):
        # x is already encoded via node_encoder in the main model
        x = x.float()  # Ensure x is FloatTensor
        edge_attr = edge_attr.float()  # Ensure edge_attr is FloatTensor
        edge_attr = self.edge_encoder(edge_attr)

        # Add virtual node embedding to node features
        vn_expanded = vn_embed[batch]
        x = x + vn_expanded

        # Message Passing
        out = self.propagate(edge_index, x=x, edge_attr=edge_attr)

        # Update node embeddings
        out = self.mlp(out)
        return out

    def message(self, x_j, edge_attr):
        # Compute messages
        return x_j + edge_attr

    def update(self, aggr_out):
        return aggr_out

# Laplacian Positional Encodings (LapPE)
def compute_laplace_pe(data, num_eigenvec=10):
    G = to_networkx(data, to_undirected=True)
    A = nx.adjacency_matrix(G).astype(float)
    num_nodes = A.shape[0]
    D = np.diag(np.array(A.sum(axis=1)).flatten())
    L = D - A.todense()
    L = torch.tensor(L, dtype=torch.float, device=device)
    try:
        eigenvalues, eigenvectors = torch.linalg.eigh(L)
    except RuntimeError:
        eigenvalues, eigenvectors = torch.symeig(L, eigenvectors=True)
    available_eigenvec = eigenvectors.shape[1] - 1
    actual_num_eigenvec = min(num_eigenvec, available_eigenvec)
    eigenvectors = eigenvectors[:, 1:1 + actual_num_eigenvec]
    if actual_num_eigenvec < num_eigenvec:
        pad_size = num_eigenvec - actual_num_eigenvec
        padding = torch.zeros(eigenvectors.shape[0], pad_size, device=device)
        eigenvectors = torch.cat([eigenvectors, padding], dim=1)
    return eigenvectors  # Shape: (num_nodes, num_eigenvec)

# Random Walk Structural Embeddings (RWSE)
def compute_rwse(data, walk_length=10):
    G = to_networkx(data, to_undirected=True)
    A = nx.adjacency_matrix(G).astype(float)
    A = A.todense()
    num_nodes = A.shape[0]
    A = torch.tensor(A, dtype=torch.float, device=device)
    rw_features = []
    A_power = A.clone()
    for _ in range(walk_length):
        diag = torch.diagonal(A_power)
        rw_features.append(diag)
        A_power = torch.matmul(A_power, A)
    rwse = torch.stack(rw_features, dim=1)  # (num_nodes, walk_length)
    return rwse  # Shape: (num_nodes, walk_length)

# SignNet to ensure sign invariance
class SignNet(nn.Module):
    def __init__(self, in_dim, out_dim):
        super(SignNet, self).__init__()
        self.phi = nn.Sequential(
            nn.Linear(in_dim, out_dim),
            nn.ReLU(),
            nn.Linear(out_dim, out_dim)
        )

    def forward(self, x):
        return self.phi(x) + self.phi(-x)

# Graph Transformer Layer with Masking
class GraphTransformerLayer(nn.Module):
    def __init__(self, in_dim, out_dim, num_heads=4, dropout=0.1):
        super(GraphTransformerLayer, self).__init__()
        self.self_attn = nn.MultiheadAttention(embed_dim=in_dim, num_heads=num_heads, dropout=dropout)
        self.linear1 = nn.Linear(in_dim, out_dim)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(out_dim, in_dim)
        self.norm1 = nn.LayerNorm(in_dim)
        self.norm2 = nn.LayerNorm(in_dim)
        self.activation = nn.ReLU()

    def forward(self, x, key_padding_mask=None):
        # x: (sequence_length, batch_size, embed_dim)
        attn_output, _ = self.self_attn(x, x, x, key_padding_mask=key_padding_mask)
        x = x + attn_output
        x = self.norm1(x)
        linear_output = self.linear2(self.dropout(self.activation(self.linear1(x))))
        x = x + linear_output
        x = self.norm2(x)
        return x

# Layer that combines Message Passing and Transformer
class HybridLayer(nn.Module):
    def __init__(self, mp_layer, transformer_layer):
        super(HybridLayer, self).__init__()
        self.mp_layer = mp_layer
        self.transformer_layer = transformer_layer

    def forward(self, x, edge_index, edge_attr, vn_embed, batch):
        # Message Passing Layer
        x = self.mp_layer(x, edge_index, edge_attr, vn_embed, batch)
        x = F.relu(x)
        return x  # Return x to update vn_embed before transformer

    def apply_transformer(self, x, batch):
        # Prepare for Transformer Layer
        x_padded, mask = pyg.utils.to_dense_batch(x, batch)  # x_padded: [batch_size, max_num_nodes, hidden_features]
        x_padded = x_padded.transpose(0, 1)  # x_padded: [max_num_nodes, batch_size, hidden_features]
        key_padding_mask = ~mask  # [batch_size, max_num_nodes]
        x_padded = self.transformer_layer(x_padded, key_padding_mask=key_padding_mask)
        x_padded = x_padded.transpose(0, 1)  # x_padded: [batch_size, max_num_nodes, hidden_features]
        x = x_padded[mask]  # x: [num_nodes, hidden_features]
        return x

# Updated GNN Model with Virtual Node, GINE Layers, and Graph Transformer
class GNNWithVirtualNodeAndGINEAndTransformer(torch.nn.Module):
    def __init__(self, in_features, hidden_features, out_features, edge_attr_dim, num_layers=5, lap_pe_dim=10, rwse_dim=10, num_heads=4):
        super(GNNWithVirtualNodeAndGINEAndTransformer, self).__init__()
        self.num_layers = num_layers
        self.hidden_features = hidden_features

        # Node Encoder
        self.node_encoder = nn.Linear(in_features, hidden_features)

        self.layers = nn.ModuleList()
        for _ in range(num_layers):
            mp_layer = GINELayerWithVN(
                in_channels=hidden_features,
                out_channels=hidden_features,
                edge_dim=edge_attr_dim
            )
            transformer_layer = GraphTransformerLayer(
                in_dim=hidden_features,
                out_dim=hidden_features,
                num_heads=num_heads
            )
            self.layers.append(HybridLayer(mp_layer, transformer_layer))

        self.virtual_node_embedding = torch.nn.Embedding(1, hidden_features)
        torch.nn.init.constant_(self.virtual_node_embedding.weight.data, 0)

        self.mlp_virtual_node = torch.nn.Sequential(
            torch.nn.Linear(hidden_features, hidden_features),
            torch.nn.ReLU(),
            torch.nn.Linear(hidden_features, hidden_features),
            torch.nn.ReLU(),
        )

        # Positional Encodings
        self.lap_pe_dim = lap_pe_dim
        self.rwse_dim = rwse_dim
        self.lap_pe_linear = nn.Linear(hidden_features, hidden_features)
        self.rwse_linear = nn.Linear(rwse_dim, hidden_features)
        self.signnet = SignNet(lap_pe_dim, hidden_features)

        self.fc = torch.nn.Linear(hidden_features, out_features)

    def forward(self, x, edge_index, edge_attr, batch, data):
        # Apply node_encoder first
        x = self.node_encoder(x)  # [num_nodes, hidden_features]
        # Initialize positional encodings tensor
        pos_enc = torch.zeros_like(x).to(device)  # [num_nodes, hidden_features]

        # Use data.num_graphs to get the correct number of graphs
        batch_size = data.num_graphs

        # Initialize virtual node embedding
        vn_embed = self.virtual_node_embedding.weight.repeat(batch_size, 1)  # [batch_size, hidden_features]

        # Iterate over each graph in the batch
        for graph_id in range(batch_size):
            mask = (batch == graph_id)
            node_idx = mask.nonzero(as_tuple=False).squeeze()

            # Handle case when graph has no nodes
            if node_idx.numel() == 0:
                continue

            # Extract subgraph using pyg.utils.subgraph
            sub_edge_index, sub_edge_attr = pyg.utils.subgraph(
                node_idx,
                edge_index,
                edge_attr,
                relabel_nodes=True,
                num_nodes=x.size(0)
            )

            # Create sub_data
            sub_data = pyg.data.Data(
                x=x[node_idx],
                edge_index=sub_edge_index,
                edge_attr=sub_edge_attr
            )

            # Compute Positional Encodings for the sub-graph
            lap_pe = compute_laplace_pe(sub_data, num_eigenvec=self.lap_pe_dim)
            rwse = compute_rwse(sub_data, walk_length=self.rwse_dim)

            # Apply SignNet to LapPE
            lap_pe = self.signnet(lap_pe)  # [num_nodes_graph, hidden_features]

            # Linear transformation
            lap_pe = self.lap_pe_linear(lap_pe)  # [num_nodes_graph, hidden_features]
            rwse = self.rwse_linear(rwse)        # [num_nodes_graph, hidden_features]

            # Combine positional encodings
            graph_pos_enc = lap_pe + rwse  # [num_nodes_graph, hidden_features]

            # Assign to pos_enc
            pos_enc[node_idx] = graph_pos_enc  # [num_nodes, hidden_features]

        # Add positional encodings to node features
        x = x + pos_enc  # [num_nodes, hidden_features]

        for layer in self.layers:
            # Message Passing Layer
            x = layer(x, edge_index, edge_attr, vn_embed, batch)

            # Update virtual node embedding
            vn_aggr = global_mean_pool(x, batch)  # [batch_size, hidden_features]
            vn_embed = vn_embed + self.mlp_virtual_node(vn_aggr)  # [batch_size, hidden_features]

            # Transformer Layer
            x = layer.apply_transformer(x, batch)

        # Apply global mean pooling
        x = global_mean_pool(x, batch)  # [batch_size, hidden_features]
        x = self.fc(x)  # [batch_size, out_features]
        return x






# WandB hyperparameter tuning example code

In [2]:
import torch
import torch_geometric as pyg
import torch_scatter
import copy



















Before using wandb, you need to create an account. Then you can login by pasting your API key when prompted. (just the key, nothing else)

In [3]:
import wandb
wandb.login()

wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: mak84271. Use `wandb login --relogin` to force relogin


True

In [4]:
# find device
if torch.cuda.is_available(): # NVIDIA
    device = torch.device('cuda')
elif torch.backends.mps.is_available(): # apple M1/M2
    device = torch.device('mps') 
else:
    device = torch.device('cpu')
device

device(type='cuda')

In [5]:
cora = pyg.datasets.Planetoid(root = "dataset/cora", name="Cora")
cora_graph = cora[0]
cora_dense_adj = pyg.utils.to_dense_adj(cora_graph.edge_index).to(device)
# cora_graph.x = cora_graph.x.unsqueeze(0) # Add an empty batch dimension. I needed that for compatibility with MolHIV later.
cora_graph = cora_graph.to(device)

In [6]:
cora_graph.to(device)

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

In [7]:
class GCNLayer(torch.nn.Module):
    def __init__(self, in_features: int, out_features: int, activation=torch.nn.functional.relu):
        super(GCNLayer, self).__init__()
        self.activation = activation
        self.W: torch.Tensor = torch.nn.Parameter(torch.zeros(in_features, out_features))
        torch.nn.init.kaiming_normal_(self.W) 

    def forward(self, H: torch.Tensor, edge_index: torch.Tensor):
        out = H.clone()
        out += torch_scatter.scatter_add(H[edge_index[0]], edge_index[1], dim=0)
        out = out.matmul(self.W)
        if self.activation:
            out = self.activation(out)
        return out

In [8]:
def get_accuracy(model, cora, mask):
    model.eval()
    with torch.no_grad():
        outputs = model(cora_graph.x, cora_graph.edge_index)
    correct = (outputs[mask].argmax(-1) == cora_graph.y[mask]).sum()
    return int(correct) / int(mask.sum())

In [9]:
class GraphNet(torch.nn.Module):
    def __init__(self, in_features:int, out_features:int, hidden_features:int, activation=torch.nn.functional.relu, dropout=0.1):
        super(GraphNet, self).__init__()
        self.activation = activation
        if dropout>0:
            self.dropout = torch.nn.Dropout(dropout)
        else: 
            self.dropout = torch.nn.Identity()

        self.layer_1 = GCNLayer(in_features=in_features, out_features=hidden_features)
        self.layer_2 = GCNLayer(in_features=hidden_features, out_features=hidden_features, activation=self.activation)
        self.layer_3 = GCNLayer(in_features=hidden_features, out_features=hidden_features, activation=self.activation)
        self.dense1 = torch.nn.Linear(in_features=hidden_features, out_features=hidden_features)
        self.dense2 = torch.nn.Linear(in_features=hidden_features, out_features=out_features)

    def forward(self, H: torch.Tensor, edge_index: torch.Tensor):
        out = self.layer_1(H, edge_index)
        out = self.dropout(out)
        out = self.layer_2(out, edge_index)
        out = self.dropout(out)
        H = self.layer_3(out, edge_index)
        H = self.dropout(out)
        out = self.dense1(out)
        out = self.activation(out)
        out = self.dropout(out)
        out = self.dense2(out)
        # H = torch.softmax(H, dim=-1)
        # out = torch.nn.functional.softmax(out, dim=1)
        return out

        

## WandB train function

We make a few changes to our train function to enable wandb logging of hyperparameters and metrics. The train function is written to allow both manual runs and hyperparameter search.

In [10]:
def train(config=None, project=None, notes=None):

    with wandb.init(config=config, project=project, notes=notes): # Initialize a new wandb run
        # By passing our config through wandb,
        # a) it is automatically logged
        # b) we can use wandb sweeps to optimize hyperparameters
        config = wandb.config 

        model = GraphNet(
            in_features=cora_graph.num_features, 
            out_features=cora.num_classes, 
            hidden_features=config.hidden_features, 
            dropout=config.dropout).to(device)

        optimizer = torch.optim.Adam(model.parameters(), lr=config.lr)
        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config.epochs, eta_min=0)
        criterion = torch.nn.BCEWithLogitsLoss()

        best_model = None
        best_val_acc = 0
        best_epoch = 0

        for epoch in range(config.epochs):
            
            model.train()
            optimizer.zero_grad()
            outputs = model(cora_graph.x, cora_graph.edge_index) # we run on everything

            loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
            loss = criterion(outputs, data.y.float())
            loss.backward()

            optimizer.step() # update parameters
            scheduler.step() # update the learning rate once per epoch

            val_acc = get_accuracy(model, cora_graph, cora_graph.val_mask)
            wandb.log({"val_acc": val_acc, "loss": loss.item()})

            if epoch % 10 == 0 and not wandb.run.sweep_id:
                # Only print information on individual runs, not on sweeps
                print(f"Epoch {epoch}, Loss: {loss.item()}, Val accuracy: {val_acc}")

            if val_acc > best_val_acc:
                best_val_acc = val_acc
                best_epoch = epoch
                best_model = copy.deepcopy(model)

    return best_model, best_epoch, best_val_acc


## Manual training runs

With wandb, you can still manually run your training loop with different hyperparameters as you are used to.

In [11]:
best_model, best_model_epoch, best_val_acc = train(dict(
    hidden_features=128,
    lr=0.01,
    dropout=0.1,
    epochs=100
), project="Cora_GraphNet", notes="first trial")



Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    return F.binary_cross_entropy_with_logits(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\

VBox(children=(Label(value='0.007 MB of 0.007 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

ValueError: Target size (torch.Size([140])) must be the same as input size (torch.Size([140, 7]))

In [12]:
test_acc = get_accuracy(best_model, cora_graph, cora_graph.test_mask)
print(f"Test acc: {test_acc:.2f} (using model from epoch {best_model_epoch} with val acc {best_val_acc:.2})")

NameError: name 'best_model' is not defined

## Hyperparameter Search

But you can also perform a hyperparameter search using wandb sweeps, by specifying a hyperparameter config

In [13]:
sweep_config = {
    # hyperparameter search methods, e.g. grid, random
    'method': 'random',

    # metric to optimize
    'metric': {
        'name': 'val_acc',
        'goal': 'maximize'   
    },

    # parameters to search
    'parameters': {
        'hidden_features': {
            'values': [64, 128, 256]
        },
        'dropout': {
            # a flat distribution between 0 and 0.1
            'distribution': 'uniform',
            'min': 0.0,
            'max': 0.5,
        },
        'lr': {
            'values': [0.001, 0.0001, 0.00001]
        },
        'epochs': {
            'values': [100, 200, 300]
        }
    }
}

In [14]:
sweep_id = wandb.sweep(sweep_config, project="Cora_GraphNet")

Create sweep with ID: iewx3aml
Sweep URL: https://wandb.ai/mak84271/Cora_GraphNet/sweeps/iewx3aml


You can click on the `Sweep URL` to get a nice visualization on how well different sets of hyperparameters perform and to see which are the best (click on the best run and then on Overview).

The following cell performs 5 runs using the sweep configuration given above. You can call `wandb.agent` multiple times to produce more runs for the same sweep configuration.

In [15]:
wandb.agent(sweep_id, function=train, count=5)

wandb: Agent Starting Run: 9k0ima7o with config:
wandb: 	dropout: 0.21382270781919233
wandb: 	epochs: 300
wandb: 	hidden_features: 64
wandb: 	lr: 1e-05


Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    return F.binary_cross_entropy_with_logits(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\

Run 9k0ima7o errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    retur

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    return F.binary_cross_entropy_with_logits(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\

Run yucwsa03 errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    retur

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    return F.binary_cross_entropy_with_logits(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\

Run nbwpi8ii errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\3896885779.py", line 29, in train
    loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\loss.py", line 819, in forward
    retur

In [16]:
# Close the sweep, otherwise individual runs after the sweep will still be logged as part of it
wandb.teardown() 

# Peptide dataset

In [17]:
import torch
import torch.nn.functional as F
import torch_geometric as pyg
from torch_geometric.data import DataLoader
from torch_geometric.datasets import LRGBDataset
import wandb
import copy


# Detect device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

# Load the peptides-func dataset
# Define a transform function
def to_float(data):
    data.x = data.x.float()
    data.edge_attr = data.edge_attr.float()
    return data

# Load the dataset with the transform
dataset = LRGBDataset(root='dataset/peptides-func', name='Peptides-func', transform=to_float)

print(f'Dataset size: {len(dataset)}')

# Determine the number of node features, edge features, and classes
num_node_features = dataset.num_features
num_edge_features = dataset[0].edge_attr.shape[1]
num_classes = dataset.num_classes

# Shuffle the dataset
torch.manual_seed(42)
dataset = dataset.shuffle()

# Split the dataset
train_ratio = 0.8
val_ratio = 0.1
test_ratio = 0.1

num_total = len(dataset)
num_train = int(num_total * train_ratio)
num_val = int(num_total * val_ratio)

train_dataset = dataset[:num_train]
val_dataset = dataset[num_train:num_train + num_val]
test_dataset = dataset[num_train + num_val:]

print(f'Train graphs: {len(train_dataset)}')
print(f'Validation graphs: {len(val_dataset)}')
print(f'Test graphs: {len(test_dataset)}')

# Create data loaders
def create_data_loader(dataset, batch_size, shuffle):
    return DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)


# We will create data loaders inside the train function
# Your model class
GraphNet = GNNWithVirtualNodeAndGINEAndTransformer



Using device: cuda
Dataset size: 10873
Train graphs: 8698
Validation graphs: 1087
Test graphs: 1088


In [18]:
def train(config=None, project=None, notes=None):
    with wandb.init(config=config, project=project, notes=notes):
        config = wandb.config

        model = GraphNet(
            in_features=num_node_features,
            hidden_features=config.hidden_features,
            out_features=10,  # Number of tasks
            edge_attr_dim=num_edge_features,
            num_layers=config.num_layers,
            lap_pe_dim=config.lap_pe_dim,
            rwse_dim=config.rwse_dim,
            num_heads=config.num_heads
        ).to(device)


        optimizer = torch.optim.Adam(model.parameters(), lr=config.lr, weight_decay=config.weight_decay)
        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config.epochs, eta_min=0)
        criterion = torch.nn.CrossEntropyLoss()

        best_model = None
        best_val_acc = 0
        best_epoch = 0

        # Create data loaders
        train_loader = create_data_loader(train_dataset, batch_size=config.batch_size, shuffle=True)
        val_loader = create_data_loader(val_dataset, batch_size=config.batch_size, shuffle=False)
        test_loader = create_data_loader(test_dataset, batch_size=config.batch_size, shuffle=False)

        for epoch in range(config.epochs):
            model.train()
            total_loss = 0
            for data in train_loader:
                data = data.to(device)
                optimizer.zero_grad()
                outputs = model(data.x, data.edge_index, data.edge_attr, data.batch, data)
                loss = criterion(outputs, data.y.squeeze())
                loss.backward()
                optimizer.step()
                total_loss += loss.item()

            scheduler.step()

            val_acc = evaluate(model, val_loader)
            wandb.log({"val_acc": val_acc, "loss": total_loss / len(train_loader)})

            if epoch % 10 == 0 and not wandb.run.sweep_id:
                print(f"Epoch {epoch}, Loss: {total_loss / len(train_loader):.4f}, Val accuracy: {val_acc:.4f}")

            if val_acc > best_val_acc:
                best_val_acc = val_acc
                best_epoch = epoch
                best_model = copy.deepcopy(model)

        # Evaluate on test set
        test_acc = evaluate(best_model, test_loader)
        wandb.log({"test_acc": test_acc})

        print(f"Best Epoch: {best_epoch}, Best Validation Accuracy: {best_val_acc:.4f}, Test Accuracy: {test_acc:.4f}")

        return best_model, best_epoch, best_val_acc, test_acc


In [19]:
from sklearn.metrics import roc_auc_score

def evaluate(model, loader):
    model.eval()
    y_true = []
    y_pred = []
    with torch.no_grad():
        for data in loader:
            data = data.to(device)
            outputs = model(data.x, data.edge_index, data.edge_attr, data.batch, data)
            y_true.append(data.y.cpu())
            y_pred.append(outputs.cpu())
    y_true = torch.cat(y_true, dim=0).numpy()
    y_pred = torch.cat(y_pred, dim=0).numpy()
    # Compute average ROC-AUC over tasks
    roc_list = []
    for i in range(y_true.shape[1]):
        if np.sum(y_true[:, i]) == 0 or np.sum(y_true[:, i]) == y_true.shape[0]:
            # Skip tasks with only one class present
            continue
        roc = roc_auc_score(y_true[:, i], y_pred[:, i])
        roc_list.append(roc)
    if len(roc_list) == 0:
        return 0.0
    else:
        return sum(roc_list) / len(roc_list)



In [20]:
sweep_config = {
    'method': 'bayes',
    'metric': {'name': 'val_roc_auc', 'goal': 'maximize'},
    'parameters': {
        'hidden_features': {'values': [64, 128, 256]},
        'num_layers': {'values': [3, 5, 7]},
        'num_heads': {'values': [4, 8]},
        'lap_pe_dim': {'values': [5, 10, 15]},
        'rwse_dim': {'values': [5, 10, 15]},
        'dropout': {'min': 0.0, 'max': 0.5},
        'lr': {'min': 1e-4, 'max': 1e-2, 'distribution': 'log_uniform'},
        'weight_decay': {'min': 0.0, 'max': 1e-4},
        'batch_size': {'values': [16, 32, 64]},
        'epochs': {'value': 100}
    }
}

sweep_id = wandb.sweep(sweep_config, project='peptides-func-hyperparameter-tuning')




Create sweep with ID: yw5hts1u
Sweep URL: https://wandb.ai/mak84271/peptides-func-hyperparameter-tuning/sweeps/yw5hts1u


In [21]:
wandb.login()  # Ensure you are logged in to WandB

# Start the sweep agent
wandb.agent(sweep_id, function=train)


wandb: Agent Starting Run: bhkacelr with config:
wandb: 	batch_size: 64
wandb: 	dropout: 0.2562022435438195
wandb: 	epochs: 100
wandb: 	hidden_features: 256
wandb: 	lap_pe_dim: 10
wandb: 	lr: 1.0026445272229791
wandb: 	num_heads: 8
wandb: 	num_layers: 3
wandb: 	rwse_dim: 10
wandb: 	weight_decay: 3.152617529864556e-05


Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 38, in train
    loss.backward()
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\autograd\__init__.py", line 347, in backward
    _engine_run_backward(
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\autograd\graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to e

Run bhkacelr errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 38, in train
    loss.backward()
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\autograd\__init__.py", line 347, in backward
    _engine_run_backward(
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\autograd\graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TO

Run aub6spcj errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously repo

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TO

Run fcmq9yuu errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously repo

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TO

Run yiyk8wtm errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously repo

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TO

Run a7wo5lji errored:
Traceback (most recent call last):
  File "C:\ProgramData\miniconda3\Lib\site-packages\wandb\agents\pyagent.py", line 306, in _run_job
    self._function()
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously repo

Traceback (most recent call last):
  File "C:\Users\dadoi\AppData\Local\Temp\ipykernel_30132\314379181.py", line 14, in train
    ).to(device)
      ^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1340, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
    module._apply(fn)
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\torch\nn\modules\module.py", line 1326, in convert
    return t.to(
           ^^^^^
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TO

wandb: Ctrl + C detected. Stopping sweep.


Downloading https://www.dropbox.com/s/ycsq37q8sxs1ou8/peptidesfunc.zip?dl=1
Extracting path\to\data\peptidesfunc.zip
Processing...
Processing train dataset: 100%|██████████| 10873/10873 [00:00<00:00, 42121.36it/s]
Processing val dataset: 100%|██████████| 2331/2331 [00:00<00:00, 31418.36it/s]
Processing test dataset: 100%|██████████| 2331/2331 [00:00<00:00, 55873.88it/s]
Done!


RuntimeError: Found dtype Long but expected Float