# Exercise 4
Due:  Tue December 3, 8:00am

## GPS and Hyperparameters

This exercise consists of two parts: first, you are to combine global transformer attention (from the last exercise) with message-passing (from the second exercise). It is completely up to you how you combine those aspects, alternating between the two seems to be one of the best available options though. You may use (pure) message-passing layers from pytorch-geometric for this exercise (but obviously not layers like GPSConv that already combine things - especially since GPSConv differs significantly from the architecture in the GPS paper...).

The second part of the exercise is to find a good model (with hyperparameters) for peptides-func. For this task, I want you to use the tool weights&biases (wandb.ai) and their "sweep" functionality. You can find example code for this below. Since we do not have access to your wandb accounts, please provide screenshots of your results and verify that these models are indeed good.

For the hyperparameter tuning, you must perform this on your hybrid architecture. It might be interesting to see in how far the results (which parameters are important etc) differ between pure transformers, pure message-passing (possibly with VN), and hybrid approaches, although such an evaluation is not necessary.

## Hybrid GPS-like architecture

In [1]:
# your model code goes here

# WandB hyperparameter tuning example code

In [2]:
import torch
import torch_geometric as pyg
import torch_scatter
import copy



















Before using wandb, you need to create an account. Then you can login by pasting your API key when prompted. (just the key, nothing else)

In [3]:
import wandb
wandb.login()

wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:wandb: Paste an API key from your profile and hit enter

True

In [4]:
# find device
if torch.cuda.is_available(): # NVIDIA
    device = torch.device('cuda')
elif torch.backends.mps.is_available(): # apple M1/M2
    device = torch.device('mps') 
else:
    device = torch.device('cpu')
device

device(type='cuda')

In [5]:
cora = pyg.datasets.Planetoid(root = "dataset/cora", name="Cora")
cora_graph = cora[0]
cora_dense_adj = pyg.utils.to_dense_adj(cora_graph.edge_index).to(device)
# cora_graph.x = cora_graph.x.unsqueeze(0) # Add an empty batch dimension. I needed that for compatibility with MolHIV later.
cora_graph = cora_graph.to(device)

In [6]:
cora_graph.to(device)

Data(x=[2708, 1433], edge_index=[2, 10556], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708])

In [7]:
class GCNLayer(torch.nn.Module):
    def __init__(self, in_features: int, out_features: int, activation=torch.nn.functional.relu):
        super(GCNLayer, self).__init__()
        self.activation = activation
        self.W: torch.Tensor = torch.nn.Parameter(torch.zeros(in_features, out_features))
        torch.nn.init.kaiming_normal_(self.W) 

    def forward(self, H: torch.Tensor, edge_index: torch.Tensor):
        out = H.clone()
        out += torch_scatter.scatter_add(H[edge_index[0]], edge_index[1], dim=0)
        out = out.matmul(self.W)
        if self.activation:
            out = self.activation(out)
        return out

In [8]:
def get_accuracy(model, cora, mask):
    model.eval()
    with torch.no_grad():
        outputs = model(cora_graph.x, cora_graph.edge_index)
    correct = (outputs[mask].argmax(-1) == cora_graph.y[mask]).sum()
    return int(correct) / int(mask.sum())

In [9]:
class GraphNet(torch.nn.Module):
    def __init__(self, in_features:int, out_features:int, hidden_features:int, activation=torch.nn.functional.relu, dropout=0.1):
        super(GraphNet, self).__init__()
        self.activation = activation
        if dropout>0:
            self.dropout = torch.nn.Dropout(dropout)
        else: 
            self.dropout = torch.nn.Identity()

        self.layer_1 = GCNLayer(in_features=in_features, out_features=hidden_features)
        self.layer_2 = GCNLayer(in_features=hidden_features, out_features=hidden_features, activation=self.activation)
        self.layer_3 = GCNLayer(in_features=hidden_features, out_features=hidden_features, activation=self.activation)
        self.dense1 = torch.nn.Linear(in_features=hidden_features, out_features=hidden_features)
        self.dense2 = torch.nn.Linear(in_features=hidden_features, out_features=out_features)

    def forward(self, H: torch.Tensor, edge_index: torch.Tensor):
        out = self.layer_1(H, edge_index)
        out = self.dropout(out)
        out = self.layer_2(out, edge_index)
        out = self.dropout(out)
        H = self.layer_3(out, edge_index)
        H = self.dropout(out)
        out = self.dense1(out)
        out = self.activation(out)
        out = self.dropout(out)
        out = self.dense2(out)
        # H = torch.softmax(H, dim=-1)
        # out = torch.nn.functional.softmax(out, dim=1)
        return out

        

## WandB train function

We make a few changes to our train function to enable wandb logging of hyperparameters and metrics. The train function is written to allow both manual runs and hyperparameter search.

In [10]:
def train(config=None, project=None, notes=None):

    with wandb.init(config=config, project=project, notes=notes): # Initialize a new wandb run
        # By passing our config through wandb,
        # a) it is automatically logged
        # b) we can use wandb sweeps to optimize hyperparameters
        config = wandb.config 

        model = GraphNet(
            in_features=cora_graph.num_features, 
            out_features=cora.num_classes, 
            hidden_features=config.hidden_features, 
            dropout=config.dropout).to(device)

        optimizer = torch.optim.Adam(model.parameters(), lr=config.lr)
        scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=config.epochs, eta_min=0)
        criterion = torch.nn.CrossEntropyLoss()

        best_model = None
        best_val_acc = 0
        best_epoch = 0

        for epoch in range(config.epochs):
            
            model.train()
            optimizer.zero_grad()
            outputs = model(cora_graph.x, cora_graph.edge_index) # we run on everything

            loss = criterion(outputs[cora_graph.train_mask], cora_graph.y[cora_graph.train_mask]) # but only propagate the loss for the train labels
            loss.backward()

            optimizer.step() # update parameters
            scheduler.step() # update the learning rate once per epoch

            val_acc = get_accuracy(model, cora_graph, cora_graph.val_mask)
            wandb.log({"val_acc": val_acc, "loss": loss.item()})

            if epoch % 10 == 0 and not wandb.run.sweep_id:
                # Only print information on individual runs, not on sweeps
                print(f"Epoch {epoch}, Loss: {loss.item()}, Val accuracy: {val_acc}")

            if val_acc > best_val_acc:
                best_val_acc = val_acc
                best_epoch = epoch
                best_model = copy.deepcopy(model)

    return best_model, best_epoch, best_val_acc


## Manual training runs

With wandb, you can still manually run your training loop with different hyperparameters as you are used to.

In [11]:
best_model, best_model_epoch, best_val_acc = train(dict(
    hidden_features=128,
    lr=0.01,
    dropout=0.1,
    epochs=100
), project="Cora_GraphNet", notes="first trial")

wandb: Currently logged in as: mak84271. Use `wandb login --relogin` to force relogin


Epoch 0, Loss: 2.5374622344970703, Val accuracy: 0.358
Epoch 10, Loss: 0.2711913287639618, Val accuracy: 0.738
Epoch 20, Loss: 0.011204823851585388, Val accuracy: 0.736
Epoch 30, Loss: 0.2288583517074585, Val accuracy: 0.71
Epoch 40, Loss: 0.42508378624916077, Val accuracy: 0.704
Epoch 50, Loss: 0.11761137843132019, Val accuracy: 0.73
Epoch 60, Loss: 0.0054314169101417065, Val accuracy: 0.714
Epoch 70, Loss: 0.004700902383774519, Val accuracy: 0.72
Epoch 80, Loss: 0.00022916511807125062, Val accuracy: 0.72
Epoch 90, Loss: 0.00043300207471475005, Val accuracy: 0.722


VBox(children=(Label(value='0.007 MB of 0.007 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_acc,▂▁▃▆█████████▇▇████▇▇▇▇▇▇███▇▇██████████

0,1
loss,0.0018
val_acc,0.722


In [12]:
test_acc = get_accuracy(best_model, cora_graph, cora_graph.test_mask)
print(f"Test acc: {test_acc:.2f} (using model from epoch {best_model_epoch} with val acc {best_val_acc:.2})")

Test acc: 0.76 (using model from epoch 18 with val acc 0.75)


## Hyperparameter Search

But you can also perform a hyperparameter search using wandb sweeps, by specifying a hyperparameter config

In [13]:
sweep_config = {
    # hyperparameter search methods, e.g. grid, random
    'method': 'random',

    # metric to optimize
    'metric': {
        'name': 'val_acc',
        'goal': 'maximize'   
    },

    # parameters to search
    'parameters': {
        'hidden_features': {
            'values': [64, 128, 256]
        },
        'dropout': {
            # a flat distribution between 0 and 0.1
            'distribution': 'uniform',
            'min': 0.0,
            'max': 0.5,
        },
        'lr': {
            'values': [0.001, 0.0001, 0.00001]
        },
        'epochs': {
            'values': [100, 200, 300]
        }
    }
}

In [14]:
sweep_id = wandb.sweep(sweep_config, project="Cora_GraphNet")

Create sweep with ID: rk4l1nnh
Sweep URL: https://wandb.ai/mak84271/Cora_GraphNet/sweeps/rk4l1nnh


You can click on the `Sweep URL` to get a nice visualization on how well different sets of hyperparameters perform and to see which are the best (click on the best run and then on Overview).

The following cell performs 5 runs using the sweep configuration given above. You can call `wandb.agent` multiple times to produce more runs for the same sweep configuration.

In [15]:
wandb.agent(sweep_id, function=train, count=5)

wandb: Agent Starting Run: xqlijcjq with config:
wandb: 	dropout: 0.2736890038459956
wandb: 	epochs: 200
wandb: 	hidden_features: 64
wandb: 	lr: 0.001


0,1
loss,█▃▂▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_acc,▁▃▃▄▄▆▆▆▇▇██████████████████████████████

0,1
loss,0.31238
val_acc,0.67


wandb: Agent Starting Run: xpbe9d4i with config:
wandb: 	dropout: 0.3217099686844707
wandb: 	epochs: 100
wandb: 	hidden_features: 256
wandb: 	lr: 1e-05


0,1
loss,█▇▆█▇▅▅▃▃▅▂▄▂▂▄▃▂▁▃▄▃▃▄▃▃▃▂▂▃▃▄▂▁▂▂▂▁▂▃▃
val_acc,▁▁▁▂▂▂▂▂▂▂▂▂▂▃▃▄▄▅▅▅▅▅▅▅▆▆▆▇▇▇██████████

0,1
loss,3.21704
val_acc,0.22


wandb: Agent Starting Run: eofj7xng with config:
wandb: 	dropout: 0.012000838386770851
wandb: 	epochs: 100
wandb: 	hidden_features: 64
wandb: 	lr: 0.0001


0,1
loss,█▆▆▅▄▄▃▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁
val_acc,▁▁▁▂▂▂▂▂▂▄▆▆▆▆▆▆▇▇▇▇▇▇▇▇████████████████

0,1
loss,1.54944
val_acc,0.278


wandb: Agent Starting Run: w0f2yido with config:
wandb: 	dropout: 0.37021611580277863
wandb: 	epochs: 100
wandb: 	hidden_features: 64
wandb: 	lr: 0.0001


0,1
loss,▆▇▅▅█▄▄▄▃▄▅▃▂▄▃▄▃▃▂▂▃▃▂▃▂▂▃▂▃▃▂▂▂▁▂▃▃▂▁▂
val_acc,▁▁▁▂▂▂▂▂▂▂▃▃▃▄▄▅▅▅▅▆▆▇▇▇▇▇▇▇▇▇▇█████████

0,1
loss,3.81461
val_acc,0.264


wandb: Agent Starting Run: d2119s8w with config:
wandb: 	dropout: 0.20942262141520307
wandb: 	epochs: 100
wandb: 	hidden_features: 64
wandb: 	lr: 0.001


0,1
loss,█▆▆▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
val_acc,▁▂▂▁▂▃▅▅▅▆▆▆▇▇▇▇████████████████████████

0,1
loss,0.65535
val_acc,0.592


In [16]:
# Close the sweep, otherwise individual runs after the sweep will still be logged as part of it
wandb.teardown() 