# Sweep

This Notebook is used to run a hyperparameter search using a sweep from W&B

The GNN consits of n Layers of type ResGNN or Gatv2conv.
After that the output is transformed to a shape of 2 x N, where N is the number of stations.

The Results of this run can be viewd at [WandB](https://wandb.ai/feik/GNNPP/sweeps/svtsdej6?workspace=user-feik).

Since the Parameter space of this search is very large, first a search of very relevant parameters is done.

Parameters I think are relevant (from experiments) are:
 - Number of Layers
 - Type of Layers
 - Hidden Dimension
 - Heads
 - Max Dist

## TODO
 - Write this as a python only function and use if __name__ == "__main__":
    train()
 - Check assertion error for Conv
 - Run from command line then

In [2]:
# Set Notebook Name for WandB
import os
os.environ['WANDB_NOTEBOOK_NAME'] = 'sweep.ipynb'

In [3]:
from helpers import load_data, load_stations, clean_data, normalize_data, create_data, visualize_graph
from torch_geometric.data import Data
from torch_geometric.loader import DataLoader
from torch_geometric.nn import GCNConv, GATConv, GATv2Conv, Sequential, summary
from torch_geometric.utils import to_networkx
from torch.nn import Linear, Embedding, Dropout, ModuleList
from tqdm import tqdm, trange

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import geopy.distance
import matplotlib as mpl
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import numpy as np
import pandas as pd
import torch
import torch.nn.functional as F
import traceback
import wandb

%matplotlib inline
plt.style.use('default')

## Import of Data

In [4]:
# Get Data from feather
data = load_data(indexed=False)
# Get List of stations with all stations -> will break further code if cut already
stations = load_stations(data)
# Clean Data
data = clean_data(data, max_missing=121, max_alt=1000.0)
# Normalize Data
normalized_data = normalize_data(data, last_obs=-365) #last_obs is -365 since the last year is used for testing

## Create the torch dataset

The Dataset which is a `pandas.DataFrame` gets converted to a `torch_geometric.data` object, which then can be processed by the GNN

In [5]:
#dist_matrix = compute_dist_matrix(stations)
#np.save(dist_matrix, 'dist_matrix.npy')

In [6]:
def build_dataloaders(max_dist: int, batch_size: int):
    dist_matrix = np.load('dist_matrix.npy')

    # Create a boolean mask indicating which edges to include
    mask = (dist_matrix <= max_dist) & (dist_matrix != 0)

    torch_data = []
    for date in tqdm(data['date'].unique(), desc="Building dataset"):
        torch_data.append(create_data(df=normalized_data, date=date, mask=mask, dist_matrix=dist_matrix))

    # Definition of train_loader and valid_loader
    train_loader = DataLoader(torch_data[:-730], batch_size=batch_size, shuffle=True)
    valid_loader = DataLoader(torch_data[-730:-365], batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(torch_data[-365:], batch_size=batch_size, shuffle=True)
    
    return train_loader, valid_loader, test_loader

## GNN

In the following, some layers for the GNN and the loss function are defined.

### CRPS Loss Function

\begin{align*}

    \operatorname{crps}(F,X)=&\int_{-\inf}^{\inf}\left(F(y)-\boldsymbol{1}_{(y-x)}\right)^2dy\\

\end{align*}

Closed form expression from Gneiting et al. (2005)

\begin{align*}

    \operatorname{crps}\left(\mathcal{N}\left(\mu, \sigma^2\right), y\right)= & \sigma\left\{\frac{y-\mu}{\sigma}\left[2 \Phi\left(\frac{y-\mu}{\sigma}\right)-1\right] +2 \varphi\left(\frac{y-\mu}{\sigma}\right)-\frac{1}{\sqrt{\pi}}\right\}

\end{align*}

$\Phi\left(\frac{y-\mu}{\sigma}\right)$ denotes the CDF of a standard normal distribution and $\varphi\left(\frac{y-\mu}{\sigma}\right)$ denotes the PDF of a standard normal distribution

In [7]:
def crps(mu: torch.tensor, sigma: torch.tensor, y: torch.tensor):
    """Calculates the Continuous Ranked Probability Score (CRPS) assuming normally distributed df

    Args:
        mu (torch.tensor): mean
        sigma (torch.tensor): standard deviation
        y (torch.tensor): observed df

    Returns:
        torch.tensor: CRPS value
    """
    y = y.view((-1,1)) # make sure y has the right shape
    PI = np.pi #3.14159265359
    omega = (y - mu) / sigma
    # PDF of normal distribution at omega
    pdf = 1/(torch.sqrt(torch.tensor(2 * PI))) * torch.exp(-0.5 * omega ** 2)
    
    # Source: https://stats.stackexchange.com/questions/187828/how-are-the-error-function-and-standard-normal-distribution-function-related
    cdf = 0.5 * (1 + torch.erf(omega / torch.sqrt(torch.tensor(2))))
    
    crps = sigma * (omega * (2 * cdf - 1) + 2 * pdf - 1/torch.sqrt(torch.tensor(PI)))
    return  torch.mean(crps)

### GNN
Definition of Model and Training

In [8]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
device

device(type='cuda', index=0)

In [9]:
class Convolution(torch.nn.Module):
    def __init__(self, out_channels, hidden_channels, heads, num_layers:int=None):
        super(Convolution, self).__init__()
        # Make sure either hidden_channels is a list, heads is a list or num_layer is supplied
        assert isinstance(hidden_channels, list) or isinstance(heads, list) or num_layers is not None,\
                "If hidden_channels is not a list, num_layers must be specified."
        # both are a list
        if isinstance(hidden_channels, list) and isinstance(heads, list):
            assert len(hidden_channels) == len(heads), f"Lengths of lists {len(hidden_channels)} and {len(heads)} do not match."
        # only hidden_channels is list
        if isinstance(hidden_channels, list) and not isinstance(heads, list):
            heads = [heads] * len(hidden_channels)
        # only heads is list
        if isinstance(heads, list) and not isinstance(hidden_channels, list):
            hidden_channels = [hidden_channels] * len(heads)
        # none is list
        if not isinstance(heads, list) and not isinstance(hidden_channels, list):
            heads = [heads] * num_layers
            hidden_channels = [hidden_channels] * num_layers
        
        # definition of Layers
        self.convolutions = ModuleList()
        for c, h in zip(hidden_channels, heads):
            self.convolutions.append(GATv2Conv(in_channels=-1, out_channels=c, heads=h, edge_dim=1))
        # Last Layer to match shape of output
        self.lin = Linear(in_features=hidden_channels[-1] * heads[-1], out_features=out_channels)

    def forward(self, x, edge_index, edge_attr):
        x = x.float()
        edge_attr = edge_attr.float()
        
        for conv in self.convolutions:
            x = F.relu(conv(x, edge_index, edge_attr))
        
        x = F.relu(self.lin(x))
        return x


class EmbedStations(torch.nn.Module):
    def __init__(self, num_stations_max, embedding_dim):
        super(EmbedStations, self).__init__()
        self.embed = Embedding(num_embeddings=num_stations_max, embedding_dim=embedding_dim)

    def forward(self, x):
        station_ids = x[:, 0].long()
        emb_station = self.embed(station_ids)
        x = torch.cat((emb_station, x[:, 1:]), dim=1) # Concatenate embedded station_id to rest of the feature vector
        return x


class MakePositive(torch.nn.Module):
    def __init__(self):
        super(MakePositive, self).__init__()

    def forward(self, x):
        mu, sigma = torch.split(x, 1, dim=-1)
        sigma = F.softplus(sigma) # ensure that sigma is positive
        return mu, sigma


class ResGnn(torch.nn.Module):
    def __init__(self, out_channels, num_layers, hidden_channels, heads):
        super(ResGnn, self).__init__()
        assert num_layers > 0, "num_layers must be > 0."

        # Create Layers
        self.convolutions = ModuleList()
        for i in range(num_layers):
            if i == 0:
                self.convolutions.append(GATv2Conv(-1, hidden_channels, heads=heads, edge_dim=1))
            else:
                self.convolutions.append((GATv2Conv(-1, hidden_channels, heads=heads, edge_dim=1)))
        self.lin = Linear(hidden_channels * heads, out_channels) #hier direkt 2 testen

    def forward(self, x, edge_index, edge_attr):
        x = x.float()
        edge_attr = edge_attr.float()
        for i, conv in enumerate(self.convolutions):
            if i == 0:
                # First Layer
                x = conv(x, edge_index, edge_attr)
                x = F.relu(x)
            else:
                x = x + F.relu(conv(x, edge_index, edge_attr)) # Residual Layers

        x = self.lin(x)
        x = F.relu(x)
        return x

In [10]:
def build_model(embed_dim:int, hidden_channels:int, heads:int, num_layers:int, type:str):
    """Builds  a model with the specified parameters

    Args:
        embed_dim (int): embedding dimension of the station id
        hidden_channels (int): number of hidden channels used by the convolution layers
        heads (int): number of heads used for the attention of the convolution layers
        num_layers (int): depth of the convolution layers
        linear_size (int): size of the linear layer
        type (str): type of the model, either 'ResGNNv2' or 'GATConvv2'


    Returns:
        _type_: returns a model with the specified parameters
    """
    torch.cuda.empty_cache()
    
    if type == 'ResGNNv2':
        conv = (ResGnn(out_channels=2, hidden_channels=hidden_channels, heads=heads, num_layers=num_layers), 'x, edge_index, edge_attr -> x')
    elif type == 'GATConvv2':
        conv = (Convolution(out_channels=2, hidden_channels=hidden_channels, heads=heads, num_layers=num_layers), 'x, edge_index, edge_attr -> x')
    
    model = Sequential('x, edge_index, edge_attr',
                   [
                       (EmbedStations(num_stations_max=535, embedding_dim=embed_dim), 'x -> x'),
                       conv,
                       #(Linear(linear_size, 2),'x -> x'),
                       (MakePositive(), 'x -> mu, sigma')
                   ])
    model.to(device)
    
    return model

def build_optimizer(model, learning_rate: float) -> torch.optim.Optimizer:
    """Defines the optimizer for the model


    Args:
        model (_type_): model for which the optimizer is defined
        learning_rate (float): learning rate

    Returns:
        torch.optim.Optimizer: returns the optimizer for the model
    """
    optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)
    return optimizer

In [11]:
def eval(model, test_loader):
    model.eval()
    mu_list = []
    sigma_list = []
    err_list = []
    y_list = []


    for batch in test_loader:
        batch.to(device)
        mu, sigma = model(batch.x, batch.edge_index, batch.edge_attr)
        y = batch.y
        err = crps(mu, sigma, y)
        mu = mu.detach().cpu().numpy().flatten()
        sigma = sigma.detach().cpu().numpy().flatten()
        y = y.cpu().numpy()
        err = err.detach().cpu().numpy()

        mu_list.append(mu)
        sigma_list.append(sigma)
        y_list.append(y)
        err_list.append(err*len(batch))

    err = sum(err_list) / len(test_loader.dataset)
    return err

In [12]:
def train_model(config=None):
    with wandb.init(config=config):
        config = wandb.config
        train_loader, valid_loader, test_loader = build_dataloaders(max_dist=config.max_dist, batch_size=config.batch_size)
        
        model = build_model(embed_dim=config.embed_dim,
                            hidden_channels=config.hidden_channels,
                            heads=config.heads,
                            num_layers=config.num_layers,
                            #linear_size=config.linear_size,
                            type=config.type)
        
        optimizer = build_optimizer(model=model, learning_rate=config.learning_rate)
        
        best_val_loss = float('inf')
        
        def train(batch):
            batch.to(device)
            optimizer.zero_grad()
            out = model(batch.x, batch.edge_index, batch.edge_attr)
            mu, sigma = out
            loss = crps(mu, sigma, batch.y)
            loss.backward()
            optimizer.step()
            return loss
        
        @torch.no_grad()
        def valid(batch):
            batch.to(device)
            out = model(batch.x, batch.edge_index, batch.edge_attr)
            mu, sigma = out
            loss = crps(mu, sigma, batch.y)
            return loss
        
        epochs_pbar = trange(config.max_epochs, desc="Epochs")
        for epoch in epochs_pbar:
            # Train for one epoch
            model.train()
            train_loss = 0.0
            for batch in train_loader:
                loss = train(batch)
                train_loss += loss.item() * batch.num_graphs
            train_loss /= len(train_loader.dataset)
                
            # Evaluate on the validation set
            model.eval()
            val_loss = 0.0
            for batch in valid_loader:
                loss = valid(batch)
                val_loss += loss.item() * batch.num_graphs
            val_loss /= len(valid_loader.dataset)
            
            # Check if the validation loss has improved
            if val_loss < best_val_loss:
                best_val_loss = val_loss
                no_improvement = 0
                # Save model checkpoint
                wandb.log({"best_val_loss": best_val_loss})
                torch.save({
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    }, "checkpoint.pt")
            else:
                no_improvement += 1
            
            # Log to WandB
            wandb.log({"train_loss": train_loss, "val_loss": val_loss})
            epochs_pbar.set_postfix({"Train Loss": train_loss, "Val Loss": val_loss, "Best Loss": best_val_loss, "No Improvement": no_improvement})
            # Early stopping
            if no_improvement == config.patience:
                print('Early stopping.')
                break
        
        # Load weights from model checkpoint
        checkpoint = torch.load("checkpoint.pt")
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        
        test_error = eval(model=model, test_loader=test_loader)
        
        wandb.log({"best_val_loss": best_val_loss,
                   "trained_epochs": epoch-config.patience,
                   "evaluation_error": test_error})
        
        # Free memory
        model.to('cpu')
        torch.cuda.empty_cache()
        
def train_model_catch_errors(config=None):
    try:
        train_model(config=config)
    except Exception as e:
        # exit gracefully, so wandb logs the problem
        print(traceback.print_exc())
        exit(1)

In [13]:
wandb.agent("feik/GNNPP/fo6141n7", train_model_catch_errors)

[34m[1mwandb[0m: Agent Starting Run: gqv1y7v4 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 1
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2
[34m[1mwandb[0m: Currently logged in as: [33mfeik[0m. Use [1m`wandb login --relogin`[0m to force relogin


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 127.53it/s]
Epochs:  31%|███       | 46/150 [01:55<04:20,  2.50s/it, Train Loss=7.94, Val Loss=7.94, Best Loss=7.94, No Improvement=25]


Early stopping.


0,1
best_val_loss,█▆▄▃▁▁
evaluation_error,▁
train_loss,▄▅▂▅▅▁▆▂▆▄▂▇▂▅▃▆▃▇▅▂▅▄▃▄█▁▁▄▆▇▂▂▇▄▄▅▆▄▃▆
trained_epochs,▁
val_loss,▅▄▅▆▄▃▂▅▇▇▄▅▄▄▇▄▄▂▁▅▅▄▄▃▃▅▅▃▅▅█▄▂▃▅█▃▅▃▅

0,1
best_val_loss,7.94116
evaluation_error,7.9122
train_loss,7.9391
trained_epochs,21.0
val_loss,7.94209


[34m[1mwandb[0m: Agent Starting Run: bdmfjhx1 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 1
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 130.48it/s]
Epochs: 100%|██████████| 150/150 [06:18<00:00,  2.52s/it, Train Loss=5.11, Val Loss=5.07, Best Loss=5.07, No Improvement=17]


0,1
best_val_loss,█▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,5.07052
evaluation_error,5.06037
train_loss,5.11114
trained_epochs,124.0
val_loss,5.07147


[34m[1mwandb[0m: Agent Starting Run: eryf64j5 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 130.71it/s]
Epochs:  49%|████▉     | 74/150 [04:07<04:14,  3.35s/it, Train Loss=5.12, Val Loss=5.08, Best Loss=5.08, No Improvement=25]


Early stopping.


VBox(children=(Label(value='0.021 MB of 0.021 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
best_val_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,5.0755
evaluation_error,5.06235
train_loss,5.11525
trained_epochs,49.0
val_loss,5.07709


[34m[1mwandb[0m: Agent Starting Run: f2y6li2f with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 130.12it/s]
Epochs:  85%|████████▌ | 128/150 [07:18<01:15,  3.43s/it, Train Loss=1.19, Val Loss=1.01, Best Loss=1.01, No Improvement=25]


Early stopping.


0,1
best_val_loss,█▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,1.00643
evaluation_error,1.03008
train_loss,1.18591
trained_epochs,103.0
val_loss,1.00947


[34m[1mwandb[0m: Agent Starting Run: k25cmsuj with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 127.73it/s]
Epochs:  47%|████▋     | 71/150 [04:58<05:31,  4.20s/it, Train Loss=7.94, Val Loss=7.94, Best Loss=7.94, No Improvement=25]


Early stopping.


0,1
best_val_loss,█▃▃▁▁▁
evaluation_error,▁
train_loss,▅▂▅▄▄▅▃▃▄▁▃▂▄▄▄▂▄▄▂▇▆▅▂▅▄▄█▅▂▂▅▆▃▄▁▆▄▄▃▅
trained_epochs,▁
val_loss,▅▂▃▆▃▄▅▄▅▅▂▂▇▅▄▄▅▁▄▅▃▁█▃▇▅▄▂▅▇▃▃▄█▄▅▅▆▃▂

0,1
best_val_loss,7.94112
evaluation_error,7.91185
train_loss,7.9392
trained_epochs,46.0
val_loss,7.94137


[34m[1mwandb[0m: Agent Starting Run: cfd6gg8q with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 127.29it/s]
Epochs:  99%|█████████▊| 148/150 [10:18<00:08,  4.18s/it, Train Loss=1.2, Val Loss=1.02, Best Loss=1.02, No Improvement=25] 


Early stopping.


0,1
best_val_loss,█▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▂▂▁▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,1.01813
evaluation_error,1.04498
train_loss,1.20341
trained_epochs,123.0
val_loss,1.02383


[34m[1mwandb[0m: Agent Starting Run: peoxrbht with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 4
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 127.67it/s]
Epochs:  29%|██▉       | 44/150 [03:39<08:48,  4.98s/it, Train Loss=7.94, Val Loss=7.94, Best Loss=7.94, No Improvement=25]


Early stopping.


VBox(children=(Label(value='0.021 MB of 0.021 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
best_val_loss,█▄▁▁▁
evaluation_error,▁
train_loss,▂▄▄▃▄▃█▃▄▄▅▄▂▄▄▇▄▅▄▃▁▄█▄▄▇▁█▁▄▂▄▅▃█▅▂▄▃▇
trained_epochs,▁
val_loss,▃▂▇▄▃▂▃▃▃▃▅▂▃▂▄▅▄▁▄▃▁▃▄▃▃▄▄█▄▆▅▄▃▅▃▅▁▄▂▃

0,1
best_val_loss,7.94121
evaluation_error,7.91187
train_loss,7.93955
trained_epochs,19.0
val_loss,7.94167


[34m[1mwandb[0m: Agent Starting Run: vkthx6fn with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 25
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 4
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:27<00:00, 127.88it/s]
Epochs:  41%|████▏     | 62/150 [05:06<07:15,  4.94s/it, Train Loss=1.33, Val Loss=1.12, Best Loss=1.12, No Improvement=25]


Early stopping.


0,1
best_val_loss,█▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,1.11572
evaluation_error,1.13958
train_loss,1.33356
trained_epochs,37.0
val_loss,1.12315


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: 5xwae2dc with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 75
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 1
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2


Building dataset: 100%|██████████| 3530/3530 [00:30<00:00, 117.33it/s]
Epochs: 100%|██████████| 150/150 [08:50<00:00,  3.54s/it, Train Loss=1.17, Val Loss=0.982, Best Loss=0.98, No Improvement=3]  


VBox(children=(Label(value='0.021 MB of 0.021 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
best_val_loss,█▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▅▃▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,0.98034
evaluation_error,1.01063
train_loss,1.16832
trained_epochs,124.0
val_loss,0.98222


[34m[1mwandb[0m: Agent Starting Run: poxkpcto with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 75
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 1
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:29<00:00, 119.08it/s]
Epochs: 100%|██████████| 150/150 [08:56<00:00,  3.58s/it, Train Loss=1.18, Val Loss=0.992, Best Loss=0.987, No Improvement=4] 


VBox(children=(Label(value='0.021 MB of 0.021 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
best_val_loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▅▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,0.98683
evaluation_error,1.02
train_loss,1.176
trained_epochs,124.0
val_loss,0.99155


[34m[1mwandb[0m: Agent Starting Run: 3y9fn999 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 75
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2


Building dataset: 100%|██████████| 3530/3530 [00:29<00:00, 118.01it/s]
Epochs:  71%|███████▏  | 107/150 [07:57<03:11,  4.46s/it, Train Loss=1.24, Val Loss=1.06, Best Loss=1.06, No Improvement=25]


Early stopping.


0,1
best_val_loss,█▄▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,1.06288
evaluation_error,1.0882
train_loss,1.24079
trained_epochs,82.0
val_loss,1.06482


[34m[1mwandb[0m: Agent Starting Run: z6cfcpzu with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 75
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 2
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:29<00:00, 118.59it/s]
Epochs: 100%|██████████| 150/150 [11:39<00:00,  4.66s/it, Train Loss=1.17, Val Loss=0.991, Best Loss=0.983, No Improvement=22]


0,1
best_val_loss,█▄▃▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▄▃▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,0.98348
evaluation_error,1.01128
train_loss,1.171
trained_epochs,124.0
val_loss,0.99141


[34m[1mwandb[0m: Sweep Agent: Waiting for job.
[34m[1mwandb[0m: Job received.
[34m[1mwandb[0m: Agent Starting Run: nbiclxe3 with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 75
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: GATConvv2


Building dataset: 100%|██████████| 3530/3530 [00:31<00:00, 112.12it/s]
Epochs:  38%|███▊      | 57/150 [05:09<08:24,  5.43s/it, Train Loss=3.85, Val Loss=3.63, Best Loss=3.63, No Improvement=25]


Early stopping.


VBox(children=(Label(value='0.077 MB of 0.077 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

0,1
best_val_loss,█▇▅▄▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
evaluation_error,▁
train_loss,█▇▆▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
trained_epochs,▁
val_loss,█▇▅▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁

0,1
best_val_loss,3.62904
evaluation_error,3.79233
train_loss,3.85397
trained_epochs,32.0
val_loss,3.62969


[34m[1mwandb[0m: Agent Starting Run: ll2mkioo with config:
[34m[1mwandb[0m: 	batch_size: 8
[34m[1mwandb[0m: 	embed_dim: 2
[34m[1mwandb[0m: 	heads: 1
[34m[1mwandb[0m: 	hidden_channels: 1
[34m[1mwandb[0m: 	learning_rate: 0.002
[34m[1mwandb[0m: 	max_dist: 75
[34m[1mwandb[0m: 	max_epochs: 150
[34m[1mwandb[0m: 	num_layers: 3
[34m[1mwandb[0m: 	patience: 25
[34m[1mwandb[0m: 	type: ResGNNv2


Building dataset: 100%|██████████| 3530/3530 [00:30<00:00, 117.06it/s]
Epochs:  49%|████▊     | 73/150 [06:15<06:32,  5.10s/it, Train Loss=1.2, Val Loss=1.09, Best Loss=1.09, No Improvement=0] 