# GGNN Optuna Optimization
This project demonstrates the use of a Gated Graph Neural Network (GGNN) model for a graph classification task. It includes the setup for GPU utilization, data loading and preparation, model definition, and the use of Optuna for hyperparameter optimization.


## Check GPU Availability
This section checks the availability of a GPU for PyTorch, ensuring that model training can leverage hardware acceleration if available.


In [1]:
# import torch
# print("PyTorch version:", torch.__version__)
# print("Is CUDA Supported?", torch.cuda.is_available())


In [2]:
# torch.cuda.is_available(), torch.cuda.device_count(), torch.cuda.get_device_name(0)


In [3]:
# import pandas as pd
# import torch
# import torch.nn as nn
# import torch.nn.functional as F
# import torch.optim as optim
# import dgl
# import numpy as np
# import optuna
# from dgl.nn import GatedGraphConv, GlobalAttentionPooling
# from dgl.dataloading import GraphDataLoader
# from sklearn.model_selection import train_test_split
# from optuna.visualization import plot_contour


## Data Loading and Preparation
Here, the data for training the GGNN model is loaded and prepared, including splitting into training, validation, and test sets.


In [4]:
# n_trials = 2
# num_epochs = 2

# # Load data and prepare for training
# reloaded_df = pd.read_csv("data_mvi/combined_df.csv")
# graphs, labels_dict = dgl.load_graphs("data_mvi/graphs.bin")
# labels = reloaded_df['binds_to_rna'].values

# # Split dataset train, test
# train_indices, test_indices, train_labels, test_labels = train_test_split(
#     range(len(reloaded_df)), labels, test_size=0.2, stratify=labels, random_state=42)

# # Split dataset train, validation
# train_indices, val_indices, train_labels, val_labels = train_test_split(
#     train_indices, train_labels, test_size=0.2, stratify=train_labels, random_state=42)

# train_graphs = [graphs[i] for i in train_indices]
# test_graphs = [graphs[i] for i in test_indices]
# val_graphs = [graphs[i] for i in val_indices]

# print(f'Train: {len(train_graphs)}, Validation: {len(val_graphs)}, Test: {len(test_graphs)}')


## Define Model and Utilities
This section defines the GGNN model, an early stopping utility to prevent overfitting, and a custom collate function for data loading.


In [5]:
# class GraphClsGGNN(nn.Module):
#     """GGNN for graph classification"""
#     def __init__(self, annotation_size, out_feats, n_steps, n_etypes, num_cls, dropout_rate=0.5):
#         """
#         Args:
#         annotation_size : int
#             The input feature size
#         out_feats : int
#             The output feature size
#         n_steps : int
#             The number of propagation steps
#         n_etypes : int
#             The number of edge types
#         num_cls : int
#             The number of output classes
#         dropout_rate : float
#             The dropout rate
#         """
#         super(GraphClsGGNN, self).__init__()
#         self.dropout = nn.Dropout(dropout_rate)
#         self.ggnn1 = GatedGraphConv(annotation_size, out_feats, n_steps, n_etypes)
#         self.ggnn2 = GatedGraphConv(out_feats, out_feats, n_steps, n_etypes)
#         self.pooling = GlobalAttentionPooling(nn.Linear(out_feats, 1))
#         self.fc = nn.Linear(out_feats, num_cls)

#     def forward(self, graph, feat):
#         """Forward pass"""
#         h = F.relu(self.ggnn1(graph, feat))
#         h = self.dropout(h)
#         h = F.relu(self.ggnn2(graph, h))
#         hg = self.pooling(graph, h)
#         out = self.fc(hg)
#         return out
    

In [6]:
# class EarlyStopping:
#     """Early stops the training if validation loss doesn't improve after a given patience."""
#     def __init__(self, patience=15, verbose=False, delta=0, path='checkpoint.pt', trace_func=print):
#         """ Initialize the EarlyStopping object """
#         """
#         Args:
#             patience (int): How long to wait after last time validation loss improved.
#                             Default: 15
#             verbose (bool): If True, prints a message for each validation loss improvement. 
#                             Default: False
#             delta (float): Minimum change in the monitored quantity to qualify as an improvement.
#                             Default: 0
#             path (str): Path for the checkpoint to be saved to.
#                             Default: 'checkpoint.pt'
#             trace_func (function): trace print function.
#                             Default: print
#         """
#         self.patience = patience
#         self.verbose = verbose
#         self.counter = 0
#         self.best_score = None
#         self.early_stop = False
#         self.val_loss_min = np.Inf
#         self.delta = delta
#         self.path = path
#         self.trace_func = trace_func

#     def __call__(self, val_loss, model):
#         """ Check if the validation loss has improved, and if not, increase the counter. """
#         """
#         Args:
#             val_loss (float): The validation loss
#             model (nn.Module): The model to be saved
#         """
#         score = -val_loss
#         if self.best_score is None:
#             self.best_score = val_loss
#             self.save_checkpoint(val_loss, model)
#         elif val_loss > self.best_score - self.delta:
#             self.counter += 1
#             self.trace_func(f'EarlyStopping counter: {self.counter} out of {self.patience}')
#             if self.counter >= self.patience:
#                 self.early_stop = True
#         else:
#             self.best_score = val_loss
#             self.save_checkpoint(val_loss, model)
#             self.counter = 0

#     def save_checkpoint(self, val_loss, model):
#         """ Saves model when validation loss decrease. """
#         """
#         Args:
#             val_loss (float): The validation loss
#             model (nn.Module): The model to be saved
#         """
#         if self.verbose:
#             self.trace_func(f'Saving model ...')
#         torch.save(model.state_dict(), self.path)
#         self.val_loss_min = val_loss
        

In [7]:
# def collate(samples):
#     """ Collate function for the DataLoader """
#     """
#     Args:
#         samples (List): The list of samples
#     """
#     graphs, labels = map(list, zip(*samples))
#     batched_graph = dgl.batch(graphs)
#     labels = torch.tensor(labels, dtype=torch.long)
#     return batched_graph, labels


## Training and Evaluation Pipeline
Outlines the process for training the GGNN model, including the training loop, validation checks, and early stopping implementation.


In [8]:
# class TrainingPipeline:
#     def __init__(self, device):
#         self.device = device

#     def train_and_evaluate(self, model, train_loader, val_loader, optimizer, criterion, early_stopping, num_epochs):
#         train_losses = []
#         val_losses = []
#         for epoch in range(num_epochs):
#         """ Train and evaluate the model """
#         """
#         Args:
#             model (nn.Module): The model to be trained
#             train_loader (DataLoader): The training data loader
#             val_loader (DataLoader): The validation data loader
#             optimizer (Optimizer): The optimizer
#             criterion (Loss): The loss function
#             early_stopping (EarlyStopping): The early stopping object
#             device (torch.device): The device to be used
#             num_epochs (int): The number of epochs
#         """
#         train_losses = []
#         val_losses = []
#         for epoch in range(num_epochs):
#             model.train()
#             train_loss = 0.0
#             for batched_graph, labels in train_loader:
#                 batched_graph, labels = batched_graph.to(device), labels.to(device)
#                 optimizer.zero_grad()
#                 logits = model(batched_graph, batched_graph.ndata['h'].float())
#                 loss = criterion(logits, labels)
#                 loss.backward()
#                 optimizer.step()
#                 train_loss += loss.item()

#             # Validation phase
#             model.eval()
#             val_loss = 0.0
#             with torch.no_grad():
#                 for batched_graph, labels in val_loader:
#                     batched_graph, labels = batched_graph.to(device), labels.to(device)
#                     logits = model(batched_graph, batched_graph.ndata['h'].float())
#                     loss = criterion(logits, labels)
#                     val_loss += loss.item()

#             train_loss /= len(train_loader)
#             val_loss /= len(val_loader)
            
#             train_losses.append(train_loss)
#             val_losses.append(val_loss)
            
#             early_stopping(val_loss, model)
#             if early_stopping.early_stop:
#                 print("Early stopping triggered at epoch:", epoch+1)
#                 break
        
#         return train_loss, val_loss, train_losses, val_losses

#     def evaluate_on_test(self, model, criterion):
#         """Evaluate the model on the test set."""
#         model.eval()
#         test_loss = 0.0
#         test_accuracy = 0.0
#         with torch.no_grad():
#             for batched_graph, labels in self.test_loader:
#                 batched_graph, labels = batched_graph.to(self.device), labels.to(self.device)
#                 logits = model(batched_graph, batched_graph.ndata['h'].float())
#                 loss = criterion(logits, labels)
#                 test_loss += loss.item()
#                 preds = torch.argmax(logits, dim=1)
#                 test_accuracy += torch.sum(preds == labels).item()

#         test_loss /= len(self.test_loader)
#         test_accuracy /= len(self.test_loader.dataset)
#         print("Test Loss:", test_loss)
#         print("Test Accuracy:", test_accuracy)


## Hyperparameter Optimization with Optuna
Describes the setup for hyperparameter optimization using Optuna, including defining the search space and optimizing the model parameters.


In [9]:
# class HyperparameterOptimizer:
#     def __init__(self, device, train_graphs, train_labels, val_graphs, val_labels, test_graphs, test_labels, num_trials, num_epochs):
#         self.device = device
#         self.train_graphs = train_graphs
#         self.train_labels = train_labels
#         self.val_graphs = val_graphs
#         self.val_labels = val_labels
#         self.test_graphs = test_graphs
#         self.test_labels = test_labels
#         self.num_trials = num_trials
#         self.num_epochs = num_epochs

#     def objective(trial):
#         """ Objective function for Optuna """
#         """
#         Args:
#             trial (optuna.Trial): The trial object
#         """
#         # Suggest hyperparameters
#         n_steps = trial.suggest_int('n_steps', 1, 30)
#         out_feats = trial.suggest_int('out_feats', 74, 512)
#         lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
#         batch_size = trial.suggest_categorical("batch_size", [8, 16, 32, 64, 128, 256, 512])
#         dropout_rate = trial.suggest_float('dropout_rate', 0.0, 0.5)

#         # DataLoaders
#         train_loader = GraphDataLoader(list(zip(train_graphs, train_labels)), batch_size=batch_size, shuffle=True, collate_fn=collate, num_workers=4)
#         val_loader = GraphDataLoader(list(zip(val_graphs, val_labels)), batch_size=batch_size, shuffle=False, collate_fn=collate, num_workers=4)
#         test_loader = GraphDataLoader(list(zip(test_graphs, test_labels)), batch_size=batch_size, shuffle=False, collate_fn=collate, num_workers=4)

#         # Model initialization
#         model = GraphClsGGNN(annotation_size=74, out_feats=out_feats, n_steps=n_steps, n_etypes=1, num_cls=2, dropout_rate=dropout_rate).to(device)
#         optimizer = optim.Adam(model.parameters(), lr=lr)
#         criterion = nn.CrossEntropyLoss()
        
#         # EarlyStopping instance
#         early_stopping = EarlyStopping(patience=7, verbose=False, path='ggnn_checkpoint.pt')

#         # Train and evaluate the model
#         train_loss, val_loss, train_losses, val_losses = train_and_evaluate(model, train_loader, val_loader, optimizer, criterion, early_stopping, device, num_epochs=num_epochs)

#         # Return the negative validation loss to maximize accuracy (minimize loss)
#         return -val_loss

#     def optimize(self):
#         """Conduct the hyperparameter optimization."""
#         study = optuna.create_study(direction='maximize')
#         study.optimize(self.objective, n_trials=self.num_trials)

#         print("Best trial:")
#         trial = study.best_trial
#         print(f"Value: {-trial.value}")
#         print("Params: ")
#         for key, value in trial.params.items():
#             print(f"{key}: {value}")
        
#         # Optionally, visualize the study
#         plot_contour(study, params=['lr', 'n_steps', 'out_feats', 'batch_size', 'dropout_rate'])

#         # Instantiate the best model for further evaluation
#         best_hyperparams = trial.params
#         model = GraphClsGGNN(annotation_size=74, 
#                              out_feats=best_hyperparams['out_feats'], 
#                              n_steps=best_hyperparams['n_steps'], 
#                              n_etypes=1, 
#                              num_cls=2, 
#                              dropout_rate=best_hyperparams.get('dropout_rate', 0.5)).to(self.device)

#         model.load_state_dict(torch.load('ggnn_checkpoint.pt'))
#         criterion = nn.CrossEntropyLoss()

#         # Evaluate on test set
#         training_pipeline = TrainingPipeline(self.device, self.train_loader, self.val_loader, self.test_loader)
#         training_pipeline.evaluate_on_test(model, criterion)


## Hyperparameter Optimization Execution
Initiates the hyperparameter optimization process, leveraging the previously defined model, data loaders, and Optuna setup.


In [10]:
# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# num_trials = 10
# num_epochs = 100

# hyperparameter_optimizer = HyperparameterOptimizer(device, train_graphs, train_labels, val_graphs, val_labels, test_graphs, test_labels, num_trials, num_epochs)
# hyperparameter_optimizer.optimize()


# 6666666666666666666666666666666666666666
# Separate

In [11]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import pandas as pd
import numpy as np
import dgl
from dgl.nn import GatedGraphConv, GlobalAttentionPooling
from dgl.dataloading import GraphDataLoader
from sklearn.model_selection import train_test_split
import optuna
from optuna.visualization import plot_contour

# Check PyTorch and CUDA availability
print("PyTorch version:", torch.__version__)
print("Is CUDA Supported?", torch.cuda.is_available())
if torch.cuda.is_available():
    print(torch.cuda.device_count(), "CUDA device(s) available.")
    print("CUDA Device Name:", torch.cuda.get_device_name(0))
    

PyTorch version: 2.1.2
Is CUDA Supported? True
1 CUDA device(s) available.
CUDA Device Name: Tesla T4


# #######################################################################################################

In [12]:
# Load data and prepare for training
reloaded_df = pd.read_csv("data_mvi/combined_df.csv")
graphs, labels_dict = dgl.load_graphs("data_mvi/graphs.bin")
labels = reloaded_df['binds_to_rna'].values

# Split dataset train, test
train_indices, test_indices, train_labels, test_labels = train_test_split(
    range(len(reloaded_df)), labels, test_size=0.2, stratify=labels, random_state=42)

# Split dataset train, validation
train_indices, val_indices, train_labels, val_labels = train_test_split(
    train_indices, train_labels, test_size=0.2, stratify=train_labels, random_state=42)

train_graphs = [graphs[i] for i in train_indices]
test_graphs = [graphs[i] for i in test_indices]
val_graphs = [graphs[i] for i in val_indices]

print(f'Train: {len(train_graphs)}, Validation: {len(val_graphs)}, Test: {len(test_graphs)}')


Train: 47275, Validation: 11819, Test: 14774


In [13]:
# Define the GGNN model
class GraphClsGGNN(nn.Module):
    """GGNN for graph classification."""
    def __init__(self, annotation_size, out_feats, n_steps, n_etypes, num_cls, dropout_rate=0.5):
        super(GraphClsGGNN, self).__init__()
        self.dropout = nn.Dropout(dropout_rate)
        self.ggnn1 = GatedGraphConv(annotation_size, out_feats, n_steps, n_etypes)
        self.ggnn2 = GatedGraphConv(out_feats, out_feats, n_steps, n_etypes)
        self.pooling = GlobalAttentionPooling(nn.Linear(out_feats, 1))
        self.fc = nn.Linear(out_feats, num_cls)

    def forward(self, graph, feat):
        h = F.relu(self.ggnn1(graph, feat))
        h = self.dropout(h)
        h = F.relu(self.ggnn2(graph, h))
        hg = self.pooling(graph, h)
        out = self.fc(hg)
        return out

# Define the EarlyStopping class
class EarlyStopping:
    """Early stops the training if validation loss doesn't improve after a given patience."""
    def __init__(self, patience=10, verbose=False, delta=0, path='checkpoint.pt', trace_func=print):
        self.patience = patience
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.val_loss_min = np.Inf
        self.delta = delta
        self.path = path
        self.trace_func = trace_func

    def __call__(self, val_loss, model):
        score = -val_loss
        if self.best_score is None:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
        elif score < self.best_score + self.delta:
            self.counter += 1
            self.trace_func(f'EarlyStopping counter: {self.counter} out of {self.patience}')
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            self.best_score = score
            self.save_checkpoint(val_loss, model)
            self.counter = 0

    def save_checkpoint(self, val_loss, model):
        if self.verbose:
            self.trace_func('Validation loss decreased ({:.6f} --> {:.6f}). Saving model ...'.format(self.val_loss_min, val_loss))
        torch.save(model.state_dict(), self.path)
        self.val_loss_min = val_loss

# Define the collate function for the DataLoader
def collate(samples):
    """Collate function for the DataLoader."""
    graphs, labels = map(list, zip(*samples))
    batched_graph = dgl.batch(graphs)
    labels = torch.tensor(labels, dtype=torch.long)
    return batched_graph, labels

# Define the TrainingPipeline class
class TrainingPipeline:
    """A class to encapsulate the training and evaluation pipeline."""
    def __init__(self, device):
        self.device = device

    def train_and_evaluate(self, model, train_loader, val_loader, optimizer, criterion, early_stopping, num_epochs):
        """Train and evaluate the model."""
        train_losses = []
        val_losses = []
        for epoch in range(num_epochs):
            model.train()
            train_loss = 0.0
            for batched_graph, labels in train_loader:
                batched_graph, labels = batched_graph.to(self.device), labels.to(self.device)
                optimizer.zero_grad()
                logits = model(batched_graph, batched_graph.ndata['h'].float())
                loss = criterion(logits, labels)
                loss.backward()
                optimizer.step()
                train_loss += loss.item()

            # Validation phase
            model.eval()
            val_loss = 0.0
            with torch.no_grad():
                for batched_graph, labels in val_loader:
                    batched_graph, labels = batched_graph.to(self.device), labels.to(self.device)
                    logits = model(batched_graph, batched_graph.ndata['h'].float())
                    loss = criterion(logits, labels)
                    val_loss += loss.item()

            train_loss /= len(train_loader)
            val_loss /= len(val_loader)
            train_losses.append(train_loss)
            val_losses.append(val_loss)
            
             # Print epoch number and loss every fifth epoch
            if (epoch + 1) % 5 == 0:
                print(f'Epoch {epoch + 1}/{num_epochs} - Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}')

            
            early_stopping(val_loss, model)
            if early_stopping.early_stop:
                print("Early stopping triggered at epoch:", epoch + 1)
                break
        
        return train_losses, val_losses


    def evaluate_on_test(self, model, test_loader, criterion):
        """Evaluate the model on the test set."""
        model.eval()
        test_loss = 0.0
        test_accuracy = 0.0
        with torch.no_grad():
            for batched_graph, labels in test_loader:
                batched_graph, labels = batched_graph.to(self.device), labels.to(self.device)
                logits = model(batched_graph, batched_graph.ndata['h'].float())
                loss = criterion(logits, labels)
                test_loss += loss.item()
                preds = torch.argmax(logits, dim=1)
                test_accuracy += torch.sum(preds == labels).item()

        test_loss /= len(test_loader)
        test_accuracy /= len(test_loader.dataset)
        print("Test Loss:", test_loss)
        print("Test Accuracy:", test_accuracy)



In [14]:
import json

class HyperparameterOptimizer:
    def __init__(self, device, train_graphs, train_labels, val_graphs, val_labels, test_graphs, test_labels, num_trials, num_epochs):
        self.device = device
        self.train_graphs = train_graphs
        self.train_labels = train_labels
        self.val_graphs = val_graphs
        self.val_labels = val_labels
        self.test_graphs = test_graphs
        self.test_labels = test_labels
        self.num_trials = num_trials
        self.num_epochs = num_epochs

    def objective(self, trial):
        """The objective function for the Optuna study."""
        # Suggest hyperparameters
        n_steps = trial.suggest_int('n_steps', 1, 30)
        out_feats = trial.suggest_int('out_feats', 74, 512)
        lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
        batch_size = trial.suggest_categorical("batch_size", [8, 16, 32, 64, 128, 256, 512])
        dropout_rate = trial.suggest_float('dropout_rate', 0.0, 0.5)

        # DataLoaders
        train_loader = GraphDataLoader(list(zip(self.train_graphs, self.train_labels)), batch_size=batch_size, shuffle=True, collate_fn=collate, num_workers=4)
        val_loader = GraphDataLoader(list(zip(self.val_graphs, self.val_labels)), batch_size=batch_size, shuffle=False, collate_fn=collate, num_workers=4)
        test_loader = GraphDataLoader(list(zip(self.test_graphs, self.test_labels)), batch_size=batch_size, shuffle=False, collate_fn=collate, num_workers=4)

        # Model initialization
        model = GraphClsGGNN(annotation_size=74, out_feats=out_feats, n_steps=n_steps, n_etypes=1, num_cls=2, dropout_rate=dropout_rate).to(self.device)
        optimizer = optim.Adam(model.parameters(), lr=lr)
        criterion = nn.CrossEntropyLoss()

        # EarlyStopping instance
        early_stopping = EarlyStopping(patience=5, verbose=False, path='ggnn_checkpoint.pt')

        # TrainingPipeline instance for training and evaluation
        training_pipeline = TrainingPipeline(self.device)
        train_losses, val_losses = training_pipeline.train_and_evaluate(model, train_loader, val_loader, optimizer, criterion, early_stopping, self.num_epochs)

        # Return the negative validation loss to maximize accuracy (minimize loss)
        return -np.min(val_losses)

    def optimize(self):
        study = optuna.create_study(direction='maximize')
        study.optimize(self.objective, n_trials=self.num_trials)
        best_hyperparams = study.best_trial.params
        with open('best_hyperparameters.json', 'w') as f:
            json.dump(best_hyperparams, f)
        print("Best trial saved to best_hyperparameters.json")


In [15]:
if __name__ == "__main__":
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    num_trials = 5  # Consider adjusting based on computational resources and desired thoroughness
    num_epochs = 10  # Adjust based on the dataset size and complexity of the model

    # Assume train_graphs, train_labels, val_graphs, val_labels, test_graphs, test_labels are defined
    hyperparameter_optimizer = HyperparameterOptimizer(device, train_graphs, train_labels, val_graphs, val_labels, test_graphs, test_labels, num_trials, num_epochs)
    hyperparameter_optimizer.optimize()
    

[I 2024-03-10 12:33:22,621] A new study created in memory with name: no-name-7f8ba73f-0948-4946-8d0e-f4acd73300b9


Epoch 5/10 - Train Loss: 0.4069, Val Loss: 0.4195
EarlyStopping counter: 1 out of 5


[I 2024-03-10 12:41:05,311] Trial 0 finished with value: -0.35334468588635726 and parameters: {'n_steps': 9, 'out_feats': 269, 'lr': 1.7924926161830276e-05, 'batch_size': 64, 'dropout_rate': 0.095007365974556}. Best is trial 0 with value: -0.35334468588635726.


Epoch 10/10 - Train Loss: 0.3582, Val Loss: 0.3533


EarlyStopping counter: 1 out of 5


Epoch 5/10 - Train Loss: 0.6673, Val Loss: 0.7435
EarlyStopping counter: 1 out of 5


EarlyStopping counter: 2 out of 5


EarlyStopping counter: 3 out of 5


EarlyStopping counter: 4 out of 5


[I 2024-03-10 13:06:49,848] Trial 1 finished with value: -0.6062370538711548 and parameters: {'n_steps': 13, 'out_feats': 490, 'lr': 0.03171875471418971, 'batch_size': 256, 'dropout_rate': 0.2439599124744522}. Best is trial 0 with value: -0.35334468588635726.


EarlyStopping counter: 5 out of 5
Early stopping triggered at epoch: 9


Epoch 5/10 - Train Loss: 0.5085, Val Loss: 0.5041


EarlyStopping counter: 1 out of 5


[I 2024-03-10 14:30:38,532] Trial 2 finished with value: -0.39996357357248347 and parameters: {'n_steps': 30, 'out_feats': 479, 'lr': 0.00010800252804778746, 'batch_size': 256, 'dropout_rate': 0.3877285340390775}. Best is trial 0 with value: -0.35334468588635726.


Epoch 10/10 - Train Loss: 0.4098, Val Loss: 0.4158
EarlyStopping counter: 1 out of 5


Epoch 5/10 - Train Loss: 0.3849, Val Loss: 0.3869


EarlyStopping counter: 1 out of 5


[I 2024-03-10 14:41:49,862] Trial 3 finished with value: -0.3350396387038692 and parameters: {'n_steps': 5, 'out_feats': 403, 'lr': 7.39153801001291e-05, 'batch_size': 128, 'dropout_rate': 0.12972280112303025}. Best is trial 3 with value: -0.3350396387038692.


Epoch 10/10 - Train Loss: 0.3260, Val Loss: 0.3433
EarlyStopping counter: 1 out of 5


EarlyStopping counter: 1 out of 5


Epoch 5/10 - Train Loss: 0.3802, Val Loss: 0.3707


EarlyStopping counter: 1 out of 5


[I 2024-03-10 14:52:26,484] Trial 4 finished with value: -0.3288701476881633 and parameters: {'n_steps': 4, 'out_feats': 455, 'lr': 4.729846807269786e-05, 'batch_size': 32, 'dropout_rate': 0.23787874753530625}. Best is trial 4 with value: -0.3288701476881633.


Epoch 10/10 - Train Loss: 0.3222, Val Loss: 0.3329
EarlyStopping counter: 1 out of 5
Best trial saved to best_hyperparameters.json


In [16]:
def retrain_and_evaluate(device, num_epochs):
    print("Loading best hyperparameters...")
    with open('best_hyperparameters.json', 'r') as f:
        best_hyperparams = json.load(f)

    print("Initializing data loaders...")
    batch_size = best_hyperparams['batch_size']
    train_loader = GraphDataLoader(list(zip(train_graphs, train_labels)), batch_size=batch_size, shuffle=True, collate_fn=collate, num_workers=4)
    val_loader = GraphDataLoader(list(zip(val_graphs, val_labels)), batch_size=batch_size, shuffle=False, collate_fn=collate, num_workers=4)
    test_loader = GraphDataLoader(list(zip(test_graphs, test_labels)), batch_size=batch_size, shuffle=False, collate_fn=collate, num_workers=4)


    print("Initializing model with best hyperparameters...")
    model = GraphClsGGNN(annotation_size=74, 
                         out_feats=best_hyperparams['out_feats'], 
                         n_steps=best_hyperparams['n_steps'], 
                         n_etypes=1, 
                         num_cls=2, 
                         dropout_rate=best_hyperparams.get('dropout_rate', 0.5)).to(device)
    optimizer = optim.Adam(model.parameters(), lr=best_hyperparams['lr'])
    criterion = nn.CrossEntropyLoss()

    print("Starting retraining...")
    early_stopping = EarlyStopping(patience=5, verbose=True, delta=0.01, path='retrained_model_checkpoint.pt')
    training_pipeline = TrainingPipeline(device)
    training_pipeline.train_and_evaluate(model, train_loader, val_loader, optimizer, criterion, early_stopping, num_epochs)

    if early_stopping.early_stop:
        print("Early stopping triggered during retraining.")

    print("Evaluating on test set...")
    training_pipeline.evaluate_on_test(model, test_loader, criterion)


In [17]:
if __name__ == "__main__":
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    # Adjust num_epochs as needed
    num_epochs = 50  
    retrain_and_evaluate(device, num_epochs)


Loading best hyperparameters...
Initializing data loaders...
Initializing model with best hyperparameters...


Starting retraining...


Validation loss decreased (inf --> 0.466252). Saving model ...


Validation loss decreased (0.466252 --> 0.425908). Saving model ...


Validation loss decreased (0.425908 --> 0.408522). Saving model ...


Validation loss decreased (0.408522 --> 0.382222). Saving model ...


Epoch 5/50 - Train Loss: 0.3756, Val Loss: 0.3756
EarlyStopping counter: 1 out of 5


Validation loss decreased (0.382222 --> 0.366717). Saving model ...


Validation loss decreased (0.366717 --> 0.343601). Saving model ...


EarlyStopping counter: 1 out of 5


EarlyStopping counter: 2 out of 5


Epoch 10/50 - Train Loss: 0.3192, Val Loss: 0.3280
Validation loss decreased (0.343601 --> 0.327952). Saving model ...


EarlyStopping counter: 1 out of 5


EarlyStopping counter: 2 out of 5


Validation loss decreased (0.327952 --> 0.310104). Saving model ...


EarlyStopping counter: 1 out of 5


Epoch 15/50 - Train Loss: 0.2799, Val Loss: 0.3175
EarlyStopping counter: 2 out of 5


EarlyStopping counter: 3 out of 5


EarlyStopping counter: 4 out of 5


EarlyStopping counter: 5 out of 5
Early stopping triggered at epoch: 18
Early stopping triggered during retraining.
Evaluating on test set...


Test Loss: 0.30774721183947157
Test Accuracy: 0.8688236090429132
