# Spaceship Titanic: TABM by Elias Ruud Aronsen

Project: [Spaceship Titanic](https://www.kaggle.com/competitions/spaceship-titanic/overview)

In this notebook, I will train a TABM model for the individual part of the project.

TABM Github repository can be found here: [TABM GitHub](https://github.com/yandex-research/tabm)

In [1]:
import pandas as pd
import random as rand
import torch
from torch import nn
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.metrics import accuracy_score
from src.tabm_reference import Model # This is our TABM model
from torch.utils.data import DataLoader, TensorDataset

# Read in preprocessed data, shown in 1-EDA-and-preprocessing.ipynb
train = pd.read_csv('data/processed_train.csv')
test = pd.read_csv('data/processed_test.csv')

df_Y = train['Transported']
df_X = train.drop(columns=['Transported'])

## TABM

In [2]:
df_X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8693 entries, 0 to 8692
Data columns (total 46 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   RoomService                8693 non-null   float64
 1   FoodCourt                  8693 non-null   float64
 2   ShoppingMall               8693 non-null   float64
 3   Spa                        8693 non-null   float64
 4   VRDeck                     8693 non-null   float64
 5   Group                      8693 non-null   int64  
 6   Id                         8693 non-null   int64  
 7   Num                        8693 non-null   float64
 8   FamilySize                 8693 non-null   float64
 9   GroupSize                  8693 non-null   int64  
 10  HomePlanet_Earth           8693 non-null   float64
 11  HomePlanet_Europa          8693 non-null   float64
 12  HomePlanet_Mars            8693 non-null   float64
 13  HomePlanet_nan             8693 non-null   float

In [None]:
#### MODEL DEFINITION AND SETUP

train_x, val_x, train_y, val_y = train_test_split( df_X, df_Y, test_size=0.2, random_state=42, stratify=df_Y) # split our dataset # stratify makes sure train and test have same class distribution

# we turn data int tensors as we will be using pytorch.
train_x_tensor = torch.tensor(train_x.values, dtype=torch.float32)
val_x_tensor = torch.tensor(val_x.values, dtype=torch.float32)
train_y_tensor = torch.tensor(train_y.values, dtype=torch.long)
val_y_tensor = torch.tensor(val_y.values, dtype=torch.long)

# create the tensor datasets
train_dataset = TensorDataset(train_x_tensor, train_y_tensor)
val_dataset = TensorDataset(val_x_tensor, val_y_tensor)

# create dataloaders for batches, speeds up training, self regularization, more stable updates etc.
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=512)


### Model setup
num_features = train_x_tensor.shape[1] # gives the amount of features, which is 46

# setup function which defines our model, loss function and optimizer with given parameters.
def setup( #default values
    n_blocks=5, 
    d_block=256, 
    k=3, 
    dropout=0.1,
    activation='ReLU', 
    lr=1e-3, 
    weight_decay=1e-4): # default settings

    model = Model(
        n_num_features=num_features,  # number of numeric features
        cat_cardinalities=[],  # list with number of categories for each categorical feature, empty since we have converted everything to numeric
        n_classes=2 ,  # number of output classes, 2 for binary
        
        backbone=dict(  # structure of the underlying MLPs
            type='MLP',  # simple feedforward network
            n_blocks=n_blocks,  # number of layers (depth)
            d_block=d_block,  # width of each layer (hidden size)
            dropout=dropout,  # dropout between layers for regularization
            activation=activation,  # activation function like ReLU or GELU 
        ),
    
        bins=None,  # used if we had numerical binning, we used one hot encoding for age instead
        num_embeddings=None,  # used if we had embedding of numeric features, we dont nees this
        arch_type='tabm',  # architecture type, "tabm" for TabM model
        k=k,  # number of experts the model can choose from at each layer
    )

    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay) # we tune learning rate for speed and stability and weight decay for regularization
    loss_function = nn.CrossEntropyLoss() # cross entropy is standard for binary classification

    return model, optimizer, loss_function


In [None]:
##### Training the TABM MODEL
# training function
def run_tabm(
    n_blocks=5, 
    d_block=256, 
    k=3, 
    dropout=0.1,
    activation='ReLU',
    lr=1e-3, 
    weight_decay=1e-4, 
    n_epochs=200, 
    patience=20 ):

    # define our model optimizer and loss function parameters wiht given parameters
    model, optimizer, loss_function = setup(n_blocks=n_blocks, d_block=d_block, k=k, dropout=dropout, activation=activation, lr=lr, weight_decay=weight_decay)

    best_val_loss = float('inf') # tracks current best validation loss reached
    patience_counter = 0 # goes up when the model steps into a worse val loss.
    
    #Training loop with validation monitoring and early stopping.
    for epoch in range(1, n_epochs + 1):
        #Training
        model.train()
        train_losses = [] # these will hold our losses for the batch, at the end of epoch take the average loss for each batch
        train_preds_all = [] # stored predictions for each batch, for calculation of accuracy later
        train_targets_all = [] # true labels for each batch, for calculation of accuracy later

        for xb, yb in train_loader: # loading in batches
            optimizer.zero_grad()
            outputs = model(xb, None) # forward propogation
            loss = loss_function(outputs.mean(dim=1), yb) # calculate loss
            loss.backward() # backward propegation
            optimizer.step()

            train_losses.append(loss.item()) # saves current batch loss
            train_preds_all.append(outputs.mean(dim=1).argmax(dim=1)) # we take the average across k preds, pick the predicted class and store it for each batch
            train_targets_all.append(yb) # stores the true labels
            
        # calculating accuracy for epoch
        train_preds_all = torch.cat(train_preds_all) # stich all batch preds in one tensore
        train_targets_all = torch.cat(train_targets_all) # 	same for the true labels
        train_acc = accuracy_score(train_targets_all.cpu(), train_preds_all.cpu()) # computes accuracy

        # Validation, mostly these same just no training 
        model.eval()
        val_losses = [] # these does the same thing as train_losses, train_preds_all and train_targets_all
        val_preds_all = []
        val_targets_all = []

        with torch.no_grad():
            for xb, yb in val_loader:
                val_outputs = model(xb, None)
                val_loss = loss_function(val_outputs.mean(dim=1), yb)

                val_losses.append(val_loss.item())
                val_preds_all.append(val_outputs.mean(dim=1).argmax(dim=1))
                val_targets_all.append(yb)

        # calculating validation accuracy
        val_preds_all = torch.cat(val_preds_all)
        val_targets_all = torch.cat(val_targets_all)
        val_acc = accuracy_score(val_targets_all.cpu(), val_preds_all.cpu())
        val_loss_epoch = sum(val_losses) / len(val_losses)

        print(f"Epoch {epoch}: Train Loss={sum(train_losses)/len(train_losses):.4f}, Train Acc={train_acc:.4f} ||| Val Loss={val_loss_epoch:.4f}, Val Acc={val_acc:.4f}")

        # early stopping check
        if val_loss_epoch < best_val_loss:
            best_val_loss = val_loss_epoch
            patience_counter = 0 # we reset counter each time we find better
            best_model_state = model.state_dict() # save state
        else: # increase patience if new is not better
            patience_counter += 1
            if patience_counter >= patience: # stop training if model has not produced a better val loss in the defined amount of epochs
                print(f"Early stopping triggered at epoch {epoch}")
                break
            
    # Load back the best found model
    model.load_state_dict(best_model_state)
    return model
    

### Tuning

In [None]:
import optuna

# how many hyperparameter combinations to try
n_trials = 50

# we define the objective function that Optuna will try to maximize
def objective(trial):
    # suggest a value for each hyperparameter
    ## The parameter space based recommendations from: https://github.com/yandex-research/tabm/tree/main
    params = {
        'n_blocks': trial.suggest_int('n_blocks', 2, 6),
        'd_block': trial.suggest_categorical('d_block', [128, 256, 512]),
        'k': trial.suggest_int('k', 4, 12),
        'dropout': trial.suggest_float('dropout', 0.1, 0.5),
        'activation': trial.suggest_categorical('activation', ['ReLU', 'GELU', 'LeakyReLU']),
        'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),
        'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2)
    }
    
    print(f"\n Trial {trial.number+1} with params: {params}")

    # train the model with the chosen parameters
    model = run_tabm(**params)

    # after we train, we evaluate the final val accuracy
    model.eval()  # set model to eval mode. turns off dropout etc.

    val_preds = []    # predictions
    val_targets = []  # targets

    with torch.no_grad():  # disables gradient tracking, faster
        for xb, yb in val_loader:  # go over val batches
            outputs = model(xb, None)  # forward pass
            preds = outputs.mean(dim=1).argmax(dim=1)  # ensemble average, then pick predicted class
            val_preds.append(preds)
            val_targets.append(yb)

    val_preds = torch.cat(val_preds)
    val_targets = torch.cat(val_targets)

    val_acc = accuracy_score(val_targets.cpu(), val_preds.cpu())  # accuracy calculation
    print(f"Validation Accuracy: {val_acc:.4f}")

    return val_acc  # optuna will try to maximize this


# create a study object
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=n_trials)

# save all results
results = [(t.params, t.value) for t in study.trials]

# best trial found
print(f"\nBest Trial Params: {study.best_params}")
print(f"Best Validation Accuracy: {study.best_value:.4f}")

  from .autonotebook import tqdm as notebook_tqdm
[I 2025-04-27 16:52:35,433] A new study created in memory with name: no-name-60be0ac5-c2a0-4e78-b558-45f50a391fd7



 Trial 1 with params: {'n_blocks': 2, 'd_block': 128, 'k': 8, 'dropout': 0.4226779073013317, 'activation': 'GELU', 'lr': 0.0010017908773783256, 'weight_decay': 5.48391092738397e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=8.2198, Train Acc=0.6445 ||| Val Loss=1.2242, Val Acc=0.7550
Epoch 2: Train Loss=2.2198, Train Acc=0.6622 ||| Val Loss=0.6161, Val Acc=0.7608
Epoch 3: Train Loss=1.0650, Train Acc=0.6624 ||| Val Loss=0.6124, Val Acc=0.6665
Epoch 4: Train Loss=0.7555, Train Acc=0.6695 ||| Val Loss=0.6185, Val Acc=0.6653
Epoch 5: Train Loss=0.6752, Train Acc=0.6800 ||| Val Loss=0.6124, Val Acc=0.6665
Epoch 6: Train Loss=0.6196, Train Acc=0.6926 ||| Val Loss=0.5936, Val Acc=0.6809
Epoch 7: Train Loss=0.5952, Train Acc=0.7123 ||| Val Loss=0.5979, Val Acc=0.6711
Epoch 8: Train Loss=0.5939, Train Acc=0.7094 ||| Val Loss=0.5940, Val Acc=0.6820
Epoch 9: Train Loss=0.5802, Train Acc=0.7181 ||| Val Loss=0.5797, Val Acc=0.6855
Epoch 10: Train Loss=0.5698, Train Acc=0.7216 ||| Val Loss=0.5812, Val Acc=0.6889
Epoch 11: Train Loss=0.5579, Train Acc=0.7302 ||| Val Loss=0.5676, Val Acc=0.7234
Epoch 12: Train Loss=0.5540, Train Acc=0.7321 ||| Val Loss=0.5655, Val Acc=0.6952
Epoch 13: Train Loss=0.54

[I 2025-04-27 16:53:17,548] Trial 0 finished with value: 0.7975848188614146 and parameters: {'n_blocks': 2, 'd_block': 128, 'k': 8, 'dropout': 0.4226779073013317, 'activation': 'GELU', 'lr': 0.0010017908773783256, 'weight_decay': 5.48391092738397e-06}. Best is trial 0 with value: 0.7975848188614146.


Validation Accuracy: 0.7976

 Trial 2 with params: {'n_blocks': 6, 'd_block': 128, 'k': 7, 'dropout': 0.3788069684528431, 'activation': 'GELU', 'lr': 0.00023978507077590286, 'weight_decay': 1.0413895371304064e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.1420, Train Acc=0.5055 ||| Val Loss=0.6769, Val Acc=0.6360
Epoch 2: Train Loss=0.7770, Train Acc=0.5315 ||| Val Loss=0.6726, Val Acc=0.6705
Epoch 3: Train Loss=0.7341, Train Acc=0.5309 ||| Val Loss=0.6708, Val Acc=0.6049
Epoch 4: Train Loss=0.7016, Train Acc=0.5587 ||| Val Loss=0.6639, Val Acc=0.6274
Epoch 5: Train Loss=0.6865, Train Acc=0.5742 ||| Val Loss=0.6475, Val Acc=0.6504
Epoch 6: Train Loss=0.6630, Train Acc=0.6011 ||| Val Loss=0.6131, Val Acc=0.6849
Epoch 7: Train Loss=0.6401, Train Acc=0.6435 ||| Val Loss=0.5779, Val Acc=0.7154
Epoch 8: Train Loss=0.6122, Train Acc=0.6690 ||| Val Loss=0.5624, Val Acc=0.7182
Epoch 9: Train Loss=0.5942, Train Acc=0.6887 ||| Val Loss=0.5672, Val Acc=0.7131
Epoch 10: Train Loss=0.5880, Train Acc=0.7081 ||| Val Loss=0.5571, Val Acc=0.7257
Epoch 11: Train Loss=0.5764, Train Acc=0.7225 ||| Val Loss=0.5459, Val Acc=0.7464
Epoch 12: Train Loss=0.5647, Train Acc=0.7288 ||| Val Loss=0.5347, Val Acc=0.7711
Epoch 13: Train Loss=0.56

[I 2025-04-27 16:54:18,600] Trial 1 finished with value: 0.7952846463484762 and parameters: {'n_blocks': 6, 'd_block': 128, 'k': 7, 'dropout': 0.3788069684528431, 'activation': 'GELU', 'lr': 0.00023978507077590286, 'weight_decay': 1.0413895371304064e-06}. Best is trial 0 with value: 0.7975848188614146.


Epoch 134: Train Loss=0.4156, Train Acc=0.8014 ||| Val Loss=0.4214, Val Acc=0.7953
Early stopping triggered at epoch 134
Validation Accuracy: 0.7953

 Trial 3 with params: {'n_blocks': 4, 'd_block': 512, 'k': 4, 'dropout': 0.16415412641921137, 'activation': 'LeakyReLU', 'lr': 0.00018308527448903479, 'weight_decay': 2.235440297775048e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.6864, Train Acc=0.6543 ||| Val Loss=0.8722, Val Acc=0.5405
Epoch 2: Train Loss=0.8285, Train Acc=0.6836 ||| Val Loss=0.5288, Val Acc=0.7907
Epoch 3: Train Loss=0.6766, Train Acc=0.6989 ||| Val Loss=0.5203, Val Acc=0.7769
Epoch 4: Train Loss=0.6184, Train Acc=0.7089 ||| Val Loss=0.5169, Val Acc=0.7792
Epoch 5: Train Loss=0.5888, Train Acc=0.7236 ||| Val Loss=0.5176, Val Acc=0.7821
Epoch 6: Train Loss=0.5642, Train Acc=0.7308 ||| Val Loss=0.5083, Val Acc=0.7844
Epoch 7: Train Loss=0.5617, Train Acc=0.7366 ||| Val Loss=0.5039, Val Acc=0.7913
Epoch 8: Train Loss=0.5611, Train Acc=0.7410 ||| Val Loss=0.5129, Val Acc=0.7855
Epoch 9: Train Loss=0.5406, Train Acc=0.7483 ||| Val Loss=0.5082, Val Acc=0.7930
Epoch 10: Train Loss=0.5267, Train Acc=0.7564 ||| Val Loss=0.5134, Val Acc=0.7924
Epoch 11: Train Loss=0.5280, Train Acc=0.7597 ||| Val Loss=0.5146, Val Acc=0.7941
Epoch 12: Train Loss=0.5271, Train Acc=0.7558 ||| Val Loss=0.5227, Val Acc=0.7493
Epoch 13: Train Loss=0.52

[I 2025-04-27 16:55:25,361] Trial 2 finished with value: 0.7952846463484762 and parameters: {'n_blocks': 4, 'd_block': 512, 'k': 4, 'dropout': 0.16415412641921137, 'activation': 'LeakyReLU', 'lr': 0.00018308527448903479, 'weight_decay': 2.235440297775048e-06}. Best is trial 0 with value: 0.7975848188614146.


Epoch 115: Train Loss=0.4135, Train Acc=0.8021 ||| Val Loss=0.4513, Val Acc=0.7953
Early stopping triggered at epoch 115
Validation Accuracy: 0.7953

 Trial 4 with params: {'n_blocks': 4, 'd_block': 512, 'k': 11, 'dropout': 0.4861056500848959, 'activation': 'GELU', 'lr': 0.0009034158014843336, 'weight_decay': 0.0008700519756372972}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.6913, Train Acc=0.5661 ||| Val Loss=0.5603, Val Acc=0.7320
Epoch 2: Train Loss=0.7774, Train Acc=0.6332 ||| Val Loss=0.5788, Val Acc=0.7993
Epoch 3: Train Loss=0.6352, Train Acc=0.6786 ||| Val Loss=0.5604, Val Acc=0.7849
Epoch 4: Train Loss=0.5885, Train Acc=0.6999 ||| Val Loss=0.5664, Val Acc=0.7608
Epoch 5: Train Loss=0.5734, Train Acc=0.7189 ||| Val Loss=0.5410, Val Acc=0.7752
Epoch 6: Train Loss=0.5654, Train Acc=0.7243 ||| Val Loss=0.5466, Val Acc=0.7855
Epoch 7: Train Loss=0.5498, Train Acc=0.7429 ||| Val Loss=0.5231, Val Acc=0.7884
Epoch 8: Train Loss=0.5455, Train Acc=0.7505 ||| Val Loss=0.5346, Val Acc=0.7878
Epoch 9: Train Loss=0.5339, Train Acc=0.7571 ||| Val Loss=0.5263, Val Acc=0.7884
Epoch 10: Train Loss=0.5398, Train Acc=0.7573 ||| Val Loss=0.5147, Val Acc=0.7936
Epoch 11: Train Loss=0.5340, Train Acc=0.7622 ||| Val Loss=0.5289, Val Acc=0.7832
Epoch 12: Train Loss=0.5265, Train Acc=0.7676 ||| Val Loss=0.5155, Val Acc=0.7861
Epoch 13: Train Loss=0.52

[I 2025-04-27 16:59:42,483] Trial 3 finished with value: 0.7918343875790684 and parameters: {'n_blocks': 4, 'd_block': 512, 'k': 11, 'dropout': 0.4861056500848959, 'activation': 'GELU', 'lr': 0.0009034158014843336, 'weight_decay': 0.0008700519756372972}. Best is trial 0 with value: 0.7975848188614146.


Epoch 200: Train Loss=0.4807, Train Acc=0.7857 ||| Val Loss=0.4656, Val Acc=0.7918
Validation Accuracy: 0.7918

 Trial 5 with params: {'n_blocks': 4, 'd_block': 128, 'k': 5, 'dropout': 0.4986705411191349, 'activation': 'GELU', 'lr': 0.012773167071090876, 'weight_decay': 0.00033187674234220094}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.1243, Train Acc=0.6110 ||| Val Loss=0.5993, Val Acc=0.6607
Epoch 2: Train Loss=0.6152, Train Acc=0.6810 ||| Val Loss=0.5835, Val Acc=0.6878
Epoch 3: Train Loss=0.5860, Train Acc=0.7065 ||| Val Loss=0.5696, Val Acc=0.7263
Epoch 4: Train Loss=0.5732, Train Acc=0.7236 ||| Val Loss=0.5264, Val Acc=0.7734
Epoch 5: Train Loss=0.5469, Train Acc=0.7476 ||| Val Loss=0.5059, Val Acc=0.7838
Epoch 6: Train Loss=0.5412, Train Acc=0.7555 ||| Val Loss=0.5001, Val Acc=0.7861
Epoch 7: Train Loss=0.5295, Train Acc=0.7679 ||| Val Loss=0.5115, Val Acc=0.7821
Epoch 8: Train Loss=0.5188, Train Acc=0.7787 ||| Val Loss=0.5199, Val Acc=0.7821
Epoch 9: Train Loss=0.5151, Train Acc=0.7791 ||| Val Loss=0.4908, Val Acc=0.7901
Epoch 10: Train Loss=0.5107, Train Acc=0.7808 ||| Val Loss=0.4979, Val Acc=0.7895
Epoch 11: Train Loss=0.5105, Train Acc=0.7829 ||| Val Loss=0.4894, Val Acc=0.7947
Epoch 12: Train Loss=0.5017, Train Acc=0.7847 ||| Val Loss=0.4815, Val Acc=0.7936
Epoch 13: Train Loss=0.49

[I 2025-04-27 17:00:08,371] Trial 4 finished with value: 0.7947096032202415 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 5, 'dropout': 0.4986705411191349, 'activation': 'GELU', 'lr': 0.012773167071090876, 'weight_decay': 0.00033187674234220094}. Best is trial 0 with value: 0.7975848188614146.


Epoch 76: Train Loss=0.5051, Train Acc=0.7840 ||| Val Loss=0.4694, Val Acc=0.7947
Early stopping triggered at epoch 76
Validation Accuracy: 0.7947

 Trial 6 with params: {'n_blocks': 6, 'd_block': 128, 'k': 5, 'dropout': 0.42152741422212603, 'activation': 'ReLU', 'lr': 0.0002689515344605779, 'weight_decay': 0.00048682323175334356}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.3758, Train Acc=0.5030 ||| Val Loss=0.6722, Val Acc=0.6578
Epoch 2: Train Loss=0.8480, Train Acc=0.5047 ||| Val Loss=0.6769, Val Acc=0.5716
Epoch 3: Train Loss=0.7517, Train Acc=0.5178 ||| Val Loss=0.6751, Val Acc=0.6722
Epoch 4: Train Loss=0.7266, Train Acc=0.5262 ||| Val Loss=0.6757, Val Acc=0.6584
Epoch 5: Train Loss=0.7035, Train Acc=0.5430 ||| Val Loss=0.6714, Val Acc=0.6682
Epoch 6: Train Loss=0.6968, Train Acc=0.5503 ||| Val Loss=0.6678, Val Acc=0.6297
Epoch 7: Train Loss=0.6876, Train Acc=0.5643 ||| Val Loss=0.6566, Val Acc=0.6642
Epoch 8: Train Loss=0.6709, Train Acc=0.5933 ||| Val Loss=0.6442, Val Acc=0.6532
Epoch 9: Train Loss=0.6576, Train Acc=0.6079 ||| Val Loss=0.6232, Val Acc=0.6975
Epoch 10: Train Loss=0.6391, Train Acc=0.6326 ||| Val Loss=0.6047, Val Acc=0.7136
Epoch 11: Train Loss=0.6229, Train Acc=0.6563 ||| Val Loss=0.5836, Val Acc=0.7493
Epoch 12: Train Loss=0.6109, Train Acc=0.6667 ||| Val Loss=0.5676, Val Acc=0.7706
Epoch 13: Train Loss=0.59

[I 2025-04-27 17:01:26,016] Trial 5 finished with value: 0.79700977573318 and parameters: {'n_blocks': 6, 'd_block': 128, 'k': 5, 'dropout': 0.42152741422212603, 'activation': 'ReLU', 'lr': 0.0002689515344605779, 'weight_decay': 0.00048682323175334356}. Best is trial 0 with value: 0.7975848188614146.


Epoch 200: Train Loss=0.4671, Train Acc=0.7849 ||| Val Loss=0.4544, Val Acc=0.7970
Validation Accuracy: 0.7970

 Trial 7 with params: {'n_blocks': 3, 'd_block': 256, 'k': 5, 'dropout': 0.2420801689089526, 'activation': 'GELU', 'lr': 0.0012204628354878094, 'weight_decay': 4.929218255047628e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.1315, Train Acc=0.6345 ||| Val Loss=0.6060, Val Acc=0.7119
Epoch 2: Train Loss=0.6584, Train Acc=0.6826 ||| Val Loss=0.6050, Val Acc=0.7412
Epoch 3: Train Loss=0.6023, Train Acc=0.7003 ||| Val Loss=0.5789, Val Acc=0.7136
Epoch 4: Train Loss=0.5964, Train Acc=0.7167 ||| Val Loss=0.5534, Val Acc=0.7389
Epoch 5: Train Loss=0.5886, Train Acc=0.7242 ||| Val Loss=0.5328, Val Acc=0.7625
Epoch 6: Train Loss=0.5555, Train Acc=0.7446 ||| Val Loss=0.5275, Val Acc=0.7550
Epoch 7: Train Loss=0.5445, Train Acc=0.7472 ||| Val Loss=0.5207, Val Acc=0.7648
Epoch 8: Train Loss=0.5364, Train Acc=0.7558 ||| Val Loss=0.5181, Val Acc=0.7619
Epoch 9: Train Loss=0.5283, Train Acc=0.7653 ||| Val Loss=0.5072, Val Acc=0.7757
Epoch 10: Train Loss=0.5205, Train Acc=0.7689 ||| Val Loss=0.5153, Val Acc=0.7757
Epoch 11: Train Loss=0.5116, Train Acc=0.7689 ||| Val Loss=0.5012, Val Acc=0.7878
Epoch 12: Train Loss=0.5090, Train Acc=0.7688 ||| Val Loss=0.4934, Val Acc=0.7844
Epoch 13: Train Loss=0.50

[I 2025-04-27 17:02:14,269] Trial 6 finished with value: 0.7849338700402531 and parameters: {'n_blocks': 3, 'd_block': 256, 'k': 5, 'dropout': 0.2420801689089526, 'activation': 'GELU', 'lr': 0.0012204628354878094, 'weight_decay': 4.929218255047628e-05}. Best is trial 0 with value: 0.7975848188614146.


Epoch 157: Train Loss=0.4107, Train Acc=0.7998 ||| Val Loss=0.4348, Val Acc=0.7849
Early stopping triggered at epoch 157
Validation Accuracy: 0.7849

 Trial 8 with params: {'n_blocks': 3, 'd_block': 128, 'k': 7, 'dropout': 0.13421047430650393, 'activation': 'GELU', 'lr': 0.023715545769525163, 'weight_decay': 1.4723360765830886e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=9.8055, Train Acc=0.7184 ||| Val Loss=0.5068, Val Acc=0.7901
Epoch 2: Train Loss=0.5376, Train Acc=0.7709 ||| Val Loss=0.6090, Val Acc=0.7522
Epoch 3: Train Loss=0.5460, Train Acc=0.7577 ||| Val Loss=0.4982, Val Acc=0.7844
Epoch 4: Train Loss=0.5126, Train Acc=0.7748 ||| Val Loss=0.4939, Val Acc=0.7792
Epoch 5: Train Loss=0.4936, Train Acc=0.7843 ||| Val Loss=0.4942, Val Acc=0.7872
Epoch 6: Train Loss=0.5003, Train Acc=0.7798 ||| Val Loss=0.4601, Val Acc=0.7970
Epoch 7: Train Loss=0.4792, Train Acc=0.7787 ||| Val Loss=0.4578, Val Acc=0.7964
Epoch 8: Train Loss=0.4991, Train Acc=0.7814 ||| Val Loss=0.4498, Val Acc=0.7970
Epoch 9: Train Loss=0.4842, Train Acc=0.7808 ||| Val Loss=0.4334, Val Acc=0.7884
Epoch 10: Train Loss=0.4735, Train Acc=0.7834 ||| Val Loss=0.4478, Val Acc=0.7878
Epoch 11: Train Loss=0.5475, Train Acc=0.7659 ||| Val Loss=0.4441, Val Acc=0.7993
Epoch 12: Train Loss=0.4580, Train Acc=0.7844 ||| Val Loss=0.4383, Val Acc=0.7976
Epoch 13: Train Loss=0.50

[I 2025-04-27 17:02:29,628] Trial 7 finished with value: 0.6175963197239793 and parameters: {'n_blocks': 3, 'd_block': 128, 'k': 7, 'dropout': 0.13421047430650393, 'activation': 'GELU', 'lr': 0.023715545769525163, 'weight_decay': 1.4723360765830886e-05}. Best is trial 0 with value: 0.7975848188614146.


Epoch 55: Train Loss=1.9006, Train Acc=0.5121 ||| Val Loss=0.7247, Val Acc=0.6176
Early stopping triggered at epoch 55
Validation Accuracy: 0.6176

 Trial 9 with params: {'n_blocks': 6, 'd_block': 512, 'k': 4, 'dropout': 0.4125798645567976, 'activation': 'ReLU', 'lr': 0.0004273952160210739, 'weight_decay': 1.5715345641888542e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.1995, Train Acc=0.5421 ||| Val Loss=0.6397, Val Acc=0.6498
Epoch 2: Train Loss=0.6962, Train Acc=0.6087 ||| Val Loss=0.5930, Val Acc=0.6998
Epoch 3: Train Loss=0.6255, Train Acc=0.6613 ||| Val Loss=0.5595, Val Acc=0.7234
Epoch 4: Train Loss=0.6027, Train Acc=0.6913 ||| Val Loss=0.5671, Val Acc=0.7159
Epoch 5: Train Loss=0.5802, Train Acc=0.7084 ||| Val Loss=0.5424, Val Acc=0.7809
Epoch 6: Train Loss=0.5629, Train Acc=0.7255 ||| Val Loss=0.5456, Val Acc=0.7671
Epoch 7: Train Loss=0.5560, Train Acc=0.7334 ||| Val Loss=0.5204, Val Acc=0.7803
Epoch 8: Train Loss=0.5485, Train Acc=0.7413 ||| Val Loss=0.5244, Val Acc=0.7861
Epoch 9: Train Loss=0.5410, Train Acc=0.7486 ||| Val Loss=0.5254, Val Acc=0.7665
Epoch 10: Train Loss=0.5381, Train Acc=0.7504 ||| Val Loss=0.5323, Val Acc=0.7717
Epoch 11: Train Loss=0.5364, Train Acc=0.7551 ||| Val Loss=0.5187, Val Acc=0.7878
Epoch 12: Train Loss=0.5262, Train Acc=0.7685 ||| Val Loss=0.5060, Val Acc=0.7844
Epoch 13: Train Loss=0.52

[I 2025-04-27 17:03:54,729] Trial 8 finished with value: 0.7912593444508338 and parameters: {'n_blocks': 6, 'd_block': 512, 'k': 4, 'dropout': 0.4125798645567976, 'activation': 'ReLU', 'lr': 0.0004273952160210739, 'weight_decay': 1.5715345641888542e-05}. Best is trial 0 with value: 0.7975848188614146.


Epoch 106: Train Loss=0.4286, Train Acc=0.7926 ||| Val Loss=0.4359, Val Acc=0.7913
Early stopping triggered at epoch 106
Validation Accuracy: 0.7913

 Trial 10 with params: {'n_blocks': 5, 'd_block': 128, 'k': 8, 'dropout': 0.43195593796688025, 'activation': 'LeakyReLU', 'lr': 0.0005265073582125093, 'weight_decay': 0.0023648770046333254}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.4864, Train Acc=0.5279 ||| Val Loss=0.6573, Val Acc=0.6348
Epoch 2: Train Loss=0.8060, Train Acc=0.5372 ||| Val Loss=0.6597, Val Acc=0.6141
Epoch 3: Train Loss=0.7084, Train Acc=0.5768 ||| Val Loss=0.6494, Val Acc=0.6274
Epoch 4: Train Loss=0.6658, Train Acc=0.6066 ||| Val Loss=0.6252, Val Acc=0.6734
Epoch 5: Train Loss=0.6348, Train Acc=0.6389 ||| Val Loss=0.5967, Val Acc=0.7062
Epoch 6: Train Loss=0.6000, Train Acc=0.6780 ||| Val Loss=0.5686, Val Acc=0.7228
Epoch 7: Train Loss=0.5796, Train Acc=0.7033 ||| Val Loss=0.5542, Val Acc=0.7320
Epoch 8: Train Loss=0.5643, Train Acc=0.7233 ||| Val Loss=0.5407, Val Acc=0.7562
Epoch 9: Train Loss=0.5556, Train Acc=0.7249 ||| Val Loss=0.5309, Val Acc=0.7792
Epoch 10: Train Loss=0.5468, Train Acc=0.7400 ||| Val Loss=0.5265, Val Acc=0.7752
Epoch 11: Train Loss=0.5450, Train Acc=0.7458 ||| Val Loss=0.5224, Val Acc=0.7786
Epoch 12: Train Loss=0.5399, Train Acc=0.7505 ||| Val Loss=0.5187, Val Acc=0.7792
Epoch 13: Train Loss=0.53

[I 2025-04-27 17:04:53,326] Trial 9 finished with value: 0.7935595169637722 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 8, 'dropout': 0.43195593796688025, 'activation': 'LeakyReLU', 'lr': 0.0005265073582125093, 'weight_decay': 0.0023648770046333254}. Best is trial 0 with value: 0.7975848188614146.


Epoch 176: Train Loss=0.5149, Train Acc=0.7859 ||| Val Loss=0.4865, Val Acc=0.7936
Early stopping triggered at epoch 176
Validation Accuracy: 0.7936

 Trial 11 with params: {'n_blocks': 2, 'd_block': 256, 'k': 11, 'dropout': 0.3089038693370183, 'activation': 'ReLU', 'lr': 0.005148146117543533, 'weight_decay': 0.008945115186488151}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=6.2094, Train Acc=0.6644 ||| Val Loss=0.5815, Val Acc=0.7257
Epoch 2: Train Loss=0.5617, Train Acc=0.7249 ||| Val Loss=0.5157, Val Acc=0.7832
Epoch 3: Train Loss=0.5451, Train Acc=0.7407 ||| Val Loss=0.5177, Val Acc=0.7838
Epoch 4: Train Loss=0.5336, Train Acc=0.7552 ||| Val Loss=0.5121, Val Acc=0.7890
Epoch 5: Train Loss=0.5229, Train Acc=0.7632 ||| Val Loss=0.5101, Val Acc=0.7723
Epoch 6: Train Loss=0.5209, Train Acc=0.7659 ||| Val Loss=0.5092, Val Acc=0.7821
Epoch 7: Train Loss=0.5227, Train Acc=0.7699 ||| Val Loss=0.5120, Val Acc=0.7890
Epoch 8: Train Loss=0.5175, Train Acc=0.7777 ||| Val Loss=0.5101, Val Acc=0.7941
Epoch 9: Train Loss=0.5157, Train Acc=0.7735 ||| Val Loss=0.5053, Val Acc=0.7861
Epoch 10: Train Loss=0.5183, Train Acc=0.7780 ||| Val Loss=0.5053, Val Acc=0.7895
Epoch 11: Train Loss=0.5198, Train Acc=0.7774 ||| Val Loss=0.5033, Val Acc=0.7867
Epoch 12: Train Loss=0.5216, Train Acc=0.7762 ||| Val Loss=0.5061, Val Acc=0.7844
Epoch 13: Train Loss=0.52

[I 2025-04-27 17:05:01,931] Trial 10 finished with value: 0.78205865439908 and parameters: {'n_blocks': 2, 'd_block': 256, 'k': 11, 'dropout': 0.3089038693370183, 'activation': 'ReLU', 'lr': 0.005148146117543533, 'weight_decay': 0.008945115186488151}. Best is trial 0 with value: 0.7975848188614146.


Epoch 31: Train Loss=0.5265, Train Acc=0.7765 ||| Val Loss=0.5038, Val Acc=0.7821
Early stopping triggered at epoch 31
Validation Accuracy: 0.7821

 Trial 12 with params: {'n_blocks': 2, 'd_block': 128, 'k': 9, 'dropout': 0.33743869567188, 'activation': 'ReLU', 'lr': 0.0001107506909386638, 'weight_decay': 9.955168400040347e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=12.0710, Train Acc=0.5722 ||| Val Loss=3.0306, Val Acc=0.7136
Epoch 2: Train Loss=8.4917, Train Acc=0.6107 ||| Val Loss=1.5742, Val Acc=0.7867
Epoch 3: Train Loss=6.4579, Train Acc=0.6455 ||| Val Loss=1.7486, Val Acc=0.7723
Epoch 4: Train Loss=5.2123, Train Acc=0.6474 ||| Val Loss=1.4477, Val Acc=0.7734
Epoch 5: Train Loss=4.1804, Train Acc=0.6595 ||| Val Loss=1.2687, Val Acc=0.6970
Epoch 6: Train Loss=3.5824, Train Acc=0.6644 ||| Val Loss=1.2326, Val Acc=0.6475
Epoch 7: Train Loss=2.9638, Train Acc=0.6537 ||| Val Loss=0.9033, Val Acc=0.7637
Epoch 8: Train Loss=2.4821, Train Acc=0.6599 ||| Val Loss=0.7684, Val Acc=0.7642
Epoch 9: Train Loss=2.1771, Train Acc=0.6688 ||| Val Loss=0.6695, Val Acc=0.7671
Epoch 10: Train Loss=1.8491, Train Acc=0.6662 ||| Val Loss=0.6316, Val Acc=0.7516
Epoch 11: Train Loss=1.6308, Train Acc=0.6661 ||| Val Loss=0.5828, Val Acc=0.7654
Epoch 12: Train Loss=1.4574, Train Acc=0.6707 ||| Val Loss=0.5781, Val Acc=0.7355
Epoch 13: Train Loss=1.2

[I 2025-04-27 17:05:36,796] Trial 11 finished with value: 0.7843588269120184 and parameters: {'n_blocks': 2, 'd_block': 128, 'k': 9, 'dropout': 0.33743869567188, 'activation': 'ReLU', 'lr': 0.0001107506909386638, 'weight_decay': 9.955168400040347e-05}. Best is trial 0 with value: 0.7975848188614146.


Epoch 177: Train Loss=0.4518, Train Acc=0.7913 ||| Val Loss=0.4657, Val Acc=0.7844
Early stopping triggered at epoch 177
Validation Accuracy: 0.7844

 Trial 13 with params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.2475771551049411, 'activation': 'ReLU', 'lr': 0.0027183970293804946, 'weight_decay': 5.486607463330976e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7485, Train Acc=0.6567 ||| Val Loss=0.5513, Val Acc=0.7527
Epoch 2: Train Loss=0.5391, Train Acc=0.7560 ||| Val Loss=0.4994, Val Acc=0.7861
Epoch 3: Train Loss=0.5210, Train Acc=0.7761 ||| Val Loss=0.4917, Val Acc=0.7872
Epoch 4: Train Loss=0.5181, Train Acc=0.7731 ||| Val Loss=0.4835, Val Acc=0.7953
Epoch 5: Train Loss=0.5018, Train Acc=0.7840 ||| Val Loss=0.4802, Val Acc=0.7941
Epoch 6: Train Loss=0.4906, Train Acc=0.7872 ||| Val Loss=0.4817, Val Acc=0.7936
Epoch 7: Train Loss=0.4891, Train Acc=0.7817 ||| Val Loss=0.4698, Val Acc=0.7953
Epoch 8: Train Loss=0.4796, Train Acc=0.7806 ||| Val Loss=0.4741, Val Acc=0.7964
Epoch 9: Train Loss=0.4718, Train Acc=0.7913 ||| Val Loss=0.4755, Val Acc=0.7964
Epoch 10: Train Loss=0.4643, Train Acc=0.7919 ||| Val Loss=0.4390, Val Acc=0.7918
Epoch 11: Train Loss=0.4608, Train Acc=0.7915 ||| Val Loss=0.4429, Val Acc=0.7959
Epoch 12: Train Loss=0.4500, Train Acc=0.7915 ||| Val Loss=0.4362, Val Acc=0.7907
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:06:06,178] Trial 12 finished with value: 0.7993099482461185 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.2475771551049411, 'activation': 'ReLU', 'lr': 0.0027183970293804946, 'weight_decay': 5.486607463330976e-06}. Best is trial 12 with value: 0.7993099482461185.


Epoch 81: Train Loss=0.3967, Train Acc=0.8082 ||| Val Loss=0.4105, Val Acc=0.7993
Early stopping triggered at epoch 81
Validation Accuracy: 0.7993

 Trial 14 with params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.22242292358131166, 'activation': 'ReLU', 'lr': 0.004070611397840372, 'weight_decay': 4.672432428590436e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7370, Train Acc=0.6739 ||| Val Loss=0.5079, Val Acc=0.7924
Epoch 2: Train Loss=0.5297, Train Acc=0.7675 ||| Val Loss=0.5092, Val Acc=0.7936
Epoch 3: Train Loss=0.5218, Train Acc=0.7761 ||| Val Loss=0.4806, Val Acc=0.7959
Epoch 4: Train Loss=0.5052, Train Acc=0.7854 ||| Val Loss=0.4770, Val Acc=0.7913
Epoch 5: Train Loss=0.4928, Train Acc=0.7837 ||| Val Loss=0.4915, Val Acc=0.7941
Epoch 6: Train Loss=0.4814, Train Acc=0.7817 ||| Val Loss=0.4604, Val Acc=0.8005
Epoch 7: Train Loss=0.4686, Train Acc=0.7889 ||| Val Loss=0.4346, Val Acc=0.7987
Epoch 8: Train Loss=0.4513, Train Acc=0.7880 ||| Val Loss=0.4355, Val Acc=0.7941
Epoch 9: Train Loss=0.4505, Train Acc=0.7890 ||| Val Loss=0.4252, Val Acc=0.7918
Epoch 10: Train Loss=0.4415, Train Acc=0.7908 ||| Val Loss=0.4242, Val Acc=0.7953
Epoch 11: Train Loss=0.4335, Train Acc=0.7942 ||| Val Loss=0.4241, Val Acc=0.7999
Epoch 12: Train Loss=0.4302, Train Acc=0.7915 ||| Val Loss=0.4204, Val Acc=0.7953
Epoch 13: Train Loss=0.42

[I 2025-04-27 17:06:33,358] Trial 13 finished with value: 0.8067855089131685 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.22242292358131166, 'activation': 'ReLU', 'lr': 0.004070611397840372, 'weight_decay': 4.672432428590436e-06}. Best is trial 13 with value: 0.8067855089131685.


Epoch 75: Train Loss=0.3956, Train Acc=0.8053 ||| Val Loss=0.4138, Val Acc=0.8068
Early stopping triggered at epoch 75
Validation Accuracy: 0.8068

 Trial 15 with params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.22303030403997218, 'activation': 'ReLU', 'lr': 0.004099862119126689, 'weight_decay': 5.27735244168503e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7099, Train Acc=0.6923 ||| Val Loss=0.5067, Val Acc=0.7901
Epoch 2: Train Loss=0.5336, Train Acc=0.7678 ||| Val Loss=0.5072, Val Acc=0.7838
Epoch 3: Train Loss=0.5165, Train Acc=0.7790 ||| Val Loss=0.4908, Val Acc=0.7901
Epoch 4: Train Loss=0.4989, Train Acc=0.7810 ||| Val Loss=0.4957, Val Acc=0.7809
Epoch 5: Train Loss=0.4918, Train Acc=0.7827 ||| Val Loss=0.4654, Val Acc=0.7936
Epoch 6: Train Loss=0.4720, Train Acc=0.7877 ||| Val Loss=0.4495, Val Acc=0.8010
Epoch 7: Train Loss=0.4699, Train Acc=0.7849 ||| Val Loss=0.4535, Val Acc=0.7878
Epoch 8: Train Loss=0.4505, Train Acc=0.7919 ||| Val Loss=0.4523, Val Acc=0.7947
Epoch 9: Train Loss=0.4452, Train Acc=0.7911 ||| Val Loss=0.4262, Val Acc=0.7947
Epoch 10: Train Loss=0.4379, Train Acc=0.7913 ||| Val Loss=0.4168, Val Acc=0.7953
Epoch 11: Train Loss=0.4318, Train Acc=0.7949 ||| Val Loss=0.4251, Val Acc=0.7941
Epoch 12: Train Loss=0.4354, Train Acc=0.7908 ||| Val Loss=0.4258, Val Acc=0.7941
Epoch 13: Train Loss=0.42

[I 2025-04-27 17:06:56,176] Trial 14 finished with value: 0.80448533640023 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.22303030403997218, 'activation': 'ReLU', 'lr': 0.004099862119126689, 'weight_decay': 5.27735244168503e-06}. Best is trial 13 with value: 0.8067855089131685.


Epoch 61: Train Loss=0.4098, Train Acc=0.8047 ||| Val Loss=0.4211, Val Acc=0.8045
Early stopping triggered at epoch 61
Validation Accuracy: 0.8045

 Trial 16 with params: {'n_blocks': 5, 'd_block': 256, 'k': 12, 'dropout': 0.22021430390146038, 'activation': 'ReLU', 'lr': 0.006096229836076245, 'weight_decay': 3.400438765809951e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.4804, Train Acc=0.6704 ||| Val Loss=0.5239, Val Acc=0.7838
Epoch 2: Train Loss=0.5281, Train Acc=0.7728 ||| Val Loss=0.5002, Val Acc=0.7913
Epoch 3: Train Loss=0.5164, Train Acc=0.7716 ||| Val Loss=0.5079, Val Acc=0.7941
Epoch 4: Train Loss=0.5193, Train Acc=0.7788 ||| Val Loss=0.4868, Val Acc=0.7947
Epoch 5: Train Loss=0.5045, Train Acc=0.7823 ||| Val Loss=0.4988, Val Acc=0.7826
Epoch 6: Train Loss=0.4970, Train Acc=0.7821 ||| Val Loss=0.5004, Val Acc=0.7890
Epoch 7: Train Loss=0.4956, Train Acc=0.7849 ||| Val Loss=0.4716, Val Acc=0.7953
Epoch 8: Train Loss=0.4897, Train Acc=0.7867 ||| Val Loss=0.4685, Val Acc=0.7930
Epoch 9: Train Loss=0.4845, Train Acc=0.7876 ||| Val Loss=0.4703, Val Acc=0.7815
Epoch 10: Train Loss=0.4766, Train Acc=0.7867 ||| Val Loss=0.4553, Val Acc=0.7970
Epoch 11: Train Loss=0.4742, Train Acc=0.7866 ||| Val Loss=0.4602, Val Acc=0.7895
Epoch 12: Train Loss=0.4828, Train Acc=0.7824 ||| Val Loss=0.4655, Val Acc=0.7982
Epoch 13: Train Loss=0.47

[I 2025-04-27 17:08:38,469] Trial 15 finished with value: 0.8056354226566993 and parameters: {'n_blocks': 5, 'd_block': 256, 'k': 12, 'dropout': 0.22021430390146038, 'activation': 'ReLU', 'lr': 0.006096229836076245, 'weight_decay': 3.400438765809951e-05}. Best is trial 13 with value: 0.8067855089131685.


Epoch 138: Train Loss=0.4082, Train Acc=0.8007 ||| Val Loss=0.4186, Val Acc=0.8056
Early stopping triggered at epoch 138
Validation Accuracy: 0.8056

 Trial 17 with params: {'n_blocks': 5, 'd_block': 256, 'k': 12, 'dropout': 0.1868444115130066, 'activation': 'ReLU', 'lr': 0.012090580233799669, 'weight_decay': 3.050559349806885e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=9.6492, Train Acc=0.6585 ||| Val Loss=0.5177, Val Acc=0.7826
Epoch 2: Train Loss=0.5437, Train Acc=0.7558 ||| Val Loss=0.5087, Val Acc=0.7849
Epoch 3: Train Loss=0.5276, Train Acc=0.7696 ||| Val Loss=0.5030, Val Acc=0.7890
Epoch 4: Train Loss=0.5310, Train Acc=0.7680 ||| Val Loss=0.5104, Val Acc=0.7665
Epoch 5: Train Loss=0.5093, Train Acc=0.7773 ||| Val Loss=0.4768, Val Acc=0.7976
Epoch 6: Train Loss=0.5091, Train Acc=0.7778 ||| Val Loss=0.6848, Val Acc=0.7890
Epoch 7: Train Loss=0.5056, Train Acc=0.7758 ||| Val Loss=0.4809, Val Acc=0.7953
Epoch 8: Train Loss=0.5038, Train Acc=0.7777 ||| Val Loss=0.5173, Val Acc=0.7844
Epoch 9: Train Loss=0.5014, Train Acc=0.7747 ||| Val Loss=0.4677, Val Acc=0.7890
Epoch 10: Train Loss=0.4903, Train Acc=0.7867 ||| Val Loss=0.4868, Val Acc=0.7970
Epoch 11: Train Loss=0.5009, Train Acc=0.7817 ||| Val Loss=0.4742, Val Acc=0.7953
Epoch 12: Train Loss=0.4882, Train Acc=0.7849 ||| Val Loss=0.4685, Val Acc=0.7901
Epoch 13: Train Loss=0.48

[I 2025-04-27 17:09:46,294] Trial 16 finished with value: 0.7929844738355377 and parameters: {'n_blocks': 5, 'd_block': 256, 'k': 12, 'dropout': 0.1868444115130066, 'activation': 'ReLU', 'lr': 0.012090580233799669, 'weight_decay': 3.050559349806885e-05}. Best is trial 13 with value: 0.8067855089131685.


Epoch 90: Train Loss=0.4221, Train Acc=0.8000 ||| Val Loss=0.4320, Val Acc=0.7930
Early stopping triggered at epoch 90
Validation Accuracy: 0.7930

 Trial 18 with params: {'n_blocks': 5, 'd_block': 256, 'k': 12, 'dropout': 0.10894027041076242, 'activation': 'ReLU', 'lr': 0.04453125047086107, 'weight_decay': 0.00014842485660468803}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2649.6408, Train Acc=0.5469 ||| Val Loss=4.5939, Val Acc=0.5118
Epoch 2: Train Loss=7.3541, Train Acc=0.5286 ||| Val Loss=39.4635, Val Acc=0.4963
Epoch 3: Train Loss=38.6607, Train Acc=0.5559 ||| Val Loss=122.3272, Val Acc=0.4756
Epoch 4: Train Loss=8631.8872, Train Acc=0.5729 ||| Val Loss=26746.2861, Val Acc=0.5589
Epoch 5: Train Loss=462417.0091, Train Acc=0.6053 ||| Val Loss=99051.6953, Val Acc=0.5405
Epoch 6: Train Loss=35625.0086, Train Acc=0.6488 ||| Val Loss=1392.1430, Val Acc=0.7332
Epoch 7: Train Loss=972.1766, Train Acc=0.6789 ||| Val Loss=158.5144, Val Acc=0.7688
Epoch 8: Train Loss=256.8901, Train Acc=0.6963 ||| Val Loss=82.6899, Val Acc=0.7476
Epoch 9: Train Loss=168.4584, Train Acc=0.6923 ||| Val Loss=99.8561, Val Acc=0.7424
Epoch 10: Train Loss=125.9350, Train Acc=0.6892 ||| Val Loss=34.4805, Val Acc=0.7826
Epoch 11: Train Loss=102.7964, Train Acc=0.6918 ||| Val Loss=31.0049, Val Acc=0.7717
Epoch 12: Train Loss=71.3434, Train Acc=0.6941 ||| Val Loss=2

[I 2025-04-27 17:10:02,238] Trial 17 finished with value: 0.7849338700402531 and parameters: {'n_blocks': 5, 'd_block': 256, 'k': 12, 'dropout': 0.10894027041076242, 'activation': 'ReLU', 'lr': 0.04453125047086107, 'weight_decay': 0.00014842485660468803}. Best is trial 13 with value: 0.8067855089131685.


Epoch 21: Train Loss=31.6281, Train Acc=0.6743 ||| Val Loss=5.8952, Val Acc=0.7849
Early stopping triggered at epoch 21
Validation Accuracy: 0.7849

 Trial 19 with params: {'n_blocks': 3, 'd_block': 256, 'k': 10, 'dropout': 0.2796146010934679, 'activation': 'LeakyReLU', 'lr': 0.007503253150104942, 'weight_decay': 2.193829486700668e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=6.0478, Train Acc=0.6629 ||| Val Loss=0.5516, Val Acc=0.7677
Epoch 2: Train Loss=0.5478, Train Acc=0.7515 ||| Val Loss=0.5046, Val Acc=0.7815
Epoch 3: Train Loss=0.5340, Train Acc=0.7591 ||| Val Loss=0.5955, Val Acc=0.6763
Epoch 4: Train Loss=0.5236, Train Acc=0.7721 ||| Val Loss=0.5061, Val Acc=0.7976
Epoch 5: Train Loss=0.5179, Train Acc=0.7725 ||| Val Loss=0.4920, Val Acc=0.7936
Epoch 6: Train Loss=0.5114, Train Acc=0.7714 ||| Val Loss=0.5021, Val Acc=0.7844
Epoch 7: Train Loss=0.5162, Train Acc=0.7706 ||| Val Loss=0.4957, Val Acc=0.7970
Epoch 8: Train Loss=0.5041, Train Acc=0.7811 ||| Val Loss=0.4865, Val Acc=0.7924
Epoch 9: Train Loss=0.5053, Train Acc=0.7752 ||| Val Loss=0.5035, Val Acc=0.7832
Epoch 10: Train Loss=0.4990, Train Acc=0.7768 ||| Val Loss=0.4785, Val Acc=0.7901
Epoch 11: Train Loss=0.4918, Train Acc=0.7820 ||| Val Loss=0.4698, Val Acc=0.7941
Epoch 12: Train Loss=0.4804, Train Acc=0.7827 ||| Val Loss=0.4755, Val Acc=0.7867
Epoch 13: Train Loss=0.47

[I 2025-04-27 17:10:32,705] Trial 18 finished with value: 0.8010350776308223 and parameters: {'n_blocks': 3, 'd_block': 256, 'k': 10, 'dropout': 0.2796146010934679, 'activation': 'LeakyReLU', 'lr': 0.007503253150104942, 'weight_decay': 2.193829486700668e-05}. Best is trial 13 with value: 0.8067855089131685.


Epoch 68: Train Loss=0.4135, Train Acc=0.8043 ||| Val Loss=0.4171, Val Acc=0.8010
Early stopping triggered at epoch 68
Validation Accuracy: 0.8010

 Trial 20 with params: {'n_blocks': 6, 'd_block': 256, 'k': 11, 'dropout': 0.2000376712509801, 'activation': 'ReLU', 'lr': 0.002510177980904529, 'weight_decay': 1.0351607224212033e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.6480, Train Acc=0.6970 ||| Val Loss=0.5319, Val Acc=0.7752
Epoch 2: Train Loss=0.5198, Train Acc=0.7767 ||| Val Loss=0.5052, Val Acc=0.7849
Epoch 3: Train Loss=0.5110, Train Acc=0.7780 ||| Val Loss=0.4767, Val Acc=0.7987
Epoch 4: Train Loss=0.5045, Train Acc=0.7827 ||| Val Loss=0.4775, Val Acc=0.7976
Epoch 5: Train Loss=0.4955, Train Acc=0.7844 ||| Val Loss=0.4858, Val Acc=0.7987
Epoch 6: Train Loss=0.4900, Train Acc=0.7846 ||| Val Loss=0.4598, Val Acc=0.7924
Epoch 7: Train Loss=0.4752, Train Acc=0.7885 ||| Val Loss=0.4516, Val Acc=0.7953
Epoch 8: Train Loss=0.4681, Train Acc=0.7892 ||| Val Loss=0.4496, Val Acc=0.8028
Epoch 9: Train Loss=0.4612, Train Acc=0.7922 ||| Val Loss=0.4245, Val Acc=0.8005
Epoch 10: Train Loss=0.4567, Train Acc=0.7827 ||| Val Loss=0.4366, Val Acc=0.7976
Epoch 11: Train Loss=0.4460, Train Acc=0.7896 ||| Val Loss=0.4261, Val Acc=0.7947
Epoch 12: Train Loss=0.4433, Train Acc=0.7912 ||| Val Loss=0.4247, Val Acc=0.7953
Epoch 13: Train Loss=0.43

[I 2025-04-27 17:11:31,853] Trial 19 finished with value: 0.8039102932719954 and parameters: {'n_blocks': 6, 'd_block': 256, 'k': 11, 'dropout': 0.2000376712509801, 'activation': 'ReLU', 'lr': 0.002510177980904529, 'weight_decay': 1.0351607224212033e-06}. Best is trial 13 with value: 0.8067855089131685.


Epoch 85: Train Loss=0.3876, Train Acc=0.8097 ||| Val Loss=0.4114, Val Acc=0.8039
Early stopping triggered at epoch 85
Validation Accuracy: 0.8039

 Trial 21 with params: {'n_blocks': 5, 'd_block': 256, 'k': 7, 'dropout': 0.2825841632112342, 'activation': 'ReLU', 'lr': 0.008173394292636528, 'weight_decay': 6.528443999332398e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.2770, Train Acc=0.6119 ||| Val Loss=0.5733, Val Acc=0.7648
Epoch 2: Train Loss=0.5634, Train Acc=0.7394 ||| Val Loss=0.5303, Val Acc=0.7838
Epoch 3: Train Loss=0.5371, Train Acc=0.7575 ||| Val Loss=0.5338, Val Acc=0.7700
Epoch 4: Train Loss=0.5403, Train Acc=0.7643 ||| Val Loss=0.5463, Val Acc=0.7303
Epoch 5: Train Loss=0.5313, Train Acc=0.7657 ||| Val Loss=0.5145, Val Acc=0.7694
Epoch 6: Train Loss=0.5175, Train Acc=0.7773 ||| Val Loss=0.4928, Val Acc=0.7953
Epoch 7: Train Loss=0.5065, Train Acc=0.7821 ||| Val Loss=0.4816, Val Acc=0.7964
Epoch 8: Train Loss=0.4958, Train Acc=0.7816 ||| Val Loss=0.4836, Val Acc=0.7924
Epoch 9: Train Loss=0.5037, Train Acc=0.7764 ||| Val Loss=0.4778, Val Acc=0.7936
Epoch 10: Train Loss=0.4917, Train Acc=0.7817 ||| Val Loss=0.4913, Val Acc=0.7775
Epoch 11: Train Loss=0.5042, Train Acc=0.7739 ||| Val Loss=0.4839, Val Acc=0.7936
Epoch 12: Train Loss=0.4949, Train Acc=0.7801 ||| Val Loss=0.4753, Val Acc=0.7844
Epoch 13: Train Loss=0.49

[I 2025-04-27 17:11:58,381] Trial 20 finished with value: 0.79700977573318 and parameters: {'n_blocks': 5, 'd_block': 256, 'k': 7, 'dropout': 0.2825841632112342, 'activation': 'ReLU', 'lr': 0.008173394292636528, 'weight_decay': 6.528443999332398e-05}. Best is trial 13 with value: 0.8067855089131685.


Epoch 57: Train Loss=0.4689, Train Acc=0.7900 ||| Val Loss=0.4591, Val Acc=0.7970
Early stopping triggered at epoch 57
Validation Accuracy: 0.7970

 Trial 22 with params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.2256010105258364, 'activation': 'ReLU', 'lr': 0.003487671318826522, 'weight_decay': 6.9982025049786245e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7483, Train Acc=0.6634 ||| Val Loss=0.5201, Val Acc=0.7878
Epoch 2: Train Loss=0.5248, Train Acc=0.7714 ||| Val Loss=0.4977, Val Acc=0.7844
Epoch 3: Train Loss=0.5236, Train Acc=0.7757 ||| Val Loss=0.5050, Val Acc=0.7913
Epoch 4: Train Loss=0.5079, Train Acc=0.7794 ||| Val Loss=0.4845, Val Acc=0.7964
Epoch 5: Train Loss=0.4891, Train Acc=0.7810 ||| Val Loss=0.4719, Val Acc=0.7976
Epoch 6: Train Loss=0.4857, Train Acc=0.7816 ||| Val Loss=0.4804, Val Acc=0.7987
Epoch 7: Train Loss=0.4786, Train Acc=0.7820 ||| Val Loss=0.4590, Val Acc=0.7964
Epoch 8: Train Loss=0.4617, Train Acc=0.7886 ||| Val Loss=0.4520, Val Acc=0.7947
Epoch 9: Train Loss=0.4561, Train Acc=0.7890 ||| Val Loss=0.4405, Val Acc=0.7941
Epoch 10: Train Loss=0.4490, Train Acc=0.7906 ||| Val Loss=0.4398, Val Acc=0.7936
Epoch 11: Train Loss=0.4381, Train Acc=0.7883 ||| Val Loss=0.4339, Val Acc=0.7930
Epoch 12: Train Loss=0.4375, Train Acc=0.7908 ||| Val Loss=0.4245, Val Acc=0.7924
Epoch 13: Train Loss=0.43

[I 2025-04-27 17:12:31,699] Trial 21 finished with value: 0.8062104657849338 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.2256010105258364, 'activation': 'ReLU', 'lr': 0.003487671318826522, 'weight_decay': 6.9982025049786245e-06}. Best is trial 13 with value: 0.8067855089131685.


Epoch 94: Train Loss=0.3897, Train Acc=0.8181 ||| Val Loss=0.4062, Val Acc=0.8062
Early stopping triggered at epoch 94
Validation Accuracy: 0.8062

 Trial 23 with params: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.15984630203331762, 'activation': 'ReLU', 'lr': 0.0014769364639454208, 'weight_decay': 9.200748737757358e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.9108, Train Acc=0.6654 ||| Val Loss=0.5515, Val Acc=0.7752
Epoch 2: Train Loss=0.5569, Train Acc=0.7345 ||| Val Loss=0.5341, Val Acc=0.7476
Epoch 3: Train Loss=0.5385, Train Acc=0.7514 ||| Val Loss=0.5075, Val Acc=0.7936
Epoch 4: Train Loss=0.5313, Train Acc=0.7606 ||| Val Loss=0.4972, Val Acc=0.7924
Epoch 5: Train Loss=0.5150, Train Acc=0.7711 ||| Val Loss=0.5037, Val Acc=0.7890
Epoch 6: Train Loss=0.5085, Train Acc=0.7768 ||| Val Loss=0.4930, Val Acc=0.7872
Epoch 7: Train Loss=0.5035, Train Acc=0.7781 ||| Val Loss=0.4987, Val Acc=0.7849
Epoch 8: Train Loss=0.4992, Train Acc=0.7816 ||| Val Loss=0.4902, Val Acc=0.7878
Epoch 9: Train Loss=0.4908, Train Acc=0.7826 ||| Val Loss=0.4728, Val Acc=0.7987
Epoch 10: Train Loss=0.4884, Train Acc=0.7807 ||| Val Loss=0.4810, Val Acc=0.7861
Epoch 11: Train Loss=0.4798, Train Acc=0.7852 ||| Val Loss=0.4760, Val Acc=0.7878
Epoch 12: Train Loss=0.4726, Train Acc=0.7869 ||| Val Loss=0.4561, Val Acc=0.7982
Epoch 13: Train Loss=0.46

[I 2025-04-27 17:13:04,387] Trial 22 finished with value: 0.8096607245543416 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.15984630203331762, 'activation': 'ReLU', 'lr': 0.0014769364639454208, 'weight_decay': 9.200748737757358e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 91: Train Loss=0.3919, Train Acc=0.8121 ||| Val Loss=0.4146, Val Acc=0.8097
Early stopping triggered at epoch 91
Validation Accuracy: 0.8097

 Trial 24 with params: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.1532506195603504, 'activation': 'ReLU', 'lr': 0.0016343055351461786, 'weight_decay': 2.9323631794337012e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.8443, Train Acc=0.6783 ||| Val Loss=0.5169, Val Acc=0.7780
Epoch 2: Train Loss=0.5490, Train Acc=0.7463 ||| Val Loss=0.5121, Val Acc=0.7826
Epoch 3: Train Loss=0.5309, Train Acc=0.7639 ||| Val Loss=0.4996, Val Acc=0.7959
Epoch 4: Train Loss=0.5249, Train Acc=0.7683 ||| Val Loss=0.4901, Val Acc=0.7976
Epoch 5: Train Loss=0.5037, Train Acc=0.7790 ||| Val Loss=0.5025, Val Acc=0.7953
Epoch 6: Train Loss=0.5065, Train Acc=0.7800 ||| Val Loss=0.5112, Val Acc=0.7884
Epoch 7: Train Loss=0.4996, Train Acc=0.7824 ||| Val Loss=0.4930, Val Acc=0.7976
Epoch 8: Train Loss=0.4924, Train Acc=0.7843 ||| Val Loss=0.4747, Val Acc=0.8005
Epoch 9: Train Loss=0.4825, Train Acc=0.7865 ||| Val Loss=0.4778, Val Acc=0.7924
Epoch 10: Train Loss=0.4826, Train Acc=0.7876 ||| Val Loss=0.4712, Val Acc=0.7913
Epoch 11: Train Loss=0.4790, Train Acc=0.7854 ||| Val Loss=0.4572, Val Acc=0.7941
Epoch 12: Train Loss=0.4621, Train Acc=0.7879 ||| Val Loss=0.4597, Val Acc=0.7947
Epoch 13: Train Loss=0.45

[I 2025-04-27 17:13:36,571] Trial 23 finished with value: 0.7993099482461185 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.1532506195603504, 'activation': 'ReLU', 'lr': 0.0016343055351461786, 'weight_decay': 2.9323631794337012e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 87: Train Loss=0.3907, Train Acc=0.8133 ||| Val Loss=0.4184, Val Acc=0.7993
Early stopping triggered at epoch 87
Validation Accuracy: 0.7993

 Trial 25 with params: {'n_blocks': 4, 'd_block': 128, 'k': 9, 'dropout': 0.17547440704922243, 'activation': 'ReLU', 'lr': 0.0032771537050493306, 'weight_decay': 7.983531635311177e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.8396, Train Acc=0.6845 ||| Val Loss=0.5044, Val Acc=0.7832
Epoch 2: Train Loss=0.5370, Train Acc=0.7577 ||| Val Loss=0.5170, Val Acc=0.7700
Epoch 3: Train Loss=0.5204, Train Acc=0.7701 ||| Val Loss=0.4927, Val Acc=0.7884
Epoch 4: Train Loss=0.5121, Train Acc=0.7754 ||| Val Loss=0.5060, Val Acc=0.7999
Epoch 5: Train Loss=0.5099, Train Acc=0.7739 ||| Val Loss=0.4773, Val Acc=0.7890
Epoch 6: Train Loss=0.4912, Train Acc=0.7829 ||| Val Loss=0.4835, Val Acc=0.7970
Epoch 7: Train Loss=0.4823, Train Acc=0.7833 ||| Val Loss=0.4528, Val Acc=0.7941
Epoch 8: Train Loss=0.4794, Train Acc=0.7824 ||| Val Loss=0.4610, Val Acc=0.7924
Epoch 9: Train Loss=0.4658, Train Acc=0.7860 ||| Val Loss=0.4493, Val Acc=0.7826
Epoch 10: Train Loss=0.4690, Train Acc=0.7893 ||| Val Loss=0.4285, Val Acc=0.7918
Epoch 11: Train Loss=0.4477, Train Acc=0.7875 ||| Val Loss=0.4267, Val Acc=0.7941
Epoch 12: Train Loss=0.4390, Train Acc=0.7893 ||| Val Loss=0.4240, Val Acc=0.7976
Epoch 13: Train Loss=0.43

[I 2025-04-27 17:13:57,584] Trial 24 finished with value: 0.8062104657849338 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 9, 'dropout': 0.17547440704922243, 'activation': 'ReLU', 'lr': 0.0032771537050493306, 'weight_decay': 7.983531635311177e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 71: Train Loss=0.3919, Train Acc=0.8072 ||| Val Loss=0.4115, Val Acc=0.8062
Early stopping triggered at epoch 71
Validation Accuracy: 0.8062

 Trial 26 with params: {'n_blocks': 4, 'd_block': 128, 'k': 8, 'dropout': 0.11034959771817207, 'activation': 'LeakyReLU', 'lr': 0.002156236699487219, 'weight_decay': 2.605006821979682e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.8040, Train Acc=0.6980 ||| Val Loss=0.5074, Val Acc=0.7901
Epoch 2: Train Loss=0.5335, Train Acc=0.7577 ||| Val Loss=0.4992, Val Acc=0.7959
Epoch 3: Train Loss=0.5187, Train Acc=0.7683 ||| Val Loss=0.4884, Val Acc=0.7924
Epoch 4: Train Loss=0.5081, Train Acc=0.7729 ||| Val Loss=0.4940, Val Acc=0.7907
Epoch 5: Train Loss=0.5050, Train Acc=0.7771 ||| Val Loss=0.4910, Val Acc=0.7890
Epoch 6: Train Loss=0.4983, Train Acc=0.7826 ||| Val Loss=0.4708, Val Acc=0.7999
Epoch 7: Train Loss=0.4844, Train Acc=0.7839 ||| Val Loss=0.4758, Val Acc=0.7941
Epoch 8: Train Loss=0.4802, Train Acc=0.7817 ||| Val Loss=0.4569, Val Acc=0.7913
Epoch 9: Train Loss=0.4672, Train Acc=0.7824 ||| Val Loss=0.4620, Val Acc=0.7964
Epoch 10: Train Loss=0.4609, Train Acc=0.7866 ||| Val Loss=0.4464, Val Acc=0.7959
Epoch 11: Train Loss=0.4465, Train Acc=0.7854 ||| Val Loss=0.4396, Val Acc=0.7947
Epoch 12: Train Loss=0.4552, Train Acc=0.7849 ||| Val Loss=0.4525, Val Acc=0.7740
Epoch 13: Train Loss=0.43

[I 2025-04-27 17:14:16,190] Trial 25 finished with value: 0.79700977573318 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 8, 'dropout': 0.11034959771817207, 'activation': 'LeakyReLU', 'lr': 0.002156236699487219, 'weight_decay': 2.605006821979682e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 66: Train Loss=0.4031, Train Acc=0.8064 ||| Val Loss=0.4166, Val Acc=0.7970
Early stopping triggered at epoch 66
Validation Accuracy: 0.7970

 Trial 27 with params: {'n_blocks': 3, 'd_block': 128, 'k': 10, 'dropout': 0.2588240720542714, 'activation': 'ReLU', 'lr': 0.0006853961787791875, 'weight_decay': 9.177083287946332e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.6575, Train Acc=0.6309 ||| Val Loss=0.5671, Val Acc=0.7671
Epoch 2: Train Loss=1.0008, Train Acc=0.6654 ||| Val Loss=0.5432, Val Acc=0.7711
Epoch 3: Train Loss=0.7500, Train Acc=0.6674 ||| Val Loss=0.5661, Val Acc=0.7476
Epoch 4: Train Loss=0.6539, Train Acc=0.6871 ||| Val Loss=0.5552, Val Acc=0.7355
Epoch 5: Train Loss=0.6054, Train Acc=0.7137 ||| Val Loss=0.5469, Val Acc=0.7671
Epoch 6: Train Loss=0.5820, Train Acc=0.7236 ||| Val Loss=0.5431, Val Acc=0.7717
Epoch 7: Train Loss=0.5634, Train Acc=0.7274 ||| Val Loss=0.5258, Val Acc=0.7809
Epoch 8: Train Loss=0.5422, Train Acc=0.7481 ||| Val Loss=0.5223, Val Acc=0.7849
Epoch 9: Train Loss=0.5462, Train Acc=0.7494 ||| Val Loss=0.5189, Val Acc=0.7826
Epoch 10: Train Loss=0.5364, Train Acc=0.7573 ||| Val Loss=0.5276, Val Acc=0.7913
Epoch 11: Train Loss=0.5323, Train Acc=0.7567 ||| Val Loss=0.5085, Val Acc=0.7901
Epoch 12: Train Loss=0.5203, Train Acc=0.7647 ||| Val Loss=0.5035, Val Acc=0.7890
Epoch 13: Train Loss=0.53

[I 2025-04-27 17:14:42,191] Trial 26 finished with value: 0.7987349051178838 and parameters: {'n_blocks': 3, 'd_block': 128, 'k': 10, 'dropout': 0.2588240720542714, 'activation': 'ReLU', 'lr': 0.0006853961787791875, 'weight_decay': 9.177083287946332e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 104: Train Loss=0.4104, Train Acc=0.8013 ||| Val Loss=0.4198, Val Acc=0.7987
Early stopping triggered at epoch 104
Validation Accuracy: 0.7987

 Trial 28 with params: {'n_blocks': 5, 'd_block': 512, 'k': 9, 'dropout': 0.2078537782803282, 'activation': 'ReLU', 'lr': 0.0014908074450939784, 'weight_decay': 2.2289399936271573e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.1752, Train Acc=0.6828 ||| Val Loss=0.5301, Val Acc=0.7849
Epoch 2: Train Loss=0.5413, Train Acc=0.7561 ||| Val Loss=0.5421, Val Acc=0.7953
Epoch 3: Train Loss=0.5329, Train Acc=0.7696 ||| Val Loss=0.5197, Val Acc=0.7884
Epoch 4: Train Loss=0.5176, Train Acc=0.7714 ||| Val Loss=0.5029, Val Acc=0.7890
Epoch 5: Train Loss=0.5094, Train Acc=0.7813 ||| Val Loss=0.4931, Val Acc=0.7901
Epoch 6: Train Loss=0.5004, Train Acc=0.7794 ||| Val Loss=0.4951, Val Acc=0.7907
Epoch 7: Train Loss=0.5028, Train Acc=0.7780 ||| Val Loss=0.4878, Val Acc=0.7941
Epoch 8: Train Loss=0.4923, Train Acc=0.7819 ||| Val Loss=0.4642, Val Acc=0.7953
Epoch 9: Train Loss=0.4919, Train Acc=0.7824 ||| Val Loss=0.4905, Val Acc=0.7844
Epoch 10: Train Loss=0.4829, Train Acc=0.7872 ||| Val Loss=0.4612, Val Acc=0.7947
Epoch 11: Train Loss=0.4877, Train Acc=0.7880 ||| Val Loss=0.4526, Val Acc=0.7959
Epoch 12: Train Loss=0.4700, Train Acc=0.7873 ||| Val Loss=0.4677, Val Acc=0.7959
Epoch 13: Train Loss=0.46

[I 2025-04-27 17:16:48,665] Trial 27 finished with value: 0.8027602070155262 and parameters: {'n_blocks': 5, 'd_block': 512, 'k': 9, 'dropout': 0.2078537782803282, 'activation': 'ReLU', 'lr': 0.0014908074450939784, 'weight_decay': 2.2289399936271573e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 113: Train Loss=0.3937, Train Acc=0.8076 ||| Val Loss=0.4129, Val Acc=0.8028
Early stopping triggered at epoch 113
Validation Accuracy: 0.8028

 Trial 29 with params: {'n_blocks': 4, 'd_block': 128, 'k': 11, 'dropout': 0.1378665338462055, 'activation': 'ReLU', 'lr': 0.012078381069982127, 'weight_decay': 1.3158998363428355e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.4054, Train Acc=0.6681 ||| Val Loss=0.5432, Val Acc=0.7487
Epoch 2: Train Loss=0.5287, Train Acc=0.7702 ||| Val Loss=0.4931, Val Acc=0.7970
Epoch 3: Train Loss=0.5081, Train Acc=0.7729 ||| Val Loss=0.4839, Val Acc=0.7936
Epoch 4: Train Loss=0.5002, Train Acc=0.7804 ||| Val Loss=0.4967, Val Acc=0.7849
Epoch 5: Train Loss=0.4915, Train Acc=0.7821 ||| Val Loss=0.4636, Val Acc=0.7993
Epoch 6: Train Loss=0.4834, Train Acc=0.7844 ||| Val Loss=0.4578, Val Acc=0.7913
Epoch 7: Train Loss=0.4696, Train Acc=0.7820 ||| Val Loss=0.4618, Val Acc=0.7757
Epoch 8: Train Loss=0.4738, Train Acc=0.7801 ||| Val Loss=0.4887, Val Acc=0.7706
Epoch 9: Train Loss=0.4705, Train Acc=0.7791 ||| Val Loss=0.4422, Val Acc=0.7936
Epoch 10: Train Loss=0.4493, Train Acc=0.7869 ||| Val Loss=0.4222, Val Acc=0.7849
Epoch 11: Train Loss=0.4477, Train Acc=0.7886 ||| Val Loss=0.4234, Val Acc=0.7878
Epoch 12: Train Loss=0.4414, Train Acc=0.7833 ||| Val Loss=0.4488, Val Acc=0.7970
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:17:06,835] Trial 28 finished with value: 0.7935595169637722 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 11, 'dropout': 0.1378665338462055, 'activation': 'ReLU', 'lr': 0.012078381069982127, 'weight_decay': 1.3158998363428355e-05}. Best is trial 22 with value: 0.8096607245543416.


Epoch 59: Train Loss=0.4156, Train Acc=0.8057 ||| Val Loss=0.4100, Val Acc=0.7936
Early stopping triggered at epoch 59
Validation Accuracy: 0.7936

 Trial 30 with params: {'n_blocks': 6, 'd_block': 128, 'k': 8, 'dropout': 0.32551897451828177, 'activation': 'LeakyReLU', 'lr': 0.004113579887730336, 'weight_decay': 5.063621078557173e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.6942, Train Acc=0.6372 ||| Val Loss=0.5515, Val Acc=0.7798
Epoch 2: Train Loss=0.5458, Train Acc=0.7552 ||| Val Loss=0.5327, Val Acc=0.7861
Epoch 3: Train Loss=0.5140, Train Acc=0.7791 ||| Val Loss=0.5042, Val Acc=0.7878
Epoch 4: Train Loss=0.5162, Train Acc=0.7752 ||| Val Loss=0.4954, Val Acc=0.7803
Epoch 5: Train Loss=0.5075, Train Acc=0.7777 ||| Val Loss=0.4759, Val Acc=0.7970
Epoch 6: Train Loss=0.5029, Train Acc=0.7785 ||| Val Loss=0.5191, Val Acc=0.7884
Epoch 7: Train Loss=0.4878, Train Acc=0.7854 ||| Val Loss=0.4668, Val Acc=0.7964
Epoch 8: Train Loss=0.4820, Train Acc=0.7846 ||| Val Loss=0.4731, Val Acc=0.7700
Epoch 9: Train Loss=0.4625, Train Acc=0.7827 ||| Val Loss=0.4378, Val Acc=0.7913
Epoch 10: Train Loss=0.4562, Train Acc=0.7859 ||| Val Loss=0.4460, Val Acc=0.7867
Epoch 11: Train Loss=0.4473, Train Acc=0.7888 ||| Val Loss=0.4555, Val Acc=0.7959
Epoch 12: Train Loss=0.4514, Train Acc=0.7906 ||| Val Loss=0.4449, Val Acc=0.7976
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:17:44,272] Trial 29 finished with value: 0.7975848188614146 and parameters: {'n_blocks': 6, 'd_block': 128, 'k': 8, 'dropout': 0.32551897451828177, 'activation': 'LeakyReLU', 'lr': 0.004113579887730336, 'weight_decay': 5.063621078557173e-06}. Best is trial 22 with value: 0.8096607245543416.


Epoch 99: Train Loss=0.4018, Train Acc=0.8118 ||| Val Loss=0.4081, Val Acc=0.7976
Early stopping triggered at epoch 99
Validation Accuracy: 0.7976

 Trial 31 with params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.27284433651928997, 'activation': 'ReLU', 'lr': 0.002029456931526584, 'weight_decay': 3.9033694841500456e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7688, Train Acc=0.6383 ||| Val Loss=0.5483, Val Acc=0.7947
Epoch 2: Train Loss=0.5522, Train Acc=0.7419 ||| Val Loss=0.5257, Val Acc=0.7734
Epoch 3: Train Loss=0.5232, Train Acc=0.7719 ||| Val Loss=0.4997, Val Acc=0.7918
Epoch 4: Train Loss=0.5193, Train Acc=0.7712 ||| Val Loss=0.4831, Val Acc=0.7953
Epoch 5: Train Loss=0.5104, Train Acc=0.7784 ||| Val Loss=0.4877, Val Acc=0.7976
Epoch 6: Train Loss=0.4969, Train Acc=0.7808 ||| Val Loss=0.4945, Val Acc=0.7993
Epoch 7: Train Loss=0.4924, Train Acc=0.7831 ||| Val Loss=0.4699, Val Acc=0.7976
Epoch 8: Train Loss=0.4845, Train Acc=0.7830 ||| Val Loss=0.4706, Val Acc=0.8005
Epoch 9: Train Loss=0.4789, Train Acc=0.7850 ||| Val Loss=0.4514, Val Acc=0.7964
Epoch 10: Train Loss=0.4743, Train Acc=0.7862 ||| Val Loss=0.4442, Val Acc=0.7964
Epoch 11: Train Loss=0.4632, Train Acc=0.7869 ||| Val Loss=0.4498, Val Acc=0.7976
Epoch 12: Train Loss=0.4541, Train Acc=0.7902 ||| Val Loss=0.4387, Val Acc=0.7982
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:18:09,071] Trial 30 finished with value: 0.8125359401955147 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.27284433651928997, 'activation': 'ReLU', 'lr': 0.002029456931526584, 'weight_decay': 3.9033694841500456e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 71: Train Loss=0.4030, Train Acc=0.8047 ||| Val Loss=0.4024, Val Acc=0.8125
Early stopping triggered at epoch 71
Validation Accuracy: 0.8125

 Trial 32 with params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.2636809099799959, 'activation': 'ReLU', 'lr': 0.0016600543074176363, 'weight_decay': 4.116320746842263e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7521, Train Acc=0.6179 ||| Val Loss=0.5466, Val Acc=0.7527
Epoch 2: Train Loss=0.5624, Train Acc=0.7271 ||| Val Loss=0.5349, Val Acc=0.7631
Epoch 3: Train Loss=0.5373, Train Acc=0.7597 ||| Val Loss=0.5057, Val Acc=0.7907
Epoch 4: Train Loss=0.5256, Train Acc=0.7685 ||| Val Loss=0.4918, Val Acc=0.7907
Epoch 5: Train Loss=0.5149, Train Acc=0.7765 ||| Val Loss=0.4907, Val Acc=0.7924
Epoch 6: Train Loss=0.5055, Train Acc=0.7798 ||| Val Loss=0.4856, Val Acc=0.7924
Epoch 7: Train Loss=0.4989, Train Acc=0.7814 ||| Val Loss=0.4832, Val Acc=0.7964
Epoch 8: Train Loss=0.4952, Train Acc=0.7830 ||| Val Loss=0.4716, Val Acc=0.7970
Epoch 9: Train Loss=0.4848, Train Acc=0.7865 ||| Val Loss=0.4656, Val Acc=0.7976
Epoch 10: Train Loss=0.4789, Train Acc=0.7840 ||| Val Loss=0.4551, Val Acc=0.7987
Epoch 11: Train Loss=0.4851, Train Acc=0.7880 ||| Val Loss=0.4693, Val Acc=0.8010
Epoch 12: Train Loss=0.4748, Train Acc=0.7862 ||| Val Loss=0.4671, Val Acc=0.7982
Epoch 13: Train Loss=0.47

[I 2025-04-27 17:18:40,463] Trial 31 finished with value: 0.7947096032202415 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.2636809099799959, 'activation': 'ReLU', 'lr': 0.0016600543074176363, 'weight_decay': 4.116320746842263e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 90: Train Loss=0.3993, Train Acc=0.8105 ||| Val Loss=0.4124, Val Acc=0.7947
Early stopping triggered at epoch 90
Validation Accuracy: 0.7947

 Trial 33 with params: {'n_blocks': 6, 'd_block': 128, 'k': 9, 'dropout': 0.3579712838405346, 'activation': 'ReLU', 'lr': 0.0009008772942833015, 'weight_decay': 1.5734642077417547e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7773, Train Acc=0.5449 ||| Val Loss=0.6395, Val Acc=0.6843
Epoch 2: Train Loss=0.6157, Train Acc=0.6746 ||| Val Loss=0.5392, Val Acc=0.7809
Epoch 3: Train Loss=0.5626, Train Acc=0.7440 ||| Val Loss=0.5239, Val Acc=0.7838
Epoch 4: Train Loss=0.5371, Train Acc=0.7624 ||| Val Loss=0.5107, Val Acc=0.7895
Epoch 5: Train Loss=0.5337, Train Acc=0.7672 ||| Val Loss=0.5112, Val Acc=0.7970
Epoch 6: Train Loss=0.5153, Train Acc=0.7768 ||| Val Loss=0.5110, Val Acc=0.7878
Epoch 7: Train Loss=0.5173, Train Acc=0.7793 ||| Val Loss=0.5042, Val Acc=0.7964
Epoch 8: Train Loss=0.5084, Train Acc=0.7784 ||| Val Loss=0.4865, Val Acc=0.7976
Epoch 9: Train Loss=0.5058, Train Acc=0.7794 ||| Val Loss=0.4893, Val Acc=0.7930
Epoch 10: Train Loss=0.4986, Train Acc=0.7790 ||| Val Loss=0.4849, Val Acc=0.7964
Epoch 11: Train Loss=0.4959, Train Acc=0.7847 ||| Val Loss=0.4856, Val Acc=0.7953
Epoch 12: Train Loss=0.4911, Train Acc=0.7827 ||| Val Loss=0.4851, Val Acc=0.7982
Epoch 13: Train Loss=0.49

[I 2025-04-27 17:19:19,095] Trial 32 finished with value: 0.7912593444508338 and parameters: {'n_blocks': 6, 'd_block': 128, 'k': 9, 'dropout': 0.3579712838405346, 'activation': 'ReLU', 'lr': 0.0009008772942833015, 'weight_decay': 1.5734642077417547e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 100: Train Loss=0.4143, Train Acc=0.7990 ||| Val Loss=0.4336, Val Acc=0.7913
Early stopping triggered at epoch 100
Validation Accuracy: 0.7913

 Trial 34 with params: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.23106814668354772, 'activation': 'ReLU', 'lr': 0.0020532734451836642, 'weight_decay': 9.746529460798676e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.8408, Train Acc=0.6452 ||| Val Loss=0.5263, Val Acc=0.7533
Epoch 2: Train Loss=0.5647, Train Acc=0.7322 ||| Val Loss=0.5413, Val Acc=0.7746
Epoch 3: Train Loss=0.5315, Train Acc=0.7633 ||| Val Loss=0.5031, Val Acc=0.7867
Epoch 4: Train Loss=0.5211, Train Acc=0.7691 ||| Val Loss=0.4939, Val Acc=0.7890
Epoch 5: Train Loss=0.5157, Train Acc=0.7751 ||| Val Loss=0.5017, Val Acc=0.7959
Epoch 6: Train Loss=0.5089, Train Acc=0.7762 ||| Val Loss=0.4814, Val Acc=0.7936
Epoch 7: Train Loss=0.5030, Train Acc=0.7767 ||| Val Loss=0.4894, Val Acc=0.7907
Epoch 8: Train Loss=0.4979, Train Acc=0.7830 ||| Val Loss=0.4777, Val Acc=0.7930
Epoch 9: Train Loss=0.4953, Train Acc=0.7798 ||| Val Loss=0.4844, Val Acc=0.7895
Epoch 10: Train Loss=0.4875, Train Acc=0.7859 ||| Val Loss=0.4746, Val Acc=0.7970
Epoch 11: Train Loss=0.4812, Train Acc=0.7846 ||| Val Loss=0.4734, Val Acc=0.7878
Epoch 12: Train Loss=0.4822, Train Acc=0.7840 ||| Val Loss=0.4670, Val Acc=0.7947
Epoch 13: Train Loss=0.47

[I 2025-04-27 17:19:49,149] Trial 33 finished with value: 0.8027602070155262 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.23106814668354772, 'activation': 'ReLU', 'lr': 0.0020532734451836642, 'weight_decay': 9.746529460798676e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 100: Train Loss=0.3882, Train Acc=0.8168 ||| Val Loss=0.4046, Val Acc=0.8028
Early stopping triggered at epoch 100
Validation Accuracy: 0.8028

 Trial 35 with params: {'n_blocks': 5, 'd_block': 128, 'k': 11, 'dropout': 0.18721197844802795, 'activation': 'ReLU', 'lr': 0.003798950203094358, 'weight_decay': 1.4278012196222145e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7500, Train Acc=0.6799 ||| Val Loss=0.5219, Val Acc=0.7918
Epoch 2: Train Loss=0.5298, Train Acc=0.7673 ||| Val Loss=0.4964, Val Acc=0.7913
Epoch 3: Train Loss=0.5153, Train Acc=0.7783 ||| Val Loss=0.5299, Val Acc=0.7711
Epoch 4: Train Loss=0.5046, Train Acc=0.7784 ||| Val Loss=0.4678, Val Acc=0.7941
Epoch 5: Train Loss=0.5014, Train Acc=0.7800 ||| Val Loss=0.4631, Val Acc=0.7953
Epoch 6: Train Loss=0.4762, Train Acc=0.7888 ||| Val Loss=0.4630, Val Acc=0.7976
Epoch 7: Train Loss=0.4866, Train Acc=0.7831 ||| Val Loss=0.4676, Val Acc=0.7930
Epoch 8: Train Loss=0.4648, Train Acc=0.7834 ||| Val Loss=0.4437, Val Acc=0.7941
Epoch 9: Train Loss=0.4569, Train Acc=0.7886 ||| Val Loss=0.4509, Val Acc=0.7901
Epoch 10: Train Loss=0.4432, Train Acc=0.7926 ||| Val Loss=0.4316, Val Acc=0.7936
Epoch 11: Train Loss=0.4471, Train Acc=0.7895 ||| Val Loss=0.4312, Val Acc=0.7970
Epoch 12: Train Loss=0.4429, Train Acc=0.7900 ||| Val Loss=0.4262, Val Acc=0.7872
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:20:23,173] Trial 34 finished with value: 0.8021851638872916 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 11, 'dropout': 0.18721197844802795, 'activation': 'ReLU', 'lr': 0.003798950203094358, 'weight_decay': 1.4278012196222145e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 95: Train Loss=0.3920, Train Acc=0.8106 ||| Val Loss=0.4140, Val Acc=0.8022
Early stopping triggered at epoch 95
Validation Accuracy: 0.8022

 Trial 36 with params: {'n_blocks': 4, 'd_block': 512, 'k': 9, 'dropout': 0.2931050500937418, 'activation': 'GELU', 'lr': 0.0009119021657581938, 'weight_decay': 3.5193322723725726e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.4273, Train Acc=0.6411 ||| Val Loss=0.5365, Val Acc=0.7930
Epoch 2: Train Loss=0.6125, Train Acc=0.7028 ||| Val Loss=0.5537, Val Acc=0.7838
Epoch 3: Train Loss=0.5651, Train Acc=0.7317 ||| Val Loss=0.5418, Val Acc=0.7499
Epoch 4: Train Loss=0.5574, Train Acc=0.7386 ||| Val Loss=0.5429, Val Acc=0.7108
Epoch 5: Train Loss=0.5407, Train Acc=0.7537 ||| Val Loss=0.5230, Val Acc=0.7890
Epoch 6: Train Loss=0.5300, Train Acc=0.7614 ||| Val Loss=0.5059, Val Acc=0.7959
Epoch 7: Train Loss=0.5215, Train Acc=0.7706 ||| Val Loss=0.4990, Val Acc=0.7872
Epoch 8: Train Loss=0.5245, Train Acc=0.7676 ||| Val Loss=0.4953, Val Acc=0.7872
Epoch 9: Train Loss=0.5192, Train Acc=0.7718 ||| Val Loss=0.4987, Val Acc=0.7936
Epoch 10: Train Loss=0.5109, Train Acc=0.7758 ||| Val Loss=0.5116, Val Acc=0.7803
Epoch 11: Train Loss=0.5102, Train Acc=0.7749 ||| Val Loss=0.4952, Val Acc=0.7815
Epoch 12: Train Loss=0.5047, Train Acc=0.7748 ||| Val Loss=0.4881, Val Acc=0.7901
Epoch 13: Train Loss=0.50

[I 2025-04-27 17:21:47,646] Trial 35 finished with value: 0.7906843013225991 and parameters: {'n_blocks': 4, 'd_block': 512, 'k': 9, 'dropout': 0.2931050500937418, 'activation': 'GELU', 'lr': 0.0009119021657581938, 'weight_decay': 3.5193322723725726e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 94: Train Loss=0.4067, Train Acc=0.7998 ||| Val Loss=0.4280, Val Acc=0.7907
Early stopping triggered at epoch 94
Validation Accuracy: 0.7907

 Trial 37 with params: {'n_blocks': 6, 'd_block': 128, 'k': 11, 'dropout': 0.15881191148404622, 'activation': 'ReLU', 'lr': 0.0012993999845610782, 'weight_decay': 8.80397444867152e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.6233, Train Acc=0.6746 ||| Val Loss=0.5401, Val Acc=0.7637
Epoch 2: Train Loss=0.5281, Train Acc=0.7650 ||| Val Loss=0.5048, Val Acc=0.7982
Epoch 3: Train Loss=0.5136, Train Acc=0.7757 ||| Val Loss=0.4887, Val Acc=0.7907
Epoch 4: Train Loss=0.5090, Train Acc=0.7781 ||| Val Loss=0.4835, Val Acc=0.7936
Epoch 5: Train Loss=0.5028, Train Acc=0.7796 ||| Val Loss=0.4851, Val Acc=0.7924
Epoch 6: Train Loss=0.4975, Train Acc=0.7847 ||| Val Loss=0.4836, Val Acc=0.7895
Epoch 7: Train Loss=0.4905, Train Acc=0.7869 ||| Val Loss=0.4759, Val Acc=0.7947
Epoch 8: Train Loss=0.4874, Train Acc=0.7873 ||| Val Loss=0.4683, Val Acc=0.7970
Epoch 9: Train Loss=0.4828, Train Acc=0.7847 ||| Val Loss=0.4688, Val Acc=0.7947
Epoch 10: Train Loss=0.4794, Train Acc=0.7882 ||| Val Loss=0.4675, Val Acc=0.7953
Epoch 11: Train Loss=0.4771, Train Acc=0.7903 ||| Val Loss=0.4589, Val Acc=0.8005
Epoch 12: Train Loss=0.4746, Train Acc=0.7898 ||| Val Loss=0.4587, Val Acc=0.7964
Epoch 13: Train Loss=0.46

[I 2025-04-27 17:22:20,095] Trial 36 finished with value: 0.8010350776308223 and parameters: {'n_blocks': 6, 'd_block': 128, 'k': 11, 'dropout': 0.15881191148404622, 'activation': 'ReLU', 'lr': 0.0012993999845610782, 'weight_decay': 8.80397444867152e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 77: Train Loss=0.4013, Train Acc=0.8043 ||| Val Loss=0.4125, Val Acc=0.8010
Early stopping triggered at epoch 77
Validation Accuracy: 0.8010

 Trial 38 with params: {'n_blocks': 5, 'd_block': 128, 'k': 8, 'dropout': 0.2593541836743758, 'activation': 'GELU', 'lr': 0.00037684303782844916, 'weight_decay': 2.1963319714100293e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.9892, Train Acc=0.5677 ||| Val Loss=0.5782, Val Acc=0.7453
Epoch 2: Train Loss=0.6676, Train Acc=0.6349 ||| Val Loss=0.5522, Val Acc=0.7424
Epoch 3: Train Loss=0.6148, Train Acc=0.6776 ||| Val Loss=0.5354, Val Acc=0.7464
Epoch 4: Train Loss=0.5921, Train Acc=0.6967 ||| Val Loss=0.5298, Val Acc=0.7769
Epoch 5: Train Loss=0.5613, Train Acc=0.7265 ||| Val Loss=0.5244, Val Acc=0.7654
Epoch 6: Train Loss=0.5577, Train Acc=0.7383 ||| Val Loss=0.5136, Val Acc=0.7832
Epoch 7: Train Loss=0.5501, Train Acc=0.7419 ||| Val Loss=0.5100, Val Acc=0.7895
Epoch 8: Train Loss=0.5429, Train Acc=0.7524 ||| Val Loss=0.5143, Val Acc=0.7895
Epoch 9: Train Loss=0.5341, Train Acc=0.7548 ||| Val Loss=0.5017, Val Acc=0.7936
Epoch 10: Train Loss=0.5261, Train Acc=0.7611 ||| Val Loss=0.5016, Val Acc=0.7895
Epoch 11: Train Loss=0.5226, Train Acc=0.7657 ||| Val Loss=0.5047, Val Acc=0.7924
Epoch 12: Train Loss=0.5235, Train Acc=0.7663 ||| Val Loss=0.4990, Val Acc=0.7872
Epoch 13: Train Loss=0.51

[I 2025-04-27 17:23:05,443] Trial 37 finished with value: 0.7924094307073031 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 8, 'dropout': 0.2593541836743758, 'activation': 'GELU', 'lr': 0.00037684303782844916, 'weight_decay': 2.1963319714100293e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 122: Train Loss=0.4054, Train Acc=0.8005 ||| Val Loss=0.4283, Val Acc=0.7924
Early stopping triggered at epoch 122
Validation Accuracy: 0.7924

 Trial 39 with params: {'n_blocks': 4, 'd_block': 128, 'k': 6, 'dropout': 0.2050635318633716, 'activation': 'ReLU', 'lr': 0.008554855242641284, 'weight_decay': 1.8983572014132025e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.7192, Train Acc=0.6734 ||| Val Loss=0.5116, Val Acc=0.7780
Epoch 2: Train Loss=0.5370, Train Acc=0.7610 ||| Val Loss=0.5140, Val Acc=0.7901
Epoch 3: Train Loss=0.5296, Train Acc=0.7660 ||| Val Loss=0.5206, Val Acc=0.7752
Epoch 4: Train Loss=0.5118, Train Acc=0.7744 ||| Val Loss=0.4988, Val Acc=0.7867
Epoch 5: Train Loss=0.4954, Train Acc=0.7807 ||| Val Loss=0.4656, Val Acc=0.8016
Epoch 6: Train Loss=0.4848, Train Acc=0.7803 ||| Val Loss=0.4648, Val Acc=0.7959
Epoch 7: Train Loss=0.4699, Train Acc=0.7816 ||| Val Loss=0.4343, Val Acc=0.7918
Epoch 8: Train Loss=0.4554, Train Acc=0.7873 ||| Val Loss=0.4616, Val Acc=0.7947
Epoch 9: Train Loss=0.4586, Train Acc=0.7834 ||| Val Loss=0.4420, Val Acc=0.7947
Epoch 10: Train Loss=0.4577, Train Acc=0.7867 ||| Val Loss=0.4373, Val Acc=0.7953
Epoch 11: Train Loss=0.4483, Train Acc=0.7931 ||| Val Loss=0.4419, Val Acc=0.7890
Epoch 12: Train Loss=0.4466, Train Acc=0.7883 ||| Val Loss=0.4186, Val Acc=0.7959
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:23:23,766] Trial 38 finished with value: 0.8016101207590569 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 6, 'dropout': 0.2050635318633716, 'activation': 'ReLU', 'lr': 0.008554855242641284, 'weight_decay': 1.8983572014132025e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 69: Train Loss=0.4137, Train Acc=0.8037 ||| Val Loss=0.4171, Val Acc=0.8016
Early stopping triggered at epoch 69
Validation Accuracy: 0.8016

 Trial 40 with params: {'n_blocks': 3, 'd_block': 512, 'k': 10, 'dropout': 0.31642719264252867, 'activation': 'LeakyReLU', 'lr': 0.0031445137761433057, 'weight_decay': 6.361729222149192e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=3.6438, Train Acc=0.6677 ||| Val Loss=0.5723, Val Acc=0.7085
Epoch 2: Train Loss=0.5745, Train Acc=0.7315 ||| Val Loss=0.5160, Val Acc=0.7786
Epoch 3: Train Loss=0.5542, Train Acc=0.7496 ||| Val Loss=0.5096, Val Acc=0.7723
Epoch 4: Train Loss=0.5532, Train Acc=0.7541 ||| Val Loss=0.4999, Val Acc=0.7849
Epoch 5: Train Loss=0.5455, Train Acc=0.7545 ||| Val Loss=0.4987, Val Acc=0.7809
Epoch 6: Train Loss=0.5301, Train Acc=0.7624 ||| Val Loss=0.4965, Val Acc=0.7838
Epoch 7: Train Loss=0.5231, Train Acc=0.7673 ||| Val Loss=0.4909, Val Acc=0.7872
Epoch 8: Train Loss=0.5191, Train Acc=0.7722 ||| Val Loss=0.4822, Val Acc=0.7913
Epoch 9: Train Loss=0.5171, Train Acc=0.7686 ||| Val Loss=0.4850, Val Acc=0.8010
Epoch 10: Train Loss=0.5147, Train Acc=0.7696 ||| Val Loss=0.4870, Val Acc=0.7803
Epoch 11: Train Loss=0.5020, Train Acc=0.7739 ||| Val Loss=0.4649, Val Acc=0.7947
Epoch 12: Train Loss=0.4953, Train Acc=0.7758 ||| Val Loss=0.4546, Val Acc=0.7993
Epoch 13: Train Loss=0.48

[I 2025-04-27 17:23:58,908] Trial 39 finished with value: 0.7987349051178838 and parameters: {'n_blocks': 3, 'd_block': 512, 'k': 10, 'dropout': 0.31642719264252867, 'activation': 'LeakyReLU', 'lr': 0.0031445137761433057, 'weight_decay': 6.361729222149192e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 50: Train Loss=0.4191, Train Acc=0.7970 ||| Val Loss=0.4146, Val Acc=0.7987
Early stopping triggered at epoch 50
Validation Accuracy: 0.7987

 Trial 41 with params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.1345066994362784, 'activation': 'GELU', 'lr': 0.02433477950478476, 'weight_decay': 0.00018530246895455223}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=3.7900, Train Acc=0.5453 ||| Val Loss=0.6344, Val Acc=0.6147
Epoch 2: Train Loss=0.6608, Train Acc=0.6478 ||| Val Loss=0.5443, Val Acc=0.7389
Epoch 3: Train Loss=0.5700, Train Acc=0.7309 ||| Val Loss=0.5240, Val Acc=0.7763
Epoch 4: Train Loss=0.5467, Train Acc=0.7440 ||| Val Loss=0.5059, Val Acc=0.7941
Epoch 5: Train Loss=0.5259, Train Acc=0.7696 ||| Val Loss=0.4952, Val Acc=0.7849
Epoch 6: Train Loss=0.5092, Train Acc=0.7777 ||| Val Loss=0.4889, Val Acc=0.7832
Epoch 7: Train Loss=0.4985, Train Acc=0.7827 ||| Val Loss=0.4793, Val Acc=0.7907
Epoch 8: Train Loss=0.4986, Train Acc=0.7846 ||| Val Loss=0.4638, Val Acc=0.7907
Epoch 9: Train Loss=0.4891, Train Acc=0.7854 ||| Val Loss=0.4758, Val Acc=0.7901
Epoch 10: Train Loss=0.4912, Train Acc=0.7831 ||| Val Loss=0.4891, Val Acc=0.7878
Epoch 11: Train Loss=0.4965, Train Acc=0.7856 ||| Val Loss=0.4744, Val Acc=0.7953
Epoch 12: Train Loss=0.4944, Train Acc=0.7859 ||| Val Loss=0.4751, Val Acc=0.7987
Epoch 13: Train Loss=0.49

[I 2025-04-27 17:24:12,568] Trial 40 finished with value: 0.7929844738355377 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.1345066994362784, 'activation': 'GELU', 'lr': 0.02433477950478476, 'weight_decay': 0.00018530246895455223}. Best is trial 30 with value: 0.8125359401955147.


Epoch 36: Train Loss=0.4815, Train Acc=0.7895 ||| Val Loss=0.4741, Val Acc=0.7930
Early stopping triggered at epoch 36
Validation Accuracy: 0.7930

 Trial 42 with params: {'n_blocks': 4, 'd_block': 128, 'k': 9, 'dropout': 0.17206160024841372, 'activation': 'ReLU', 'lr': 0.00508239905977638, 'weight_decay': 7.436342540113737e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=1.0352, Train Acc=0.6726 ||| Val Loss=0.5231, Val Acc=0.7884
Epoch 2: Train Loss=0.5394, Train Acc=0.7636 ||| Val Loss=0.5198, Val Acc=0.7884
Epoch 3: Train Loss=0.5148, Train Acc=0.7754 ||| Val Loss=0.4968, Val Acc=0.7855
Epoch 4: Train Loss=0.5150, Train Acc=0.7783 ||| Val Loss=0.4955, Val Acc=0.7815
Epoch 5: Train Loss=0.4910, Train Acc=0.7819 ||| Val Loss=0.4741, Val Acc=0.7993
Epoch 6: Train Loss=0.4909, Train Acc=0.7842 ||| Val Loss=0.4650, Val Acc=0.7947
Epoch 7: Train Loss=0.4662, Train Acc=0.7913 ||| Val Loss=0.4518, Val Acc=0.7918
Epoch 8: Train Loss=0.4670, Train Acc=0.7902 ||| Val Loss=0.4396, Val Acc=0.7953
Epoch 9: Train Loss=0.4537, Train Acc=0.7865 ||| Val Loss=0.4276, Val Acc=0.7953
Epoch 10: Train Loss=0.4514, Train Acc=0.7890 ||| Val Loss=0.4284, Val Acc=0.7936
Epoch 11: Train Loss=0.4411, Train Acc=0.7900 ||| Val Loss=0.4442, Val Acc=0.7976
Epoch 12: Train Loss=0.4462, Train Acc=0.7831 ||| Val Loss=0.4264, Val Acc=0.7964
Epoch 13: Train Loss=0.43

[I 2025-04-27 17:24:34,840] Trial 41 finished with value: 0.8021851638872916 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 9, 'dropout': 0.17206160024841372, 'activation': 'ReLU', 'lr': 0.00508239905977638, 'weight_decay': 7.436342540113737e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 77: Train Loss=0.3963, Train Acc=0.8135 ||| Val Loss=0.4115, Val Acc=0.8022
Early stopping triggered at epoch 77
Validation Accuracy: 0.8022

 Trial 43 with params: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.1762118093370013, 'activation': 'ReLU', 'lr': 0.002904932626054875, 'weight_decay': 1.650298134436485e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.8907, Train Acc=0.6816 ||| Val Loss=0.5306, Val Acc=0.7918
Epoch 2: Train Loss=0.5367, Train Acc=0.7577 ||| Val Loss=0.5130, Val Acc=0.7815
Epoch 3: Train Loss=0.5265, Train Acc=0.7686 ||| Val Loss=0.5052, Val Acc=0.7913
Epoch 4: Train Loss=0.5159, Train Acc=0.7695 ||| Val Loss=0.4948, Val Acc=0.7924
Epoch 5: Train Loss=0.5109, Train Acc=0.7767 ||| Val Loss=0.5036, Val Acc=0.7947
Epoch 6: Train Loss=0.5013, Train Acc=0.7783 ||| Val Loss=0.4817, Val Acc=0.7941
Epoch 7: Train Loss=0.4958, Train Acc=0.7840 ||| Val Loss=0.4765, Val Acc=0.7936
Epoch 8: Train Loss=0.4878, Train Acc=0.7839 ||| Val Loss=0.4809, Val Acc=0.7924
Epoch 9: Train Loss=0.4859, Train Acc=0.7827 ||| Val Loss=0.4797, Val Acc=0.7884
Epoch 10: Train Loss=0.4819, Train Acc=0.7824 ||| Val Loss=0.4710, Val Acc=0.7901
Epoch 11: Train Loss=0.4782, Train Acc=0.7857 ||| Val Loss=0.4551, Val Acc=0.7976
Epoch 12: Train Loss=0.4775, Train Acc=0.7839 ||| Val Loss=0.4714, Val Acc=0.7987
Epoch 13: Train Loss=0.46

[I 2025-04-27 17:24:52,194] Trial 42 finished with value: 0.7998849913743531 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.1762118093370013, 'activation': 'ReLU', 'lr': 0.002904932626054875, 'weight_decay': 1.650298134436485e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 58: Train Loss=0.4094, Train Acc=0.8018 ||| Val Loss=0.4175, Val Acc=0.7999
Early stopping triggered at epoch 58
Validation Accuracy: 0.7999

 Trial 44 with params: {'n_blocks': 4, 'd_block': 128, 'k': 8, 'dropout': 0.23337714009542354, 'activation': 'ReLU', 'lr': 0.0010609206170323917, 'weight_decay': 3.2282782714111487e-06}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.9321, Train Acc=0.6401 ||| Val Loss=0.5506, Val Acc=0.7188
Epoch 2: Train Loss=0.5858, Train Acc=0.7131 ||| Val Loss=0.5199, Val Acc=0.7832
Epoch 3: Train Loss=0.5566, Train Acc=0.7412 ||| Val Loss=0.5200, Val Acc=0.7930
Epoch 4: Train Loss=0.5380, Train Acc=0.7535 ||| Val Loss=0.5097, Val Acc=0.7913
Epoch 5: Train Loss=0.5209, Train Acc=0.7688 ||| Val Loss=0.4952, Val Acc=0.7970
Epoch 6: Train Loss=0.5186, Train Acc=0.7702 ||| Val Loss=0.4944, Val Acc=0.7941
Epoch 7: Train Loss=0.5065, Train Acc=0.7774 ||| Val Loss=0.4920, Val Acc=0.7941
Epoch 8: Train Loss=0.5097, Train Acc=0.7771 ||| Val Loss=0.4864, Val Acc=0.7941
Epoch 9: Train Loss=0.5039, Train Acc=0.7794 ||| Val Loss=0.4826, Val Acc=0.7947
Epoch 10: Train Loss=0.4949, Train Acc=0.7806 ||| Val Loss=0.4906, Val Acc=0.7890
Epoch 11: Train Loss=0.4904, Train Acc=0.7831 ||| Val Loss=0.4719, Val Acc=0.7964
Epoch 12: Train Loss=0.4876, Train Acc=0.7847 ||| Val Loss=0.4645, Val Acc=0.8005
Epoch 13: Train Loss=0.47

[I 2025-04-27 17:25:19,887] Trial 43 finished with value: 0.7993099482461185 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 8, 'dropout': 0.23337714009542354, 'activation': 'ReLU', 'lr': 0.0010609206170323917, 'weight_decay': 3.2282782714111487e-06}. Best is trial 30 with value: 0.8125359401955147.


Epoch 97: Train Loss=0.4052, Train Acc=0.8026 ||| Val Loss=0.4123, Val Acc=0.7993
Early stopping triggered at epoch 97
Validation Accuracy: 0.7993

 Trial 45 with params: {'n_blocks': 3, 'd_block': 128, 'k': 7, 'dropout': 0.14698338733891037, 'activation': 'ReLU', 'lr': 0.0020416279301639826, 'weight_decay': 5.4653696621886695e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.2274, Train Acc=0.6639 ||| Val Loss=0.5437, Val Acc=0.7487
Epoch 2: Train Loss=0.5985, Train Acc=0.7212 ||| Val Loss=0.5296, Val Acc=0.7861
Epoch 3: Train Loss=0.5532, Train Acc=0.7459 ||| Val Loss=0.5162, Val Acc=0.7838
Epoch 4: Train Loss=0.5278, Train Acc=0.7639 ||| Val Loss=0.5099, Val Acc=0.7930
Epoch 5: Train Loss=0.5237, Train Acc=0.7683 ||| Val Loss=0.4954, Val Acc=0.7901
Epoch 6: Train Loss=0.5090, Train Acc=0.7764 ||| Val Loss=0.4965, Val Acc=0.7907
Epoch 7: Train Loss=0.5022, Train Acc=0.7778 ||| Val Loss=0.5013, Val Acc=0.7913
Epoch 8: Train Loss=0.5053, Train Acc=0.7797 ||| Val Loss=0.4901, Val Acc=0.7907
Epoch 9: Train Loss=0.4983, Train Acc=0.7816 ||| Val Loss=0.4899, Val Acc=0.7941
Epoch 10: Train Loss=0.5010, Train Acc=0.7843 ||| Val Loss=0.4951, Val Acc=0.7838
Epoch 11: Train Loss=0.4949, Train Acc=0.7826 ||| Val Loss=0.4771, Val Acc=0.7907
Epoch 12: Train Loss=0.4920, Train Acc=0.7837 ||| Val Loss=0.4764, Val Acc=0.7867
Epoch 13: Train Loss=0.49

[I 2025-04-27 17:25:45,540] Trial 44 finished with value: 0.7952846463484762 and parameters: {'n_blocks': 3, 'd_block': 128, 'k': 7, 'dropout': 0.14698338733891037, 'activation': 'ReLU', 'lr': 0.0020416279301639826, 'weight_decay': 5.4653696621886695e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 112: Train Loss=0.4033, Train Acc=0.8017 ||| Val Loss=0.4087, Val Acc=0.7953
Early stopping triggered at epoch 112
Validation Accuracy: 0.7953

 Trial 46 with params: {'n_blocks': 4, 'd_block': 128, 'k': 9, 'dropout': 0.19903942029752023, 'activation': 'ReLU', 'lr': 0.003389970772327613, 'weight_decay': 1.2052973692351311e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.9801, Train Acc=0.6763 ||| Val Loss=0.5230, Val Acc=0.7936
Epoch 2: Train Loss=0.5403, Train Acc=0.7580 ||| Val Loss=0.4905, Val Acc=0.7930
Epoch 3: Train Loss=0.5186, Train Acc=0.7729 ||| Val Loss=0.4998, Val Acc=0.7982
Epoch 4: Train Loss=0.5105, Train Acc=0.7752 ||| Val Loss=0.5062, Val Acc=0.7895
Epoch 5: Train Loss=0.4990, Train Acc=0.7826 ||| Val Loss=0.5023, Val Acc=0.7803
Epoch 6: Train Loss=0.4908, Train Acc=0.7837 ||| Val Loss=0.4642, Val Acc=0.7936
Epoch 7: Train Loss=0.4797, Train Acc=0.7846 ||| Val Loss=0.4523, Val Acc=0.7999
Epoch 8: Train Loss=0.4680, Train Acc=0.7896 ||| Val Loss=0.4434, Val Acc=0.8016
Epoch 9: Train Loss=0.4623, Train Acc=0.7888 ||| Val Loss=0.4425, Val Acc=0.7953
Epoch 10: Train Loss=0.4567, Train Acc=0.7886 ||| Val Loss=0.4577, Val Acc=0.7878
Epoch 11: Train Loss=0.4517, Train Acc=0.7872 ||| Val Loss=0.4524, Val Acc=0.7878
Epoch 12: Train Loss=0.4448, Train Acc=0.7873 ||| Val Loss=0.4397, Val Acc=0.7947
Epoch 13: Train Loss=0.44

[I 2025-04-27 17:26:09,025] Trial 45 finished with value: 0.8050603795284647 and parameters: {'n_blocks': 4, 'd_block': 128, 'k': 9, 'dropout': 0.19903942029752023, 'activation': 'ReLU', 'lr': 0.003389970772327613, 'weight_decay': 1.2052973692351311e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 81: Train Loss=0.4007, Train Acc=0.8085 ||| Val Loss=0.4102, Val Acc=0.8051
Early stopping triggered at epoch 81
Validation Accuracy: 0.8051

 Trial 47 with params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.24359651545595912, 'activation': 'ReLU', 'lr': 0.005323721854063274, 'weight_decay': 3.296597682039308e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7175, Train Acc=0.6944 ||| Val Loss=0.5583, Val Acc=0.6929
Epoch 2: Train Loss=0.5412, Train Acc=0.7573 ||| Val Loss=0.5082, Val Acc=0.7872
Epoch 3: Train Loss=0.5206, Train Acc=0.7725 ||| Val Loss=0.5023, Val Acc=0.7844
Epoch 4: Train Loss=0.5221, Train Acc=0.7738 ||| Val Loss=0.4895, Val Acc=0.7861
Epoch 5: Train Loss=0.5046, Train Acc=0.7807 ||| Val Loss=0.5049, Val Acc=0.7895
Epoch 6: Train Loss=0.5063, Train Acc=0.7790 ||| Val Loss=0.4690, Val Acc=0.7970
Epoch 7: Train Loss=0.4952, Train Acc=0.7842 ||| Val Loss=0.4681, Val Acc=0.7947
Epoch 8: Train Loss=0.4938, Train Acc=0.7839 ||| Val Loss=0.4791, Val Acc=0.7901
Epoch 9: Train Loss=0.4828, Train Acc=0.7859 ||| Val Loss=0.4595, Val Acc=0.7953
Epoch 10: Train Loss=0.4855, Train Acc=0.7879 ||| Val Loss=0.4689, Val Acc=0.7987
Epoch 11: Train Loss=0.4773, Train Acc=0.7899 ||| Val Loss=0.4536, Val Acc=0.7976
Epoch 12: Train Loss=0.4734, Train Acc=0.7875 ||| Val Loss=0.4709, Val Acc=0.7918
Epoch 13: Train Loss=0.47

[I 2025-04-27 17:26:42,161] Trial 46 finished with value: 0.8067855089131685 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.24359651545595912, 'activation': 'ReLU', 'lr': 0.005323721854063274, 'weight_decay': 3.296597682039308e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 94: Train Loss=0.4091, Train Acc=0.8082 ||| Val Loss=0.4022, Val Acc=0.8068
Early stopping triggered at epoch 94
Validation Accuracy: 0.8068

 Trial 48 with params: {'n_blocks': 5, 'd_block': 512, 'k': 10, 'dropout': 0.27763529693502553, 'activation': 'ReLU', 'lr': 0.005590800833258202, 'weight_decay': 2.9196429704611143e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=2.4735, Train Acc=0.6799 ||| Val Loss=0.5728, Val Acc=0.7780
Epoch 2: Train Loss=0.5665, Train Acc=0.7449 ||| Val Loss=0.5273, Val Acc=0.7798
Epoch 3: Train Loss=0.5511, Train Acc=0.7463 ||| Val Loss=0.5349, Val Acc=0.7970
Epoch 4: Train Loss=0.5372, Train Acc=0.7632 ||| Val Loss=0.5066, Val Acc=0.7872
Epoch 5: Train Loss=0.5364, Train Acc=0.7611 ||| Val Loss=0.5078, Val Acc=0.7775
Epoch 6: Train Loss=0.5224, Train Acc=0.7734 ||| Val Loss=0.4972, Val Acc=0.7895
Epoch 7: Train Loss=0.5323, Train Acc=0.7649 ||| Val Loss=0.5504, Val Acc=0.7826
Epoch 8: Train Loss=0.5138, Train Acc=0.7773 ||| Val Loss=0.5704, Val Acc=0.6722
Epoch 9: Train Loss=0.5049, Train Acc=0.7817 ||| Val Loss=0.4776, Val Acc=0.7982
Epoch 10: Train Loss=0.4983, Train Acc=0.7821 ||| Val Loss=0.4783, Val Acc=0.7913
Epoch 11: Train Loss=0.4919, Train Acc=0.7865 ||| Val Loss=0.4675, Val Acc=0.7987
Epoch 12: Train Loss=0.4932, Train Acc=0.7817 ||| Val Loss=0.4755, Val Acc=0.7861
Epoch 13: Train Loss=0.50

[I 2025-04-27 17:29:17,956] Trial 47 finished with value: 0.7906843013225991 and parameters: {'n_blocks': 5, 'd_block': 512, 'k': 10, 'dropout': 0.27763529693502553, 'activation': 'ReLU', 'lr': 0.005590800833258202, 'weight_decay': 2.9196429704611143e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 133: Train Loss=0.4303, Train Acc=0.7908 ||| Val Loss=0.4174, Val Acc=0.7907
Early stopping triggered at epoch 133
Validation Accuracy: 0.7907

 Trial 49 with params: {'n_blocks': 6, 'd_block': 128, 'k': 11, 'dropout': 0.24214223333962978, 'activation': 'GELU', 'lr': 0.0007105600562841395, 'weight_decay': 9.134999940592513e-05}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.6737, Train Acc=0.6166 ||| Val Loss=0.5424, Val Acc=0.7677
Epoch 2: Train Loss=0.5479, Train Acc=0.7446 ||| Val Loss=0.5256, Val Acc=0.7987
Epoch 3: Train Loss=0.5301, Train Acc=0.7699 ||| Val Loss=0.5119, Val Acc=0.7987
Epoch 4: Train Loss=0.5195, Train Acc=0.7757 ||| Val Loss=0.4976, Val Acc=0.7867
Epoch 5: Train Loss=0.5082, Train Acc=0.7813 ||| Val Loss=0.4850, Val Acc=0.7982
Epoch 6: Train Loss=0.5070, Train Acc=0.7830 ||| Val Loss=0.4876, Val Acc=0.7959
Epoch 7: Train Loss=0.4988, Train Acc=0.7840 ||| Val Loss=0.4840, Val Acc=0.7930
Epoch 8: Train Loss=0.4984, Train Acc=0.7813 ||| Val Loss=0.4932, Val Acc=0.7982
Epoch 9: Train Loss=0.4926, Train Acc=0.7852 ||| Val Loss=0.4862, Val Acc=0.7930
Epoch 10: Train Loss=0.4936, Train Acc=0.7856 ||| Val Loss=0.4775, Val Acc=0.7947
Epoch 11: Train Loss=0.4918, Train Acc=0.7833 ||| Val Loss=0.4893, Val Acc=0.7907
Epoch 12: Train Loss=0.4849, Train Acc=0.7831 ||| Val Loss=0.4758, Val Acc=0.7953
Epoch 13: Train Loss=0.48

[I 2025-04-27 17:29:57,685] Trial 48 finished with value: 0.7912593444508338 and parameters: {'n_blocks': 6, 'd_block': 128, 'k': 11, 'dropout': 0.24214223333962978, 'activation': 'GELU', 'lr': 0.0007105600562841395, 'weight_decay': 9.134999940592513e-05}. Best is trial 30 with value: 0.8125359401955147.


Epoch 87: Train Loss=0.4329, Train Acc=0.7980 ||| Val Loss=0.4451, Val Acc=0.7913
Early stopping triggered at epoch 87
Validation Accuracy: 0.7913

 Trial 50 with params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.34994928353452004, 'activation': 'ReLU', 'lr': 0.006338294096152983, 'weight_decay': 0.001276334155072742}


  'lr': trial.suggest_loguniform('lr', 1e-4, 5e-2),                   # learning rate (log scale)
  'weight_decay': trial.suggest_loguniform('weight_decay', 1e-6, 1e-2) # optimizer regularization


Epoch 1: Train Loss=0.7626, Train Acc=0.6923 ||| Val Loss=0.5103, Val Acc=0.7815
Epoch 2: Train Loss=0.5301, Train Acc=0.7752 ||| Val Loss=0.4980, Val Acc=0.7913
Epoch 3: Train Loss=0.5173, Train Acc=0.7791 ||| Val Loss=0.5021, Val Acc=0.7907
Epoch 4: Train Loss=0.5181, Train Acc=0.7787 ||| Val Loss=0.4901, Val Acc=0.7907
Epoch 5: Train Loss=0.5172, Train Acc=0.7791 ||| Val Loss=0.4970, Val Acc=0.7855
Epoch 6: Train Loss=0.5246, Train Acc=0.7751 ||| Val Loss=0.4902, Val Acc=0.7913
Epoch 7: Train Loss=0.5176, Train Acc=0.7787 ||| Val Loss=0.4977, Val Acc=0.7832
Epoch 8: Train Loss=0.5210, Train Acc=0.7811 ||| Val Loss=0.5178, Val Acc=0.7872
Epoch 9: Train Loss=0.5189, Train Acc=0.7771 ||| Val Loss=0.4955, Val Acc=0.7884
Epoch 10: Train Loss=0.5169, Train Acc=0.7785 ||| Val Loss=0.4996, Val Acc=0.7867
Epoch 11: Train Loss=0.5159, Train Acc=0.7775 ||| Val Loss=0.4863, Val Acc=0.7878
Epoch 12: Train Loss=0.5185, Train Acc=0.7797 ||| Val Loss=0.4898, Val Acc=0.7878
Epoch 13: Train Loss=0.51

[I 2025-04-27 17:30:17,617] Trial 49 finished with value: 0.7924094307073031 and parameters: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.34994928353452004, 'activation': 'ReLU', 'lr': 0.006338294096152983, 'weight_decay': 0.001276334155072742}. Best is trial 30 with value: 0.8125359401955147.


Epoch 59: Train Loss=0.5121, Train Acc=0.7856 ||| Val Loss=0.4849, Val Acc=0.7924
Early stopping triggered at epoch 59
Validation Accuracy: 0.7924

Best Trial Params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.27284433651928997, 'activation': 'ReLU', 'lr': 0.002029456931526584, 'weight_decay': 3.9033694841500456e-06}
Best Validation Accuracy: 0.8125


In [None]:
# sort the results based on performance and show best parameter combinations
best_params = sorted(results, key=lambda x: x[1], reverse=True)

print("\nTop 5 parameter combinations:") # print the top 5 parameter combinations
for i, (params, acc) in enumerate(best_params[:5], start=1):
    print(f"{i}. Val Accuracy: {acc:.4f} | Params: {params}")


Top 5 parameter combinations:
1. Val Accuracy: 0.8125 | Params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.27284433651928997, 'activation': 'ReLU', 'lr': 0.002029456931526584, 'weight_decay': 3.9033694841500456e-06}
2. Val Accuracy: 0.8097 | Params: {'n_blocks': 4, 'd_block': 128, 'k': 10, 'dropout': 0.15984630203331762, 'activation': 'ReLU', 'lr': 0.0014769364639454208, 'weight_decay': 9.200748737757358e-06}
3. Val Accuracy: 0.8068 | Params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.22242292358131166, 'activation': 'ReLU', 'lr': 0.004070611397840372, 'weight_decay': 4.672432428590436e-06}
4. Val Accuracy: 0.8068 | Params: {'n_blocks': 5, 'd_block': 128, 'k': 9, 'dropout': 0.24359651545595912, 'activation': 'ReLU', 'lr': 0.005323721854063274, 'weight_decay': 3.296597682039308e-05}
5. Val Accuracy: 0.8062 | Params: {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.2256010105258364, 'activation': 'ReLU', 'lr': 0.003487671318826522, 'weight_decay': 6.99820250497

## Results

- After 30 trials the best accuracy attained was 0.8125, which is on par/slightly worse than a tuned CatModel, but better than most of the ensemble models, but not quite as good as I expected, but we still have not trained the model on the full dataset which will likely increase it's accuracy.

From here we train this model in preperation for the submssion on the full training set, and generate our predictions on the given test data and submit.

In [None]:
from sklearn.model_selection import StratifiedKFold

best = best_params[0][0] # get our best found parameters
print("Our parameters: ", best)

# prepare the full training dataset
X_tensor = torch.tensor(df_X.values, dtype=torch.float32)
Y_tensor = torch.tensor(df_Y.values, dtype=torch.long)

# The cross-validation setup
n_splits = 5 # amount of folds
strat_k_fold = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=42)

cv_scores = [] # contains our

# the k fold loop
for fold_idx, (train_idx, val_idx) in enumerate(strat_k_fold.split(X_tensor, Y_tensor)):
    print(f"\n ||| Fold number: {fold_idx + 1} / {n_splits} |||")

    # we split into this folds training and validations subsets
    X_train_fold = X_tensor[train_idx]
    y_train_fold = Y_tensor[train_idx]
    X_val_fold = X_tensor[val_idx]
    y_val_fold = Y_tensor[val_idx]

    # create the DataLoaders
    train_dataset = TensorDataset(X_train_fold, y_train_fold)
    val_dataset = TensorDataset(X_val_fold, y_val_fold)

    train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True) # larger batch sizes to combat overfitting
    val_loader = DataLoader(val_dataset, batch_size=256, shuffle=False)
    # evaluate on validation set
    
    # train the model for this fold
    fold_model = run_tabm(**best, n_epochs=400)
    
    fold_model.eval()  # turn off dropout, etc.

    val_preds = [] # predictions
    val_targets = [] # true labels

    with torch.no_grad():  # disables gradient tracking
        for xb, yb in val_loader:  # use your validation loader
            outputs = fold_model(xb, None)
            preds = outputs.mean(dim=1).argmax(dim=1) # generate final prediction from the k experts
            val_preds.append(preds)
            val_targets.append(yb)

    val_preds = torch.cat(val_preds)
    val_targets = torch.cat(val_targets)

    # calculate this folds validation accuracy
    val_acc = accuracy_score(val_targets.cpu(), val_preds.cpu())
    print(f"Fold {fold_idx + 1} Validation Accuracy: {val_acc:.4f}")
    cv_scores.append(val_acc)


print("\n--- Cross-Validation Results ---")
print(f"Mean Validation Accuracy: {sum(cv_scores)/len(cv_scores):.4f}")
print(f"Validation Accuracies for each fold: {cv_scores}")

Our parameters:  {'n_blocks': 5, 'd_block': 128, 'k': 10, 'dropout': 0.27284433651928997, 'activation': 'ReLU', 'lr': 0.002029456931526584, 'weight_decay': 3.9033694841500456e-06}

 ||| Fold number: 1 / 5 |||
Epoch 1: Train Loss=0.9427, Train Acc=0.5552 ||| Val Loss=0.6188, Val Acc=0.7694
Epoch 2: Train Loss=0.6125, Train Acc=0.6762 ||| Val Loss=0.5430, Val Acc=0.7499
Epoch 3: Train Loss=0.5556, Train Acc=0.7327 ||| Val Loss=0.5254, Val Acc=0.7924
Epoch 4: Train Loss=0.5386, Train Acc=0.7498 ||| Val Loss=0.5030, Val Acc=0.7964
Epoch 5: Train Loss=0.5270, Train Acc=0.7529 ||| Val Loss=0.5024, Val Acc=0.7918
Epoch 6: Train Loss=0.5208, Train Acc=0.7714 ||| Val Loss=0.4957, Val Acc=0.7999
Epoch 7: Train Loss=0.5182, Train Acc=0.7741 ||| Val Loss=0.4906, Val Acc=0.7959
Epoch 8: Train Loss=0.5059, Train Acc=0.7819 ||| Val Loss=0.4880, Val Acc=0.7970
Epoch 9: Train Loss=0.5072, Train Acc=0.7787 ||| Val Loss=0.5041, Val Acc=0.7941
Epoch 10: Train Loss=0.5073, Train Acc=0.7788 ||| Val Loss=0.4

In [18]:
print("\n ||| Training final model on full dataset |||")

# use full training dataset on the model
full_dataset = TensorDataset(X_tensor, Y_tensor)
train_loader = DataLoader(full_dataset, batch_size=256, shuffle=True) #  global train_loader variable used inside run_tabm

# training final model
final_model = run_tabm(**best, n_epochs=400)

# evaluate validation accuracy
final_model.eval()  # turn off dropout, etc.

val_preds = [] # predictions
val_targets = [] # true labels

with torch.no_grad():  # disables gradient tracking
    for xb, yb in val_loader:  # use your validation loader
        outputs = final_model(xb, None)
        preds = outputs.mean(dim=1).argmax(dim=1)
        val_preds.append(preds)
        val_targets.append(yb)

val_preds = torch.cat(val_preds)
val_targets = torch.cat(val_targets)

# calculate final validation accuracy
final_val_acc = accuracy_score(val_targets.cpu(), val_preds.cpu())
print(f"Final Validation Accuracy: {final_val_acc:.4f}")


 ||| Training final model on full dataset |||
Epoch 1: Train Loss=0.9117, Train Acc=0.5651 ||| Val Loss=0.6034, Val Acc=0.7468
Epoch 2: Train Loss=0.5901, Train Acc=0.6963 ||| Val Loss=0.5344, Val Acc=0.7589
Epoch 3: Train Loss=0.5538, Train Acc=0.7438 ||| Val Loss=0.5098, Val Acc=0.7779
Epoch 4: Train Loss=0.5356, Train Acc=0.7611 ||| Val Loss=0.4949, Val Acc=0.7722
Epoch 5: Train Loss=0.5207, Train Acc=0.7700 ||| Val Loss=0.4986, Val Acc=0.7768
Epoch 6: Train Loss=0.5130, Train Acc=0.7741 ||| Val Loss=0.4950, Val Acc=0.7791
Epoch 7: Train Loss=0.5068, Train Acc=0.7790 ||| Val Loss=0.4861, Val Acc=0.7756
Epoch 8: Train Loss=0.5034, Train Acc=0.7798 ||| Val Loss=0.4840, Val Acc=0.7762
Epoch 9: Train Loss=0.4945, Train Acc=0.7858 ||| Val Loss=0.4785, Val Acc=0.7768
Epoch 10: Train Loss=0.4998, Train Acc=0.7822 ||| Val Loss=0.4887, Val Acc=0.7831
Epoch 11: Train Loss=0.4923, Train Acc=0.7852 ||| Val Loss=0.4795, Val Acc=0.7779
Epoch 12: Train Loss=0.4833, Train Acc=0.7873 ||| Val Loss=0

#### Making the submission

In [19]:

# convert test to tensor so that we can feed it into model
test_x_tensor = torch.tensor(test.values, dtype=torch.float32)

# Predicting
final_model.eval()
with torch.no_grad():
    test_outputs = final_model(test_x_tensor, None)  # generate predictions for the k models
    test_preds = test_outputs.mean(dim=1).argmax(dim=1)  # we take the average prediction over the k ensemble models

# we get back the passengerIds (removed in encoding)
test_ids = pd.read_csv('data/test.csv')['PassengerId']

#  setup the submission df
final_submission_tabm = pd.DataFrame({
    'PassengerId': test_ids,
    'Transported': test_preds.cpu().numpy().astype(bool)  # convert predictions to bool True/False
})

# saves the submission df to csv. Ready for kaggle.
final_submission_tabm.to_csv('submissions/final_submission_tabm.csv', index=False)

## Final Result

- The final model achieved a validation accuracy of **0.8470**, which is a strong result during training.
- However, the final Kaggle leaderboard score for the TABM model was **0.79541**, indicating signs of overfitting to the training data.
- The five-fold cross-validation gave a mean validation accuracy of **0.7927**, suggesting that the true generalization performance of the model is slightly lower than the single validation split suggested.

I hypothesize that the model's complexity — and the high capacity of MLP architectures in general — made it prone to overfitting given the relatively small size of the Spaceship Titanic dataset.  
Despite the regularization strategies used (dropout, weight decay, batch normalization), the model likely still overfit subtle patterns in the training data that did not generalize well to unseen test data.
