# Optimizing hyperparameters for ElasticNet, ExtraTrees and XGBoost with Optuna

**Zbiory danych**:
1. Kredyty: https://www.openml.org/search?type=data&sort=runs&id=31&status=active&fbclid=IwZXh0bgNhZW0CMTEAAR0LEBUzDmkBWU5CJnI93mAsYUDFilVrhmiXRyS_JDMNZmcdzgBLqPhzlh4_aem_6sDeQnXFdTyQdDh1LyZAxQ
2.  Cukrzyca: https://www.openml.org/search?type=data&sort=runs&id=37&status=active&fbclid=IwZXh0bgNhZW0CMTEAAR0I4wZuCTTESyFUjUcDwJs35bTnJs7bqnIEmlVZbL-6fxHbKLMW7L_Ly4A_aem_WFz0zwQx3GFdSdlD3UQszg
3. Transfuzja krwi: https://www.openml.org/search?type=data&sort=runs&id=1464&status=active&fbclid=IwZXh0bgNhZW0CMTEAAR2sNl5JdGtoyl5fXCGpQXSjATOnHyswHo99zDt9THChDRmqDi2RgKr2Fa8_aem_2wGmFakw-Bj8BUXn4HDt8g
4. Wine: https://www.openml.org/search?type=data&sort=qualities.NumberOfFeatures&status=active&qualities.NumberOfClasses=%3D_2&qualities.NumberOfFeatures=between_10_100&order=asc&qualities.NumberOfInstances=between_1000_10000&id=43980

**Algorytmy ML:**

1. Elastic Net: C, l1_ratio, penalty
2. Extra Trees: n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features, criterion
3. XGBoost: n_estimators, max_depth, learning_rate, booster, gamma, subsample, reg_alpha, reg_lambda


**Metody optymalizacji:**

A) Dla wszystkich zbiorów jednocześnie by obliczyć $\theta^*$
- RandomSerach
- Ewentualnie: Bayes Search z agregacją na poziomie funkcji objective(trial) w optunie

B) Dla każdego ze zbiorów osobno aby policzyć tunowalność algorytmów ML:
- RandomSearch
- Bayes Search oparty o Gaussian Processes
- Tree-structured Parzen Estimator


**Funkcje celu:**
- AUC
- Accuracy
- F1 Score

## Loading and data preprocessing

In [27]:
import math

from utils import config as config
from utils.save_results import save_study_to_pickle_joint as save_study_to_pickle_joint, save_best_params_to_json_joint
from utils.save_results import save_study_to_pickle_marginal as save_study_to_pickle_marginal
from utils.save_results import save_best_params_to_json_joint as save_best_params_to_json_joint
from utils.save_results import save_best_params_to_json_marginal as save_best_params_to_json_marginal
from utils.save_results import read_study_from_pickle_joint as read_study_from_pickle_joint
from utils.save_results import read_study_from_pickle_marginal as read_study_from_pickle_marginal
from utils.save_results import read_best_params_from_json_joint as read_best_params_from_json_joint
from utils.save_results import read_best_params_from_json_marginal as read_best_params_from_json_marginal

import utils
import os
import json
import logging
import pickle
import pandas as pd
import numpy as np
import optuna
from sklearn.metrics import roc_auc_score, accuracy_score, f1_score
from sklearn.model_selection import StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import ExtraTreesClassifier
from xgboost import XGBClassifier
import warnings

warnings.filterwarnings("ignore")
logger = logging.getLogger(__name__)
logger.setLevel(logging.WARN)
optuna.logging.set_verbosity(optuna.logging.INFO)

SEED = config.RANDOM_SEED
SEEDS = config.RANDOM_SEEDS
DATASET_IDS = config.DATASET_IDS
MODELS = config.MODELS
SAMPLERS = config.SAMPLERS
SKF = StratifiedKFold(n_splits=5, shuffle=True, random_state=SEED)
preprocessing = utils.preprocessing

results_dir = os.path.join(os.getcwd(), 'results')
results_studies_dir = os.path.join(results_dir, 'studies')
os.makedirs(results_studies_dir, exist_ok=True)
results_bestparams_dir = os.path.join(results_dir, 'best_params')
os.makedirs(results_bestparams_dir, exist_ok=True)

In [28]:
Xs, ys = utils.get_data(DATASET_IDS)

2024-11-15 01:33:54,179 - openml.datasets.dataset - INFO - pickle write credit-g


2024-11-15 01:33:54,183 - openml.datasets.dataset - DEBUG - Saved dataset 31: credit-g to file /Users/hubert/.openml/org/openml/www/datasets/31/dataset_31.pkl.py3
2024-11-15 01:33:54,204 - openml.datasets.dataset - INFO - pickle write diabetes
2024-11-15 01:33:54,205 - openml.datasets.dataset - DEBUG - Saved dataset 37: diabetes to file /Users/hubert/.openml/org/openml/www/datasets/37/dataset_37.pkl.py3
2024-11-15 01:33:54,213 - openml.datasets.dataset - INFO - pickle write shrutime
2024-11-15 01:33:54,216 - openml.datasets.dataset - DEBUG - Saved dataset 45062: shrutime to file /Users/hubert/.openml/org/openml/www/datasets/45062/dataset_45062.pkl.py3
2024-11-15 01:33:54,224 - openml.datasets.dataset - INFO - pickle write wine
2024-11-15 01:33:54,225 - openml.datasets.dataset - DEBUG - Saved dataset 43980: wine to file /Users/hubert/.openml/org/openml/www/datasets/43980/dataset_43980.pkl.py3


Dataset ID=31, shape: (1000, 21), 0 missing values
Dataset ID=37, shape: (768, 9), 0 missing values
Dataset ID=45062, shape: (10000, 11), 0 missing values
Dataset ID=43980, shape: (2554, 12), 0 missing values


In [29]:
Xs = [pd.DataFrame(preprocessing.fit_transform(X)) for X in Xs]
logger.info('Preprocessing done')

## Key function definitions

In [30]:
def run_experiment_joint(model_class, sampler_class, Xs, ys, param_space, results_dir, n_trials=10):
    """
    Run the optimization process for all datasets simultaneously, given the model_class and a sampler.
    The result of this function (after analyzing the results) is a single set of hyperparameters 
    that are optimal for all datasets.
    
    Args:
        model_class: class of the model to be optimized
        sampler_class: class of the sampler to be used
        Xs: list of Xs of the datasets
        ys: list of ys of the datasets
        param_space: dictionary of hyperparameters and their ranges
        results_dir: path to the directory where the results will be saved
        n_trials: number of trials for the optimization process
        
    Dependencies:
        run_study: function that runs optuna optimization
    """
    
    sampler_name = sampler_class.__name__
    model_name = model_class.__name__
    
    logger.info(f"Searching hyperparameter space with {sampler_name} for {model_name}")
    path_joint = os.path.join(results_dir, f"{model_name}_{sampler_name}_joint.csv")

    logger.info("Optimizing hyperparameters for all datasets simultaneously to calculate optimal defaults")
    trials, study = run_study(objective, model_class, sampler_class, Xs, ys, param_space, n_trials=n_trials)
    trials.to_csv(path_joint)
    logger.info(f"Results saved to {path_joint}")
    
    return trials, study

In [31]:
def run_experiment_marginal(model_class, sampler_class, X, y, ID, param_space, results_dir, n_trials=10):
    """
    Run the optimization process for a single dataset, given the model_class and a sampler.
    
    Args:
        model_class: class of the model to be optimized
        sampler_class: class of the sampler to be used
        X: X of the dataset
        y: y of the dataset
        ID: id of the dataset being used for this particular experiment (id of (X,y))
        param_space: dictionary of hyperparameters and their ranges
        results_dir: path to the directory where the results will be saved
        n_trials: number of trials for the optimization process
    """
    
    sampler_name = sampler_class.__name__
    model_name = model_class.__name__
    
    logger.info(f"Searching hyperparameter space with {sampler_name} for {model_name}")
    path_marginal = os.path.join(results_dir, f"{model_name}_{sampler_name}_marginal_{ID}.csv")

    logger.info(f"Optimizing hyperparameters for dataset separately, dataset {ID}")
    trials, study = run_study(objective, model_class, sampler_class, [X], [y], param_space, n_trials=n_trials)
    trials['dataset'] = ID

    trials.to_csv(path_marginal)
    logger.info(f"Results saved to {path_marginal}")
    
    return trials, study

In [32]:
def objective(trial, model_class, Xs, ys, param_space):
    """
    The most important function for optimization with optuna. It is invoked in every iteration
    of the optimization process when optuna.study.optimize() is called. In mathematical terms,
    this is the objective function that the algorithm is trying to optimize (maximize in this case).
    
    Args:
        trial: optuna.trial.Trial
        model_class: class of the model to be optimized
        Xs: list of Xs of the datasets
        ys: list of ys of the datasets
        param_space: dictionary of hyperparameters and their ranges
        
    Dependencies:
        utils.sample_parameter: function that samples a value from the range of a hyperparameter,
                                decoding the range configured in utils.config file
    
    Returns:
        mean(auc_scores): mean AUC score among all datasets for given hyperparameters
        mean(accuracy_scores): mean accuracy score among all datasets for given hyperparameters
        mean(f1_scores): mean F1 score among all datasets for given hyperparameters
    
    """  
    param = {param_name: utils.study_utils.sample_parameter(trial, param_name, value) for param_name, value in param_space.items()}
    auc_scores = []

    for X, y in zip(Xs, ys):
        auc_split = []

        for train_idx, val_idx in SKF.split(X, y):
            X_train = X.iloc[train_idx]
            X_val = X.iloc[val_idx]
            y_train, y_val = y[train_idx], y[val_idx]
            model = model_class(**param)
            model.fit(X_train, y_train)

            y_pred = model.predict(X_val)
            if hasattr(model, 'predict_proba'):
                y_pred_prob = model.predict_proba(X_val)[:, 1]
            else:
                y_pred_prob = y_pred

            auc_split.append(roc_auc_score(y_val, y_pred_prob))

        auc_scores.append(np.mean(np.array(auc_split)))

    return np.mean(auc_scores)

In [36]:
from utils import study_utils

def run_study(objective, model_class, sampler_class, Xs, ys, param_space, n_trials):
    """
    Function that runs optuna optimization. It instantiates the sampler and starts a study
    for given model_class with param_space and for given datasets Xs and ys.
    
    Args:
        objective: the objective function to be optimized
        model_class: class of the model to be optimized
        sampler_class: class of the sampler to be used
        Xs: list of Xs of the datasets
        ys: list of ys of the datasets
        param_space: dictionary of hyperparameters and their ranges
        n_trials: number of trials for the optimization process
        
    Dependencies:
        objective: function that is invoked in every iteration of the optimization process
        get_trials: function that returns the trials (iterations) of the study
    """
    # seeds = [SEED] if sampler_class.__name__ == 'RandomSampler' else SEEDS
    seeds = [SEED]
    n_trials = math.ceil(n_trials/len(seeds))
    main_study = optuna.create_study(direction="maximize")
    
    for seed in seeds:
        sampler = sampler_class(seed=seed)
        study = optuna.create_study(direction="maximize", sampler=sampler)
        study.optimize(lambda trial: objective(trial, model_class, Xs, ys, param_space), n_trials=n_trials)
        main_study.add_trials(study.get_trials())
    
    print(main_study.best_trials)
    trials = utils.study_utils.get_trials(main_study)
    
    return trials, main_study

## Calculating $\theta^*$ - optimal defaults

### Logistic Regression - $\theta^*$
#### RS

In [None]:
model = MODELS[0][0]
param_grid = MODELS[0][1]
sampler = SAMPLERS[0]

lr_joint_trials, lr_joint_study = run_experiment_joint(model, sampler, Xs, ys, param_grid, results_dir, n_trials=300)

lr_best_params_joint = lr_joint_study.best_trials[0].params
print("Best params for Logistic Regression ")
for param_name, value in lr_best_params_joint.items():
    print(param_name, ": ", value)

save_study_to_pickle_joint(lr_joint_study, 'lr_rs')
save_best_params_to_json_joint(lr_best_params_joint, 'lr_rs')

[I 2024-11-15 01:54:41,901] A new study created in memory with name: no-name-a9bb5b65-b725-4173-af30-6c7813461e2d
[I 2024-11-15 01:54:41,920] A new study created in memory with name: no-name-96bf1db9-4c41-4fc3-98bd-ffb692c909d2
[I 2024-11-15 01:54:42,347] Trial 0 finished with value: 0.8127931691379152 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8127931691379152.
[I 2024-11-15 01:54:46,431] Trial 1 finished with value: 0.8137264202663816 and parameters: {'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 with value: 0.8137264202663816.
[I 2024-11-15 01:54:46,918] Trial 2 finished with value: 0.7933168868858801 and parameters: {'C': 0.0017707168643537846, 'penalty': 'elasticnet', 'l1_ratio': 0.0004207053950287938, 'class_weight

[FrozenTrial(number=9, state=TrialState.COMPLETE, values=[0.8159293896091279], datetime_start=datetime.datetime(2024, 11, 15, 1, 54, 57, 144921), datetime_complete=datetime.datetime(2024, 11, 15, 1, 54, 57, 620265), params={'C': 0.2854697857797185, 'penalty': 'elasticnet', 'l1_ratio': 0.0014618962793704966, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=9, value=None)]
Best params for Logistic Regression 
C :  0.2854697857797185
penalty :  elasticnet
l1_ratio :  0.0014618962793704966
class_weight :  bal

#### BS

In [None]:
model = MODELS[0][0]
param_grid = MODELS[0][1]
sampler = SAMPLERS[1]

lr_joint_trials_bs, lr_joint_study_bs = run_experiment_joint(model, sampler, Xs, ys, param_grid, results_dir, n_trials=150)

lr_best_params_joint_bs = lr_joint_study_bs.best_trials[0].params
print("Best params for Logistic Regression ")
for param_name, value in lr_best_params_joint_bs.items():
    print(param_name, ": ", value)

save_study_to_pickle_joint(lr_joint_study_bs, 'lr_tpe')
save_best_params_to_json_joint(lr_best_params_joint_bs, 'lr_tpe')

[I 2024-11-15 01:56:55,734] A new study created in memory with name: no-name-f854c53b-2679-433c-9d3e-fe81c9f49d3a
[I 2024-11-15 01:56:55,741] A new study created in memory with name: no-name-c46d9fc3-c260-4aed-9725-786ece895b4f
[I 2024-11-15 01:56:56,118] Trial 0 finished with value: 0.8127854164971388 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8127854164971388.
[I 2024-11-15 01:57:00,180] Trial 1 finished with value: 0.8137162203727448 and parameters: {'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 with value: 0.8137162203727448.
[I 2024-11-15 01:57:00,455] Trial 2 finished with value: 0.7933232970047078 and parameters: {'C': 0.0017707168643537846, 'penalty': 'elasticnet', 'l1_ratio': 0.0004207053950287938, 'class_weight

[FrozenTrial(number=9, state=TrialState.COMPLETE, values=[0.8159482164618372], datetime_start=datetime.datetime(2024, 11, 15, 1, 57, 10, 550240), datetime_complete=datetime.datetime(2024, 11, 15, 1, 57, 11, 49140), params={'C': 0.2854697857797185, 'penalty': 'elasticnet', 'l1_ratio': 0.0014618962793704966, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=9, value=None)]


AttributeError: 'list' object has no attribute 'params'

### ExtraTrees - $\theta^*$

In [None]:
model = MODELS[1][0]
param_grid = MODELS[1][1]
sampler = SAMPLERS[0]

# Run the optimization process for all datasets simultaneously
et_joint_trials, et_joint_study = run_experiment_joint(model, sampler, Xs, ys, param_grid, results_dir, n_trials=300)

# Get best default hyperparameters
et_best_params_joint = et_joint_study.best_trials[0].params
print("Best params for Extra Trees ")
for param_name, value in et_best_params_joint.items():
    print(param_name, ": ", value)

# Save the study and best parameters to pickle files
save_study_to_pickle_joint(et_joint_study, 'et_rs')
save_best_params_to_json_joint(et_best_params_joint, 'et_rs')

2024-11-14 21:00:31,003 - __main__ - INFO - Searching hyperparameter space with RandomSampler for ExtraTreesClassifier
2024-11-14 21:00:31,005 - __main__ - INFO - Optimizing hyperparameters for all datasets simultaneously to calculate optimal defaults
[I 2024-11-14 21:00:31,005] A new study created in memory with name: no-name-d17ab06d-dc96-4b67-8c16-f2a446720461
[I 2024-11-14 21:00:31,007] A new study created in memory with name: no-name-b3df6857-1553-4bd6-8df2-5ccfedb281c3
[I 2024-11-14 21:00:39,370] Trial 0 finished with value: 0.7785188562284511 and parameters: {'n_estimators': 381, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.5780093202212182, 'max_features': 0.22479561626896213, 'min_samples_leaf': 0.061616722433639894}. Best is trial 0 with value: 0.7785188562284511.
[I 2024-11-14 21:01:11,389] Trial 1 finished with value: 0.7841143724007601 and parameters: {'n_estimators': 868, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9849549260809971, 'max_featur

[FrozenTrial(number=1, state=1, values=[0.7841143724007601], datetime_start=datetime.datetime(2024, 11, 14, 21, 0, 39, 371261), datetime_complete=datetime.datetime(2024, 11, 14, 21, 1, 11, 389653), params={'n_estimators': 868, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9849549260809971, 'max_features': 0.7659541126403374, 'min_samples_leaf': 0.09246782213565524}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1000, log=False, low=10, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.5, step=None), 'max_features': FloatDistribution(high=0.9, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=0.25, log=False, low=0.05, step=None)}, trial_id=1, value=None)]
Best params for Extra Trees 
n_estimators :  868
criterion :  entropy
bootstrap : 

In [None]:
model = MODELS[1][0]
param_grid = MODELS[1][1]
sampler = SAMPLERS[1]

# Run the optimization process for all datasets simultaneously
et_joint_trials, et_joint_study = run_experiment_joint(model, sampler, Xs, ys, param_grid, results_dir, n_trials=150)

# Get best default hyperparameters
et_best_params_joint = et_joint_study.best_trials[0].params

print("Best params for Extra Trees ")
for param_name, value in et_best_params_joint.items():
    print(param_name, ": ", value)
    
# Save the study and best parameters to pickle files
save_study_to_pickle_joint(et_joint_study, 'et_tpe')
save_best_params_to_json_joint(et_best_params_joint, 'et_tpe')

2024-11-14 00:40:48,176 - __main__ - INFO - Searching hyperparameter space with TPESampler for ExtraTreesClassifier
2024-11-14 00:40:48,179 - __main__ - INFO - Optimizing hyperparameters for all datasets simultaneously to calculate optimal defaults
[I 2024-11-14 00:40:48,180] A new study created in memory with name: no-name-1fcd5c28-a83d-4567-8a6a-ed98b3da1269
[I 2024-11-14 00:40:48,182] A new study created in memory with name: no-name-7ee14757-28f3-4ce1-a500-24414ae89e8d
[I 2024-11-14 00:41:03,828] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:41:50,265] Trial 1 finished with value: 0.708489178383457 and parameters: {'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samp

[FrozenTrial(number=2, state=1, values=[0.7335389282170084], datetime_start=datetime.datetime(2024, 11, 14, 0, 41, 50, 267254), datetime_complete=datetime.datetime(2024, 11, 14, 0, 41, 59, 383341), params={'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=2, value=None)]
Best params for Extra Trees 
n_estimators :  458
criterion :  gini
bootstrap :  True
m

### XGBoost - $\theta^*$

In [None]:
model = MODELS[2][0]
param_grid = MODELS[2][1]
sampler = SAMPLERS[0]

# Run the optimization process for all datasets simultaneously
xgb_joint_trials, xgb_joint_study = run_experiment_joint(model, sampler, Xs, ys, param_grid, results_dir, n_trials=300)

# Get best default hyperparameters
xgb_best_params_joint = xgb_joint_study.best_trials[0].params
print("Best params for XGBoost ")
for param_name, value in xgb_best_params_joint.items():
    print(param_name, ": ", value)

# Save the study and best parameters to pickle files
save_study_to_pickle_joint(xgb_joint_study, 'xgb_rs')
save_best_params_to_json_joint(xgb_best_params_joint, 'xgb_rs')

2024-11-14 21:17:18,392 - __main__ - INFO - Searching hyperparameter space with RandomSampler for XGBClassifier
2024-11-14 21:17:18,394 - __main__ - INFO - Optimizing hyperparameters for all datasets simultaneously to calculate optimal defaults
[I 2024-11-14 21:17:18,395] A new study created in memory with name: no-name-74554863-229e-4f21-845f-e6d8e1eb65fb
[I 2024-11-14 21:17:18,396] A new study created in memory with name: no-name-affc9caa-3890-4e5e-9b3b-696e61d46a77
[I 2024-11-14 21:17:23,740] Trial 0 finished with value: 0.658821131497315 and parameters: {'n_estimators': 755, 'learning_rate': 0.2657846934562037, 'subsample': 0.7989954563585538, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.2403950683025824, 'colsample_bylevel': 0.15227525095137953, 'reg_alpha': 64.77532426827341, 'reg_lambda': 4.0428727350273315}. Best is trial 0 with value: 0.658821131497315.
[I 2024-11-14 21:17:36,015] Trial 1 finished with value: 0.808925997734

[FrozenTrial(number=1, state=1, values=[0.8089259977343646], datetime_start=datetime.datetime(2024, 11, 14, 21, 17, 23, 741520), datetime_complete=datetime.datetime(2024, 11, 14, 21, 17, 36, 15654), params={'n_estimators': 1419, 'learning_rate': 0.00011861690371243162, 'subsample': 0.9774323891214958, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.26364247048639056, 'colsample_bylevel': 0.2650640588680905, 'reg_alpha': 0.010996218010237245, 'reg_lambda': 1.40779231399724}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=2000, log=False, low=10, step=1), 'learning_rate': FloatDistribution(high=0.4, log=True, low=0.0001, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.25, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, lo

In [None]:
model = MODELS[2][0]
param_grid = MODELS[2][1]
sampler = SAMPLERS[1]

# Run the optimization process for all datasets simultaneously
xgb_joint_trials, xgb_joint_study = run_experiment_joint(model, sampler, Xs, ys, param_grid, results_dir, n_trials=150)

# Get best default hyperparameters
xgb_best_params_joint = xgb_joint_study.best_trials[0].params

print("Best params for XGBoost ")
for param_name, value in xgb_best_params_joint.items():
    print(param_name, ": ", value)
    
# Save the study and best parameters to pickle files
save_study_to_pickle_joint(xgb_joint_study, 'xgb_tpe')
save_best_params_to_json_joint(xgb_best_params_joint, 'xgb_tpe')

2024-11-14 00:43:22,638 - __main__ - INFO - Searching hyperparameter space with TPESampler for XGBClassifier
2024-11-14 00:43:22,640 - __main__ - INFO - Optimizing hyperparameters for all datasets simultaneously to calculate optimal defaults
[I 2024-11-14 00:43:22,641] A new study created in memory with name: no-name-068a0c3f-6ebb-4b4b-a4cf-dba363fa193c
[I 2024-11-14 00:43:22,642] A new study created in memory with name: no-name-ee6ac0a8-95fa-4a2f-b74c-80d7c9692978
[I 2024-11-14 00:43:32,742] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:44:01,741] Trial 1 finished with value: 0.8083940908272287 and parameters: {'n_esti

[FrozenTrial(number=3, state=1, values=[0.8096246235110756], datetime_start=datetime.datetime(2024, 11, 14, 0, 44, 8, 460569), datetime_complete=datetime.datetime(2024, 11, 14, 0, 44, 27, 373370), params={'n_estimators': 3490, 'learning_rate': 0.019087949022263857, 'subsample': 0.8090866526479393, 'booster': 'gbtree', 'max_depth': 2, 'min_child_weight': 41.34543577073546, 'colsample_bytree': 0.7436130444117905, 'colsample_bylevel': 0.33520928688329754, 'reg_alpha': 0.5769841450775586, 'reg_lambda': 0.002809866857880845}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=False

## Optimization of each dataset separately to calculate $\theta^{(j)*}$ - best hyperparameters for given dataset $(j)$

Acknowledgements:
- From now on we will use AUC as our primary metric. All performance measures and tunability values will be caluclated using AUC
- Optimization will be performed in 2 chunks - first using Random Search and then using TPE (form of bayesian optimization)
- For each chunk we perform marginal optimization over all models and datasets

### Optimization with RandomSearch - LogisticRegression, ExtraTrees and XGBoost at once

In [None]:
sampler = SAMPLERS[0]
best_params_dict = {}

for model, param_grid in MODELS:
    for ID, X, y in zip(DATASET_IDS, Xs, ys):
        trials, study = run_experiment_marginal(model, sampler, X, y, ID, param_grid, results_dir, n_trials=300)
        model_name = {'LogisticRegression': 'lr', 'ExtraTreesClassifier': 'et', 'XGBClassifier': 'xgb'}[model.__name__]
        save_study_to_pickle_marginal(study, model_name, 'RS', ID)
        best_params = study.best_trials[0].params
    
        if model_name not in best_params_dict:
            best_params_dict[model_name] = {}
        best_params_dict[model_name][ID] = best_params
        
with open(os.path.join(results_bestparams_dir, 'best_params_dict_RS.json'), 'w') as f:
    json.dump(best_params_dict, f)

for model_name, datasets in best_params_dict.items():
    print(f"Best params for {model_name}:")
    for ID, params in datasets.items():
        print(f"  Dataset {ID}:")
        for param_name, value in params.items():
            print(f"    {param_name}: {value}")
    print("\n")

2024-11-14 00:44:54,225 - __main__ - INFO - Searching hyperparameter space with RandomSampler for LogisticRegression
2024-11-14 00:44:54,226 - __main__ - INFO - Optimizing hyperparameters for dataset separately, dataset 31
[I 2024-11-14 00:44:54,227] A new study created in memory with name: no-name-45f2ad2b-8e48-4865-a5e2-f2f1a3c578aa
[I 2024-11-14 00:44:54,228] A new study created in memory with name: no-name-dc65ea21-59a9-41bd-952a-12dd056e12cf
[I 2024-11-14 00:44:54,415] Trial 0 finished with value: 0.7788095238095238 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.7788095238095238.
[I 2024-11-14 00:44:55,649] Trial 1 finished with value: 0.7803809523809524 and parameters: {'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 wit

[FrozenTrial(number=1, state=1, values=[0.7803809523809524], datetime_start=datetime.datetime(2024, 11, 14, 0, 44, 54, 416069), datetime_complete=datetime.datetime(2024, 11, 14, 0, 44, 55, 649262), params={'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=1, value=None)]
[FrozenTrial(number=0, state=1, values=[0.8299594689028652], datetime_start=datetime.datetime(2024, 11, 14, 0, 44, 55, 793626), datetime_complete=datetime.d

[I 2024-11-14 00:44:56,486] Trial 0 finished with value: 0.8324682639918153 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8324682639918153.
[I 2024-11-14 00:45:02,714] Trial 1 finished with value: 0.8328541605189675 and parameters: {'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 with value: 0.8328541605189675.
[I 2024-11-14 00:45:03,249] Trial 2 finished with value: 0.8149494612197428 and parameters: {'C': 0.0017707168643537846, 'penalty': 'elasticnet', 'l1_ratio': 0.0004207053950287938, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 with value: 0.8328541605189675.
2024-11-14 00:45:03,265 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/results/LogisticRegression_RandomSampler_ma

[FrozenTrial(number=1, state=1, values=[0.8328541605189675], datetime_start=datetime.datetime(2024, 11, 14, 0, 44, 56, 487180), datetime_complete=datetime.datetime(2024, 11, 14, 0, 45, 2, 714344), params={'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=1, value=None)]


[I 2024-11-14 00:45:03,476] Trial 0 finished with value: 0.8098475706459055 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8098475706459055.
[I 2024-11-14 00:45:03,708] Trial 1 finished with value: 0.8119067666282198 and parameters: {'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 with value: 0.8119067666282198.
[I 2024-11-14 00:45:03,759] Trial 2 finished with value: 0.79272336361015 and parameters: {'C': 0.0017707168643537846, 'penalty': 'elasticnet', 'l1_ratio': 0.0004207053950287938, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 1 with value: 0.8119067666282198.
2024-11-14 00:45:03,767 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/results/LogisticRegression_RandomSampler_marg

[FrozenTrial(number=1, state=1, values=[0.8119067666282198], datetime_start=datetime.datetime(2024, 11, 14, 0, 45, 3, 476723), datetime_complete=datetime.datetime(2024, 11, 14, 0, 45, 3, 708100), params={'C': 71.77141927992021, 'penalty': 'elasticnet', 'l1_ratio': 0.024810409748678097, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=1, value=None)]


[I 2024-11-14 00:45:06,548] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:45:13,457] Trial 1 finished with value: 0.5674047619047619 and parameters: {'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}. Best is trial 1 with value: 0.5674047619047619.
[I 2024-11-14 00:45:14,651] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 289, 'criterion': 'log_loss', 'bootstrap': True, 'max_samples': 0.48875051677790415, 'max_features': 0.36210622617823773, 'min_samples_leaf': 0.6506676052501416}. Best is trial 1 with value: 0.5674047619047619.
2024-11-14 00:45:14,658 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/resul

[FrozenTrial(number=1, state=1, values=[0.5674047619047619], datetime_start=datetime.datetime(2024, 11, 14, 0, 45, 6, 549871), datetime_complete=datetime.datetime(2024, 11, 14, 0, 45, 13, 457167), params={'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:45:19,171] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:45:29,816] Trial 1 finished with value: 0.7775559049615653 and parameters: {'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}. Best is trial 1 with value: 0.7775559049615653.
[I 2024-11-14 00:45:30,997] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 289, 'criterion': 'log_loss', 'bootstrap': True, 'max_samples': 0.48875051677790415, 'max_features': 0.36210622617823773, 'min_samples_leaf': 0.6506676052501416}. Best is trial 1 with value: 0.7775559049615653.
2024-11-14 00:45:31,008 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/resul

[FrozenTrial(number=1, state=1, values=[0.7775559049615653], datetime_start=datetime.datetime(2024, 11, 14, 0, 45, 19, 172624), datetime_complete=datetime.datetime(2024, 11, 14, 0, 45, 29, 816549), params={'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:45:34,350] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:45:45,293] Trial 1 finished with value: 0.72794513790491 and parameters: {'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}. Best is trial 1 with value: 0.72794513790491.
[I 2024-11-14 00:45:49,378] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 289, 'criterion': 'log_loss', 'bootstrap': True, 'max_samples': 0.48875051677790415, 'max_features': 0.36210622617823773, 'min_samples_leaf': 0.6506676052501416}. Best is trial 1 with value: 0.72794513790491.
2024-11-14 00:45:49,390 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/results/Ext

[FrozenTrial(number=1, state=1, values=[0.72794513790491], datetime_start=datetime.datetime(2024, 11, 14, 0, 45, 34, 350784), datetime_complete=datetime.datetime(2024, 11, 14, 0, 45, 45, 293272), params={'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:45:55,218] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:46:05,194] Trial 1 finished with value: 0.7558317353902345 and parameters: {'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}. Best is trial 1 with value: 0.7558317353902345.
[I 2024-11-14 00:46:06,414] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 289, 'criterion': 'log_loss', 'bootstrap': True, 'max_samples': 0.48875051677790415, 'max_features': 0.36210622617823773, 'min_samples_leaf': 0.6506676052501416}. Best is trial 1 with value: 0.7558317353902345.
2024-11-14 00:46:06,420 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/resul

[FrozenTrial(number=1, state=1, values=[0.7558317353902345], datetime_start=datetime.datetime(2024, 11, 14, 0, 45, 55, 218922), datetime_complete=datetime.datetime(2024, 11, 14, 0, 46, 5, 194364), params={'n_estimators': 1302, 'criterion': 'entropy', 'bootstrap': True, 'max_samples': 0.9729188669457949, 'max_features': 0.8491983767203796, 'min_samples_leaf': 0.29110519961044856}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:46:08,157] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:46:12,048] Trial 1 finished with value: 0.7754166666666666 and parameters: {'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}. Best is trial 1 with value: 0.7754166666666666.
[I 2024-11-14 00:46:14,088] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 2160, 'lea

[FrozenTrial(number=1, state=1, values=[0.7754166666666666], datetime_start=datetime.datetime(2024, 11, 14, 0, 46, 8, 158124), datetime_complete=datetime.datetime(2024, 11, 14, 0, 46, 12, 48047), params={'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=Fal

[I 2024-11-14 00:46:15,226] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:46:18,715] Trial 1 finished with value: 0.8073812019566737 and parameters: {'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}. Best is trial 1 with value: 0.8073812019566737.
[I 2024-11-14 00:46:20,064] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 2160, 'lea

[FrozenTrial(number=1, state=1, values=[0.8073812019566737], datetime_start=datetime.datetime(2024, 11, 14, 0, 46, 15, 226882), datetime_complete=datetime.datetime(2024, 11, 14, 0, 46, 18, 715133), params={'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=F

[I 2024-11-14 00:46:22,289] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:46:32,690] Trial 1 finished with value: 0.8326656294414374 and parameters: {'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}. Best is trial 1 with value: 0.8326656294414374.
[I 2024-11-14 00:46:41,556] Trial 2 finished with value: 0.8132203549627315 and parameters: {'n_estimato

[FrozenTrial(number=1, state=1, values=[0.8326656294414374], datetime_start=datetime.datetime(2024, 11, 14, 0, 46, 22, 290406), datetime_complete=datetime.datetime(2024, 11, 14, 0, 46, 32, 689690), params={'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=F

[I 2024-11-14 00:46:43,039] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:46:57,952] Trial 1 finished with value: 0.8181128652441367 and parameters: {'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}. Best is trial 1 with value: 0.8181128652441367.
[I 2024-11-14 00:47:02,019] Trial 2 finished with value: 0.5 and parameters: {'n_estimators': 2160, 'lea

[FrozenTrial(number=1, state=1, values=[0.8181128652441367], datetime_start=datetime.datetime(2024, 11, 14, 0, 46, 43, 40177), datetime_complete=datetime.datetime(2024, 11, 14, 0, 46, 57, 952112), params={'n_estimators': 3541, 'learning_rate': 1.236400798668794e-05, 'subsample': 0.9729188669457949, 'booster': 'gbtree', 'max_depth': 13, 'min_child_weight': 27.967067056141072, 'colsample_bytree': 0.19000671753502962, 'colsample_bylevel': 0.1915704647548995, 'reg_alpha': 0.027160511446548512, 'reg_lambda': 1.5777981883365035}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=Fa

### Optimization with TPE (Bayes Search) for Logistic Regression, Extra Trees and XGBoost at once

In [None]:
sampler = SAMPLERS[1]
best_params_dict = {}

for model, param_grid in MODELS:
    for ID, X, y in zip(DATASET_IDS, Xs, ys):
        trials, study = run_experiment_marginal(model, sampler, X, y, ID, param_grid, results_dir, n_trials=150)
        model_name = {'LogisticRegression': 'lr', 'ExtraTreesClassifier': 'et', 'XGBClassifier': 'xgb'}[model.__name__]
        save_study_to_pickle_marginal(study, model_name, 'BS', ID)
        best_params = study.best_trials[0].params
    
        if model_name not in best_params_dict:
            best_params_dict[model_name] = {}
        best_params_dict[model_name][ID] = best_params
        
with open(os.path.join(results_bestparams_dir, 'best_params_dict_TPE.json'), 'w') as f:
    json.dump(best_params_dict, f)
    
for model_name, datasets in best_params_dict.items():
    print(f"Best params for {model_name}:")
    for ID, params in datasets.items():
        print(f"  Dataset {ID}:")
        for param_name, value in params.items():
            print(f"    {param_name}: {value}")
    print("\n")

2024-11-14 00:47:02,101 - __main__ - INFO - Searching hyperparameter space with TPESampler for LogisticRegression
2024-11-14 00:47:02,103 - __main__ - INFO - Optimizing hyperparameters for dataset separately, dataset 31
[I 2024-11-14 00:47:02,104] A new study created in memory with name: no-name-59c9fc45-b61a-452e-8230-6c695a0a82b4
[I 2024-11-14 00:47:02,105] A new study created in memory with name: no-name-1143f6a7-6944-4743-81d4-54a08aa516e6
[I 2024-11-14 00:47:02,321] Trial 0 finished with value: 0.7788095238095238 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.7788095238095238.
[I 2024-11-14 00:47:02,324] A new study created in memory with name: no-name-ec68c371-85a2-4a4a-9ab0-4b2ae6a39ed0
[I 2024-11-14 00:47:02,493] Trial 0 finished with value: 0.7777619047619047 and parameters: {'C': 0.023441926146948493, 'penalty': 'elasticnet', 'l1_

[FrozenTrial(number=2, state=1, values=[0.7863333333333332], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 2, 496471), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 2, 687447), params={'C': 0.03350945131274967, 'penalty': 'elasticnet', 'l1_ratio': 0.00648817727333381, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=2, value=None)]


[I 2024-11-14 00:47:02,954] Trial 0 finished with value: 0.828962962962963 and parameters: {'C': 0.03350945131274967, 'penalty': 'elasticnet', 'l1_ratio': 0.00648817727333381, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.828962962962963.
2024-11-14 00:47:02,964 - __main__ - INFO - Results saved to /home/igor/Repos/AutoML/results/LogisticRegression_TPESampler_marginal_37.csv
2024-11-14 00:47:02,966 - __main__ - INFO - Searching hyperparameter space with TPESampler for LogisticRegression
2024-11-14 00:47:02,967 - __main__ - INFO - Optimizing hyperparameters for dataset separately, dataset 45062
[I 2024-11-14 00:47:02,967] A new study created in memory with name: no-name-fd2f3ebb-7c89-47ac-bce1-b7123fd8f9d6
[I 2024-11-14 00:47:02,969] A new study created in memory with name: no-name-77aa0bd0-2b13-416b-b190-e2f9a51fca2b


[FrozenTrial(number=0, state=1, values=[0.8299965059399023], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 2, 704345), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 2, 755340), params={'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=0, value=None)]


[I 2024-11-14 00:47:03,496] Trial 0 finished with value: 0.8324682617407145 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8324682617407145.
[I 2024-11-14 00:47:03,498] A new study created in memory with name: no-name-79483db6-d218-4bd7-af03-d97d7a9c0314
[I 2024-11-14 00:47:04,059] Trial 0 finished with value: 0.8298975908720951 and parameters: {'C': 0.023441926146948493, 'penalty': 'elasticnet', 'l1_ratio': 0.17229428125126015, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8298975908720951.
[I 2024-11-14 00:47:04,061] A new study created in memory with name: no-name-1c818204-96ea-4723-b3f5-e1a791a3d099
[I 2024-11-14 00:47:04,679] Trial 0 finished with value: 0.8304099249588317 and parameters: {'C': 0.03350945131274967, 'penalty': 'elasticnet', 'l1_ratio': 0.00648817727333381, 'class_weight':

[FrozenTrial(number=0, state=1, values=[0.8324682617407145], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 2, 969713), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 3, 496142), params={'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=0, value=None)]


[I 2024-11-14 00:47:04,918] Trial 0 finished with value: 0.8098475706459055 and parameters: {'C': 0.09915644566638401, 'penalty': 'elasticnet', 'l1_ratio': 0.6351221010640696, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8098475706459055.
[I 2024-11-14 00:47:04,921] A new study created in memory with name: no-name-3f4d93be-a79d-4cff-8374-e3b0485f58a9
[I 2024-11-14 00:47:05,020] Trial 0 finished with value: 0.8089955545943868 and parameters: {'C': 0.023441926146948493, 'penalty': 'elasticnet', 'l1_ratio': 0.17229428125126015, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}. Best is trial 0 with value: 0.8089955545943868.
[I 2024-11-14 00:47:05,022] A new study created in memory with name: no-name-bfc8fa54-ee8e-4089-a1d9-340819d0ab2b
[I 2024-11-14 00:47:05,126] Trial 0 finished with value: 0.8109811731064974 and parameters: {'C': 0.03350945131274967, 'penalty': 'elasticnet', 'l1_ratio': 0.00648817727333381, 'class_weight':

[FrozenTrial(number=2, state=1, values=[0.8109811731064974], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 5, 23256), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 5, 126211), params={'C': 0.03350945131274967, 'penalty': 'elasticnet', 'l1_ratio': 0.00648817727333381, 'class_weight': 'balanced', 'max_iter': 1500, 'solver': 'saga'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'C': FloatDistribution(high=10000.0, log=True, low=0.0001, step=None), 'penalty': CategoricalDistribution(choices=('elasticnet',)), 'l1_ratio': FloatDistribution(high=1.0, log=True, low=0.0001, step=None), 'class_weight': CategoricalDistribution(choices=('balanced',)), 'max_iter': IntDistribution(high=1500, log=False, low=1500, step=1), 'solver': CategoricalDistribution(choices=('saga',))}, trial_id=2, value=None)]


[I 2024-11-14 00:47:08,359] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:08,360] A new study created in memory with name: no-name-ad8708a9-e54b-4592-9946-3b2e29f16e02
[I 2024-11-14 00:47:10,657] Trial 0 finished with value: 0.6250595238095238 and parameters: {'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}. Best is trial 0 with value: 0.6250595238095238.
[I 2024-11-14 00:47:10,658] A new study created in memory with name: no-name-befb184e-6f69-4fcc-b810-23ffc7121e57
[I 2024-11-14 00:47:12,883] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 487, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.88134983135684

[FrozenTrial(number=1, state=1, values=[0.6250595238095238], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 8, 361086), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 10, 657472), params={'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:47:15,534] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:15,535] A new study created in memory with name: no-name-6f6e72b7-56f9-4351-b386-5c7c519be8c2
[I 2024-11-14 00:47:17,598] Trial 0 finished with value: 0.7757739343116701 and parameters: {'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}. Best is trial 0 with value: 0.7757739343116701.
[I 2024-11-14 00:47:17,599] A new study created in memory with name: no-name-1f62eaa5-f39e-40f9-b35f-87267be4bb4e
[I 2024-11-14 00:47:20,066] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 487, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.88134983135684

[FrozenTrial(number=1, state=1, values=[0.7757739343116701], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 15, 535802), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 17, 597811), params={'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:47:24,149] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:24,150] A new study created in memory with name: no-name-0e15d777-5405-4366-a4b7-48eae7054e92
[I 2024-11-14 00:47:27,845] Trial 0 finished with value: 0.7654046251238892 and parameters: {'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}. Best is trial 0 with value: 0.7654046251238892.
[I 2024-11-14 00:47:27,846] A new study created in memory with name: no-name-d8ebc775-10dc-48d7-82ab-8bff3030f9f9
[I 2024-11-14 00:47:31,537] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 487, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.88134983135684

[FrozenTrial(number=1, state=1, values=[0.7654046251238892], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 24, 151375), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 27, 845012), params={'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:47:34,332] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 574, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.24041677639819287, 'max_features': 0.2403950683025824, 'min_samples_leaf': 0.15227525095137953}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:34,333] A new study created in memory with name: no-name-c0af464c-f6ca-49cf-a33e-5253faf2cdfb
[I 2024-11-14 00:47:36,987] Trial 0 finished with value: 0.7524456519127259 and parameters: {'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}. Best is trial 0 with value: 0.7524456519127259.
[I 2024-11-14 00:47:36,989] A new study created in memory with name: no-name-4a8f56be-abce-4100-977a-bf7b62fe93e2
[I 2024-11-14 00:47:39,420] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 487, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.88134983135684

[FrozenTrial(number=1, state=1, values=[0.7524456519127259], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 34, 334182), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 36, 987678), params={'n_estimators': 458, 'criterion': 'gini', 'bootstrap': True, 'max_samples': 0.6052140781935152, 'max_features': 0.32823005879811984, 'min_samples_leaf': 0.19447937531659815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=1500, log=False, low=20, step=1), 'criterion': CategoricalDistribution(choices=('gini', 'entropy', 'log_loss')), 'bootstrap': CategoricalDistribution(choices=(True,)), 'max_samples': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'max_features': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'min_samples_leaf': FloatDistribution(high=1.0, log=False, low=0.1, step=None)}, trial_id=1, value=None)]


[I 2024-11-14 00:47:42,049] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:42,050] A new study created in memory with name: no-name-aa09a822-9622-47f7-9e8b-37bfcc831457
[I 2024-11-14 00:47:43,728] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1482, 'learning_rate': 0.041907742461594774, 'subsample': 0.4152272727010703, 'booster': 'gbtree', 'max_depth': 12, 'min_child_weight': 72.29131992286271, 'colsample_bytree': 0.2610530646779318, 'colsample_bylevel': 0.11392731284825795, 'reg_alpha': 0.0002935525339557525, 'reg_lambda': 24.341035298636815}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:4

[FrozenTrial(number=0, state=1, values=[0.5], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 39, 429899), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 42, 48743), params={'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=False, low=1.0, step=No

[I 2024-11-14 00:47:46,682] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:46,684] A new study created in memory with name: no-name-b8ef0a34-b77a-4074-927a-44444640369b
[I 2024-11-14 00:47:47,751] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1482, 'learning_rate': 0.041907742461594774, 'subsample': 0.4152272727010703, 'booster': 'gbtree', 'max_depth': 12, 'min_child_weight': 72.29131992286271, 'colsample_bytree': 0.2610530646779318, 'colsample_bylevel': 0.11392731284825795, 'reg_alpha': 0.0002935525339557525, 'reg_lambda': 24.341035298636815}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:4

[FrozenTrial(number=0, state=1, values=[0.5], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 45, 469481), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 46, 682323), params={'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=False, low=1.0, step=N

[I 2024-11-14 00:47:51,128] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:51,130] A new study created in memory with name: no-name-47da54a2-0d9d-42b5-aa9d-634ac2c0eb3a
[I 2024-11-14 00:47:53,571] Trial 0 finished with value: 0.8346409048028358 and parameters: {'n_estimators': 1482, 'learning_rate': 0.041907742461594774, 'subsample': 0.4152272727010703, 'booster': 'gbtree', 'max_depth': 12, 'min_child_weight': 72.29131992286271, 'colsample_bytree': 0.2610530646779318, 'colsample_bylevel': 0.11392731284825795, 'reg_alpha': 0.0002935525339557525, 'reg_lambda': 24.341035298636815}. Best is trial 0 with value: 0.834640904

[FrozenTrial(number=1, state=1, values=[0.8346409048028358], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 51, 130896), datetime_complete=datetime.datetime(2024, 11, 14, 0, 47, 53, 571451), params={'n_estimators': 1482, 'learning_rate': 0.041907742461594774, 'subsample': 0.4152272727010703, 'booster': 'gbtree', 'max_depth': 12, 'min_child_weight': 72.29131992286271, 'colsample_bytree': 0.2610530646779318, 'colsample_bylevel': 0.11392731284825795, 'reg_alpha': 0.0002935525339557525, 'reg_lambda': 24.341035298636815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=Fa

[I 2024-11-14 00:47:58,643] Trial 0 finished with value: 0.5 and parameters: {'n_estimators': 1873, 'learning_rate': 0.1804941906677887, 'subsample': 0.7587945476302645, 'booster': 'gbtree', 'max_depth': 9, 'min_child_weight': 20.814367336189438, 'colsample_bytree': 0.16443457513284063, 'colsample_bylevel': 0.06750277604651747, 'reg_alpha': 849.9808989183019, 'reg_lambda': 6.4405075539937195}. Best is trial 0 with value: 0.5.
[I 2024-11-14 00:47:58,645] A new study created in memory with name: no-name-48b30d8a-9f12-4ac0-801c-7484e3b9c05a
[I 2024-11-14 00:48:00,681] Trial 0 finished with value: 0.8006660599288736 and parameters: {'n_estimators': 1482, 'learning_rate': 0.041907742461594774, 'subsample': 0.4152272727010703, 'booster': 'gbtree', 'max_depth': 12, 'min_child_weight': 72.29131992286271, 'colsample_bytree': 0.2610530646779318, 'colsample_bylevel': 0.11392731284825795, 'reg_alpha': 0.0002935525339557525, 'reg_lambda': 24.341035298636815}. Best is trial 0 with value: 0.800666059

[FrozenTrial(number=1, state=1, values=[0.8006660599288736], datetime_start=datetime.datetime(2024, 11, 14, 0, 47, 58, 645715), datetime_complete=datetime.datetime(2024, 11, 14, 0, 48, 0, 680998), params={'n_estimators': 1482, 'learning_rate': 0.041907742461594774, 'subsample': 0.4152272727010703, 'booster': 'gbtree', 'max_depth': 12, 'min_child_weight': 72.29131992286271, 'colsample_bytree': 0.2610530646779318, 'colsample_bylevel': 0.11392731284825795, 'reg_alpha': 0.0002935525339557525, 'reg_lambda': 24.341035298636815}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=5000, log=False, low=1, step=1), 'learning_rate': FloatDistribution(high=0.3, log=True, low=1e-05, step=None), 'subsample': FloatDistribution(high=1.0, log=False, low=0.1, step=None), 'booster': CategoricalDistribution(choices=('gbtree',)), 'max_depth': IntDistribution(high=15, log=False, low=1, step=1), 'min_child_weight': FloatDistribution(high=128.0, log=Fal