# Notebook Summary: 14Days_LOS_Classification
This notebook, originally generated by Google Colab, implements a machine learning workflow for classifying hospital length of stay (LOS) in diabetic patients using data from a 14-day observation window. The key steps include:

## Setup & Imports:
The notebook begins by installing and importing necessary libraries such as scikit-learn, XGBoost, pandas, NumPy, matplotlib, seaborn, and Optuna for hyperparameter tuning.

##Data Loading & Preprocessing:
It loads a CSV dataset ("MC_merged.csv"), drops irrelevant columns, and defines the target variable (binary classification: below median vs. above median LOS).
The code distinguishes between pattern-based features (columns with specific prefixes) and baseline features, then constructs three datasets: one with only pattern-based features, one with only static features, and one with a combined set.

##Model Training & Evaluation:
The notebook optimizes and trains three classifiers—Random Forest, SVM, and XGBoost—using Optuna for hyperparameter tuning.
For each classifier, it splits the data into training and testing sets, performs necessary scaling (especially for SVM), and in the case of XGBoost, applies SMOTE to handle class imbalance.

##Performance Metrics:
For each model and feature set, evaluation metrics including Accuracy, Precision, Recall, F1 Score, and AUC-ROC are computed and printed, allowing for a comprehensive comparison of model performance.

This notebook is part of a broader research effort to assess how temporal abstraction and feature integration can improve LOS classification accuracy. All code, along with detailed hyperparameter optimization and evaluation routines, is made available in this notebook for reproducibility and further research exploration

In [None]:
# Install required packages
!pip install xgboost
!pip install scikit-learn
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install seaborn
!pip install --upgrade xgboost
!pip install optuna scikit-learn


Collecting optuna
  Downloading optuna-4.2.1-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.9-py3-none-any.whl.metadata (2.9 kB)
Downloading optuna-4.2.1-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.6/383.6 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.14.1-py3-none-any.whl (233 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.6/233.6 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading Mako-1.3.9-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.5/78.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Ma

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
import optuna
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.feature_selection import RFE
from imblearn.over_sampling import SMOTE
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, classification_report, confusion_matrix
)

# Load the dataset
df = pd.read_csv("MC_merged.csv")  # Ensure the dataset is uploaded in Colab
df = df.drop(columns=["TotalHospitalDays", "Unnamed: 0"])
# Define target variable (y)
y = df["value"]  # Target: 0 (below median), 1 (above median)

# Identify columns for different cases
pattern_columns = [col for col in df.columns if col.startswith("@@Pair")]
static_columns = [col for col in df.columns if col not in pattern_columns + ["value", "Total_Dosage", "PatientNum"]]

# Create different feature sets
X_patterns_only = df[pattern_columns]  # Only pattern-based features
X_static_only = df[static_columns]  # Only static features
X_combined = df.drop(columns=["value", "Total_Dosage", "PatientNum"])  # Use all features


#RANDOM FOREST

In [None]:
# Define the Optuna optimization function
def objective(trial, X_train, X_test, y_train, y_test):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 500),
        "max_depth": trial.suggest_int("max_depth", 5, 30),
        "min_samples_split": trial.suggest_int("min_samples_split", 2, 10),
        "min_samples_leaf": trial.suggest_int("min_samples_leaf", 1, 5),
        "bootstrap": trial.suggest_categorical("bootstrap", [True, False])
    }

    model = RandomForestClassifier(**params, random_state=42)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    return accuracy  # Optimize for accuracy

# Function to train and evaluate a Random Forest model with Optuna tuning
def train_and_evaluate(X, y, dataset_name):
    # Split into train/test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Run Optuna optimization
    study = optuna.create_study(direction="maximize")
    study.optimize(lambda trial: objective(trial, X_train, X_test, y_train, y_test), n_trials=30)  # 30 trials

    # Get best hyperparameters
    best_params = study.best_params
    print(f"\n🎯 Best Hyperparameters for {dataset_name}:")
    print(best_params)

    # Train the best model
    best_model = RandomForestClassifier(**best_params, random_state=42)
    best_model.fit(X_train, y_train)

    # Predict on test data
    y_pred = best_model.predict(X_test)
    y_pred_proba = best_model.predict_proba(X_test)[:, 1]

    # Evaluate Performance
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc_roc = roc_auc_score(y_test, y_pred_proba)
    conf_matrix = confusion_matrix(y_test, y_pred)

    # Print Evaluation Metrics
    print(f"\n📊 Results for {dataset_name}:")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    print(f"AUC-ROC Score: {auc_roc:.4f}")



# Train and evaluate for each dataset using Optuna-optimized Random Forest
train_and_evaluate(X_patterns_only, y, "Pattern-Based Features")
train_and_evaluate(X_static_only, y, "Static-Based Features")
train_and_evaluate(X_combined, y, "Combined Features")

[I 2025-02-28 12:01:31,562] A new study created in memory with name: no-name-150ef327-95f7-43c5-b5c5-d7f20c535b0c
[I 2025-02-28 12:01:32,081] Trial 0 finished with value: 0.622093023255814 and parameters: {'n_estimators': 434, 'max_depth': 26, 'min_samples_split': 10, 'min_samples_leaf': 5, 'bootstrap': False}. Best is trial 0 with value: 0.622093023255814.
[I 2025-02-28 12:01:32,508] Trial 1 finished with value: 0.622093023255814 and parameters: {'n_estimators': 239, 'max_depth': 8, 'min_samples_split': 9, 'min_samples_leaf': 5, 'bootstrap': False}. Best is trial 0 with value: 0.622093023255814.
[I 2025-02-28 12:01:33,135] Trial 2 finished with value: 0.6395348837209303 and parameters: {'n_estimators': 148, 'max_depth': 26, 'min_samples_split': 7, 'min_samples_leaf': 2, 'bootstrap': False}. Best is trial 2 with value: 0.6395348837209303.
[I 2025-02-28 12:01:33,951] Trial 3 finished with value: 0.627906976744186 and parameters: {'n_estimators': 176, 'max_depth': 30, 'min_samples_split'


🎯 Best Hyperparameters for Pattern-Based Features:
{'n_estimators': 148, 'max_depth': 26, 'min_samples_split': 7, 'min_samples_leaf': 2, 'bootstrap': False}

📊 Results for Pattern-Based Features:
Accuracy: 0.6395
Precision: 0.8378
Recall: 0.3563
F1 Score: 0.5000
AUC-ROC Score: 0.6472


[I 2025-02-28 12:01:49,106] Trial 0 finished with value: 0.6918604651162791 and parameters: {'n_estimators': 115, 'max_depth': 8, 'min_samples_split': 2, 'min_samples_leaf': 1, 'bootstrap': False}. Best is trial 0 with value: 0.6918604651162791.
[I 2025-02-28 12:01:49,915] Trial 1 finished with value: 0.7034883720930233 and parameters: {'n_estimators': 424, 'max_depth': 28, 'min_samples_split': 10, 'min_samples_leaf': 2, 'bootstrap': True}. Best is trial 1 with value: 0.7034883720930233.
[I 2025-02-28 12:01:50,542] Trial 2 finished with value: 0.6918604651162791 and parameters: {'n_estimators': 330, 'max_depth': 18, 'min_samples_split': 9, 'min_samples_leaf': 1, 'bootstrap': False}. Best is trial 1 with value: 0.7034883720930233.
[I 2025-02-28 12:01:50,779] Trial 3 finished with value: 0.7093023255813954 and parameters: {'n_estimators': 120, 'max_depth': 29, 'min_samples_split': 2, 'min_samples_leaf': 2, 'bootstrap': True}. Best is trial 3 with value: 0.7093023255813954.
[I 2025-02-28 


🎯 Best Hyperparameters for Static-Based Features:
{'n_estimators': 480, 'max_depth': 23, 'min_samples_split': 5, 'min_samples_leaf': 5, 'bootstrap': True}


[I 2025-02-28 12:02:11,099] A new study created in memory with name: no-name-8c9dbd7a-3a0a-4852-8407-5a17c158451d



📊 Results for Static-Based Features:
Accuracy: 0.7326
Precision: 0.7253
Recall: 0.7586
F1 Score: 0.7416
AUC-ROC Score: 0.7328


[I 2025-02-28 12:02:11,701] Trial 0 finished with value: 0.7209302325581395 and parameters: {'n_estimators': 315, 'max_depth': 12, 'min_samples_split': 4, 'min_samples_leaf': 4, 'bootstrap': True}. Best is trial 0 with value: 0.7209302325581395.
[I 2025-02-28 12:02:12,364] Trial 1 finished with value: 0.7209302325581395 and parameters: {'n_estimators': 339, 'max_depth': 28, 'min_samples_split': 6, 'min_samples_leaf': 2, 'bootstrap': True}. Best is trial 0 with value: 0.7209302325581395.
[I 2025-02-28 12:02:12,658] Trial 2 finished with value: 0.7267441860465116 and parameters: {'n_estimators': 153, 'max_depth': 16, 'min_samples_split': 7, 'min_samples_leaf': 3, 'bootstrap': True}. Best is trial 2 with value: 0.7267441860465116.
[I 2025-02-28 12:02:12,825] Trial 3 finished with value: 0.7325581395348837 and parameters: {'n_estimators': 87, 'max_depth': 25, 'min_samples_split': 10, 'min_samples_leaf': 1, 'bootstrap': False}. Best is trial 3 with value: 0.7325581395348837.
[I 2025-02-28 1


🎯 Best Hyperparameters for Combined Features:
{'n_estimators': 245, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 2, 'bootstrap': True}

📊 Results for Combined Features:
Accuracy: 0.7442
Precision: 0.7529
Recall: 0.7356
F1 Score: 0.7442
AUC-ROC Score: 0.7719


#SVM

In [None]:


# Define the Optuna optimization function for SVM
def objective_svm(trial, X_train, X_test, y_train, y_test):
    params = {
        "C": trial.suggest_loguniform("C", 0.01, 100),
        "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
        "kernel": trial.suggest_categorical("kernel", ["rbf", "poly", "sigmoid"])
    }

    model = SVC(**params, probability=True, random_state=42)

    # Scale data for SVM
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)

    model.fit(X_train_scaled, y_train)

    y_pred = model.predict(X_test_scaled)
    accuracy = accuracy_score(y_test, y_pred)

    return accuracy  # Optimize for accuracy

# Function to train and evaluate an SVM model with Optuna tuning
def train_and_evaluate_svm(X, y, dataset_name):
    # Split into train/test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Run Optuna optimization
    study = optuna.create_study(direction="maximize")
    study.optimize(lambda trial: objective_svm(trial, X_train, X_test, y_train, y_test), n_trials=30)  # 30 trials

    # Get best hyperparameters
    best_params = study.best_params
    print(f"\n🎯 Best Hyperparameters for {dataset_name} (SVM):")
    print(best_params)

    # Train the best model
    best_model = SVC(**best_params, probability=True, random_state=42)

    # Scale data for SVM
    scaler = StandardScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    best_model.fit(X_train, y_train)

    # Predict on test data
    y_pred = best_model.predict(X_test)
    y_pred_proba = best_model.predict_proba(X_test)[:, 1]

    # Evaluate Performance
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc_roc = roc_auc_score(y_test, y_pred_proba)
    conf_matrix = confusion_matrix(y_test, y_pred)

    # Print Evaluation Metrics
    print(f"\n📊 Results for {dataset_name} (SVM):")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    print(f"AUC-ROC Score: {auc_roc:.4f}")


# Train and evaluate for each dataset using Optuna-optimized SVM
train_and_evaluate_svm(X_patterns_only, y, "Pattern-Based Features")
train_and_evaluate_svm(X_static_only, y, "Static-Based Features")
train_and_evaluate_svm(X_combined, y, "Combined Features")


[I 2025-02-28 12:02:59,424] A new study created in memory with name: no-name-76c8556f-2f20-4c95-9ecb-f4f455d738db
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:02:59,464] Trial 0 finished with value: 0.6046511627906976 and parameters: {'C': 0.37924969564945643, 'gamma': 0.04557303930487806, 'kernel': 'rbf'}. Best is trial 0 with value: 0.6046511627906976.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:02:59,505] Trial 1 finished with value: 0.622093023255814 and parameters: {'C': 3.065108913458006, 'gamma': 0.09547334951603054, 'kernel': 'sigmoid'}. Best is trial 1 with value: 0.622093023255814.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:02:59,690] Trial 2 finished with value: 0.563953488372093 and parameters: {'C': 7.865692199229727, 'gamma': 0.20754453524


🎯 Best Hyperparameters for Pattern-Based Features (SVM):
{'C': 3.065108913458006, 'gamma': 0.09547334951603054, 'kernel': 'sigmoid'}

📊 Results for Pattern-Based Features (SVM):
Accuracy: 0.6221
Precision: 0.8667
Recall: 0.2989
F1 Score: 0.4444
AUC-ROC Score: 0.6298


[I 2025-02-28 12:03:02,585] Trial 2 finished with value: 0.5581395348837209 and parameters: {'C': 2.475144866467341, 'gamma': 0.0054993423160588, 'kernel': 'poly'}. Best is trial 1 with value: 0.6162790697674418.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:03:02,638] Trial 3 finished with value: 0.5988372093023255 and parameters: {'C': 6.380105167422281, 'gamma': 0.06465683095190972, 'kernel': 'rbf'}. Best is trial 1 with value: 0.6162790697674418.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:03:02,680] Trial 4 finished with value: 0.5465116279069767 and parameters: {'C': 3.0235156731825015, 'gamma': 0.03212965286423916, 'kernel': 'poly'}. Best is trial 1 with value: 0.6162790697674418.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:03:02,720] Trial 5 finis


🎯 Best Hyperparameters for Static-Based Features (SVM):
{'C': 80.1973297325371, 'gamma': 0.00035370822216969507, 'kernel': 'rbf'}

📊 Results for Static-Based Features (SVM):
Accuracy: 0.6628
Precision: 0.6986
Recall: 0.5862
F1 Score: 0.6375
AUC-ROC Score: 0.7091


[I 2025-02-28 12:03:04,492] Trial 2 finished with value: 0.5930232558139535 and parameters: {'C': 0.8515174642298406, 'gamma': 0.5654498244752544, 'kernel': 'sigmoid'}. Best is trial 2 with value: 0.5930232558139535.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:03:04,540] Trial 3 finished with value: 0.6511627906976745 and parameters: {'C': 27.924827352100397, 'gamma': 0.1191769396078128, 'kernel': 'sigmoid'}. Best is trial 3 with value: 0.6511627906976745.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:03:04,602] Trial 4 finished with value: 0.5872093023255814 and parameters: {'C': 3.615653317359488, 'gamma': 0.02700751372671847, 'kernel': 'poly'}. Best is trial 3 with value: 0.6511627906976745.
  "C": trial.suggest_loguniform("C", 0.01, 100),
  "gamma": trial.suggest_loguniform("gamma", 0.0001, 1),
[I 2025-02-28 12:03:04,646] Trial 


🎯 Best Hyperparameters for Combined Features (SVM):
{'C': 0.7041024605402495, 'gamma': 0.022309404078590736, 'kernel': 'rbf'}

📊 Results for Combined Features (SVM):
Accuracy: 0.6977
Precision: 0.7465
Recall: 0.6092
F1 Score: 0.6709
AUC-ROC Score: 0.7588


#XGBoost

In [None]:
def objective_xgb(trial, X_train, X_test, y_train, y_test):
    params = {
        "n_estimators": trial.suggest_int("n_estimators", 50, 500),
        "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "subsample": trial.suggest_float("subsample", 0.6, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.6, 1.0),
        "gamma": trial.suggest_float("gamma", 0.0, 5.0),
        "reg_alpha": trial.suggest_float("reg_alpha", 0.0, 5.0),
        "reg_lambda": trial.suggest_float("reg_lambda", 0.0, 5.0)
    }

    model = XGBClassifier(**params, use_label_encoder=False, eval_metric="logloss")

    model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    return accuracy  # Optimize for accuracy

# Function to train and evaluate an XGBoost model with Optuna tuning
def train_and_evaluate_xgb_optuna(X, y, dataset_name):
    # Split into train/test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Handle class imbalance with SMOTE
    smote = SMOTE(random_state=42)
    X_train, y_train = smote.fit_resample(X_train, y_train)

    # Scale features (helps in gradient boosting)
    scaler = MinMaxScaler()
    X_train = scaler.fit_transform(X_train)
    X_test = scaler.transform(X_test)

    # Run Optuna optimization
    study = optuna.create_study(direction="maximize")
    study.optimize(lambda trial: objective_xgb(trial, X_train, X_test, y_train, y_test), n_trials=30)  # 30 trials

    # Get best hyperparameters
    best_params = study.best_params
    print(f"\n🎯 Best Hyperparameters for {dataset_name} (XGBoost):")
    print(best_params)

    # Train the best model
    best_model = XGBClassifier(**best_params, use_label_encoder=False, eval_metric="logloss")

    best_model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)

    # Predict on test data
    y_pred = best_model.predict(X_test)
    y_pred_proba = best_model.predict_proba(X_test)[:, 1]

    # Evaluate Performance
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    auc_roc = roc_auc_score(y_test, y_pred_proba)
    conf_matrix = confusion_matrix(y_test, y_pred)

    # Print Evaluation Metrics
    print(f"\n📊 Results for {dataset_name} (XGBoost):")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1:.4f}")
    print(f"AUC-ROC Score: {auc_roc:.4f}")


# Train and evaluate for each dataset using Optuna-optimized XGBoost
train_and_evaluate_xgb_optuna(X_patterns_only, y, "Pattern-Based Features")
train_and_evaluate_xgb_optuna(X_static_only, y, "Static-Based Features")
train_and_evaluate_xgb_optuna(X_combined, y, "Combined Features")

[I 2025-02-28 12:03:10,020] A new study created in memory with name: no-name-90074808-1c94-46a8-89f8-ef922bb5b479
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
Parameters: { "use_label_encoder" } are not used.

[I 2025-02-28 12:03:10,196] Trial 0 finished with value: 0.622093023255814 and parameters: {'n_estimators': 498, 'learning_rate': 0.014260182390667875, 'max_depth': 3, 'subsample': 0.9486479449477812, 'colsample_bytree': 0.9772317081645949, 'gamma': 0.8004700890194338, 'reg_alpha': 0.41492145645769307, 'reg_lambda': 4.329739881069169}. Best is trial 0 with value: 0.622093023255814.
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
Parameters: { "use_label_encoder" } are not used.

[I 2025-02-28 12:03:10,270] Trial 1 finished with value: 0.6162790697674418 and parameters: {'n_estimators': 262, 'learning_rate': 0.05089899849170449, 'max_depth': 8, 'subsample': 0.9218504261405696, 'colsample_bytree': 0.662012833531932, 'gamma': 2.79


🎯 Best Hyperparameters for Pattern-Based Features (XGBoost):
{'n_estimators': 498, 'learning_rate': 0.014260182390667875, 'max_depth': 3, 'subsample': 0.9486479449477812, 'colsample_bytree': 0.9772317081645949, 'gamma': 0.8004700890194338, 'reg_alpha': 0.41492145645769307, 'reg_lambda': 4.329739881069169}

📊 Results for Pattern-Based Features (XGBoost):
Accuracy: 0.6221
Precision: 0.8235
Recall: 0.3218
F1 Score: 0.4628
AUC-ROC Score: 0.6479


[I 2025-02-28 12:03:16,250] Trial 0 finished with value: 0.686046511627907 and parameters: {'n_estimators': 381, 'learning_rate': 0.05572051553982514, 'max_depth': 9, 'subsample': 0.639832582209734, 'colsample_bytree': 0.6860485285965353, 'gamma': 1.7380293006151892, 'reg_alpha': 0.3735247301279465, 'reg_lambda': 4.770304833986841}. Best is trial 0 with value: 0.686046511627907.
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
Parameters: { "use_label_encoder" } are not used.

[I 2025-02-28 12:03:16,381] Trial 1 finished with value: 0.6686046511627907 and parameters: {'n_estimators': 409, 'learning_rate': 0.06469697858595302, 'max_depth': 9, 'subsample': 0.6937001509614078, 'colsample_bytree': 0.6936900552740822, 'gamma': 2.174181443031552, 'reg_alpha': 3.70253493054356, 'reg_lambda': 3.4034046925131767}. Best is trial 0 with value: 0.686046511627907.
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
Parameters: { "use_label_encoder" } are


🎯 Best Hyperparameters for Static-Based Features (XGBoost):
{'n_estimators': 344, 'learning_rate': 0.04458405249641259, 'max_depth': 4, 'subsample': 0.9949593168262669, 'colsample_bytree': 0.6259660623810614, 'gamma': 1.150557536144719, 'reg_alpha': 3.122113786834188, 'reg_lambda': 2.711701757443695}

📊 Results for Static-Based Features (XGBoost):
Accuracy: 0.7151
Precision: 0.7065
Recall: 0.7471
F1 Score: 0.7263
AUC-ROC Score: 0.7153


[I 2025-02-28 12:03:22,743] Trial 0 finished with value: 0.7209302325581395 and parameters: {'n_estimators': 457, 'learning_rate': 0.1432211825338271, 'max_depth': 10, 'subsample': 0.9349227334035096, 'colsample_bytree': 0.7764375794595618, 'gamma': 4.161663343279118, 'reg_alpha': 0.02644737958150012, 'reg_lambda': 1.8007035895722279}. Best is trial 0 with value: 0.7209302325581395.
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
Parameters: { "use_label_encoder" } are not used.

[I 2025-02-28 12:03:22,930] Trial 1 finished with value: 0.6976744186046512 and parameters: {'n_estimators': 458, 'learning_rate': 0.18082319043422618, 'max_depth': 7, 'subsample': 0.80763041356616, 'colsample_bytree': 0.8365478439635642, 'gamma': 0.4185365956474252, 'reg_alpha': 1.5335121719839035, 'reg_lambda': 3.4324994984125006}. Best is trial 0 with value: 0.7209302325581395.
  "learning_rate": trial.suggest_loguniform("learning_rate", 0.01, 0.3),
Parameters: { "use_label_encoder"


🎯 Best Hyperparameters for Combined Features (XGBoost):
{'n_estimators': 172, 'learning_rate': 0.11439923081293461, 'max_depth': 5, 'subsample': 0.7319657988257261, 'colsample_bytree': 0.7085818986362691, 'gamma': 3.7086730071108662, 'reg_alpha': 3.563113576784189, 'reg_lambda': 1.1069946779850757}

📊 Results for Combined Features (XGBoost):
Accuracy: 0.7616
Precision: 0.7614
Recall: 0.7701
F1 Score: 0.7657
AUC-ROC Score: 0.7823
