<a href="https://colab.research.google.com/github/fabriziobasso/Colab_backup/blob/main/File_02_Calories_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PREDICTING CALOORIES BURNED

## RMSLE Metric and Competition Context

This notebook is developed for a data science competition focused on predicting **Calories burned** during exercise. The evaluation metric for this competition is the **Root Mean Squared Logarithmic Error (RMSLE)**, which measures the square root of the mean squared difference between the logarithms of predicted and actual values. RMSLE is ideal for datasets with a wide range of target values, as it emphasizes **relative errors**, ensuring balanced performance across small and large calorie values.

The RMSLE formula is:

![](https://miro.medium.com/v2/resize:fit:720/format:webp/0*AUzyQ1rc6mpQVYfn)

### Why RMSLE?
- **Handles Wide Ranges**: RMSLE penalizes relative errors proportionally, making it robust for calorie values ranging from small (e.g., 10 calories) to large (e.g., 1000 calories).
- **Balanced Evaluation**: Ensures models perform well across the entire spectrum of calorie burn.
- **Competition Goal**: A lower RMSLE score indicates a precise and generalizable model, critical for ranking high on the leaderboard.

---

## Potential Effects of Features on Calorie Burn

The dataset includes the following features to predict calorie burn: **Sex**, **Age**, **Height**, **Weight**, **Duration**, **Heart_Rate**, and **Body_Temp**. Below, we explore how each feature might influence calorie burn:

### 1. Sex
- **Impact**: Differences in metabolic rates and muscle mass between males and females affect calorie burn. Males often have higher muscle mass, leading to greater calorie expenditure for the same exercise.
- **Example**: A male running at the same pace and duration as a female may burn more calories due to higher energy demands.

### 2. Age
- **Impact**: Basal metabolic rate (BMR) decreases with age, reducing calorie burn in older individuals due to lower metabolic rates and muscle mass (sarcopenia).
- **Example**: A 20-year-old may burn more calories than a 50-year-old during identical workouts.

### 3. Height
- **Impact**: Taller individuals have more body mass or muscle, requiring more energy for movement, thus burning more calories. Height’s effect is often linked to weight and exercise intensity.
- **Example**: A taller person may expend more energy covering the same distance.

### 4. Weight
- **Impact**: Heavier individuals burn more calories due to the energy required to move greater body mass. Body composition (fat vs. muscle) also influences calorie burn.
- **Example**: A 90 kg individual burns more calories walking the same distance as a 60 kg individual.

### 5. Duration
- **Impact**: Longer exercise sessions directly increase total calorie expenditure, though intensity and exercise type also matter.
- **Example**: Running for 30 minutes burns more calories than running for 15 minutes.

### 6. Heart_Rate
- **Impact**: Higher heart rates indicate greater exercise intensity and metabolic effort, leading to increased calorie burn. Fitness levels can modulate heart rate responses.
- **Example**: High heart rate during a HIIT workout correlates with higher calorie burn.

### 7. Body_Temp
- **Impact**: Rising body temperature during exercise reflects increased metabolic activity and thermoregulation, potentially increasing calorie burn. Environmental factors (e.g., heat) also play a role.
- **Example**: Exercising in a hot environment may increase calorie expenditure due to thermoregulation.

---

## Transition to Analysis

Understanding the relationships between these features and calorie burn is key to building a predictive model. In this notebook, we will:

1. **Explore Data**: Analyze the distribution of the target variable (**Calories**) and features using visualizations (e.g., histograms, boxplots).
2. **Correlation Analysis**: Identify relationships between features and the target using correlation matrices and polar plots.
3. **Outlier Detection**: Address anomalies that could skew model performance.
4. **Feature Engineering**: Apply techniques like quantile and equal-width binning to enhance model input.
5. **Model Development**: Build and evaluate models to minimize RMSLE, aligning with competition objectives.

### Visualization Strategy
We will use:
- **Histograms** and **boxplots** to examine feature distributions.
- **Correlation matrices** to uncover feature relationships.
- **Polar plots** for creative visualization of feature impacts.
- **Pair plots** to explore pairwise relationships.

By systematically analyzing the data, we aim to develop a robust model that accurately predicts calorie burn and excels in the competition.

---

# 0.0 Setting

## 0.1 Import Libraries:

In [None]:
# !pip uninstall scikit-learn
# !pip install scikit-learn==1.4

In [None]:
%%capture
#!pip install -qq pytorch_tabnet
!pip install optuna
!pip install --upgrade catboost
#!pip install optuna-integration-pytorch-tabnet

#from pytorch_tabnet.tab_model import TabNetRegressor

!pip install --upgrade category-encoders
!pip install optuna-integration
!pip install colorama
#!pip install pyfiglet
#!pip install keras-tuner --upgrade
#!pip install keras-nlp
#!pip install BorutaShap
#!pip install scikit-learn==1.2.2
#!pip install scikit-lego
!pip install skops

In [None]:
import sklearn
import lightgbm, xgboost, catboost
sklearn.__version__, lightgbm.__version__, xgboost.__version__, catboost.__version__

In [None]:
# Setup notebook
from pathlib import Path
import ipywidgets as widgets
import pandas as pd
import numpy as np
from pickle import load, dump
import json
import joblib
#from joblib import dump, load
#import calplot as cal

# Graphic Libraries:
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.image as mpimg
from termcolor import colored
# Set Style
sns.set_style("whitegrid",{"grid.linestyle":"--", 'grid.linewidth':0.2, 'grid.alpha':0.5});
sns.despine(left=True, bottom=True, top=False, right=False);
mpl.rcParams['figure.dpi'] = 120;
mpl.rc('axes', labelsize=12);
plt.rc('xtick',labelsize=10);
plt.rc('ytick',labelsize=10);

mpl.rcParams['axes.spines.top'] = False;
mpl.rcParams['axes.spines.right'] = False;
mpl.rcParams['axes.spines.left'] = True;

# Palette Setup
colors = ['#FB5B68','#FFEB48','#2676A1','#FFBDB0',]
colormap_0 = mpl.colors.LinearSegmentedColormap.from_list("",colors)
palette_1 = sns.color_palette("coolwarm", as_cmap=True)
palette_2 = sns.color_palette("YlOrBr", as_cmap=True)
palette_3 = sns.light_palette("red", as_cmap=True)
palette_4 = sns.color_palette("viridis", as_cmap=True)
palette_5 = sns.color_palette("rocket", as_cmap=True)
palette_6 = sns.color_palette("GnBu", as_cmap=True)
palette_7 = sns.color_palette("tab20c", as_cmap=False)
palette_8 = sns.color_palette("Set2", as_cmap=False)

palette_custom = ['#fbb4ae','#b3cde3','#ccebc5','#decbe4','#fed9a6','#ffffcc','#e5d8bd','#fddaec','#f2f2f2']
palette_9 = sns.color_palette(palette_custom, as_cmap=False)

# tool for Excel:
from openpyxl import load_workbook, Workbook
from openpyxl.drawing.image import Image
from openpyxl.styles import Border, Side, PatternFill, Font, GradientFill, Alignment
from openpyxl.worksheet.cell_range import CellRange

from openpyxl.formatting import Rule
from openpyxl.styles import Font, PatternFill, Border
from openpyxl.styles.differential import DifferentialStyle

# Bloomberg
#from xbbg import blp
from catboost import CatBoostRegressor, Pool, CatBoostClassifier
import xgboost as xgb
from xgboost import XGBRegressor, XGBClassifier
from xgboost.callback import EarlyStopping

import lightgbm as lgb
from lightgbm import (LGBMRegressor,
                      LGBMClassifier,
                      early_stopping,
                      record_evaluation,
                      log_evaluation)

# Time Management
from tqdm import tqdm
from datetime import date
from datetime import datetime
from pandas.tseries.offsets import BMonthEnd, QuarterEnd
import datetime
from pandas.tseries.offsets import BDay # BDay is business day, not birthday...
import datetime as dt
import click
import glob
import os
import gc
import re
import string

from ipywidgets import AppLayout
from ipywidgets import Dropdown, Layout, HTML, AppLayout, VBox, Label, HBox, BoundedFloatText, interact, Output

#from my_func import *

import optuna
from optuna.integration import TFKerasPruningCallback
from optuna.trial import TrialState
from optuna.visualization import plot_intermediate_values
from optuna.visualization import plot_optimization_history
from optuna.visualization import plot_param_importances
from optuna.visualization import plot_contour

os.environ["KERAS_BACKEND"] = "tensorflow"

import numpy as np
import tensorflow as tf
import keras
from tensorflow.keras import backend as K

from keras import ops
from keras import layers
from keras import activations

from keras.layers import Input, LSTM, Dense, Lambda, RepeatVector, Reshape
from keras.models import Model
from keras.losses import MeanSquaredError
from keras.metrics import RootMeanSquaredError

from keras.utils import FeatureSpace, plot_model

# Import libraries for Hypertuning
#import keras_tuner as kt
#from keras_tuner.tuners import RandomSearch, GridSearch, BayesianOptimization

#from my_func import *

# preprocessing modules
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold, RepeatedKFold, cross_val_score, cross_validate, GroupKFold, GridSearchCV, RepeatedStratifiedKFold, cross_val_predict
from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer

from sklearn.preprocessing import (LabelEncoder,
                                   StandardScaler,
                                   MinMaxScaler,
                                   OrdinalEncoder,
                                   RobustScaler,
                                   PowerTransformer,
                                   OneHotEncoder,
                                   QuantileTransformer,
                                   PolynomialFeatures,
                                   FunctionTransformer)

# metrics
import sklearn
#import skops.io as sio
from sklearn.metrics import (mean_squared_error,
                             root_mean_squared_error,
                             root_mean_squared_log_error,
                             r2_score,
                             mean_absolute_error,
                             mean_absolute_percentage_error,
                             classification_report,
                             confusion_matrix,
                             ConfusionMatrixDisplay,
                             multilabel_confusion_matrix,
                             accuracy_score,
                             roc_auc_score,
                             auc,
                             roc_curve,
                             log_loss,
                             make_scorer)
# modeling algos
from sklearn.linear_model import (LogisticRegression,
                                  Lasso,
                                  ridge_regression,
                                  LinearRegression,
                                  Ridge,
                                  RidgeCV,
                                  ElasticNet,
                                  BayesianRidge,
                                  HuberRegressor,
                                  TweedieRegressor,
                                  QuantileRegressor,
                                  ARDRegression,
                                  TheilSenRegressor,
                                  PoissonRegressor,
                                  GammaRegressor)

from sklearn.ensemble import (AdaBoostRegressor,
                              AdaBoostClassifier,
                              RandomForestRegressor,
                              RandomForestClassifier,
                              VotingRegressor,
                              GradientBoostingRegressor,
                              GradientBoostingClassifier,
                              StackingRegressor,
                              StackingClassifier,
                              HistGradientBoostingClassifier,
                              HistGradientBoostingRegressor,
                              ExtraTreesClassifier)

from sklearn.decomposition import PCA, TruncatedSVD
from sklearn.base import clone
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.feature_selection import SelectFromModel
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from category_encoders import TargetEncoder, CatBoostEncoder, LeaveOneOutEncoder, OrdinalEncoder, CountEncoder

from yellowbrick.cluster import KElbowVisualizer

import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
%matplotlib inline

from sklearn.linear_model import LinearRegression
import numpy as np
import seaborn as sns
from statsmodels.tsa.deterministic import CalendarFourier, DeterministicProcess

from sklearn.multioutput import RegressorChain

import itertools
import warnings
from openpyxl import load_workbook

from lightgbm import LGBMRegressor
from xgboost import XGBRegressor
from catboost import CatBoostRegressor

import statsmodels.api as sm
from pylab import rcParams
import scipy.stats as ss

#from category_encoders.cat_boost import CatBoostEncoder
#from category_encoders.wrapper import PolynomialWrapper
#from category_encoders.count import CountEncoder
#from category_encoders import TargetEncoder

import skops.io as sio

warnings.filterwarnings('ignore')
#import pyfiglet
#plt.style.use('fivethirtyeight')

**Formatting and Settings:**

In [None]:
sns.set({"axes.facecolor"       : "#ffffff",
         "figure.facecolor"     : "#ffffff",
         "axes.edgecolor"       : "#000000",
         "grid.color"           : "#ffffff",
         "font.family"          : ['Cambria'],
         "axes.labelcolor"      : "#000000",
         "xtick.color"          : "#000000",
         "ytick.color"          : "#000000",
         "grid.linewidth"       : 0.5,
         'grid.alpha'           :0.5,
         "grid.linestyle"       : "--",
         "axes.titlecolor"      : 'black',
         'axes.titlesize'       : 12,
#         'axes.labelweight'     : "bold",
         'legend.fontsize'      : 7.0,
         'legend.title_fontsize': 7.0,
         'font.size'            : 7.5,
         'xtick.labelsize'      : 7.5,
         'ytick.labelsize'      : 7.5,
        });

sns.set_style("whitegrid",{"grid.linestyle":"--", 'grid.linewidth':0.2, 'grid.alpha':0.5})
# Set Style
mpl.rcParams['figure.dpi'] = 120;

# import font colors
from colorama import Fore, Style, init

# Making sklearn pipeline outputs as dataframe:-
pd.set_option('display.max_columns', 100);
pd.set_option('display.max_rows', 50);

sns.despine(left=True, bottom=True, top=False, right=False)

mpl.rcParams['axes.spines.left'] = True
mpl.rcParams['axes.spines.right'] = False
mpl.rcParams['axes.spines.top'] = False
mpl.rcParams['axes.spines.bottom'] = True

init(autoreset=True)

In [None]:
from tqdm import tqdm
from itertools import product

import numpy as np
import pandas as pd
import gc
import matplotlib.pyplot as plt
import seaborn as sns

from lightgbm import LGBMRegressor
from xgboost import XGBRegressor
from catboost import CatBoostRegressor

from sklearn.model_selection import GroupKFold
from sklearn.impute import SimpleImputer
import torch

import warnings
warnings.filterwarnings("ignore")

# Connect to Colab:#
from google.colab import drive
import os
drive.mount('/content/drive')

## 0.2 Functions:

* **Plotting Functiss**

In [None]:
def plot_scatter(df, x="feat1", y="feat2", color_feature=None, cmap='viridis'):
    """
    Generates a scatter plot with points colored based on a third feature.

    Args:
        df: Pandas DataFrame containing the data.
        x: Name of the column to use for the x-axis.
        y: Name of the column to use for the y-axis.
        color_feature: Name of the column to use for coloring the points.
                       If None, points will be a single color.
        cmap: Colormap to use for coloring the points (e.g., 'viridis', 'plasma', 'magma', 'inferno', 'cividis').
              See matplotlib documentation for available colormaps.
    """

    plt.figure(figsize=(8, 5))

    if color_feature is not None:
        # Ensure the color feature exists
        if color_feature not in df.columns:
            raise ValueError(f"Color feature '{color_feature}' not found in DataFrame.")

        # Scatter plot with colors
        scatter = plt.scatter(df[x], df[y], c=df[color_feature], cmap=cmap)

        # Add a colorbar
        cbar = plt.colorbar(scatter)
        cbar.set_label(color_feature)  # Label the colorbar

    else:
        # Simple scatter plot (single color)
        plt.scatter(df[x], df[y],color="royalblue",alpha=0.6)

    plt.xlabel(x)
    plt.ylabel(y)
    plt.title("Scatter Plot")  # Add a title for better visualization
    plt.show()

* **Dataset Management Functions**:

In [None]:
class Config:

    state = 42
    n_splits = 10
    early_stop = 200

    target = 'Calories'
    train = pd.read_csv('/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/train.csv')
    test = pd.read_csv('/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/test.csv')
    submission = pd.read_csv( "/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/sample_submission.csv")
    #train_org = pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/original.csv")

    original_data = 'N'
    outliers = 'N'
    log_trf = 'Y'
    scaler_trf = 'Y'
    feature_eng = 'N'
    missing = 'Y'
    sqrt_normalization="Y"
    impose_normalization="N"
    trg_enc = "N"
    problem = "Regression"
    metric_goal="LRMSE"
    direction_="minimize"
    log_trans_cols = ["Body_Temp"]
    sqrt_norm_cols = ["Age"]
    impose_norm_cols = []
    trg_enc_feat = []

class Preprocessing():

    def __init__(self):
        self.train = Config.train
        self.test = Config.test
        self.targets = Config.target

        self.prp_data()

    def prp_data(self):

        if Config.original_data == 'Y':
            self.train = pd.concat([self.train, Config.train_org], ignore_index=True).drop_duplicates(ignore_index=True)

        self.train = self.train.drop(['id'], axis=1)
        self.test = self.test.drop(['id'], axis=1)

        self.cat_features = self.train.drop(self.targets, axis=1).select_dtypes(include=['object', 'bool']).columns.tolist()
        self.num_features = self.train.drop(self.targets, axis=1).select_dtypes(exclude=['object', 'bool']).columns.tolist()

        self.train = self.reduce_mem(self.train)
        self.test = self.reduce_mem(self.test)
        return self

    def reduce_mem(self, df):

        numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64', "uint16", "uint32", "uint64"]

        for col in df.columns:
            col_type = df[col].dtypes

            if col_type in numerics:
                c_min = df[col].min()
                c_max = df[col].max()

                if "int" in str(col_type):
                    if c_min >= np.iinfo(np.int8).min and c_max < np.iinfo(np.int8).max:
                        df[col] = df[col].astype(np.int32)
                    elif c_min >= np.iinfo(np.int16).min and c_max < np.iinfo(np.int16).max:
                        df[col] = df[col].astype(np.int32)
                    elif c_min >= np.iinfo(np.int32).min and c_max < np.iinfo(np.int32).max:
                        df[col] = df[col].astype(np.int32)
                    elif c_min >= np.iinfo(np.int64).min and c_max < np.iinfo(np.int64).max:
                        df[col] = df[col].astype(np.int64)
                else:
                    if c_min >= np.finfo(np.float16).min and c_max < np.finfo(np.float16).max:
                        df[col] = df[col].astype(np.float32)
                    if c_min >= np.finfo(np.float32).min and c_max < np.finfo(np.float32).max:
                        df[col] = df[col].astype(np.float32)
                    else:
                        df[col] = df[col].astype(np.float64)

        return df

class EDA(Config, Preprocessing):

    def __init__(self):
        super().__init__()

        self.data_info()
        self.heatmap()
        self.dist_plots()
        self.cat_feature_plots()
        if Config.problem == 'Classification':
          self.target_pie()
        else:
          self.target_dist()

    def data_info(self):

        for data, label in zip([self.train, self.test], ['Train', 'Test']):
            table_style = [{'selector': 'th:not(.index_name)',
                            'props': [('background-color', 'slategrey'),
                                      ('color', '#FFFFFF'),
                                      ('font-weight', 'bold'),
                                      ('border', '1px solid #DCDCDC'),
                                      ('text-align', 'center')]
                            },
                            {'selector': 'tbody td',
                             'props': [('border', '1px solid #DCDCDC'),
                                       ('font-weight', 'normal')]
                            }]
            print(Style.BRIGHT+Fore.RED+f'\n{label} head\n')
            display(data.head().style.set_table_styles(table_style))

            print(Style.BRIGHT+Fore.RED+f'\n{label} info\n'+Style.RESET_ALL)
            display(data.info())

            print(Style.BRIGHT+Fore.RED+f'\n{label} describe\n')
            display(data.describe().drop(index='count', columns=self.targets, errors = 'ignore').T
                    .style.set_table_styles(table_style).format('{:.3f}'))

            print(Style.BRIGHT+Fore.RED+f'\n{label} missing values\n'+Style.RESET_ALL)
            display(data.isnull().sum())
        return self

    def heatmap(self):
        print(Style.BRIGHT+Fore.RED+f'\nCorrelation Heatmap\n')
        plt.figure(figsize=(7,7))
        corr = self.train.select_dtypes(exclude='object').corr(method='pearson')
        sns.heatmap(corr, fmt = '0.2f', cmap = 'Blues', annot=True, cbar=False)
        plt.show()

    def dist_plots(self):

        print(Style.BRIGHT+Fore.RED+f"\nDistribution analysis - Numerical\n")
        df = pd.concat([self.train[self.num_features].assign(Source = 'Train'),
                        self.test[self.num_features].assign(Source = 'Test'),],
                        axis=0, ignore_index = True)

        fig, axes = plt.subplots(len(self.num_features), 2 ,figsize = (13, len(self.num_features) * 4),
                                 gridspec_kw = {'hspace': 0.3,
                                                'wspace': 0.2,
                                                'width_ratios': [0.70, 0.30]
                                               }
                                )
        for i,col in enumerate(self.num_features):
            try:
                ax = axes[i,0]
            except:
                ax = axes[i]
            sns.kdeplot(data = df[[col, 'Source']], x = col, hue = 'Source',
                        palette = ['royalblue', 'tomato'], ax = ax, alpha=0.7, linewidth = 2
                       )
            ax.set(xlabel = '', ylabel = '')
            ax.set_title(f"\n{col}")
            ax.grid('--',alpha=0.7)

            try:
                ax = axes[i,1]
            except:
                ax = axes[1]
            sns.boxplot(data = df, y = col, x=df.Source, width = 0.5,
                        linewidth = 1, fliersize= 1,
                        ax = ax, palette=['royalblue', 'tomato']
                       )
            ax.set_title(f"\n{col}")
            ax.set(xlabel = '', ylabel = '')
            ax.tick_params(axis='both', which='major')
            ax.set_xticklabels(['Train', 'Test'])

        plt.tight_layout()
        plt.show()

    def cat_feature_plots(self):
        print(Style.BRIGHT+Fore.RED+f"\nDistribution analysis - Categorical\n")
        fig, axes = plt.subplots(len(self.cat_features), 2 ,figsize = (18, len(self.cat_features) * 6),
                                 gridspec_kw = {'hspace': 0.5,
                                                'wspace': 0.2,
                                               }
                                )

        for i, col in enumerate(self.cat_features):
            try:
                ax = axes[i,0]
            except:
                ax = axes[i]
            sns.barplot(data=self.train[col].value_counts().nlargest(10).reset_index(), x=col, y='count', ax=ax, color='royalblue', alpha=0.7)
            ax.set(xlabel = '', ylabel = '')
            ax.set_title(f"\n{col} Train")

            try:
                ax = axes[i,1]
            except:
                ax = axes[i+1]
            sns.barplot(data=self.test[col].value_counts().nlargest(10).reset_index(), x=col, y='count', ax=ax, color='tomato', alpha=0.7)
            ax.set(xlabel = '', ylabel = '')
            ax.set_title(f"\n{col} Test")

        plt.tight_layout()
        plt.show()

    def target_pie(self):
        print(Style.BRIGHT+Fore.RED+f"\nTarget feature distribution\n")
        targets = self.train[self.targets]
        plt.figure(figsize=(6, 6))
        plt.pie(targets.value_counts(), labels=targets.value_counts().index, autopct='%1.2f%%', colors=palette_9)
        plt.show()

    def target_dist(self):
        print(Style.BRIGHT+Fore.RED+f"\nTarget feature distribution\n")
        fig, axes = plt.subplots(1, 1, figsize=(7, 5))
        sns.histplot(self.train[self.targets], kde=True, ax=axes)
        axes.set_title('Distribution of Price')
        axes.set_xlabel(self.targets)
        axes.set_ylabel('Frequency')

# 1.0 EDA

## 1.1 Experiment Area:

In [None]:
class CFG:
    SEED    = 333
    CV      = KFold(n_splits=15, shuffle=True, random_state=SEED)
    VERSION = '1'

class Data:
    path       = False
    or_path    = ''
    to_drop    = False
    target     = 'Calories'
    drop_duplicates = False

    def __init__(self):
        self.train      = pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/df_train_01.csv",index_col=0).drop(columns=self.to_drop) if self.to_drop else pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/df_train_01.csv",index_col=0)
        self.test       = pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/df_test_01.csv",index_col=0).drop(columns=self.to_drop) if self.to_drop else pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/df_test_01.csv",index_col=0)
        self.submission = pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/sample_submission.csv",index_col=0)
        self.original   = pd.read_csv(self.or_path) if self.or_path else pd.DataFrame()

        self.train.loc[:,"BMI"] = np.clip(self.train.BMI, a_min=-5.0, a_max=5.0)
        self.test.loc[:,"BMI"] = np.clip(self.test.BMI, a_min=-5.0, a_max=5.0)

    @property
    def X(self):
        return self.train.drop(columns=self.target)
    @property
    def y(self):
        return self.train[[self.target]]
    @property
    def X_test(self):
        return self.test
    @property
    def X_original(self):
        if len(self.original) != 0:
            return self.original.drop(columns=self.target)
        return pd.DataFrame()
    @property
    def y_original(self):
        if len(self.original) != 0:
            return self.original[[self.target]]
        return pd.DataFrame()
    @property
    def cat_features(self):
        return self.X.select_dtypes(include=['category', 'bool', 'category','int']).columns.to_list()
    @property
    def num_features(self):
        return self.X.select_dtypes(exclude=['category', 'bool', 'category','int']).columns.to_list()

    def submit(self, sub: np.ndarray, desc: str):
        '''Submit the predictions in the adequate format'''
        self.submission[self.target] = sub
        self.submission.to_csv(f'SUB_{CFG.VERSION}_{desc}.csv', index=False)
        print(colored('Submission has been made.', color='green', attrs=['bold', 'dark']))

    @staticmethod
    def sep_line():
        print(colored(f'{"_____"*14}', color='black'))
        print('')

    @staticmethod
    def head(head_text):
        print(colored(f'{"    "} ➩ {head_text} ', color='green', attrs=['dark']))

    def display_data(self):
        self.head(f'𝐃𝐚𝐭𝐚𝐬𝐞𝐭 𝐬𝐡𝐚𝐩𝐞𝐬 — 𝐓𝐫𝐚𝐢𝐧 | 𝐓𝐞𝐬𝐭: {self.train.shape} | {self.test.shape}')
        self.sep_line()

        self.head('𝐓𝐫𝐚𝐢𝐧 𝐡𝐞𝐚𝐝')
        display(self.train.head(5))
        self.head('𝐓𝐞𝐬𝐭 𝐡𝐞𝐚𝐝')
        display(self.test.head(5))
        self.sep_line()

        self.head('𝐓𝐫𝐚𝐢𝐧 𝐢𝐧𝐟𝐨')
        display(self.train.info())
        self.head('𝐓𝐞𝐬𝐭 𝐢𝐧𝐟𝐨')
        display(self.test.info())
        self.sep_line()

        self.head('𝐓𝐫𝐚𝐢𝐧 𝐬𝐮𝐦𝐦𝐚𝐫𝐲 𝐬𝐭𝐚𝐭𝐬')
        display(self.train.describe().T)
        self.head('𝐓𝐞𝐬𝐭 𝐬𝐮𝐦𝐦𝐚𝐫𝐲 𝐬𝐭𝐚𝐭𝐬')
        display(self.test.describe().T)
        self.sep_line()

        def nunique_null(train, test):
            nunique_train, nunique_test = {}, {}
            nulls_train, nulls_test = {}, {}

            for col in test.columns:
                nunique_train[col], nunique_test[col] = train[col].nunique(), test[col].nunique()
                nulls_train[col], nulls_test[col] = train[col].isna().sum(), test[col].isna().sum()

            df = pd.DataFrame([nunique_train, nunique_test,
                               nulls_train, nulls_test],
                              index=['Train nunique', 'Test nunique',
                                     'Train null', 'Test null'])
            return df

        self.head('𝐍𝐮𝐧𝐢𝐪𝐮𝐞 𝐚𝐧𝐝 𝐧𝐮𝐥𝐥𝐬')
        display(nunique_null(self.train, self.test))
        self.sep_line()

        self.head('𝐃𝐮𝐩𝐥𝐢𝐜𝐚𝐭𝐞𝐬')
        display(f'Train duplicated: {self.train.duplicated().sum()}')
        display(f'Test duplicated: {self.test.duplicated().sum()}')

        if self.drop_duplicates==True:
          if self.train.duplicated().sum() > 0:
              self.train = self.train.drop_duplicates()
              print('Train duplicates dropped.')
          if self.test.duplicated().sum() > 0:
              #self.test = self.test.drop_duplicates()
              print('Test duplicates dropped.')
        self.sep_line()

        self.head('𝐍𝐮𝐧𝐢𝐪𝐮𝐞 𝐢𝐧 𝐭𝐫𝐚𝐢𝐧 𝐧𝐨𝐭 𝐢𝐧 𝐭𝐞𝐬𝐭/𝐢𝐧 𝐭𝐞𝐬𝐭 𝐧𝐨𝐭 𝐢𝐧 𝐭𝐫𝐚𝐢𝐧')
        cat_cols = [c for c in self.test.columns if self.train[c].nunique() <= 40 or
                    c in self.test.select_dtypes(include=['object', 'category']).columns]

        def compare_unique_categories(train, test, cat_cols):
            unique_train_dic, unique_test_dic = {}, {}

            for c in cat_cols:
                unique_train_c = train[c].unique()
                unique_test_c = test[c].unique()

                count_tr = sum(1 for cat in unique_train_c if cat not in unique_test_c and not pd.isna(cat))
                count_te = sum(1 for cat in unique_test_c if (cat not in unique_train_c and not pd.isna(cat)))

                unique_train_dic[c] = count_tr
                unique_test_dic[c] = count_te

            result_df = pd.DataFrame([unique_train_dic, unique_test_dic],
                                     index=['in train not in test', 'in test not in train'])

            return result_df

        display(compare_unique_categories(self.train, self.test, cat_cols))

data = Data()
data.display_data()

In [None]:
data.X.shape, data.y.shape, data.X_test.shape

In [None]:
print(data.cat_features)

In [None]:
#plot_scatter(pd.concat([data.X,data.y],axis=1), x="BMI", y="Intensity", color_feature="Calories")

In [None]:
data.X.info(),data.X_test.info()

In [None]:
data.X_test.shape, data.y.shape

y_test_fic = data.y[:len(data.X_test)].copy()
y_test_fic["Calories"]=np.nan

# 2.0 Neural Networks:


In [None]:
def dataframe_to_dataset(dataframe, target, categorical_features, numerical_features, shuffle=False, batch_size=32):
    dataframe = dataframe.copy()
    ds = tf.data.Dataset.from_tensor_slices(((dataframe[categorical_features].values,  # First input
                                              dataframe[numerical_features].values),
                                              target))

    if shuffle:
      ds = ds.shuffle(buffer_size=len(dataframe))

    ds = ds.batch(batch_size)
    ds = ds.prefetch(batch_size)

    return ds

def dataframe_to_dataset_test(dataframe, target_finc, categorical_features, numerical_features, batch_size=32):
    dataframe = dataframe.copy()
    ds = tf.data.Dataset.from_tensor_slices(((dataframe[categorical_features].values,  # First input
                                              dataframe[numerical_features].values),
                                              target_finc))

    ds = ds.batch(batch_size)
    ds = ds.prefetch(batch_size)

    return ds

In [None]:
class CFG:
    SEED    = 333
    CV      = KFold(n_splits=11, shuffle=True, random_state=SEED)
    VERSION = '1'

class TrainModels:
    def __init__(self, X, y, X_test, test_finc_target, X_original, y_original, model_, parameters):
        self.model     = model_
        self.parameters = parameters
        self.X          = X
        self.y          = y
        self.test_finc_target = test_finc_target
        self.X_test     = X_test
        self._OOF_train = pd.DataFrame()
        self._OOF_test  = pd.DataFrame()
        self.categorical_features = X.select_dtypes(include=['category', 'bool', 'category','int']).columns.to_list()
        self.numerical_features = X.select_dtypes(exclude=['category', 'bool', 'category','int']).columns.to_list()

    def fit_model(self, name="Base_model"):
        oof_train = np.zeros(self.X.shape[0])
        oof_test  = np.zeros(self.X_test.shape[0])
        scores_train = []
        scores_val   = []

        os.chdir('/content/drive/MyDrive/Exercises/Studies_Structured_Data/Models/S5E5/layers_3_staked_models')

        for fold, (train_idx, val_idx) in enumerate(CFG.CV.split(self.X, self.y)):
            x_train, y_train = self.X.iloc[train_idx], self.y.iloc[train_idx]
            x_val,   y_val   = self.X.iloc[val_idx],   self.y.iloc[val_idx]


            train_ds = dataframe_to_dataset(x_train, y_train, self.categorical_features, self.numerical_features, shuffle=True, batch_size=1024)
            val_ds = dataframe_to_dataset(x_val, y_val, self.categorical_features, self.numerical_features, shuffle=False, batch_size=1024)
            test_ds = dataframe_to_dataset_test(self.X_test, self.test_finc_target, self.categorical_features, self.numerical_features, batch_size=1024)

            model = self.model(**self.parameters)

            optimizer = keras.optimizers.Adam(learning_rate=5e-4)
            model.compile(optimizer=optimizer,
                          loss=[rmsle, keras.losses.MeanSquaredLogarithmicError(name="msle")],
                          metrics=[rmsle, keras.metrics.RootMeanSquaredError(name="msle")])

            checkpoint_filepath = '/tmp/ckpt/checkpoint.weights.h5'
            model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
                                                                        filepath=checkpoint_filepath,
                                                                        save_weights_only=True,
                                                                        monitor='val_rmsle',
                                                                        mode='min',
                                                                        save_best_only=True
                                                                        )

            # Fit the model
            history = model.fit(train_ds,
                                validation_data=val_ds,
                                epochs=151,
                                batch_size=1024,
                                callbacks=[keras.callbacks.ReduceLROnPlateau(patience=3, factor = 0.5, min_lr=1e-6),
                                          keras.callbacks.EarlyStopping(patience=21, restore_best_weights=True, monitor="val_rmsle",
                                                                          start_from_epoch=3, mode="min"),
                                            model_checkpoint_callback])

            model.load_weights(checkpoint_filepath)

            model.save(f"/content/drive/MyDrive/Exercises/Studies_Structured_Data/Models/S5E5/keras_models/{name}_{fold}.keras")

            model.evaluate(val_ds, verbose=0)

            plot_training_session(history)


            # Make predictions
            y_pred_train = model.predict(train_ds)
            y_pred_val   = model.predict(val_ds)
            y_pred_test  = model.predict(test_ds)

            # Correct Ranges:

            y_pred_train = np.maximum(y_pred_train, 1.0)
            y_pred_train = np.minimum(y_pred_train, 315.0)

            y_pred_val = np.maximum(y_pred_val, 1.0)
            y_pred_val = np.minimum(y_pred_val, 315.0)

            y_pred_test = np.maximum(y_pred_test, 1.0)
            y_pred_test = np.minimum(y_pred_test, 315.0)

            # Store Results
            oof_train[val_idx] = y_pred_val.reshape(-1)
            oof_test   += (y_pred_test/CFG.CV.get_n_splits()).reshape(-1)

            train_score = root_mean_squared_log_error(y_train, y_pred_train)
            val_score   = root_mean_squared_log_error(y_val, y_pred_val)

            print(f'Fold {fold+1} → Training set Score: {train_score:.5f} | Validation set Score: {val_score:.5f}')

            scores_train.append(train_score)
            scores_val.append(val_score)

        self._OOF_train[name] = oof_train
        self._OOF_test[name]  = oof_test

        os.chdir('/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5')

        print(colored(f'Overall → Training set Score: {np.mean(scores_train):.5f}±{np.std(scores_train):.7f} | Validation set Score: {np.mean(scores_val):.5f}±{np.std(scores_val):.7f}',
              color='green', attrs=['bold', 'dark']))

    @property
    def OOF_train(self):
        return self._OOF_train
    @property
    def OOF_test(self):
        return self._OOF_test

    def save_predictions(self):
        self._OOF_train.to_csv('OOF_train_many_models.csv', index=False)
        self._OOF_test.to_csv('OOF_test_many_models.csv', index=False)

In [None]:
def plot_training_session(history):
  # Plot training and validation loss scores
  # against the number of epochs.
  plt.figure(figsize=(8, 6))
  plt.plot(history.history['loss'], label='Train')
  plt.plot(history.history['val_loss'], label='Validation')
  plt.grid(linestyle='--')
  plt.ylabel('val_loss')
  plt.xlabel('Epoch')
  plt.title('Train-Validation Scores', pad=13)
  plt.legend(loc='upper right');
  plt.show()

def rmsle(y_true, y_pred):
    """
    Root Mean Squared Logarithmic Error (RMSLE)
    """
    # Ensure y_pred is non-negative and add a small constant to avoid log(0) errors
    y_pred = K.maximum(K.cast(y_pred, tf.float32), K.epsilon()) # Corrected: K.maximum

    first_log = K.log(K.maximum(K.cast(y_pred, tf.float32), K.epsilon()) + 1.) # Corrected: K.maximum
    second_log = K.log(K.maximum(K.cast(y_true, tf.float32), K.epsilon()) + 1.) # Corrected: K.maximum

    return K.sqrt(K.mean(K.square(first_log - second_log)))

# def rmsle(y_true, y_pred):
#     """
#     Root Mean Squared Logarithm Error
#     Args:
#         y_true ([np.array]): test samples
#         y_pred ([np.array]): predicted samples
#     Returns:
#         [float]: root mean squared logarithm error
#     """
#     first_log = K.log(K.clip(y_pred, K.epsilon(), None) + 1.)
#     second_log = K.log(K.clip(y_true, K.epsilon(), None) + 1.)
#     return K.sqrt(K.mean(K.square(first_log - second_log), axis=-1))

### **2.1.0 NeuralNetwork: Dense**

In [None]:
data.X.sample(3)

In [None]:
data.X.max(axis=0)

In [None]:
data.X.min(axis=0)

In [None]:
cat_features = data.cat_features
num_features = data.num_features

cat_features_card = [2,2,8]
cat_features_out = [2, 2, 4]

print(cat_features,cat_features_card)
print(num_features)

In [None]:
np.ceil(np.sqrt(cat_features_card[1]))

In [None]:
def build_model(units=512,last_layer = 1, activation="relu", do_rate=0.25, reg=0.001):

    x_input_cats = layers.Input(shape=(len(cat_features),))
    embs = []
    for j in range(len(cat_features)):
        e = layers.Embedding(cat_features_card[j], cat_features_out[j]) #np.ceil(np.sqrt(cat_features_card[1]))
        x = e(x_input_cats[:,j])
        x = layers.Flatten()(x)
        embs.append(x)

    x_input_nums = layers.Input(shape=(len(num_features),))

    x = layers.Concatenate(axis=-1)(embs+[x_input_nums])
    x = layers.Dense(units, activation=activation, kernel_regularizer=keras.regularizers.l2(reg))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(do_rate)(x)
    x = layers.Dense(units, activation=activation, kernel_regularizer=keras.regularizers.l2(reg))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(do_rate)(x)
    x = layers.Dense(int(units/last_layer), activation=activation, kernel_regularizer=keras.regularizers.l2(reg))(x)
    x = layers.BatchNormalization()(x)
    x = layers.Dropout(do_rate)(x)
    x_final = layers.Dense(1, activation='linear')(x)

    model = keras.Model(inputs=[x_input_cats,x_input_nums], outputs=x_final)
    return model

In [None]:
mod_test = build_model(units=512)
mod_test.summary()

#### 2.1.1 Optuna Optimization:

In [None]:
X_fin = data.X
X_test_fin = data.X_test

X_train_cat = data.X[cat_features]
X_train_num = data.X[num_features]

X_test_cat = data.X_test[cat_features]
X_test_num = data.X_test[num_features]

X_train_cat.info()
X_train_num.info()

y_fin = data.y

In [None]:
y_fin.isna().sum()

**OPTIMIZATION SECTION**

In [None]:
def objective_nn(trial, X, y, n_splits, n_repeats, model=build_model, use_gpu=True, rs=42, fit_scaling=False, cv_strategy="KFold"):

    model_class = model

    categorical_features = cat_features.copy()

    num_cols = [col for col in X.columns if col not in categorical_features]

    params = {'units': trial.suggest_categorical('units', [128,256,512,1024]),
              'last_layer': trial.suggest_int('last_layer', 1,2),
              'activation': trial.suggest_categorical('activation', ["relu","selu","gelu","silu"]), #, reg=0.001, dropout_rate=0.33)
              'reg': trial.suggest_float('reg', 1e-4, 0.1, log=True),
              'do_rate': trial.suggest_float('do_rate', 0.30, 0.50)
              }

    if cv_strategy == 'RepKFold':
        kf = RepeatedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=rs)
    elif cv_strategy == 'KFold':
        kf = KFold(n_splits=n_splits, random_state=rs, shuffle=True)
    elif cv_strategy == "StratKFold":
        kf = StratifiedKFold(n_splits=n_splits, random_state=rs, shuffle=True)
    elif cv_strategy == "RepStratKFold":
        kf = RepeatedStratifiedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=rs)

    rmsle_scores = []

    keras.backend.clear_session()

    iteration_n=0

    for idx_train, idx_valid in kf.split(X, y):
        print(f"Running Fold: {iteration_n}")
        # Split the data into training and validation sets for the current fold
        X_train, y_train = X.iloc[idx_train], y.iloc[idx_train].to_numpy()#.reshape(-1, 1)
        X_valid, y_valid = X.iloc[idx_valid], y.iloc[idx_valid].to_numpy()#.reshape(-1, 1)

        X_train_cat = X_train[cat_features]
        X_train_num = X_train[num_features]

        X_valid_cat = X_valid[cat_features]
        X_valid_num = X_valid[num_features]

        # Create the model
        keras.utils.set_random_seed(rs)
        model = model_class(**params)

        optimizer = keras.optimizers.Adam(learning_rate=5e-4)
        model.compile(optimizer=optimizer,
                      loss=[rmsle, keras.losses.MeanSquaredLogarithmicError(name="msle")],
                      metrics=[rmsle, keras.metrics.RootMeanSquaredError(name="msle")])

        checkpoint_filepath = '/tmp/ckpt/checkpoint.weights.h5'
        model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
            filepath=checkpoint_filepath,
            save_weights_only=True,
            monitor='val_rmsle',
            mode='min',
            save_best_only=True)

        # Fit the model
        model.fit([X_train_cat,X_train_num], y_train,
                  validation_data=([X_valid_cat, X_valid_num], y_valid),
                  epochs=31,
                  batch_size=1024,
                  callbacks=[keras.callbacks.ReduceLROnPlateau(patience=3, factor = 0.5, min_lr=1e-6),
                            keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True, monitor="val_rmsle",
                                                            start_from_epoch=3, mode="min"),
                             model_checkpoint_callback])

        model.load_weights(checkpoint_filepath)

        # Make predictions on the validation set
        y_pred = model.predict([X_valid_cat, X_valid_num], batch_size=1024)
        y_pred = np.maximum(y_pred, 1.0)
        y_pred = np.minimum(y_pred, 315.0)

        print("Pred Min: {}".format(y_pred.min()))
        print("Pred Max: {}".format(y_pred.max()))

        # Calculate the RMSE for the current fold
        rmsle_score = root_mean_squared_log_error(y_valid, y_pred)
        print(f"Fold {iteration_n} RMSLE: {rmsle_score}")

        rmsle_scores.append(rmsle_score)
        iteration_n+=1

    # Calculate the mean RMSLE score across all folds
    key_metric = np.mean(rmsle_scores)

    return key_metric

In [None]:
# Step 2: Tuning Hyperparameters with Optuna
def tune_hyperparameters(X, y, model_class, n_trials, n_splits_ ,n_repeats_, use_gpu=True):  #use_gpu
    study = optuna.create_study(direction="minimize", sampler=optuna.samplers.TPESampler(), pruner=optuna.pruners.MedianPruner(n_warmup_steps=5))
    study.optimize(lambda trial: objective_nn(trial, X, y, n_splits=n_splits_, n_repeats=n_repeats_, model=build_model, use_gpu=use_gpu, cv_strategy="KFold"), n_trials=n_trials)
    return study  # Return the study object

# Step 3: Saving Best Results and Models
def save_results(study, model_class, model_name):
    best_params_file = f"{model_name}_best_params.joblib"
    joblib.dump(study.best_params, best_params_file)
    print(f"Best parameters for {model_name} saved to {best_params_file}")

    verbose_file = f"{model_name}_optuna_verbose.log"
    with open(verbose_file, "w") as f:
        f.write(str(study.trials))
    print(f"Optuna verbose for {model_name} saved to {verbose_file}")# usage with XGBRegressor

In [None]:
X_fin.isna().sum(), y_fin.min()

  1. Trial 4 finished with value: 0.06326587167991321 and parameters: {'units': 512, 'last_layer': 1, 'activation': 'relu', 'reg': 0.00012466698516071345, 'do_rate': 0.32329936440008156}.

  2. Trial 12 finished with value: 0.0644081979735794 and parameters: {'units': 256, 'last_layer': 1, 'activation': 'relu', 'reg': 0.0006106006869707281, 'do_rate': 0.3494656732997632}.

  3.  Trial 14 finished with value: 0.06490268403547308 and parameters: {'units': 256, 'last_layer': 1, 'activation': 'relu', 'reg': 0.00012000706329704339, 'do_rate': 0.3032266090954228}.

In [None]:
nn0_study = tune_hyperparameters(X_fin, y_fin, model_class=build_model, n_trials=31, n_splits_ = 5 ,n_repeats_=3, use_gpu=True)

cat_params = nn0_study.best_params

#### 2.1.2 Train Model:

In [None]:
param = {'units': 512, 'last_layer': 1, 'activation': 'relu', 'reg': 0.00012466698516071345, 'do_rate': 0.32329936440008156}
TM = TrainModels(X=data.X, y=data.y, X_test=data.X_test, test_finc_target=y_test_fic, X_original=None, y_original=None, model_=build_model, parameters=param)

In [None]:
TM.fit_model(name="NN_exp_00")

#### 2.1.3 Store Results:

In [None]:
train_pred = TM.OOF_train
test_pred = TM.OOF_test
train_pred = pd.DataFrame(data = train_pred, columns = ["NN_exp_00"])


sub = pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/sample_submission.csv",index_col=0)

sub["Calories"] =  test_pred.values

sub.to_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/submission_NN_exp_00.csv")
train_pred.to_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/train_pred_NN_exp_00.csv")

In [None]:
test_pred.min(), test_pred.max()

In [None]:
train_pred

### **2.2.0 NeuralNetwork: Wide and Deep Model v0**

In [None]:
data.X.sample(3)

In [None]:
data.X.max(axis=0)

In [None]:
fig, ax = plt.subplots(1,2,figsize=(12,3))

ax[0].hist(data.X.BMI, bins=31)
ax[1].hist(data.X_test.BMI, bins=31, color="salmon")

plt.show()

In [None]:
cat_features = data.cat_features
num_features = data.num_features

cat_features_card = [2,2,8]
cat_features_out = [2, 2, 4]


print(cat_features,cat_features_card)
print(num_features)

In [None]:
def wide_deep(units=512, activation="relu", do_rate=0.25, reg=0.001, hidden_layers=3):
    '''
    In this model embedding is performed for the data feeding into both the Deep and and wide layers:
    '''

    x_input_cats = layers.Input(shape=(len(cat_features),))
    embs = []
    for j in range(len(cat_features)):
        e = layers.Embedding(cat_features_card[j], cat_features_out[j]) #np.ceil(np.sqrt(cat_features_card[1]))
        x = e(x_input_cats[:,j])
        x = layers.Flatten()(x)
        embs.append(x)

    x_input_nums = layers.Input(shape=(len(num_features),))

    x = layers.Concatenate(axis=-1)(embs+[x_input_nums])

    wide = layers.BatchNormalization()(x)
    deep = x

    for lay in range(hidden_layers):
        deep = layers.Dense(units,kernel_regularizer=keras.regularizers.l2(reg), name=f"dense_deep_{lay}")(deep)
        deep = layers.BatchNormalization(name=f"bn_deep_{lay}")(deep)
        if activation == "relu":
            deep = layers.ReLU(name=f"relu_deep_{lay}")(deep)
        elif activation == "prelu":
            deep = layers.PReLU(name=f"prelu_deep_{lay}")(deep)
        elif activation == "gelu ":
            deep = activations.gelu(deep)
        elif activation == "silu":
            deep = activations.silu(deep)
        elif activation == "mish":
            deep = layers.Lambda(lambda x: keras.activations.mish(x), name=f"mish_deep_{lay}")(deep)
        elif activation == "celu":
            deep = activations.celu(deep)

        deep = layers.Dropout(do_rate, name=f"do_deep_{lay}")(deep)

    merged = layers.concatenate([wide, deep])

    x_final = layers.Dense(1, activation='linear')(merged)

    model = keras.Model(inputs=[x_input_cats,x_input_nums], outputs=x_final)
    return model

In [None]:
mod_test = wide_deep(units=512, activation="celu")
mod_test.summary()

In [None]:
#keras.utils.plot_model(mod_test, show_shapes=True, rankdir="LR")

#### 2.1.1 Optuna Optimization:

In [None]:
X_fin = data.X
X_test_fin = data.X_test

X_train_cat = data.X[cat_features]
X_train_num = data.X[num_features]

X_test_cat = data.X_test[cat_features]
X_test_num = data.X_test[num_features]

X_train_cat.info()
X_train_num.info()

y_fin = data.y

In [None]:
y_fin.isna().sum()

**OPTIMIZATION SECTION**

In [None]:
def objective_nn(trial, X, y, n_splits, n_repeats, model=wide_deep, use_gpu=True, rs=42, fit_scaling=False, cv_strategy="KFold"):

    model_class = model

    categorical_features = cat_features.copy()

    num_cols = [col for col in X.columns if col not in categorical_features]

    params = {
              'units': trial.suggest_categorical('units', [128,256,512,1024]),
              'last_layer': trial.suggest_int('last_layer', 1,2),
              'activation': trial.suggest_categorical('activation', ["relu","prelu","gelu","silu","mish","celu"]), #, reg=0.001, dropout_rate=0.33)
              'reg': trial.suggest_float('reg', 1e-4, 0.1, log=True),
              'do_rate': trial.suggest_float('do_rate', 0.30, 0.50),
              'hidden_layers': trial.suggest_int('hidden_layers', 1,4)
              }

    if cv_strategy == 'RepKFold':
        kf = RepeatedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=rs)
    elif cv_strategy == 'KFold':
        kf = KFold(n_splits=n_splits, random_state=rs, shuffle=True)
    elif cv_strategy == "StratKFold":
        kf = StratifiedKFold(n_splits=n_splits, random_state=rs, shuffle=True)
    elif cv_strategy == "RepStratKFold":
        kf = RepeatedStratifiedKFold(n_splits=n_splits, n_repeats=n_repeats, random_state=rs)

    rmsle_scores = []

    keras.backend.clear_session()

    iteration_n=0

    for idx_train, idx_valid in kf.split(X, y):
        print(f"Running Fold: {iteration_n}")
        # Split the data into training and validation sets for the current fold
        X_train, y_train = X.iloc[idx_train], y.iloc[idx_train].to_numpy()#.reshape(-1, 1)
        X_valid, y_valid = X.iloc[idx_valid], y.iloc[idx_valid].to_numpy()#.reshape(-1, 1)

        X_train_cat = X_train[cat_features]
        X_train_num = X_train[num_features]

        X_valid_cat = X_valid[cat_features]
        X_valid_num = X_valid[num_features]

        # Create the model
        keras.utils.set_random_seed(rs)
        model = model_class(**params)

        optimizer = keras.optimizers.Adam(learning_rate=5e-4)
        model.compile(optimizer=optimizer,
                      loss=[rmsle, keras.losses.MeanSquaredLogarithmicError(name="msle")],
                      metrics=[rmsle, keras.metrics.RootMeanSquaredError(name="msle")])

        checkpoint_filepath = '/tmp/ckpt/checkpoint_dw.weights.h5'
        model_checkpoint_callback = keras.callbacks.ModelCheckpoint(
            filepath=checkpoint_filepath,
            save_weights_only=True,
            monitor='val_rmsle',
            mode='min',
            save_best_only=True)

        # Fit the model
        model.fit([X_train_cat,X_train_num], y_train,
                  validation_data=([X_valid_cat, X_valid_num], y_valid),
                  epochs=31,
                  batch_size=1024,
                  callbacks=[keras.callbacks.ReduceLROnPlateau(patience=3, factor = 0.5, min_lr=1e-6),
                            keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True, monitor="val_rmsle",
                                                            start_from_epoch=3, mode="min"),
                             model_checkpoint_callback])

        model.load_weights(checkpoint_filepath)

        # Make predictions on the validation set
        y_pred = model.predict([X_valid_cat, X_valid_num], batch_size=1024)
        y_pred = np.maximum(y_pred, 1.0)
        y_pred = np.minimum(y_pred, 315.0)

        print("Pred Min: {}".format(y_pred.min()))
        print("Pred Max: {}".format(y_pred.max()))

        # Calculate the RMSE for the current fold
        rmsle_score = root_mean_squared_log_error(y_valid, y_pred)
        print(f"Fold {iteration_n} RMSLE: {rmsle_score}")

        rmsle_scores.append(rmsle_score)
        iteration_n+=1

    # Calculate the mean RMSLE score across all folds
    key_metric = np.mean(rmsle_scores)

    return key_metric

In [None]:
# Step 2: Tuning Hyperparameters with Optuna
def tune_hyperparameters(X, y, model_class, n_trials, n_splits_ ,n_repeats_, use_gpu=True):  #use_gpu
    study = optuna.create_study(direction="minimize", sampler=optuna.samplers.TPESampler(), pruner=optuna.pruners.MedianPruner(n_warmup_steps=5))
    study.optimize(lambda trial: objective_nn(trial, X, y, n_splits=n_splits_, n_repeats=n_repeats_, model=model_class, use_gpu=use_gpu, cv_strategy="KFold"), n_trials=n_trials)
    return study  # Return the study object

# Step 3: Saving Best Results and Models
def save_results(study, model_class, model_name):
    best_params_file = f"{model_name}_best_params.joblib"
    joblib.dump(study.best_params, best_params_file)
    print(f"Best parameters for {model_name} saved to {best_params_file}")

    verbose_file = f"{model_name}_optuna_verbose.log"
    with open(verbose_file, "w") as f:
        f.write(str(study.trials))
    print(f"Optuna verbose for {model_name} saved to {verbose_file}")# usage with XGBRegressor

In [None]:
X_fin.isna().sum(), y_fin.min()

  1. Trial 32 finished with value: 0.06224817614285113 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.000103427172893175, 'do_rate': 0.40056000512858025, 'hidden_layers': 2}

  2. Trial 18 finished with value: 0.06211038027842043 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.0001004170129215336, 'do_rate': 0.41356627172269655, 'hidden_layers': 3} Best

  3.  Trial 23 finished with value: 0.06239891244956206 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00010001356287977584, 'do_rate': 0.4642367345915417, 'hidden_layers': 2}.

In [60]:
nn0_study = tune_hyperparameters(X_fin, y_fin, model_class=wide_deep, n_trials=51, n_splits_ = 5 ,n_repeats_=3, use_gpu=True)

cat_params = nn0_study.best_params

598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.0967 - msle: 7.6188 - rmsle: 0.0918 - val_dense_loss: 0.0000e+00 - val_loss: 0.0688 - val_msle: 3.9854 - val_rmsle: 0.0638 - learning_rate: 2.5000e-04
Epoch 28/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.0962 - msle: 7.5606 - rmsle: 0.0913 - val_dense_loss: 0.0000e+00 - val_loss: 0.0686 - val_msle: 3.8090 - val_rmsle: 0.0636 - learning_rate: 2.5000e-04
Epoch 29/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - dense_loss: 0.0000e+00 - loss: 0.0958 - msle: 7.5293 - rmsle: 0.0910 - val_dense_loss: 0.0000e+00 - val_loss: 0.0692 - val_msle: 3.9257 - val_rmsle: 0.0643 - learning_rate: 2.5000e-04
Epoch 30/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.0943 - msle: 7.4720 - rmsle: 0.0897 - val_dense_loss: 0.0000e+00 - val_loss: 0.0671 - val_msle: 3.7703 - val_rmsle: 0.0626 - learning_rate: 1.2500e-04
Epoch 31/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - d

[I 2025-05-07 22:16:32,297] Trial 22 finished with value: 0.06353837487766549 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00019973277177462976, 'do_rate': 0.4031569153546122, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1601 - msle: 97.5285 - rmsle: 2.1122 - val_dense_loss: 0.0000e+00 - val_loss: 0.7870 - val_msle: 65.9286 - val_rmsle: 0.7567 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6356 - msle: 58.3225 - rmsle: 0.6090 - val_dense_loss: 0.0000e+00 - val_loss: 0.2292 - val_msle: 13.0947 - val_rmsle: 0.2109 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2381 - msle: 13.1779 - rmsle: 0.2217 - val_dense_loss: 0.0000e+00 - val_loss: 0.0993 - val_msle: 4.9205 - val_rmsle: 0.0859 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1489 - msle: 9.4803 - rmsle: 0.1362 - val_dense_loss: 0.0000e+00 - val_loss: 0.0837 - val_msle: 5.9429 - val_rmsle: 0.0727 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━

[I 2025-05-07 22:23:33,945] Trial 23 finished with value: 0.06239891244956206 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00010001356287977584, 'do_rate': 0.4642367345915417, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.2131 - msle: 99.3155 - rmsle: 2.1194 - val_dense_loss: 0.0000e+00 - val_loss: 0.7720 - val_msle: 72.2120 - val_rmsle: 0.7184 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5981 - msle: 74.5655 - rmsle: 0.5493 - val_dense_loss: 0.0000e+00 - val_loss: 0.2486 - val_msle: 48.5413 - val_rmsle: 0.2098 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2811 - msle: 53.3143 - rmsle: 0.2454 - val_dense_loss: 0.0000e+00 - val_loss: 0.2265 - val_msle: 34.6889 - val_rmsle: 0.1986 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2539 - msle: 38.6441 - rmsle: 0.2286 - val_dense_loss: 0.0000e+00 - val_loss: 0.2166 - val_msle: 22.0711 - val_rmsle: 0.1975 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━

[I 2025-05-07 22:30:47,027] Trial 24 finished with value: 0.06290427003099212 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00010106883106377464, 'do_rate': 0.4622905929814005, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 1.9020 - msle: 93.0021 - rmsle: 1.7698 - val_dense_loss: 0.0000e+00 - val_loss: 0.6012 - val_msle: 49.8447 - val_rmsle: 0.5579 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3805 - msle: 31.9375 - rmsle: 0.3446 - val_dense_loss: 0.0000e+00 - val_loss: 0.2022 - val_msle: 14.7160 - val_rmsle: 0.1778 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1696 - msle: 9.1854 - rmsle: 0.1458 - val_dense_loss: 0.0000e+00 - val_loss: 0.0975 - val_msle: 4.4909 - val_rmsle: 0.0770 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1484 - msle: 8.7315 - rmsle: 0.1289 - val_dense_loss: 0.0000e+00 - val_loss: 0.0923 - val_msle: 4.2211 - val_rmsle: 0.0743 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━━

[I 2025-05-07 22:37:19,943] Trial 25 finished with value: 0.06368631985321752 and parameters: {'units': 1024, 'last_layer': 2, 'activation': 'silu', 'reg': 0.0001745018608654567, 'do_rate': 0.49815599080886164, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 9s 9ms/step - dense_loss: 0.0000e+00 - loss: 2.4428 - msle: 98.3678 - rmsle: 2.4251 - val_dense_loss: 0.0000e+00 - val_loss: 0.7433 - val_msle: 64.9325 - val_rmsle: 0.7289 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5220 - msle: 50.9920 - rmsle: 0.5075 - val_dense_loss: 0.0000e+00 - val_loss: 0.1412 - val_msle: 10.9291 - val_rmsle: 0.1275 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1744 - msle: 10.3754 - rmsle: 0.1613 - val_dense_loss: 0.0000e+00 - val_loss: 0.1047 - val_msle: 6.8779 - val_rmsle: 0.0931 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1553 - msle: 8.9253 - rmsle: 0.1442 - val_dense_loss: 0.0000e+00 - val_loss: 0.0948 - val_msle: 6.2998 - val_rmsle: 0.0845 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━━━

[I 2025-05-07 22:43:38,954] Trial 26 finished with value: 0.0651089561112833 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.0007571978267496025, 'do_rate': 0.42779646924463055, 'hidden_layers': 1}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.2746 - msle: 97.1335 - rmsle: 1.9677 - val_dense_loss: 0.0000e+00 - val_loss: 0.7969 - val_msle: 68.6374 - val_rmsle: 0.6703 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.7021 - msle: 73.9315 - rmsle: 0.5874 - val_dense_loss: 0.0000e+00 - val_loss: 0.3409 - val_msle: 10.8991 - val_rmsle: 0.2538 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3691 - msle: 15.4758 - rmsle: 0.2917 - val_dense_loss: 0.0000e+00 - val_loss: 0.1712 - val_msle: 6.7242 - val_rmsle: 0.1187 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3064 - msle: 11.1995 - rmsle: 0.2595 - val_dense_loss: 0.0000e+00 - val_loss: 0.1680 - val_msle: 6.6097 - val_rmsle: 0.1336 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━

[I 2025-05-07 22:49:48,161] Trial 27 finished with value: 0.08159233936094082 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'celu', 'reg': 0.0003960032885407173, 'do_rate': 0.4724442076661731, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 17s 13ms/step - dense_loss: 0.0000e+00 - loss: 2.2714 - msle: 103.1270 - rmsle: 2.0596 - val_dense_loss: 0.0000e+00 - val_loss: 1.5409 - val_msle: 105.9597 - val_rmsle: 1.4257 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - dense_loss: 0.0000e+00 - loss: 1.3047 - msle: 101.5253 - rmsle: 1.2032 - val_dense_loss: 0.0000e+00 - val_loss: 0.4596 - val_msle: 20.2373 - val_rmsle: 0.3706 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.4817 - msle: 25.2364 - rmsle: 0.3962 - val_dense_loss: 0.0000e+00 - val_loss: 0.2193 - val_msle: 13.6789 - val_rmsle: 0.1457 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - dense_loss: 0.0000e+00 - loss: 0.3875 - msle: 18.7833 - rmsle: 0.3172 - val_dense_loss: 0.0000e+00 - val_loss: 0.2037 - val_msle: 13.7122 - val_rmsle: 0.1428 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 

[I 2025-05-07 22:55:43,271] Trial 28 finished with value: 0.13208377548703126 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'gelu', 'reg': 0.00015664139047782735, 'do_rate': 0.45314272767537633, 'hidden_layers': 4}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.2305 - msle: 97.4417 - rmsle: 2.1128 - val_dense_loss: 0.0000e+00 - val_loss: 0.8118 - val_msle: 66.9404 - val_rmsle: 0.7731 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6274 - msle: 57.5592 - rmsle: 0.5963 - val_dense_loss: 0.0000e+00 - val_loss: 0.1458 - val_msle: 6.4667 - val_rmsle: 0.1249 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1760 - msle: 11.1531 - rmsle: 0.1565 - val_dense_loss: 0.0000e+00 - val_loss: 0.0927 - val_msle: 5.0809 - val_rmsle: 0.0761 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1534 - msle: 9.7249 - rmsle: 0.1378 - val_dense_loss: 0.0000e+00 - val_loss: 0.0877 - val_msle: 5.9233 - val_rmsle: 0.0739 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━━

[I 2025-05-07 23:02:09,214] Trial 29 finished with value: 0.06351284287516273 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00029640255130309164, 'do_rate': 0.48600211990580794, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1785 - msle: 95.6029 - rmsle: 2.0007 - val_dense_loss: 0.0000e+00 - val_loss: 0.7433 - val_msle: 62.1153 - val_rmsle: 0.7035 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5613 - msle: 50.7888 - rmsle: 0.5298 - val_dense_loss: 0.0000e+00 - val_loss: 0.1440 - val_msle: 5.4146 - val_rmsle: 0.1212 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1661 - msle: 9.8915 - rmsle: 0.1450 - val_dense_loss: 0.0000e+00 - val_loss: 0.1123 - val_msle: 4.8651 - val_rmsle: 0.0945 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1471 - msle: 8.9809 - rmsle: 0.1304 - val_dense_loss: 0.0000e+00 - val_loss: 0.0982 - val_msle: 5.0038 - val_rmsle: 0.0833 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━━━

[I 2025-05-07 23:08:32,596] Trial 30 finished with value: 0.06390955888006591 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'mish', 'reg': 0.0005021834412306193, 'do_rate': 0.4291059018687493, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1417 - msle: 97.5148 - rmsle: 2.0942 - val_dense_loss: 0.0000e+00 - val_loss: 0.7766 - val_msle: 65.8933 - val_rmsle: 0.7474 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6493 - msle: 59.9679 - rmsle: 0.6239 - val_dense_loss: 0.0000e+00 - val_loss: 0.4492 - val_msle: 37.5267 - val_rmsle: 0.4339 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.4420 - msle: 38.7850 - rmsle: 0.4288 - val_dense_loss: 0.0000e+00 - val_loss: 0.2516 - val_msle: 20.9471 - val_rmsle: 0.2399 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1467 - msle: 9.3456 - rmsle: 0.1353 - val_dense_loss: 0.0000e+00 - val_loss: 0.0877 - val_msle: 5.4238 - val_rmsle: 0.0776 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━

[I 2025-05-07 23:15:06,804] Trial 31 finished with value: 0.06231783750406522 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00010016840215891687, 'do_rate': 0.39538206506891804, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1472 - msle: 97.5264 - rmsle: 2.0982 - val_dense_loss: 0.0000e+00 - val_loss: 0.7794 - val_msle: 65.7907 - val_rmsle: 0.7502 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6518 - msle: 61.2903 - rmsle: 0.6264 - val_dense_loss: 0.0000e+00 - val_loss: 0.2326 - val_msle: 16.9270 - val_rmsle: 0.2162 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2528 - msle: 15.3837 - rmsle: 0.2382 - val_dense_loss: 0.0000e+00 - val_loss: 0.1354 - val_msle: 11.4990 - val_rmsle: 0.1242 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1394 - msle: 8.7283 - rmsle: 0.1286 - val_dense_loss: 0.0000e+00 - val_loss: 0.0814 - val_msle: 4.9138 - val_rmsle: 0.0717 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━

[I 2025-05-07 23:21:56,784] Trial 32 finished with value: 0.06224817614285113 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.000103427172893175, 'do_rate': 0.40056000512858025, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 11ms/step - dense_loss: 0.0000e+00 - loss: 6.4886 - msle: 97.1379 - rmsle: 2.1374 - val_dense_loss: 0.0000e+00 - val_loss: 0.8140 - val_msle: 70.4712 - val_rmsle: 0.7732 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5060 - msle: 47.7374 - rmsle: 0.4602 - val_dense_loss: 0.0000e+00 - val_loss: 0.2776 - val_msle: 14.0756 - val_rmsle: 0.2126 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2491 - msle: 12.0158 - rmsle: 0.1904 - val_dense_loss: 0.0000e+00 - val_loss: 0.1875 - val_msle: 13.7843 - val_rmsle: 0.1427 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2073 - msle: 10.3563 - rmsle: 0.1649 - val_dense_loss: 0.0000e+00 - val_loss: 0.1349 - val_msle: 9.7238 - val_rmsle: 0.0989 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━

[I 2025-05-07 23:28:46,194] Trial 33 finished with value: 0.06920287254414896 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.03143641529289578, 'do_rate': 0.3916743313302587, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1648 - msle: 97.5569 - rmsle: 2.0903 - val_dense_loss: 0.0000e+00 - val_loss: 0.7996 - val_msle: 67.1238 - val_rmsle: 0.7662 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6286 - msle: 60.1498 - rmsle: 0.6010 - val_dense_loss: 0.0000e+00 - val_loss: 0.2291 - val_msle: 15.3686 - val_rmsle: 0.2122 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2456 - msle: 13.0036 - rmsle: 0.2308 - val_dense_loss: 0.0000e+00 - val_loss: 0.1364 - val_msle: 10.1684 - val_rmsle: 0.1245 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1355 - msle: 8.3931 - rmsle: 0.1240 - val_dense_loss: 0.0000e+00 - val_loss: 0.0997 - val_msle: 4.3352 - val_rmsle: 0.0895 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━

[I 2025-05-07 23:35:36,862] Trial 34 finished with value: 0.062408490986112675 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00017059785798002538, 'do_rate': 0.3733127784757884, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 9s 8ms/step - dense_loss: 0.0000e+00 - loss: 2.0426 - msle: 93.2358 - rmsle: 2.0337 - val_dense_loss: 0.0000e+00 - val_loss: 0.3332 - val_msle: 36.5351 - val_rmsle: 0.3208 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - dense_loss: 0.0000e+00 - loss: 0.2523 - msle: 23.8392 - rmsle: 0.2397 - val_dense_loss: 0.0000e+00 - val_loss: 0.1546 - val_msle: 7.3324 - val_rmsle: 0.1424 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - dense_loss: 0.0000e+00 - loss: 0.1680 - msle: 8.0199 - rmsle: 0.1562 - val_dense_loss: 0.0000e+00 - val_loss: 0.1239 - val_msle: 6.8835 - val_rmsle: 0.1128 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1575 - msle: 7.7987 - rmsle: 0.1466 - val_dense_loss: 0.0000e+00 - val_loss: 0.1313 - val_msle: 6.4784 - val_rmsle: 0.1209 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━━━━━

[I 2025-05-07 23:41:49,630] Trial 35 finished with value: 0.06502809910016363 and parameters: {'units': 1024, 'last_layer': 2, 'activation': 'silu', 'reg': 0.0003051197674310691, 'do_rate': 0.43873993490106994, 'hidden_layers': 1}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.4187 - msle: 98.7008 - rmsle: 2.3350 - val_dense_loss: 0.0000e+00 - val_loss: 0.7873 - val_msle: 68.8559 - val_rmsle: 0.7391 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5506 - msle: 53.8235 - rmsle: 0.5112 - val_dense_loss: 0.0000e+00 - val_loss: 0.1314 - val_msle: 13.8980 - val_rmsle: 0.1081 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1676 - msle: 13.7420 - rmsle: 0.1464 - val_dense_loss: 0.0000e+00 - val_loss: 0.0999 - val_msle: 6.9712 - val_rmsle: 0.0836 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1373 - msle: 10.5650 - rmsle: 0.1220 - val_dense_loss: 0.0000e+00 - val_loss: 0.0867 - val_msle: 4.8182 - val_rmsle: 0.0737 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━

[I 2025-05-07 23:49:19,617] Trial 36 finished with value: 0.062341729381920674 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.0001714849436554508, 'do_rate': 0.4035739446101512, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 17.5377 - msle: 100.0418 - rmsle: 2.4535 - val_dense_loss: 0.0000e+00 - val_loss: 1.2299 - val_msle: 85.6455 - val_rmsle: 1.1998 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.7090 - msle: 64.5009 - rmsle: 0.6781 - val_dense_loss: 0.0000e+00 - val_loss: 0.3735 - val_msle: 37.1268 - val_rmsle: 0.3352 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2852 - msle: 26.0692 - rmsle: 0.2368 - val_dense_loss: 0.0000e+00 - val_loss: 0.2881 - val_msle: 21.3531 - val_rmsle: 0.2347 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2244 - msle: 13.9430 - rmsle: 0.1706 - val_dense_loss: 0.0000e+00 - val_loss: 0.2539 - val_msle: 22.5498 - val_rmsle: 0.2037 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 

[I 2025-05-07 23:57:03,238] Trial 37 finished with value: 0.06743879930344623 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.08923112872646002, 'do_rate': 0.4029385050664301, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.3918 - msle: 98.5412 - rmsle: 2.3140 - val_dense_loss: 0.0000e+00 - val_loss: 0.7545 - val_msle: 66.7352 - val_rmsle: 0.7084 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5287 - msle: 52.4036 - rmsle: 0.4907 - val_dense_loss: 0.0000e+00 - val_loss: 0.1286 - val_msle: 12.8001 - val_rmsle: 0.1057 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1548 - msle: 12.2948 - rmsle: 0.1342 - val_dense_loss: 0.0000e+00 - val_loss: 0.1016 - val_msle: 6.9775 - val_rmsle: 0.0859 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1297 - msle: 9.7972 - rmsle: 0.1151 - val_dense_loss: 0.0000e+00 - val_loss: 0.0841 - val_msle: 4.8356 - val_rmsle: 0.0717 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━

[I 2025-05-08 00:04:36,917] Trial 38 finished with value: 0.06265351224406834 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.00015848958576634598, 'do_rate': 0.3599242085771337, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.7234 - msle: 98.4359 - rmsle: 2.3095 - val_dense_loss: 0.0000e+00 - val_loss: 0.8749 - val_msle: 73.8438 - val_rmsle: 0.8404 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5167 - msle: 52.5486 - rmsle: 0.4917 - val_dense_loss: 0.0000e+00 - val_loss: 0.1508 - val_msle: 7.1267 - val_rmsle: 0.1300 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1578 - msle: 12.7149 - rmsle: 0.1381 - val_dense_loss: 0.0000e+00 - val_loss: 0.1321 - val_msle: 4.8267 - val_rmsle: 0.1144 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1383 - msle: 10.2352 - rmsle: 0.1212 - val_dense_loss: 0.0000e+00 - val_loss: 0.0989 - val_msle: 4.7134 - val_rmsle: 0.0830 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━

[I 2025-05-08 00:12:12,390] Trial 39 finished with value: 0.06435060997812309 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.001243250807070294, 'do_rate': 0.38337843404644, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 18s 14ms/step - dense_loss: 0.0000e+00 - loss: 4.3555 - msle: 98.7020 - rmsle: 2.2867 - val_dense_loss: 0.0000e+00 - val_loss: 0.9963 - val_msle: 79.1582 - val_rmsle: 0.9590 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6244 - msle: 59.4606 - rmsle: 0.5926 - val_dense_loss: 0.0000e+00 - val_loss: 0.2435 - val_msle: 25.0046 - val_rmsle: 0.2085 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2297 - msle: 20.5457 - rmsle: 0.1931 - val_dense_loss: 0.0000e+00 - val_loss: 0.1348 - val_msle: 7.2048 - val_rmsle: 0.0994 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - dense_loss: 0.0000e+00 - loss: 0.1812 - msle: 12.5926 - rmsle: 0.1477 - val_dense_loss: 0.0000e+00 - val_loss: 0.1540 - val_msle: 12.7392 - val_rmsle: 0.1231 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━

[I 2025-05-08 00:20:14,802] Trial 40 finished with value: 0.06598654518723414 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.006765897049474696, 'do_rate': 0.40516580732698926, 'hidden_layers': 4}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.3860 - msle: 98.2153 - rmsle: 2.3510 - val_dense_loss: 0.0000e+00 - val_loss: 1.1324 - val_msle: 76.8501 - val_rmsle: 1.1054 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.9839 - msle: 72.2996 - rmsle: 0.9581 - val_dense_loss: 0.0000e+00 - val_loss: 0.3306 - val_msle: 33.4616 - val_rmsle: 0.3078 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3518 - msle: 26.8534 - rmsle: 0.3299 - val_dense_loss: 0.0000e+00 - val_loss: 0.1592 - val_msle: 9.0414 - val_rmsle: 0.1402 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2834 - msle: 13.3176 - rmsle: 0.2650 - val_dense_loss: 0.0000e+00 - val_loss: 0.1636 - val_msle: 6.8865 - val_rmsle: 0.1470 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━

[I 2025-05-08 00:26:09,150] Trial 41 finished with value: 0.08522413666736477 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'celu', 'reg': 0.00013158704922745375, 'do_rate': 0.4435388071281556, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.2623 - msle: 99.3405 - rmsle: 2.0821 - val_dense_loss: 0.0000e+00 - val_loss: 0.8162 - val_msle: 74.3426 - val_rmsle: 0.7599 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6025 - msle: 80.2754 - rmsle: 0.5570 - val_dense_loss: 0.0000e+00 - val_loss: 0.2581 - val_msle: 61.5432 - val_rmsle: 0.2240 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2664 - msle: 65.7016 - rmsle: 0.2361 - val_dense_loss: 0.0000e+00 - val_loss: 0.2254 - val_msle: 51.9760 - val_rmsle: 0.2045 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2408 - msle: 56.5899 - rmsle: 0.2220 - val_dense_loss: 0.0000e+00 - val_loss: 0.2232 - val_msle: 41.2223 - val_rmsle: 0.2087 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━

[I 2025-05-08 00:32:48,201] Trial 42 finished with value: 0.0632409493899596 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00022289152879535065, 'do_rate': 0.34997092282734454, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1531 - msle: 97.4879 - rmsle: 2.1194 - val_dense_loss: 0.0000e+00 - val_loss: 0.7587 - val_msle: 68.1378 - val_rmsle: 0.7376 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5436 - msle: 55.1592 - rmsle: 0.5256 - val_dense_loss: 0.0000e+00 - val_loss: 0.2135 - val_msle: 14.0045 - val_rmsle: 0.2012 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1531 - msle: 14.1678 - rmsle: 0.1415 - val_dense_loss: 0.0000e+00 - val_loss: 0.1354 - val_msle: 4.4880 - val_rmsle: 0.1253 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1221 - msle: 9.4651 - rmsle: 0.1125 - val_dense_loss: 0.0000e+00 - val_loss: 0.1162 - val_msle: 4.3035 - val_rmsle: 0.1075 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━

[I 2025-05-08 00:39:45,944] Trial 43 finished with value: 0.06281580493653846 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.00013195929346934942, 'do_rate': 0.36741364991145603, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.3304 - msle: 99.2918 - rmsle: 2.1041 - val_dense_loss: 0.0000e+00 - val_loss: 0.7857 - val_msle: 76.6873 - val_rmsle: 0.7234 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5809 - msle: 74.3140 - rmsle: 0.5251 - val_dense_loss: 0.0000e+00 - val_loss: 0.2521 - val_msle: 45.3721 - val_rmsle: 0.2091 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2776 - msle: 56.9853 - rmsle: 0.2404 - val_dense_loss: 0.0000e+00 - val_loss: 0.2361 - val_msle: 42.7323 - val_rmsle: 0.2100 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2520 - msle: 41.9506 - rmsle: 0.2285 - val_dense_loss: 0.0000e+00 - val_loss: 0.2317 - val_msle: 22.6979 - val_rmsle: 0.2144 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━

[I 2025-05-08 00:46:14,090] Trial 44 finished with value: 0.06425777577162443 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.00029721226511733134, 'do_rate': 0.4278807862363875, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.2721 - msle: 110.2751 - rmsle: 1.8958 - val_dense_loss: 0.0000e+00 - val_loss: 0.9397 - val_msle: 51.6275 - val_rmsle: 0.8748 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5615 - msle: 38.2526 - rmsle: 0.4962 - val_dense_loss: 0.0000e+00 - val_loss: 0.2248 - val_msle: 13.5735 - val_rmsle: 0.1628 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.4044 - msle: 16.4202 - rmsle: 0.3453 - val_dense_loss: 0.0000e+00 - val_loss: 0.2726 - val_msle: 13.4860 - val_rmsle: 0.2199 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3359 - msle: 14.6718 - rmsle: 0.2841 - val_dense_loss: 0.0000e+00 - val_loss: 0.2456 - val_msle: 12.6824 - val_rmsle: 0.1993 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━

[I 2025-05-08 00:52:04,755] Trial 45 finished with value: 0.11364067090756089 and parameters: {'units': 1024, 'last_layer': 2, 'activation': 'gelu', 'reg': 0.0006769632957612944, 'do_rate': 0.39398629748681807, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 12s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.6416 - msle: 102.3243 - rmsle: 2.6104 - val_dense_loss: 0.0000e+00 - val_loss: 1.3287 - val_msle: 87.7669 - val_rmsle: 1.3072 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 1.0764 - msle: 80.3760 - rmsle: 1.0573 - val_dense_loss: 0.0000e+00 - val_loss: 0.5484 - val_msle: 57.7252 - val_rmsle: 0.5362 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.4458 - msle: 48.5659 - rmsle: 0.4351 - val_dense_loss: 0.0000e+00 - val_loss: 0.1509 - val_msle: 20.9616 - val_rmsle: 0.1437 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1598 - msle: 16.6840 - rmsle: 0.1533 - val_dense_loss: 0.0000e+00 - val_loss: 0.0774 - val_msle: 5.9085 - val_rmsle: 0.0721 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━

[I 2025-05-08 00:58:25,972] Trial 46 finished with value: 0.06396097060204704 and parameters: {'units': 128, 'last_layer': 2, 'activation': 'relu', 'reg': 0.00021932289775685297, 'do_rate': 0.38277107119618853, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 15s 13ms/step - dense_loss: 0.0000e+00 - loss: 2.2868 - msle: 97.3919 - rmsle: 2.0132 - val_dense_loss: 0.0000e+00 - val_loss: 0.6704 - val_msle: 65.6741 - val_rmsle: 0.5980 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.5399 - msle: 65.1966 - rmsle: 0.4755 - val_dense_loss: 0.0000e+00 - val_loss: 0.2593 - val_msle: 41.1106 - val_rmsle: 0.2151 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.2826 - msle: 46.1357 - rmsle: 0.2437 - val_dense_loss: 0.0000e+00 - val_loss: 0.2433 - val_msle: 27.3876 - val_rmsle: 0.2159 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 5ms/step - dense_loss: 0.0000e+00 - loss: 0.2550 - msle: 28.6212 - rmsle: 0.2308 - val_dense_loss: 0.0000e+00 - val_loss: 0.2277 - val_msle: 14.1492 - val_rmsle: 0.2106 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━

[I 2025-05-08 01:05:02,732] Trial 47 finished with value: 0.06432578472720817 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'mish', 'reg': 0.0003701582373943472, 'do_rate': 0.4521522960382661, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 10s 8ms/step - dense_loss: 0.0000e+00 - loss: 2.3643 - msle: 99.7575 - rmsle: 2.3614 - val_dense_loss: 0.0000e+00 - val_loss: 0.9378 - val_msle: 75.9134 - val_rmsle: 0.9341 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.7885 - msle: 68.8807 - rmsle: 0.7847 - val_dense_loss: 0.0000e+00 - val_loss: 0.3398 - val_msle: 39.9114 - val_rmsle: 0.3362 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 3ms/step - dense_loss: 0.0000e+00 - loss: 0.2799 - msle: 31.5461 - rmsle: 0.2764 - val_dense_loss: 0.0000e+00 - val_loss: 0.1033 - val_msle: 9.6966 - val_rmsle: 0.0997 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1419 - msle: 10.3541 - rmsle: 0.1384 - val_dense_loss: 0.0000e+00 - val_loss: 0.0797 - val_msle: 5.9080 - val_rmsle: 0.0762 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━

[I 2025-05-08 01:11:24,826] Trial 48 finished with value: 0.06476461739840372 and parameters: {'units': 256, 'last_layer': 2, 'activation': 'prelu', 'reg': 0.00010472192744117036, 'do_rate': 0.40886738534711037, 'hidden_layers': 1}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 13s 11ms/step - dense_loss: 0.0000e+00 - loss: 2.1872 - msle: 97.5364 - rmsle: 2.0948 - val_dense_loss: 0.0000e+00 - val_loss: 0.8110 - val_msle: 67.8676 - val_rmsle: 0.7754 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.6526 - msle: 61.1773 - rmsle: 0.6240 - val_dense_loss: 0.0000e+00 - val_loss: 0.4502 - val_msle: 39.9591 - val_rmsle: 0.4362 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3046 - msle: 29.3040 - rmsle: 0.2898 - val_dense_loss: 0.0000e+00 - val_loss: 0.1155 - val_msle: 4.7644 - val_rmsle: 0.1012 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.1388 - msle: 8.6435 - rmsle: 0.1254 - val_dense_loss: 0.0000e+00 - val_loss: 0.0838 - val_msle: 4.3122 - val_rmsle: 0.0720 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━━━━━

[I 2025-05-08 01:17:54,915] Trial 49 finished with value: 0.06244188947053544 and parameters: {'units': 512, 'last_layer': 2, 'activation': 'silu', 'reg': 0.0002212934882765353, 'do_rate': 0.3985409976038868, 'hidden_layers': 2}. Best is trial 18 with value: 0.06211038027842043.


Running Fold: 0
Epoch 1/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 14s 12ms/step - dense_loss: 0.0000e+00 - loss: 2.9455 - msle: 104.0886 - rmsle: 2.9071 - val_dense_loss: 0.0000e+00 - val_loss: 1.5482 - val_msle: 92.4224 - val_rmsle: 1.5182 - learning_rate: 5.0000e-04
Epoch 2/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 1.3738 - msle: 90.5754 - rmsle: 1.3463 - val_dense_loss: 0.0000e+00 - val_loss: 0.7820 - val_msle: 72.6726 - val_rmsle: 0.7611 - learning_rate: 5.0000e-04
Epoch 3/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 2s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.7045 - msle: 70.6547 - rmsle: 0.6852 - val_dense_loss: 0.0000e+00 - val_loss: 0.3780 - val_msle: 51.5044 - val_rmsle: 0.3624 - learning_rate: 5.0000e-04
Epoch 4/31
598/598 ━━━━━━━━━━━━━━━━━━━━ 3s 4ms/step - dense_loss: 0.0000e+00 - loss: 0.3687 - msle: 48.4701 - rmsle: 0.3540 - val_dense_loss: 0.0000e+00 - val_loss: 0.2348 - val_msle: 24.8825 - val_rmsle: 0.2223 - learning_rate: 5.0000e-04
Epoch 5/31
598/598 ━━

[I 2025-05-08 01:25:10,709] Trial 50 finished with value: 0.06524286290862095 and parameters: {'units': 128, 'last_layer': 2, 'activation': 'silu', 'reg': 0.0001433447813483603, 'do_rate': 0.4227286017634942, 'hidden_layers': 3}. Best is trial 18 with value: 0.06211038027842043.


In [61]:
cat_params

{'units': 512,
 'last_layer': 2,
 'activation': 'silu',
 'reg': 0.0001004170129215336,
 'do_rate': 0.41356627172269655,
 'hidden_layers': 3}

#### 2.1.2 Train Model:

In [None]:
param = {'units': 512, 'last_layer': 1, 'activation': 'relu', 'reg': 0.00012466698516071345, 'do_rate': 0.32329936440008156}
TM = TrainModels(X=data.X, y=data.y, X_test=data.X_test, test_finc_target=y_test_fic, X_original=None, y_original=None, model_=build_model, parameters=param)

In [None]:
TM.fit_model(name="NN_exp_00")

#### 2.1.3 Store Results:

In [None]:
train_pred = TM.OOF_train
test_pred = TM.OOF_test
train_pred = pd.DataFrame(data = train_pred, columns = ["NN_exp_00"])


sub = pd.read_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/sample_submission.csv",index_col=0)

sub["Calories"] =  test_pred.values

sub.to_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/submission_NN_exp_00.csv")
train_pred.to_csv("/content/drive/MyDrive/Exercises/Studies_Structured_Data/Data/S5E5/train_pred_NN_exp_00.csv")

In [None]:
test_pred.min(), test_pred.max()

In [None]:
train_pred