# Hypertuning with Cross-Validation Folds
This code estimates random forest and XGBoost (LightGBM package) and prints out our evaluation metric (f-beta, where beta=2). The code is working with the cleaned cross-validation folds created from the 2015-2018 dataset. We do both a baseline assessment accross 3-months of data and a full hypertuning with the 2015-2018 cross-validation folds. In the future, we plan to also try ensemble models. We adopted a bayesian hypertuning strategy appropriate for big data called Tree-structured Parazen Estimator (TPE) within the hyperopt package. TPE starts learning good values for your hyperparameters (within a range we set) as it goes through multiple trials. The bayesian approach is helpful for big data tuning because we do not have the compute resources to do a comprehensive grid search.

![Pipeline Image](https://i.imgur.com/wq62T0E.png)

### Project Description
This is a group project conducted for course w261: Machine Learning at Scale at the University of California Berkeley in Summer 2023. This project develops a machine learning model that predicts flight delays based on historical flight, airport station, and weather data spanning five years from 2015-2019 in the United States.

###Group members
Jessica Stockham, Chase Madison, Kisha Kim, Eric Danforth

Citation: Code written by Jessica Stockham

In [0]:
import numpy as np
import re
import pandas as pd
from datetime import datetime, timedelta, date

from pyspark.sql.types import *
from pyspark.sql.functions import *
from pyspark.sql import Window
from pyspark.sql.functions import udf, col,isnan,when,count
from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler,StandardScaler, Imputer, Bucketizer
from pyspark.ml.classification import LogisticRegression, RandomForestClassifier, GBTClassifier
from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator
from pyspark.mllib.evaluation import MulticlassMetrics
from lightgbm import LGBMClassifier

from hyperopt import fmin, tpe, Trials, SparkTrials, hp, space_eval
import mlflow



In [0]:
## Place this cell in any team notebook that needs access to the team cloud storage
mids261_mount_path = '/mnt/mids-w261'  # 261 course blob storage is mounted here
secret_scope = 'sec5-team1-scope'  # Name of the secret scope Chase created in Databricks CLI
secret_key = 'sec5-team1-key'  # Name of the secret key Chase created in Databricks CLI
storage_account = 'sec5team1storage'  # Name of the Azure Storage Account Chase created
blob_container = 'sec5-team1-container'  # Name of the container Chase created in Azure Storage Account
team_blob_url = f'wasbs://{blob_container}@{storage_account}.blob.core.windows.net'  # Points to the root of your team storage bucket
spark.conf.set(  # SAS Token: Grant the team limited access to Azure Storage resources
  f'fs.azure.sas.{blob_container}.{storage_account}.blob.core.windows.net',
  dbutils.secrets.get(scope=secret_scope, key=secret_key)
)

###Baseline Cross-Validation on 3-Month Folds

In [0]:
##### LOAD 3mo Baseline Folds DATASET ##########
timeInterval = '3mo'
fold_name = "folds" + timeInterval
folds = load_folds_from_blob_and_cache(team_blob_url, fold_name)
folds[0][1].schema["features"].metadata["ml_attr"]["attrs"]

Loading 3 folds...


{'numeric': [{'idx': 673, 'name': 'scaled_numeric_0'},
  {'idx': 674, 'name': 'scaled_numeric_1'}],
 'binary': [{'idx': 0, 'name': 'OP_UNIQUE_CARRIER_hot_WN'},
  {'idx': 1, 'name': 'OP_UNIQUE_CARRIER_hot_DL'},
  {'idx': 2, 'name': 'OP_UNIQUE_CARRIER_hot_EV'},
  {'idx': 3, 'name': 'OP_UNIQUE_CARRIER_hot_OO'},
  {'idx': 4, 'name': 'OP_UNIQUE_CARRIER_hot_AA'},
  {'idx': 5, 'name': 'OP_UNIQUE_CARRIER_hot_UA'},
  {'idx': 6, 'name': 'OP_UNIQUE_CARRIER_hot_US'},
  {'idx': 7, 'name': 'OP_UNIQUE_CARRIER_hot_MQ'},
  {'idx': 8, 'name': 'OP_UNIQUE_CARRIER_hot_B6'},
  {'idx': 9, 'name': 'OP_UNIQUE_CARRIER_hot_AS'},
  {'idx': 10, 'name': 'OP_UNIQUE_CARRIER_hot_NK'},
  {'idx': 11, 'name': 'OP_UNIQUE_CARRIER_hot_F9'},
  {'idx': 12, 'name': 'OP_UNIQUE_CARRIER_hot_HA'},
  {'idx': 13, 'name': 'OP_UNIQUE_CARRIER_hot_VX'},
  {'idx': 14, 'name': 'OP_UNIQUE_CARRIER_hot___unknown'},
  {'idx': 15, 'name': 'CRS_DEP_BUCKET_hot_2.0'},
  {'idx': 16, 'name': 'CRS_DEP_BUCKET_hot_3.0'},
  {'idx': 17, 'name': 'CRS_DEP

In [0]:
# BASELINE LOGISTIC WITH FOLDS
estimator = LogisticRegression(featuresCol = 'features', labelCol = 'label')
fscore = trainPredictEval(estimator)

CV FOLD START: 0: 2023-08-11 17:32:04.580162
Model built: 0: 2023-08-11 17:34:05.160354
Prediction Validation Set: 0: 2023-08-11 17:34:05.251975
0.051851741323226605
fold fscore: 0.051851741323226605
CV FOLD START: 1: 2023-08-11 17:34:16.781711
Model built: 1: 2023-08-11 17:36:27.640509
Prediction Validation Set: 1: 2023-08-11 17:36:27.784065
0.03454285614695528
fold fscore: 0.03454285614695528
CV FOLD START: 2: 2023-08-11 17:36:47.939786
Model built: 2: 2023-08-11 17:38:00.923822
Prediction Validation Set: 2: 2023-08-11 17:38:01.059795
0.012566119564132575
fold fscore: 0.012566119564132575
average fscore accross fold: 0.03298690567810482


In [0]:
# BASELINE RANDOM FOREST WITH FOLDS
estimator = RandomForestClassifier(featuresCol = 'features', labelCol = 'label')    
fscore_rf = trainPredictEval(estimator)

CV FOLD START: 0: 2023-08-11 17:38:33.238097
Model built: 0: 2023-08-11 17:38:54.193041
Prediction Validation Set: 0: 2023-08-11 17:38:54.283478
0.0
fold fscore: 0.0
CV FOLD START: 1: 2023-08-11 17:38:56.695042
Model built: 1: 2023-08-11 17:39:18.651062
Prediction Validation Set: 1: 2023-08-11 17:39:18.728618
0.0
fold fscore: 0.0
CV FOLD START: 2: 2023-08-11 17:39:20.015509
Model built: 2: 2023-08-11 17:39:36.613758
Prediction Validation Set: 2: 2023-08-11 17:39:36.693287
0.0
fold fscore: 0.0
average fscore accross fold: 0.0


###Hypertuning Cross-Validation with 60 Month Dataset

In [0]:
def load_folds_from_blob_and_cache(blob_url, fold_name):
    '''Load folds from storage blob'''
    
    folds = list()
    DEFAULT_PARTITION_COUNT = 50

    # Compute the fold count
    files = dbutils.fs.ls(f"{blob_url}/{fold_name}")
    fold_names = sorted([f.name for f in files if f.name.startswith("train")])
    match = re.match(r"train_(\d+)_df", fold_names[-1])
    fold_count = int(match.group(1)) + 1
    print(f"Loading {fold_count} folds...")

    # Load folds
    for i in range(fold_count):
        train_df = (
            spark.read.parquet(f"{blob_url}/{fold_name}/train_{i}_df")
            .repartition(DEFAULT_PARTITION_COUNT)
            .cache()
        )
        val_df = (
            spark.read.parquet(f"{blob_url}/{fold_name}/val_{i}_df")
            .repartition(DEFAULT_PARTITION_COUNT)
            .cache()
        )
        folds.append((train_df, val_df))
    return folds


def trainPredictEval(estimator):  

    """
    Get validation fscore across all folds. Function is called by objective_function_rf()

    Parameters:
        estimator: machine learning model defined in objective_function_rf()
    
    returns:
        average validation fscore accross all folds
    """
    from statistics import mean 

    metricsList = []

    # Load folds data
    for i, (train_df, val_df) in enumerate(folds):

        print(f'CV FOLD START: {i}: {datetime.now()}')
        
        # Train
        model = estimator.fit(train_df)

        print(f'Model built: {i}: {datetime.now()}')
        
        pred = model.transform(val_df).cache()
        
        print(f'Prediction Validation Set: {i}: {datetime.now()}')
            
        # Compute Metrics

        evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="fMeasureByLabel", beta=2.0, metricLabel=1.0)
        fmeasure = evaluator.evaluate(pred, {evaluator.metricLabel: 1.0})
        print(fmeasure)

        metricsList.append(fmeasure)
        print(f'fold fscore: {fmeasure}')

        pred.unpersist()

    avgFscore = mean(metricsList)
    print(f'average fscore accross fold: {avgFscore}')

    # mlflow logging
    mlflow.log_metric("f2_score", (-1)*avgFscore)

    # negate fscore becuase hyperopt minimizes
    return (-1)*avgFscore

In [0]:
# RUN THIS CODE BELOW TO PULL IN CV FOLDS IN VARIABLE "folds" for 60 month (only brings in 2015-2018 as training set)
timeInterval = '60mo'
fold_name = "folds" + timeInterval
folds = load_folds_from_blob_and_cache(team_blob_url, fold_name)

# Filter to the most recent 2 folds (2017 and 2018)
fold_small = folds[3:5]

# Create folds_slim that excludes ORIGIN_hot and DEST_hot (representing about 600 columns)
folds_slim = []

for i, (train_df, val_df) in enumerate(fold_small):

    train_df_new = train_df.drop("features")
    val_df_new = val_df.drop("features")
    
    features_all = ['IS_FIRST_FLIGHT_OF_DAY_double_hot',
    'is_holiday_adjacent_double_hot',
    'OP_UNIQUE_CARRIER_hot',
    'is_holiday_double_hot',
    'CRS_DEP_BUCKET_hot',
    'DAY_OF_WEEK_hot',
    'origin_type_hot',
    'dest_type_hot',
    'MONTH_hot',
    'YEAR_hot'] + ['scaled_numeric']

    #print(f'features_all: {features_all}')
    assembler = VectorAssembler(inputCols=features_all, outputCol="features")

    train_df_slim = assembler.transform(train_df_new)
    val_df_slim = assembler.transform(val_df_new)

    train_df_slim = train_df_slim.select(['features', 'label'])
    val_df_slim = val_df_slim.select(['features', 'label'])

    folds_slim.append((train_df_slim, val_df_slim))

folds_slim


In [0]:
# SCHEMA has "features, label" + individual features + intermediate features used for processing. 
# Jess changed this. Before just kept "features, label" but realized this gives us more flexibility.
# Could change "features" input on the fly if you wanted.
folds_slim[0][0].printSchema()

root
 |-- label: double (nullable = true)
 |-- DISTANCE: double (nullable = true)
 |-- ELEVATION: double (nullable = true)
 |-- FE_PRIOR_DAILY_AVG_DEP_DELAY: double (nullable = true)
 |-- FE_PRIOR_AVG_DURATION: double (nullable = true)
 |-- FE_NUM_FLIGHT_SCHEDULED: long (nullable = true)
 |-- DEP_DELAY_LAG: double (nullable = true)
 |-- DAY_OF_WEEK: integer (nullable = true)
 |-- MONTH: integer (nullable = true)
 |-- YEAR: integer (nullable = true)
 |-- OP_UNIQUE_CARRIER: string (nullable = true)
 |-- origin_type: string (nullable = true)
 |-- dest_type: string (nullable = true)
 |-- ORIGIN: string (nullable = true)
 |-- DEST: string (nullable = true)
 |-- is_holiday_double: double (nullable = true)
 |-- is_holiday_adjacent_double: double (nullable = true)
 |-- IS_FIRST_FLIGHT_OF_DAY_double: double (nullable = true)
 |-- DATE: timestamp (nullable = true)
 |-- FL_DATE: date (nullable = true)
 |-- OP_CARRIER_FL_NUM: integer (nullable = true)
 |-- DEP_DELAY: double (nullable = true)
 |-- 

In [0]:
# Inspect features
folds_slim[0][0].schema["features"].metadata["ml_attr"]["attrs"]

{'numeric': [{'idx': 810, 'name': 'scaled_numeric_0'},
  {'idx': 811, 'name': 'scaled_numeric_1'},
  {'idx': 812, 'name': 'scaled_numeric_2'},
  {'idx': 813, 'name': 'scaled_numeric_3'},
  {'idx': 814, 'name': 'scaled_numeric_4'},
  {'idx': 815, 'name': 'scaled_numeric_5'},
  {'idx': 816, 'name': 'scaled_numeric_6'},
  {'idx': 817, 'name': 'scaled_numeric_7'},
  {'idx': 818, 'name': 'scaled_numeric_8'},
  {'idx': 819, 'name': 'scaled_numeric_9'}],
 'binary': [{'idx': 0, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot_0.0'},
  {'idx': 1, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot_1.0'},
  {'idx': 2, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot___unknown'},
  {'idx': 3, 'name': 'is_holiday_adjacent_double_hot_0.0'},
  {'idx': 4, 'name': 'is_holiday_adjacent_double_hot_1.0'},
  {'idx': 5, 'name': 'is_holiday_adjacent_double_hot___unknown'},
  {'idx': 6, 'name': 'OP_UNIQUE_CARRIER_hot_WN'},
  {'idx': 7, 'name': 'OP_UNIQUE_CARRIER_hot_DL'},
  {'idx': 8, 'name': 'OP_UNIQUE_CARRIER_hot_AA'},
  {'idx

# HYPERTUNE WITH HYPERFLOW

In [0]:
def objective_function_rf(params):

    """
    Define estimator

    Parameters:
        params: default in hyperopt. Do not change.
    
    returns:
        trainPredictEval(estimator) function
    """

    # set hyperparameters we want to tune
    maxDepth = params["maxDepth"]
    numTrees = params["numTrees"]

    with mlflow.start_run():

        # Train
        estimator = RandomForestClassifier(featuresCol = 'features'
                                    , labelCol = 'label'
                                    , maxDepth = maxDepth
                                    , numTrees = numTrees
                                    )
        
        return trainPredictEval(estimator)    

folds = folds_slim

display(folds[0][0].schema["features"].metadata["ml_attr"]["attrs"])

print(f'Job START: {i}: {datetime.now()}')

mlflow.end_run()

# Keep logging off during hypertuning
mlflow.pyspark.ml.autolog(log_models=False)

# hp.quniform(label, low, high, q) Returns a value like round(uniform(low, high) / q) * q
search_space_rf = {
    "maxDepth": hp.quniform("maxDepth", 4, 12, 2),   
    "numTrees": hp.quniform("numTrees", 20, 200, 20)

}


num_evals = 8
trials = Trials()

best_hyperparam_rf = fmin(fn=objective_function_rf,
                       space=search_space_rf,
                       algo=tpe.suggest,
                       max_evals=num_evals,
                       trials=trials,
                       rstate=np.random.default_rng(42))

# BEST PARAMETERS
best_params = space_eval(search_space_rf, best_hyperparam_rf)
print(f'best parameters: {best_params}')

# LOG IT

with mlflow.start_run():
    mlflow.log_params(best_params)
    mlflow.log_metric("CV_2folds_Drop_Origin_Dest_fbeta_rf", trials.best_trial['result']['loss'])
# End prior mlfow run
mlflow.end_run()

Loading 5 folds...


{'numeric': [{'idx': 71, 'name': 'scaled_numeric_0'},
  {'idx': 72, 'name': 'scaled_numeric_1'},
  {'idx': 73, 'name': 'scaled_numeric_2'},
  {'idx': 74, 'name': 'scaled_numeric_3'},
  {'idx': 75, 'name': 'scaled_numeric_4'},
  {'idx': 76, 'name': 'scaled_numeric_5'},
  {'idx': 77, 'name': 'scaled_numeric_6'},
  {'idx': 78, 'name': 'scaled_numeric_7'},
  {'idx': 79, 'name': 'scaled_numeric_8'},
  {'idx': 80, 'name': 'scaled_numeric_9'}],
 'binary': [{'idx': 0, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot_0.0'},
  {'idx': 1, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot_1.0'},
  {'idx': 2, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot___unknown'},
  {'idx': 3, 'name': 'is_holiday_adjacent_double_hot_0.0'},
  {'idx': 4, 'name': 'is_holiday_adjacent_double_hot_1.0'},
  {'idx': 5, 'name': 'is_holiday_adjacent_double_hot___unknown'},
  {'idx': 6, 'name': 'OP_UNIQUE_CARRIER_hot_WN'},
  {'idx': 7, 'name': 'OP_UNIQUE_CARRIER_hot_DL'},
  {'idx': 8, 'name': 'OP_UNIQUE_CARRIER_hot_AA'},
  {'idx': 9, 'nam

Job START: 1: 2023-08-08 09:41:19.415446
  0%|          | 0/8 [00:00<?, ?trial/s, best loss=?]                                                     CV FOLD START: 0: 2023-08-08 09:41:20.178859
  0%|          | 0/8 [00:00<?, ?trial/s, best loss=?]




                                                     Model built: 0: 2023-08-08 10:14:40.530339
  0%|          | 0/8 [33:21<?, ?trial/s, best loss=?]                                                     Prediction Validation Set: 0: 2023-08-08 10:14:40.622530
  0%|          | 0/8 [33:21<?, ?trial/s, best loss=?]




                                                     0.26049919049283454
  0%|          | 0/8 [35:15<?, ?trial/s, best loss=?]                                                     fold fscore: 0.26049919049283454
  0%|          | 0/8 [35:15<?, ?trial/s, best loss=?]                                                     CV FOLD START: 1: 2023-08-08 10:16:35.149048
  0%|          | 0/8 [35:15<?, ?trial/s, best loss=?]




                                                     Model built: 1: 2023-08-08 11:00:46.451835
  0%|          | 0/8 [1:19:26<?, ?trial/s, best loss=?]                                                       Prediction Validation Set: 1: 2023-08-08 11:00:46.564373
  0%|          | 0/8 [1:19:27<?, ?trial/s, best loss=?]




                                                       0.3750751168066343
  0%|          | 0/8 [1:21:40<?, ?trial/s, best loss=?]                                                       fold fscore: 0.3750751168066343
  0%|          | 0/8 [1:21:40<?, ?trial/s, best loss=?]                                                       average fscore accross fold: 0.3177871536497344
  0%|          | 0/8 [1:21:40<?, ?trial/s, best loss=?] 12%|█▎        | 1/8 [1:21:40<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      CV FOLD START: 0: 2023-08-08 11:03:00.745883
 12%|█▎        | 1/8 [1:21:41<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]




                                                                                      Model built: 0: 2023-08-08 11:21:43.400762
 12%|█▎        | 1/8 [1:40:23<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      Prediction Validation Set: 0: 2023-08-08 11:21:43.496273
 12%|█▎        | 1/8 [1:40:24<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]




                                                                                      0.37022813830220574
 12%|█▎        | 1/8 [1:41:28<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      fold fscore: 0.37022813830220574
 12%|█▎        | 1/8 [1:41:28<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      CV FOLD START: 1: 2023-08-08 11:22:48.330284
 12%|█▎        | 1/8 [1:41:28<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]




                                                                                      Model built: 1: 2023-08-08 11:48:27.750829
 12%|█▎        | 1/8 [2:07:08<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      Prediction Validation Set: 1: 2023-08-08 11:48:27.866561
 12%|█▎        | 1/8 [2:07:08<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]




                                                                                      0.3791464174945349
 12%|█▎        | 1/8 [2:08:52<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      fold fscore: 0.3791464174945349
 12%|█▎        | 1/8 [2:08:52<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344]                                                                                      average fscore accross fold: 0.3746872778983703
 12%|█▎        | 1/8 [2:08:52<9:31:43, 4900.47s/trial, best loss: -0.3177871536497344] 25%|██▌       | 2/8 [2:08:52<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 0: 2023-08-08 11:50:12.808793
 25%|██▌       | 2/8 [2:08:53<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 0: 2023-08-08 11:56:55.483665
 25%|██▌       | 2/8 [2:15:36<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 0: 2023-08-08 11:56:55.609645
 25%|██▌       | 2/8 [2:15:36<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]




                                                                                      0.0
 25%|██▌       | 2/8 [2:16:14<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.0
 25%|██▌       | 2/8 [2:16:14<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 1: 2023-08-08 11:57:33.861913
 25%|██▌       | 2/8 [2:16:14<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 1: 2023-08-08 12:05:03.963597
 25%|██▌       | 2/8 [2:23:44<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 1: 2023-08-08 12:05:04.070502
 25%|██▌       | 2/8 [2:23:44<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]




                                                                                      0.0
 25%|██▌       | 2/8 [2:24:55<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.0
 25%|██▌       | 2/8 [2:24:55<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703]                                                                                      average fscore accross fold: 0.0
 25%|██▌       | 2/8 [2:24:55<6:08:22, 3683.80s/trial, best loss: -0.3746872778983703] 38%|███▊      | 3/8 [2:24:55<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 0: 2023-08-08 12:06:16.105846
 38%|███▊      | 3/8 [2:24:56<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 0: 2023-08-08 12:20:56.464300
 38%|███▊      | 3/8 [2:39:37<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 0: 2023-08-08 12:20:56.562120
 38%|███▊      | 3/8 [2:39:37<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]




                                                                                      0.06073111688469884
 38%|███▊      | 3/8 [2:41:01<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.06073111688469884
 38%|███▊      | 3/8 [2:41:01<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 1: 2023-08-08 12:22:20.971171
 38%|███▊      | 3/8 [2:41:01<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 1: 2023-08-08 12:40:57.683493
 38%|███▊      | 3/8 [2:59:38<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 1: 2023-08-08 12:40:57.777038
 38%|███▊      | 3/8 [2:59:38<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]




                                                                                      0.16556247112276679
 38%|███▊      | 3/8 [3:00:52<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.16556247112276679
 38%|███▊      | 3/8 [3:00:52<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703]                                                                                      average fscore accross fold: 0.11314679400373281
 38%|███▊      | 3/8 [3:00:52<3:23:28, 2441.60s/trial, best loss: -0.3746872778983703] 50%|█████     | 4/8 [3:00:52<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 0: 2023-08-08 12:42:13.119743
 50%|█████     | 4/8 [3:00:53<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 0: 2023-08-08 12:53:55.105127
 50%|█████     | 4/8 [3:12:35<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 0: 2023-08-08 12:53:55.225410
 50%|█████     | 4/8 [3:12:35<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]




                                                                                      0.0
 50%|█████     | 4/8 [3:13:30<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.0
 50%|█████     | 4/8 [3:13:30<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 1: 2023-08-08 12:54:49.913694
 50%|█████     | 4/8 [3:13:30<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 1: 2023-08-08 13:08:38.491508
 50%|█████     | 4/8 [3:27:19<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 1: 2023-08-08 13:08:38.590881
 50%|█████     | 4/8 [3:27:19<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]




                                                                                      0.0
 50%|█████     | 4/8 [3:28:27<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.0
 50%|█████     | 4/8 [3:28:27<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703]                                                                                      average fscore accross fold: 0.0
 50%|█████     | 4/8 [3:28:27<2:35:16, 2329.22s/trial, best loss: -0.3746872778983703] 62%|██████▎   | 5/8 [3:28:27<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 0: 2023-08-08 13:09:47.861237
 62%|██████▎   | 5/8 [3:28:28<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 0: 2023-08-08 13:31:23.699178
 62%|██████▎   | 5/8 [3:50:04<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 0: 2023-08-08 13:31:23.807799
 62%|██████▎   | 5/8 [3:50:04<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]




                                                                                      0.001657131044898001
 62%|██████▎   | 5/8 [3:51:47<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.001657131044898001
 62%|██████▎   | 5/8 [3:51:47<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 1: 2023-08-08 13:33:06.793069
 62%|██████▎   | 5/8 [3:51:47<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 1: 2023-08-08 14:01:01.153146
 62%|██████▎   | 5/8 [4:19:41<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 1: 2023-08-08 14:01:01.257577
 62%|██████▎   | 5/8 [4:19:41<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]




                                                                                      0.10799852052928027
 62%|██████▎   | 5/8 [4:21:25<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.10799852052928027
 62%|██████▎   | 5/8 [4:21:25<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703]                                                                                      average fscore accross fold: 0.054827825787089134
 62%|██████▎   | 5/8 [4:21:25<1:44:17, 2085.97s/trial, best loss: -0.3746872778983703] 75%|███████▌  | 6/8 [4:21:25<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 0: 2023-08-08 14:02:45.517384
 75%|███████▌  | 6/8 [4:21:26<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 0: 2023-08-08 15:23:58.496719
 75%|███████▌  | 6/8 [5:42:39<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 0: 2023-08-08 15:23:58.599859
 75%|███████▌  | 6/8 [5:42:39<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]




                                                                                      0.39581951266878845
 75%|███████▌  | 6/8 [5:46:34<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.39581951266878845
 75%|███████▌  | 6/8 [5:46:34<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      CV FOLD START: 1: 2023-08-08 15:27:54.033494
 75%|███████▌  | 6/8 [5:46:34<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]




                                                                                      Model built: 1: 2023-08-08 17:02:27.151429
 75%|███████▌  | 6/8 [7:21:07<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      Prediction Validation Set: 1: 2023-08-08 17:02:27.460655
 75%|███████▌  | 6/8 [7:21:08<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]




                                                                                      0.4053420979951356
 75%|███████▌  | 6/8 [7:25:59<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      fold fscore: 0.4053420979951356
 75%|███████▌  | 6/8 [7:25:59<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703]                                                                                      average fscore accross fold: 0.40058080533196205
 75%|███████▌  | 6/8 [7:25:59<1:21:54, 2457.18s/trial, best loss: -0.3746872778983703] 88%|████████▊ | 7/8 [7:26:00<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       CV FOLD START: 0: 2023-08-08 17:07:20.210388
 88%|████████▊ | 7/8 [7:26:00<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]




                                                                                       Model built: 0: 2023-08-08 17:21:49.459008
 88%|████████▊ | 7/8 [7:40:30<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       Prediction Validation Set: 0: 2023-08-08 17:21:49.582713
 88%|████████▊ | 7/8 [7:40:30<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]




                                                                                       0.3613048411562015
 88%|████████▊ | 7/8 [7:41:39<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       fold fscore: 0.3613048411562015
 88%|████████▊ | 7/8 [7:41:39<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       CV FOLD START: 1: 2023-08-08 17:22:59.261069
 88%|████████▊ | 7/8 [7:41:39<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]




                                                                                       Model built: 1: 2023-08-08 17:40:32.161381
 88%|████████▊ | 7/8 [7:59:12<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       Prediction Validation Set: 1: 2023-08-08 17:40:32.266446
 88%|████████▊ | 7/8 [7:59:12<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]




                                                                                       0.40170511320329555
 88%|████████▊ | 7/8 [8:00:50<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       fold fscore: 0.40170511320329555
 88%|████████▊ | 7/8 [8:00:50<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]                                                                                       average fscore accross fold: 0.3815049771797485
 88%|████████▊ | 7/8 [8:00:50<1:27:54, 5274.50s/trial, best loss: -0.40058080533196205]100%|██████████| 8/8 [8:00:50<00:00, 4260.75s/trial, best loss: -0.40058080533196205]  100%|██████████| 8/8 [8:00:50<00:00, 3606.29s/trial, best loss: -0.40058080533196205]
best parameters: {'maxDepth': 12.0, 'numTrees': 140.0}


In [0]:
def objective_function_lightgb(params):

    # set hyperparameters we want to tune
    maxDepth = params["max_depth"]
    minchildweight = params["min_child_weight"]
    mindatainleaf = params["min_data_in_leaf"]
    subSample = params["subsample"]
    alp = params["reg_alpha"]



    #subsample
    with mlflow.start_run():

        # Build Model
        estimator_lgbm = LGBMClassifier(featuresCol = 'features'
                            , labelCol = 'label'
                            , maxDepth = maxDepth
                            ,min_child_weight=minchildweight
                            ,min_data_in_leaf=mindatainleaf
                             ,subsample=subSample
                            ,reg_alpha=alp
                            , numRound=100, numWorkers=2
                            )

        return trainPredictEval(estimator_lgbm)
    
    folds = folds_slim

mlflow.end_run()

# Keep logging off during hypertuning
mlflow.pyspark.ml.autolog(log_models=False)


search_space_lightgb = {
    #step 1 - tree specific drivers
        #"maxDepth": hp.quniform("maxDepth", 2, 10, 1),   

#  #step 1b- data in each tree

     'max_depth':hp.quniform("max_depth", 3,10,2),
     'min_child_weight':hp.quniform("min_child_weight", 1,6,2),
     'min_data_in_leaf':hp.quniform("min_data_in_leaf", 100, 1000, 200),
    'subsample':hp.choice("subsample", [i/10.0 for i in range(6,10)]),
    

#  #step 2 - regularization

     'reg_alpha':hp.choice("reg_alpha", [1e-5, 1e-2, 0.1, 1, 100])
}

num_evals = 10
trials = Trials()


best_hyperparam_lightgb = fmin(fn=objective_function_lightgb,
                       space=search_space_lightgb,
                       algo=tpe.suggest,
                       max_evals=num_evals,
                       trials=trials,
                       rstate=np.random.default_rng(42))

# BEST PARAMETERS
best_params_lightgb = space_eval(search_space_lightgb, best_hyperparam_lightgb)
print(f'best parameters: {best_params_lightgb}')

# LOG IT

with mlflow.start_run():
    mlflow.log_params(best_params_lightgb)
    mlflow.log_metric("CV_2folds_Drop_Origin_Dest_fbeta_XG", trials.best_trial['result']['loss'])
# End prior mlfow run
mlflow.end_run()





Loading 5 folds...


{'numeric': [{'idx': 71, 'name': 'scaled_numeric_0'},
  {'idx': 72, 'name': 'scaled_numeric_1'},
  {'idx': 73, 'name': 'scaled_numeric_2'},
  {'idx': 74, 'name': 'scaled_numeric_3'},
  {'idx': 75, 'name': 'scaled_numeric_4'},
  {'idx': 76, 'name': 'scaled_numeric_5'},
  {'idx': 77, 'name': 'scaled_numeric_6'},
  {'idx': 78, 'name': 'scaled_numeric_7'},
  {'idx': 79, 'name': 'scaled_numeric_8'},
  {'idx': 80, 'name': 'scaled_numeric_9'}],
 'binary': [{'idx': 0, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot_0.0'},
  {'idx': 1, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot_1.0'},
  {'idx': 2, 'name': 'IS_FIRST_FLIGHT_OF_DAY_double_hot___unknown'},
  {'idx': 3, 'name': 'is_holiday_adjacent_double_hot_0.0'},
  {'idx': 4, 'name': 'is_holiday_adjacent_double_hot_1.0'},
  {'idx': 5, 'name': 'is_holiday_adjacent_double_hot___unknown'},
  {'idx': 6, 'name': 'OP_UNIQUE_CARRIER_hot_WN'},
  {'idx': 7, 'name': 'OP_UNIQUE_CARRIER_hot_DL'},
  {'idx': 8, 'name': 'OP_UNIQUE_CARRIER_hot_AA'},
  {'idx': 9, 'nam

  0%|          | 0/10 [00:00<?, ?trial/s, best loss=?]                                                      CV FOLD START: 0: 2023-08-08 06:26:48.574583
  0%|          | 0/10 [00:00<?, ?trial/s, best loss=?]




                                                      Model built: 0: 2023-08-08 06:33:19.114570
  0%|          | 0/10 [06:31<?, ?trial/s, best loss=?]                                                      Prediction Validation Set: 0: 2023-08-08 06:33:19.190269
  0%|          | 0/10 [06:31<?, ?trial/s, best loss=?]




                                                      0.1507283981018264
  0%|          | 0/10 [07:18<?, ?trial/s, best loss=?]                                                      fold fscore: 0.1507283981018264
  0%|          | 0/10 [07:18<?, ?trial/s, best loss=?]                                                      CV FOLD START: 1: 2023-08-08 06:34:06.507445
  0%|          | 0/10 [07:18<?, ?trial/s, best loss=?]




                                                      Model built: 1: 2023-08-08 06:44:32.303248
  0%|          | 0/10 [17:44<?, ?trial/s, best loss=?]                                                      Prediction Validation Set: 1: 2023-08-08 06:44:32.387729
  0%|          | 0/10 [17:44<?, ?trial/s, best loss=?]




                                                      0.3259635789839808
  0%|          | 0/10 [19:26<?, ?trial/s, best loss=?]                                                      fold fscore: 0.3259635789839808
  0%|          | 0/10 [19:26<?, ?trial/s, best loss=?]                                                      average fscore accross fold: 0.23834598854290362
  0%|          | 0/10 [19:26<?, ?trial/s, best loss=?] 10%|█         | 1/10 [19:26<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      CV FOLD START: 0: 2023-08-08 06:46:15.325637
 10%|█         | 1/10 [19:27<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]




                                                                                      Model built: 0: 2023-08-08 06:54:25.536426
 10%|█         | 1/10 [27:37<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      Prediction Validation Set: 0: 2023-08-08 06:54:25.779432
 10%|█         | 1/10 [27:38<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]




                                                                                      0.14803960290317122
 10%|█         | 1/10 [28:31<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      fold fscore: 0.14803960290317122
 10%|█         | 1/10 [28:31<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      CV FOLD START: 1: 2023-08-08 06:55:19.215470
 10%|█         | 1/10 [28:31<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]




                                                                                      Model built: 1: 2023-08-08 07:05:41.097655
 10%|█         | 1/10 [38:53<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      Prediction Validation Set: 1: 2023-08-08 07:05:41.192201
 10%|█         | 1/10 [38:53<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]




                                                                                      0.2680171191661673
 10%|█         | 1/10 [40:02<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      fold fscore: 0.2680171191661673
 10%|█         | 1/10 [40:02<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362]                                                                                      average fscore accross fold: 0.20802836103466926
 10%|█         | 1/10 [40:03<2:55:02, 1166.93s/trial, best loss: -0.23834598854290362] 20%|██        | 2/10 [40:03<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      CV FOLD START: 0: 2023-08-08 07:06:51.477976
 20%|██        | 2/10 [40:03<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]




                                                                                      Model built: 0: 2023-08-08 07:13:25.689015
 20%|██        | 2/10 [46:38<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      Prediction Validation Set: 0: 2023-08-08 07:13:25.760136
 20%|██        | 2/10 [46:38<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]




                                                                                      0.14803960290317122
 20%|██        | 2/10 [47:22<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      fold fscore: 0.14803960290317122
 20%|██        | 2/10 [47:22<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      CV FOLD START: 1: 2023-08-08 07:14:10.084925
 20%|██        | 2/10 [47:22<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]




                                                                                      Model built: 1: 2023-08-08 07:22:22.262236
 20%|██        | 2/10 [55:34<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      Prediction Validation Set: 1: 2023-08-08 07:22:22.335977
 20%|██        | 2/10 [55:34<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]




                                                                                      0.2791327551881639
 20%|██        | 2/10 [56:26<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      fold fscore: 0.2791327551881639
 20%|██        | 2/10 [56:26<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362]                                                                                      average fscore accross fold: 0.21358617904566757
 20%|██        | 2/10 [56:26<2:41:01, 1207.73s/trial, best loss: -0.23834598854290362] 30%|███       | 3/10 [56:26<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                      CV FOLD START: 0: 2023-08-08 07:23:15.260064
 30%|███       | 3/10 [56:27<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]




                                                                                      Model built: 0: 2023-08-08 07:30:14.265132
 30%|███       | 3/10 [1:03:26<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                        Prediction Validation Set: 0: 2023-08-08 07:30:14.345377
 30%|███       | 3/10 [1:03:26<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]




                                                                                        0.14803960290317122
 30%|███       | 3/10 [1:04:21<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                        fold fscore: 0.14803960290317122
 30%|███       | 3/10 [1:04:21<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                        CV FOLD START: 1: 2023-08-08 07:31:09.142546
 30%|███       | 3/10 [1:04:21<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]




                                                                                        Model built: 1: 2023-08-08 07:39:11.055517
 30%|███       | 3/10 [1:12:23<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                        Prediction Validation Set: 1: 2023-08-08 07:39:11.128819
 30%|███       | 3/10 [1:12:23<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]




                                                                                        0.31067492617420894
 30%|███       | 3/10 [1:13:43<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                        fold fscore: 0.31067492617420894
 30%|███       | 3/10 [1:13:43<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362]                                                                                        average fscore accross fold: 0.2293572645386901
 30%|███       | 3/10 [1:13:43<2:08:58, 1105.43s/trial, best loss: -0.23834598854290362] 40%|████      | 4/10 [1:13:43<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        CV FOLD START: 0: 2023-08-08 07:40:31.743326
 40%|████      | 4/10 [1:13:44<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]




                                                                                        Model built: 0: 2023-08-08 07:47:05.024656
 40%|████      | 4/10 [1:20:17<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        Prediction Validation Set: 0: 2023-08-08 07:47:05.105530
 40%|████      | 4/10 [1:20:17<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]




                                                                                        0.14803960290317122
 40%|████      | 4/10 [1:21:10<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        fold fscore: 0.14803960290317122
 40%|████      | 4/10 [1:21:10<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        CV FOLD START: 1: 2023-08-08 07:47:58.259081
 40%|████      | 4/10 [1:21:10<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]




                                                                                        Model built: 1: 2023-08-08 07:56:34.328840
 40%|████      | 4/10 [1:29:46<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        Prediction Validation Set: 1: 2023-08-08 07:56:34.404933
 40%|████      | 4/10 [1:29:46<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]




                                                                                        0.2791327551881639
 40%|████      | 4/10 [1:30:53<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        fold fscore: 0.2791327551881639
 40%|████      | 4/10 [1:30:53<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362]                                                                                        average fscore accross fold: 0.21358617904566757
 40%|████      | 4/10 [1:30:53<1:47:49, 1078.26s/trial, best loss: -0.23834598854290362] 50%|█████     | 5/10 [1:30:53<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        CV FOLD START: 0: 2023-08-08 07:57:42.207547
 50%|█████     | 5/10 [1:30:54<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]




                                                                                        Model built: 0: 2023-08-08 08:04:33.418241
 50%|█████     | 5/10 [1:37:45<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        Prediction Validation Set: 0: 2023-08-08 08:04:33.499989
 50%|█████     | 5/10 [1:37:45<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]




                                                                                        0.17475168642847178
 50%|█████     | 5/10 [1:38:36<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        fold fscore: 0.17475168642847178
 50%|█████     | 5/10 [1:38:36<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        CV FOLD START: 1: 2023-08-08 08:05:24.546158
 50%|█████     | 5/10 [1:38:36<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]




                                                                                        Model built: 1: 2023-08-08 08:15:23.517339
 50%|█████     | 5/10 [1:48:35<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        Prediction Validation Set: 1: 2023-08-08 08:15:23.596095
 50%|█████     | 5/10 [1:48:35<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]




                                                                                        0.3098193630930614
 50%|█████     | 5/10 [1:49:51<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        fold fscore: 0.3098193630930614
 50%|█████     | 5/10 [1:49:51<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362]                                                                                        average fscore accross fold: 0.2422855247607666
 50%|█████     | 5/10 [1:49:51<1:28:25, 1061.02s/trial, best loss: -0.23834598854290362] 60%|██████    | 6/10 [1:49:51<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                        CV FOLD START: 0: 2023-08-08 08:16:39.643054
 60%|██████    | 6/10 [1:49:51<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]




                                                                                       Model built: 0: 2023-08-08 08:23:42.712098
 60%|██████    | 6/10 [1:56:55<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                       Prediction Validation Set: 0: 2023-08-08 08:23:42.792400
 60%|██████    | 6/10 [1:56:55<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]




                                                                                       0.2905477275143471
 60%|██████    | 6/10 [1:57:50<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                       fold fscore: 0.2905477275143471
 60%|██████    | 6/10 [1:57:50<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                       CV FOLD START: 1: 2023-08-08 08:24:38.223135
 60%|██████    | 6/10 [1:57:50<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]




                                                                                       Model built: 1: 2023-08-08 08:36:02.729151
 60%|██████    | 6/10 [2:09:15<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                       Prediction Validation Set: 1: 2023-08-08 08:36:02.823415
 60%|██████    | 6/10 [2:09:15<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]




                                                                                       0.28381140677775346
 60%|██████    | 6/10 [2:10:46<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                       fold fscore: 0.28381140677775346
 60%|██████    | 6/10 [2:10:46<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666]                                                                                       average fscore accross fold: 0.2871795671460503
 60%|██████    | 6/10 [2:10:46<1:12:27, 1086.96s/trial, best loss: -0.2422855247607666] 70%|███████   | 7/10 [2:10:46<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                       CV FOLD START: 0: 2023-08-08 08:37:34.871991
 70%|███████   | 7/10 [2:10:47<57:05, 1141.96s/trial, best loss: -0.2871795671460503]




                                                                                     Model built: 0: 2023-08-08 08:45:29.493263
 70%|███████   | 7/10 [2:18:41<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                     Prediction Validation Set: 0: 2023-08-08 08:45:29.571469
 70%|███████   | 7/10 [2:18:41<57:05, 1141.96s/trial, best loss: -0.2871795671460503]




                                                                                     0.15125945199491264
 70%|███████   | 7/10 [2:19:24<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                     fold fscore: 0.15125945199491264
 70%|███████   | 7/10 [2:19:24<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                     CV FOLD START: 1: 2023-08-08 08:46:12.239955
 70%|███████   | 7/10 [2:19:24<57:05, 1141.96s/trial, best loss: -0.2871795671460503]




                                                                                     Model built: 1: 2023-08-08 08:57:01.631416
 70%|███████   | 7/10 [2:30:13<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                     Prediction Validation Set: 1: 2023-08-08 08:57:01.708236
 70%|███████   | 7/10 [2:30:14<57:05, 1141.96s/trial, best loss: -0.2871795671460503]




                                                                                     0.2630376854989977
 70%|███████   | 7/10 [2:31:36<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                     fold fscore: 0.2630376854989977
 70%|███████   | 7/10 [2:31:36<57:05, 1141.96s/trial, best loss: -0.2871795671460503]                                                                                     average fscore accross fold: 0.20714856874695517
 70%|███████   | 7/10 [2:31:36<57:05, 1141.96s/trial, best loss: -0.2871795671460503] 80%|████████  | 8/10 [2:31:36<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     CV FOLD START: 0: 2023-08-08 08:58:24.977380
 80%|████████  | 8/10 [2:31:37<39:12, 1176.39s/trial, best loss: -0.2871795671460503]




                                                                                     Model built: 0: 2023-08-08 09:07:06.295402
 80%|████████  | 8/10 [2:40:18<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     Prediction Validation Set: 0: 2023-08-08 09:07:06.389511
 80%|████████  | 8/10 [2:40:18<39:12, 1176.39s/trial, best loss: -0.2871795671460503]




                                                                                     0.1678479219500937
 80%|████████  | 8/10 [2:41:00<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     fold fscore: 0.1678479219500937
 80%|████████  | 8/10 [2:41:00<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     CV FOLD START: 1: 2023-08-08 09:07:47.957522
 80%|████████  | 8/10 [2:41:00<39:12, 1176.39s/trial, best loss: -0.2871795671460503]




                                                                                     Model built: 1: 2023-08-08 09:22:38.333821
 80%|████████  | 8/10 [2:55:50<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     Prediction Validation Set: 1: 2023-08-08 09:22:38.420706
 80%|████████  | 8/10 [2:55:50<39:12, 1176.39s/trial, best loss: -0.2871795671460503]




                                                                                     0.29688935656658855
 80%|████████  | 8/10 [2:56:51<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     fold fscore: 0.29688935656658855
 80%|████████  | 8/10 [2:56:51<39:12, 1176.39s/trial, best loss: -0.2871795671460503]                                                                                     average fscore accross fold: 0.23236863925834111
 80%|████████  | 8/10 [2:56:51<39:12, 1176.39s/trial, best loss: -0.2871795671460503] 90%|█████████ | 9/10 [2:56:51<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     CV FOLD START: 0: 2023-08-08 09:23:39.740057
 90%|█████████ | 9/10 [2:56:52<21:22, 1282.23s/trial, best loss: -0.2871795671460503]




                                                                                     Model built: 0: 2023-08-08 09:30:52.824875
 90%|█████████ | 9/10 [3:04:05<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     Prediction Validation Set: 0: 2023-08-08 09:30:52.904865
 90%|█████████ | 9/10 [3:04:05<21:22, 1282.23s/trial, best loss: -0.2871795671460503]




                                                                                     0.12735659920322326
 90%|█████████ | 9/10 [3:04:53<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     fold fscore: 0.12735659920322326
 90%|█████████ | 9/10 [3:04:53<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     CV FOLD START: 1: 2023-08-08 09:31:40.804562
 90%|█████████ | 9/10 [3:04:53<21:22, 1282.23s/trial, best loss: -0.2871795671460503]




                                                                                     Model built: 1: 2023-08-08 09:39:21.951473
 90%|█████████ | 9/10 [3:12:34<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     Prediction Validation Set: 1: 2023-08-08 09:39:22.036498
 90%|█████████ | 9/10 [3:12:34<21:22, 1282.23s/trial, best loss: -0.2871795671460503]




                                                                                     0.33998018764358134
 90%|█████████ | 9/10 [3:13:26<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     fold fscore: 0.33998018764358134
 90%|█████████ | 9/10 [3:13:26<21:22, 1282.23s/trial, best loss: -0.2871795671460503]                                                                                     average fscore accross fold: 0.2336683934234023
 90%|█████████ | 9/10 [3:13:26<21:22, 1282.23s/trial, best loss: -0.2871795671460503]100%|██████████| 10/10 [3:13:26<00:00, 1193.61s/trial, best loss: -0.2871795671460503]100%|██████████| 10/10 [3:13:26<00:00, 1160.67s/trial, best loss: -0.2871795671460503]
best parameters: {'max_depth': 10.0, 'min_child_weight': 2.0, 'min_data_in_leaf': 1000.0, 'reg_alpha': 1, 'subsample': 0.8}
