# Problem 1 - Hyperparameter Optimization using H20

## 1.1

### (a)

In [3]:
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()

airlines= h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/airlines/allyears2k_headers.zip")

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.21" 2023-10-17; OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu122.04); OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)
  Starting server from /usr/local/lib/python3.10/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpf_dw13f0
  JVM stdout: /tmp/tmpf_dw13f0/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpf_dw13f0/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,05 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.44.0.2
H2O_cluster_version_age:,25 days
H2O_cluster_name:,H2O_from_python_unknownUser_p6pal1
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.170 Gb
H2O_cluster_total_cores:,2
H2O_cluster_allowed_cores:,2


Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [5]:
from h2o.grid.grid_search import H2OGridSearch
from h2o.estimators.random_forest import H2ORandomForestEstimator
# Split the dataset
train, test = airlines.split_frame(ratios=[.8], seed=1234)

# set the predictor names and the response column name
predictors = ["Origin", "Dest", "Year", "UniqueCarrier", "DayOfWeek", "Month", "Distance", "FlightNum"]
response = "IsDepDelayed"

hyperparameters = {'ntrees': [10, 30, 50, 100], 'max_depth': [1, 2, 4, 6]}

rf_grid = H2OGridSearch(model=H2ORandomForestEstimator,
                        grid_id='rf_grid',
                        hyper_params=hyperparameters)
rf_grid.train(x= predictors, y= response, training_frame=train)

drf Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%


Unnamed: 0,max_depth,ntrees,model_ids,logloss
,6.0,100.0,rf_grid_model_16,0.6174081
,6.0,30.0,rf_grid_model_8,0.6180608
,6.0,50.0,rf_grid_model_12,0.6192753
,6.0,10.0,rf_grid_model_4,0.6250022
,4.0,50.0,rf_grid_model_11,0.6334848
,4.0,100.0,rf_grid_model_15,0.6341884
,4.0,30.0,rf_grid_model_7,0.6353233
,4.0,10.0,rf_grid_model_3,0.6431804
,2.0,50.0,rf_grid_model_10,0.6578778
,2.0,100.0,rf_grid_model_14,0.6584011


### (b)

In [6]:
sorted_grid_results = rf_grid.get_grid(sort_by='accuracy', decreasing=True)
print(sorted_grid_results)

Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids         accuracy
--  -----------  --------  ----------------  ----------
    6            100       rf_grid_model_16  0.670364
    6            30        rf_grid_model_8   0.669938
    6            50        rf_grid_model_12  0.668889
    6            10        rf_grid_model_4   0.658305
    4            100       rf_grid_model_15  0.65601
    4            50        rf_grid_model_11  0.65479
    4            30        rf_grid_model_7   0.65184
    2            100       rf_grid_model_14  0.636748
    4            10        rf_grid_model_3   0.633978
    2            50        rf_grid_model_10  0.629628
    2            30        rf_grid_model_6   0.627301
    1            100       rf_grid_model_13  0.617855
    1            50        rf_grid_model_9   0.613117
    1            30        rf_grid_model_5   0.612437
    2            10        rf_grid_model_2   0.611254
    1            10 

### (c)

In [7]:
best_rf_model = sorted_grid_results.models[0]
print("Best Model:\n", best_rf_model)

Best Model:
 Model Details
H2ORandomForestEstimator : Distributed Random Forest
Model Key: rf_grid_model_16


Model Summary: 
    number_of_trees    number_of_internal_trees    model_size_in_bytes    min_depth    max_depth    mean_depth    min_leaves    max_leaves    mean_leaves
--  -----------------  --------------------------  ---------------------  -----------  -----------  ------------  ------------  ------------  -------------
    100                100                         126036                 6            6            6             34            64            58.5

ModelMetricsBinomial: drf
** Reported on train data. **

MSE: 0.2140357072340236
RMSE: 0.46263993259772074
LogLoss: 0.6174081496557858
Mean Per-Class Error: 0.39378861143656296
AUC: 0.7229765512342221
AUCPR: 0.7350414191332454
Gini: 0.4459531024684442

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3812262656592868
       NO    YES    Error    Rate
-----  ----  -----  -------  -----------------
NO     527

In [8]:
performance = best_rf_model.model_performance(test)
print("AUC:\n", performance.auc())

AUC:
 0.7179929105058308


**Answer:**

The model 'rf_grid_model_16' consistently shows up as the top performer across both log loss and accuracy metrics. This indicates that for this specific dataset, a max_depth of 6 and ntrees of 100 is a good combination of hyperparameters for the RandomForest model. It is evident that models with higher max_depth and ntrees values tend to perform better. Based on the metrics above, the AUC score of approximately 0.718 confirms that the best model has a good ability to distinguish between the classes, although there is still room for improvement. The results also indicate that RandomForest models with higher complexity (more trees and greater depth) are more suitable for this dataset. However, it is important to balance model complexity with the risk of overfitting and computational efficiency.

## 1.2

### (a)

In [10]:
# Define hyperparameters
hyperparameters = {'ntrees': [10, 30, 50, 100], 'max_depth': [1, 2, 4, 6]}

# Set up randomized grid search
search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 10}
rf_random_grid = H2OGridSearch(model=H2ORandomForestEstimator,
                               hyper_params=hyperparameters,
                               search_criteria=search_criteria,
                               grid_id='rf_random_grid')

# Perform the search
rf_random_grid.train(x = predictors, y = response, training_frame=train)

drf Grid Build progress: |███████████████████████████████████████████████████████| (done) 100%


Unnamed: 0,max_depth,ntrees,model_ids,logloss
,6.0,100.0,rf_random_grid_model_4,0.616739
,6.0,50.0,rf_random_grid_model_6,0.617285
,6.0,30.0,rf_random_grid_model_2,0.6196315
,6.0,10.0,rf_random_grid_model_10,0.6242806
,4.0,50.0,rf_random_grid_model_3,0.633782
,2.0,100.0,rf_random_grid_model_7,0.6572671
,2.0,10.0,rf_random_grid_model_8,0.6622113
,1.0,100.0,rf_random_grid_model_5,0.6719186
,1.0,50.0,rf_random_grid_model_1,0.6721547
,1.0,10.0,rf_random_grid_model_9,0.6721887


### (b)

In [11]:
sorted_random_grid_results= rf_random_grid.get_grid(sort_by='accuracy', decreasing=True)
print(sorted_random_grid_results)

Hyper-Parameter Search Summary: ordered by decreasing accuracy
    max_depth    ntrees    model_ids                accuracy
--  -----------  --------  -----------------------  ----------
    6            50        rf_random_grid_model_6   0.671328
    6            100       rf_random_grid_model_4   0.670165
    6            30        rf_random_grid_model_2   0.668208
    6            10        rf_random_grid_model_10  0.659985
    4            50        rf_random_grid_model_3   0.655584
    2            100       rf_random_grid_model_7   0.633684
    1            100       rf_random_grid_model_5   0.617826
    1            50        rf_random_grid_model_1   0.614054
    2            10        rf_random_grid_model_8   0.613029
    1            10        rf_random_grid_model_9   0.596123


### (c)

In [12]:
best_rf_random_results = sorted_random_grid_results.models[0]
print("Best Model:\n", best_rf_random_results)

Best Model:
 Model Details
H2ORandomForestEstimator : Distributed Random Forest
Model Key: rf_random_grid_model_6


Model Summary: 
    number_of_trees    number_of_internal_trees    model_size_in_bytes    min_depth    max_depth    mean_depth    min_leaves    max_leaves    mean_leaves
--  -----------------  --------------------------  ---------------------  -----------  -----------  ------------  ------------  ------------  -------------
    50                 50                          62357                  6            6            6             46            64            58.9

ModelMetricsBinomial: drf
** Reported on train data. **

MSE: 0.21402113882460969
RMSE: 0.46262418746171247
LogLoss: 0.6172849935900317
Mean Per-Class Error: 0.3822332175860758
AUC: 0.7227321410000939
AUCPR: 0.7350082013212988
Gini: 0.44546428200018773

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.39635160709383566
       NO    YES    Error    Rate
-----  ----  -----  -------  -----------------
NO

In [13]:
random_performance = best_rf_random_results.model_performance(test)
print("AUC Score:", random_performance.auc())

AUC Score: 0.7187140403152024


**Answer:**

Models with a max_depth of 6 consistently perform better, irrespective of the ntrees value, indicating that a greater depth contributes significantly to the model's predictive power in this specific scenario. In terms of optimal hyperparameters, both log loss and accuracy metrics suggest that models with higher max_depth (especially 6) tend to yield better results. This implies that allowing the trees in the forest to grow deeper improves the model's ability to capture complex patterns in the data. While the model with the lowest log loss (rf_random_grid_model_4) differs slightly from the model with the highest accuracy (rf_random_grid_model_6), both models have similar configurations, suggesting that the chosen hyperparameters are robust. In regards to the AUC score, the best model from this randomized search has a strong predictive performance. Overall, the randomized search results demonstrate that randomized grid search can effectively identify high-performing models without exhaustively searching through all possible combinations, which is beneficial in terms of computational efficiency. The results above highlight that RandomForest models with a max_depth of 6 perform exceptionally well on the dataset, regardless of the number of trees. The AUC score further corroborates the effectiveness of the model in classification tasks.

## 1.3

### (a)

In [14]:
from h2o.automl import H2OAutoML

# Initialize H2OAutoML
automl = H2OAutoML(max_models=20, seed = 1234) #include_algos=["DeepLearning"]

# Run AutoML
automl.train(x=predictors, y=response, training_frame=train)

AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),10/20
# GBM base models (used / total),2/7
# XGBoost base models (used / total),5/6
# DRF base models (used / total),2/2
# DeepLearning base models (used / total),1/4
# GLM base models (used / total),0/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,NO,YES,Error,Rate
NO,2567.0,2229.0,0.4648,(2229.0/4796.0)
YES,623.0,4626.0,0.1187,(623.0/5249.0)
Total,3190.0,6855.0,0.2839,(2852.0/10045.0)

metric,threshold,value,idx
max f1,0.3930339,0.7643754,271.0
max f2,0.2477663,0.860864,339.0
max f0point5,0.5903154,0.7563181,168.0
max accuracy,0.515597,0.7388751,208.0
max precision,0.9769114,1.0,0.0
max recall,0.1341178,1.0,385.0
max specificity,0.9769114,1.0,0.0
max absolute_mcc,0.515597,0.4774078,208.0
max min_per_class_accuracy,0.515597,0.7384264,208.0
max mean_per_class_accuracy,0.515597,0.7388963,208.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100548,0.9265431,1.9136978,1.9136978,1.0,0.9460915,1.0,0.9460915,0.0192418,0.0192418,91.3697847,91.3697847,0.0192418
2,0.02001,0.9103188,1.8945609,1.904177,0.99,0.9175509,0.9950249,0.9318922,0.0188607,0.0381025,89.4560869,90.4176962,0.037894
3,0.0300647,0.8965244,1.8947503,1.9010244,0.990099,0.9041014,0.9933775,0.9225979,0.0190512,0.0571537,89.4750344,90.1024352,0.0567367
4,0.0400199,0.8815754,1.8945609,1.8994165,0.99,0.8887289,0.9925373,0.9141728,0.0188607,0.0760145,89.4560869,89.941652,0.075389
5,0.0501742,0.8728415,1.8574126,1.8909157,0.9705882,0.8768204,0.9880952,0.9066134,0.0188607,0.0948752,85.7412616,89.091573,0.0936242
6,0.1000498,0.825432,1.7800064,1.8356266,0.9301397,0.8493913,0.959204,0.8780877,0.0887788,0.183654,78.0006381,83.5626592,0.1751052
7,0.1500249,0.7832864,1.6620961,1.7778215,0.8685259,0.8029949,0.928998,0.8530734,0.0830634,0.2667175,66.2096138,77.782149,0.2444072
8,0.2001991,0.7434194,1.560575,1.7233748,0.8154762,0.7615466,0.900547,0.8301348,0.0783006,0.3450181,56.057503,72.3374839,0.3033167
9,0.3002489,0.6662689,1.4357494,1.6275315,0.7502488,0.7037664,0.8504642,0.7880259,0.1436464,0.4886645,43.574943,62.7531491,0.3946278
10,0.4,0.5943619,1.2624294,1.5364831,0.6596806,0.630205,0.802887,0.7486689,0.1259287,0.6145933,26.2429418,53.648314,0.4494556

Unnamed: 0,NO,YES,Error,Rate
NO,7611.0,9155.0,0.546,(9155.0/16766.0)
YES,2637.0,15848.0,0.1427,(2637.0/18485.0)
Total,10248.0,25003.0,0.3345,(11792.0/35251.0)

metric,threshold,value,idx
max f1,0.3777233,0.7288447,272.0
max f2,0.1704567,0.8482762,372.0
max f0point5,0.5638124,0.7073304,178.0
max accuracy,0.491737,0.6874415,215.0
max precision,0.9712642,1.0,0.0
max recall,0.0621366,1.0,399.0
max specificity,0.9712642,1.0,0.0
max absolute_mcc,0.5638124,0.3727561,178.0
max min_per_class_accuracy,0.5151467,0.6849337,204.0
max mean_per_class_accuracy,0.5174012,0.6860338,203.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100423,0.9226132,1.8208133,1.8208133,0.9548023,0.9431526,0.9548023,0.9431526,0.0182851,0.0182851,82.0813333,82.0813333,0.0173308
2,0.0200278,0.9060704,1.7282239,1.7746498,0.90625,0.9137507,0.9305949,0.9284933,0.0172572,0.0355423,72.8223898,77.4649762,0.0326198
3,0.0300133,0.8921982,1.7498944,1.7664136,0.9176136,0.8988396,0.926276,0.9186274,0.0174736,0.053016,74.9894417,76.6413579,0.0483637
4,0.0400272,0.8799415,1.6963167,1.7488769,0.8895184,0.8858118,0.9170801,0.9104177,0.0169867,0.0700027,69.6316667,74.8876931,0.0630243
5,0.0500128,0.8669703,1.711971,1.7415083,0.8977273,0.873126,0.9132161,0.9029721,0.0170949,0.0870976,71.1971008,74.1508307,0.077972
6,0.1000255,0.8188853,1.62685,1.6841792,0.8530913,0.8425273,0.8831537,0.8727497,0.0813633,0.1684609,62.6849996,68.4179152,0.1438874
7,0.1500099,0.7761237,1.5249552,1.6311246,0.7996595,0.796836,0.8553328,0.8474547,0.076224,0.2446849,52.4955167,63.1124564,0.1990568
8,0.2000227,0.7360734,1.4537808,1.5867823,0.7623369,0.7559894,0.8320806,0.8245851,0.0727076,0.3173925,45.3780848,58.6782347,0.2467734
9,0.3000199,0.6630261,1.3551913,1.5095926,0.7106383,0.6998884,0.7916036,0.7830235,0.1355153,0.4529078,35.5191271,50.9592621,0.3214512
10,0.400017,0.5894963,1.1907289,1.4298824,0.6243972,0.6259241,0.749805,0.7437514,0.1190695,0.5719773,19.0728937,42.9882353,0.3615514

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.6615114,0.0093901,0.6515496,0.6661923,0.6754288,0.6577229,0.6566634
auc,0.7523087,0.0079118,0.7463450,0.7635915,0.756794,0.7505156,0.744297
err,0.3384886,0.0093901,0.3484504,0.3338076,0.3245713,0.3422771,0.3433366
err_count,2386.2,57.364624,2451.0,2346.0,2309.0,2411.0,2414.0
f0point5,0.664885,0.0104520,0.6505671,0.6695122,0.6789604,0.6615033,0.6638819
f1,0.7295107,0.0093320,0.7142357,0.7374077,0.7366260,0.7286438,0.7306405
f2,0.8081243,0.0107347,0.7917184,0.8206278,0.804996,0.810953,0.8123263
lift_top_group,1.8111736,0.0738093,1.9046696,1.8666525,1.7854844,1.7202712,1.7787901
logloss,0.5886239,0.0065070,0.5934647,0.5794397,0.5848523,0.5899953,0.5953675
max_per_class_error,0.5693414,0.0255023,0.5587808,0.5814234,0.5296926,0.5836564,0.5931536


### (b)

In [15]:
# Display the AutoML Leaderboard
leaderboard = automl.leaderboard
print(leaderboard)

model_id                                                      auc    logloss     aucpr    mean_per_class_error      rmse       mse
StackedEnsemble_AllModels_1_AutoML_1_20231203_193521     0.752344   0.588617  0.766046                0.344351  0.449208  0.201788
StackedEnsemble_BestOfFamily_1_AutoML_1_20231203_193521  0.751309   0.589635  0.764877                0.352111  0.44964   0.202176
GBM_1_AutoML_1_20231203_193521                           0.74697    0.593642  0.760399                0.354801  0.451487  0.203841
XGBoost_grid_1_AutoML_1_20231203_193521_model_3          0.745901   0.594275  0.759362                0.356915  0.451851  0.204169
XGBoost_grid_1_AutoML_1_20231203_193521_model_1          0.745334   0.595267  0.759552                0.358533  0.452317  0.20459
XGBoost_1_AutoML_1_20231203_193521                       0.744989   0.596359  0.758866                0.361082  0.452715  0.204951
XRT_1_AutoML_1_20231203_193521                           0.744118   0.596984  0.7551

In [16]:
# Identify the best model
best_model = automl.leader
print("Best Model:", best_model.model_id)

Best Model: StackedEnsemble_AllModels_1_AutoML_1_20231203_193521


In [17]:
# Print parameters of the best model
print("Model Parameters:")
print(best_model.params)

Model Parameters:
{'model_id': {'default': None, 'actual': {'__meta': {'schema_version': 3, 'schema_name': 'ModelKeyV3', 'schema_type': 'Key<Model>'}, 'name': 'StackedEnsemble_AllModels_1_AutoML_1_20231203_193521', 'type': 'Key<Model>', 'URL': '/3/Models/StackedEnsemble_AllModels_1_AutoML_1_20231203_193521'}, 'input': None}, 'training_frame': {'default': None, 'actual': {'__meta': {'schema_version': 3, 'schema_name': 'FrameKeyV3', 'schema_type': 'Key<Frame>'}, 'name': 'AutoML_1_20231203_193521_training_py_5_sid_90d6', 'type': 'Key<Frame>', 'URL': '/3/Frames/AutoML_1_20231203_193521_training_py_5_sid_90d6'}, 'input': {'__meta': {'schema_version': 3, 'schema_name': 'FrameKeyV3', 'schema_type': 'Key<Frame>'}, 'name': 'AutoML_1_20231203_193521_training_py_5_sid_90d6', 'type': 'Key<Frame>', 'URL': '/3/Frames/AutoML_1_20231203_193521_training_py_5_sid_90d6'}}, 'response_column': {'default': None, 'actual': {'__meta': {'schema_version': 3, 'schema_name': 'ColSpecifierV3', 'schema_type': 'VecS

### (c)

In [18]:
# Evaluate performance
best_model_performance = best_model.model_performance(test)
print("AUC Score:", best_model_performance.auc())

AUC Score: 0.7548225556001725


### (d)

In [21]:
# Best log loss model
best_log_loss_model = automl.get_best_model(algorithm="xgboost", criterion="logloss")
print("Best XGBoost Model: ", best_log_loss_model.model_id)

performance = best_log_loss_model.model_performance(test)
best_log_loss = performance.logloss()
print("Best XGBoost Log Loss Model:", best_log_loss)

# Best AUC Model
best_performance_auc = performance.auc()
print("Best AUC Model:", best_performance_auc)

Best XGBoost Model:  XGBoost_grid_1_AutoML_1_20231203_193521_model_3
Best XGBoost Log Loss Model: 0.5906267587640737
Best AUC Model: 0.7481741001656049


**Answer:**

- The best model identified by AutoML is a Stacked Ensemble model, also known as 'StackedEnsemble_AllModels_1_AutoML_1_20231203_193521'. As for the best model, it has the following metrics:
   - The Stacked Ensemble model is identified as the best performing model with an AUC score of 0.75482.
   - The best XGBoost model in terms of log loss is 'XGBoost_grid_1_AutoML_1_20231203_193521_model_3' with a log loss of 0.59063.
   - The best AUC score achieved by a model is 0.74817.

The use of various models in the ensemble indicates a complex approach to achieve higher predictive accuracy. The combination of different performance metrics (like AUC, LogLoss, RMSE) provides a comprehensive view of the model's strengths and weaknesses. Understanding the ensemble's performance requires considering how each base model contributes to predictions, which can be complex but provides valuable insights for high-stakes decisions.

In summary, the Stacked Ensemble model shows strong performance on both training and cross-validation datasets, as indicated by multiple metrics like AUC, LogLoss, and RMSE. The model's complexity and diverse metrics indicate its robustness and reliability for the given classification task. Similtaneoulsy, it has a higher training time. Considering the XGBoost's model has a slightly lower accuracy but is much faster in training time, I believe implementing the XGBoost would be more appropriate since the differences are not completely significant.   