<a href="https://colab.research.google.com/github/mohamedrafat9/ml_algos/blob/main/h2o-auto-ml.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np
from sklearn.impute import KNNImputer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
import lightgbm as lgb


df =pd.read_csv("/content/diabetes.csv")


cols_with_zeros = ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]
df[cols_with_zeros] = df[cols_with_zeros].replace(0, np.nan)


for col in cols_with_zeros:
    df[col].fillna(df[col].mean(), inplace=True)



X = df.drop(columns=['Outcome'])
y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.24, random_state=42, stratify=y
)

models = {
    "Logistic Regression": LogisticRegression(max_iter=1000),
    "Decision Tree": DecisionTreeClassifier(random_state=42),
    "Random Forest": RandomForestClassifier(random_state=42),
    "XGBoost": xgb.XGBClassifier(use_label_encoder=False, eval_metric="logloss", random_state=42),
    "LightGBM": lgb.LGBMClassifier(random_state=42,verbose=-1)
}


for name, model in models.items():
    print("="*50)
    print(f"Model: {name}")

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    acc = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred, average="weighted")

    print(f"Accuracy: {acc:.4f}")
    print(f"Weighted F1-score: {f1:.4f}")
    print("\nClassification Report:\n", classification_report(y_test, y_pred))


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mean(), inplace=True)


Model: Logistic Regression
Accuracy: 0.7297
Weighted F1-score: 0.7224

Classification Report:
               precision    recall  f1-score   support

           0       0.77      0.84      0.80       120
           1       0.64      0.52      0.58        65

    accuracy                           0.73       185
   macro avg       0.70      0.68      0.69       185
weighted avg       0.72      0.73      0.72       185

Model: Decision Tree
Accuracy: 0.7081
Weighted F1-score: 0.7032

Classification Report:
               precision    recall  f1-score   support

           0       0.76      0.81      0.78       120
           1       0.60      0.52      0.56        65

    accuracy                           0.71       185
   macro avg       0.68      0.67      0.67       185
weighted avg       0.70      0.71      0.70       185

Model: Random Forest
Accuracy: 0.7622
Weighted F1-score: 0.7570

Classification Report:
               precision    recall  f1-score   support

           0      

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


Accuracy: 0.7514
Weighted F1-score: 0.7494

Classification Report:
               precision    recall  f1-score   support

           0       0.80      0.82      0.81       120
           1       0.66      0.62      0.63        65

    accuracy                           0.75       185
   macro avg       0.73      0.72      0.72       185
weighted avg       0.75      0.75      0.75       185

Model: LightGBM
Accuracy: 0.7514
Weighted F1-score: 0.7484

Classification Report:
               precision    recall  f1-score   support

           0       0.79      0.83      0.81       120
           1       0.66      0.60      0.63        65

    accuracy                           0.75       185
   macro avg       0.73      0.72      0.72       185
weighted avg       0.75      0.75      0.75       185



In [2]:
!pip install h2o



In [3]:
import h2o
from h2o.automl import H2OAutoML


h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.28" 2025-07-15; OpenJDK Runtime Environment (build 11.0.28+6-post-Ubuntu-1ubuntu122.04.1); OpenJDK 64-Bit Server VM (build 11.0.28+6-post-Ubuntu-1ubuntu122.04.1, mixed mode, sharing)
  Starting server from /usr/local/lib/python3.12/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp84q80q_0
  JVM stdout: /tmp/tmp84q80q_0/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmp84q80q_0/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,Etc/UTC
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.7
H2O_cluster_version_age:,5 months and 12 days
H2O_cluster_name:,H2O_from_python_unknownUser_n18i99
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.170 Gb
H2O_cluster_total_cores:,2
H2O_cluster_allowed_cores:,2


In [4]:
df = pd.read_csv("/content/diabetes.csv")


hf = h2o.H2OFrame(df)


y = "Outcome"
x = [col for col in hf.columns if col != y]


hf[y] = hf[y].asfactor()

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [5]:
train, test = hf.split_frame(ratios=[0.8], seed=42)

In [6]:
aml = H2OAutoML(
    max_models=20,
    max_runtime_secs=600,
    seed=42,
    balance_classes=True
)
aml.train(x=x, y=y, training_frame=train)

AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),6/6
# GBM base models (used / total),1/1
# XGBoost base models (used / total),1/1
# GLM base models (used / total),1/1
# DRF base models (used / total),2/2
# DeepLearning base models (used / total),1/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,0,1,Error,Rate
0,395.0,2.0,0.005,(2.0/397.0)
1,24.0,184.0,0.1154,(24.0/208.0)
Total,419.0,186.0,0.043,(26.0/605.0)

metric,threshold,value,idx
max f1,0.526794,0.9340102,141.0
max f2,0.3477086,0.9586466,183.0
max f0point5,0.526794,0.9663866,141.0
max accuracy,0.526794,0.9570248,141.0
max precision,0.9999299,1.0,0.0
max recall,0.243115,1.0,219.0
max specificity,0.9999299,1.0,0.0
max absolute_mcc,0.526794,0.905394,141.0
max min_per_class_accuracy,0.4178776,0.9519231,168.0
max mean_per_class_accuracy,0.3477086,0.9551201,183.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0115702,0.9999151,2.9086538,2.9086538,1.0,0.999943,1.0,0.999943,0.0336538,0.0336538,190.8653846,190.8653846,0.0336538
2,0.0214876,0.9997824,2.9086538,2.9086538,1.0,0.9998472,1.0,0.9998988,0.0288462,0.0625,190.8653846,190.8653846,0.0625
3,0.031405,0.9995563,2.9086538,2.9086538,1.0,0.9996624,1.0,0.9998241,0.0288462,0.0913462,190.8653846,190.8653846,0.0913462
4,0.0413223,0.9993032,2.9086538,2.9086538,1.0,0.9994559,1.0,0.9997358,0.0288462,0.1201923,190.8653846,190.8653846,0.1201923
5,0.0512397,0.9837648,2.9086538,2.9086538,1.0,0.9948207,1.0,0.9987845,0.0288462,0.1490385,190.8653846,190.8653846,0.1490385
6,0.1008264,0.9377082,2.9086538,2.9086538,1.0,0.9604478,1.0,0.9799304,0.1442308,0.2932692,190.8653846,190.8653846,0.2932692
7,0.1504132,0.8624374,2.9086538,2.9086538,1.0,0.9033885,1.0,0.9546968,0.1442308,0.4375,190.8653846,190.8653846,0.4375
8,0.2,0.7739377,2.9086538,2.9086538,1.0,0.8168213,1.0,0.9205128,0.1442308,0.5817308,190.8653846,190.8653846,0.5817308
9,0.3008264,0.5547861,2.8132881,2.8766906,0.9672131,0.6759649,0.989011,0.8385489,0.2836538,0.8653846,181.3288146,187.6690617,0.8603468
10,0.4,0.3057092,1.2119391,2.4639423,0.4166667,0.4278858,0.8471074,0.7367316,0.1201923,0.9855769,21.1939103,146.3942308,0.8923779

Unnamed: 0,0,1,Error,Rate
0,316.0,81.0,0.204,(81.0/397.0)
1,52.0,156.0,0.25,(52.0/208.0)
Total,368.0,237.0,0.2198,(133.0/605.0)

metric,threshold,value,idx
max f1,0.4004232,0.7011236,185.0
max f2,0.1425539,0.7974482,303.0
max f0point5,0.5033162,0.7079646,142.0
max accuracy,0.4845821,0.7933884,148.0
max precision,0.9982207,1.0,0.0
max recall,0.0029708,1.0,394.0
max specificity,0.9982207,1.0,0.0
max absolute_mcc,0.4279643,0.5379568,174.0
max min_per_class_accuracy,0.3807948,0.7644231,193.0
max mean_per_class_accuracy,0.4004232,0.7729849,185.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0115702,0.9151548,2.9086538,2.9086538,1.0,0.9573717,1.0,0.9573717,0.0336538,0.0336538,190.8653846,190.8653846,0.0336538
2,0.0214876,0.8729932,2.4238782,2.6849112,0.8333333,0.8859943,0.9230769,0.9244283,0.0240385,0.0576923,142.3878205,168.4911243,0.0551734
3,0.031405,0.8593476,1.9391026,2.4493927,0.6666667,0.8660553,0.8421053,0.9059947,0.0192308,0.0769231,93.9102564,144.9392713,0.0693664
4,0.0413223,0.8528603,2.4238782,2.4432692,0.8333333,0.8556812,0.84,0.8939195,0.0240385,0.1009615,142.3878205,144.3269231,0.090886
5,0.0512397,0.8344581,2.4238782,2.4395161,0.8333333,0.8453454,0.8387097,0.884518,0.0240385,0.125,142.3878205,143.9516129,0.1124055
6,0.1008264,0.766814,2.3269231,2.3841425,0.8,0.8039221,0.8196721,0.8448807,0.1153846,0.2403846,132.6923077,138.4142497,0.2126768
7,0.1504132,0.6846981,2.1330128,2.3013525,0.7333333,0.7220429,0.7912088,0.8043847,0.1057692,0.3461538,113.3012821,130.1352494,0.2982949
8,0.2,0.6306598,2.1330128,2.2596154,0.7333333,0.6564241,0.7768595,0.7677003,0.1057692,0.4519231,113.3012821,125.9615385,0.383913
9,0.3008264,0.4846482,1.8119483,2.1095731,0.6229508,0.552685,0.7252747,0.6956347,0.1826923,0.6346154,81.1948298,110.9573119,0.5086708
10,0.4,0.3870571,1.1634615,1.875,0.4,0.4367833,0.6446281,0.6314567,0.1153846,0.75,16.3461538,87.5,0.5333753

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.8034048,0.0220110,0.8333333,0.808,0.8067227,0.7723577,0.7966102
aic,124.88584,7.5315948,112.2385940,129.76523,123.71795,128.39182,130.31564
auc,0.8469948,0.0264434,0.87625,0.8267813,0.8491033,0.8684931,0.814346
err,0.1965952,0.0220110,0.1666667,0.192,0.1932773,0.2276423,0.2033898
err_count,23.8,2.8635643,20.0,24.0,23.0,28.0,24.0
f0point5,0.7078577,0.0304280,0.7552083,0.6779661,0.7142857,0.7068965,0.6849315
f1,0.7234499,0.0345134,0.7435898,0.6666667,0.7472528,0.7454546,0.7142857
f2,0.7412403,0.0534407,0.7323232,0.6557377,0.7834101,0.7884616,0.7462686
lift_top_group,2.6934705,0.8418289,3.0,3.3783784,2.8333333,1.23,3.025641
loglikelihood,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
lb = aml.leaderboard
print(lb.head(rows=10))

model_id                                                     auc    logloss     aucpr    mean_per_class_error      rmse       mse
StackedEnsemble_BestOfFamily_1_AutoML_1_20250909_60806  0.843199   0.463611  0.728871                0.227015  0.388154  0.150664
GLM_1_AutoML_1_20250909_60806                           0.836      0.473992  0.712184                0.235952  0.391275  0.153096
StackedEnsemble_AllModels_1_AutoML_1_20250909_60806     0.835158   0.474157  0.704016                0.224732  0.393024  0.154468
XRT_1_AutoML_1_20250909_60806                           0.831816   0.487808  0.711413                0.22782   0.40173   0.161387
GBM_1_AutoML_1_20250909_60806                           0.829757   0.474801  0.712332                0.24053   0.394668  0.155763
DRF_1_AutoML_1_20250909_60806                           0.827916   0.534572  0.722752                0.239155  0.400289  0.160232
GBM_grid_1_AutoML_1_20250909_60806_model_2              0.827111   0.486549  0.726732     

In [8]:
preds = aml.leader.predict(test)
perf = aml.leader.model_performance(test)
print(perf)

stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%
ModelMetricsBinomialGLM: stackedensemble
** Reported on test data. **

MSE: 0.17584091055141526
RMSE: 0.4193338891043929
LogLoss: 0.551624874193808
AUC: 0.8066343042071197
AUCPR: 0.6512487452982143
Gini: 0.6132686084142394
Null degrees of freedom: 162
Residual degrees of freedom: 156
Null deviance: 214.90903393233492
Residual deviance: 179.82970898718142
AIC: 193.82970898718142

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.25380535829262685
       0    1    Error    Rate
-----  ---  ---  -------  ------------
0      66   37   0.3592   (37.0/103.0)
1      8    52   0.1333   (8.0/60.0)
Total  74   89   0.2761   (45.0/163.0)

Maximum Metrics: Maximum metrics at their respective thresholds
metric                       threshold    value     idx
---------------------------  -----------  --------  -----
max f1                       0.253805     0.697987  88
max f2                       0.

In [9]:
leader = aml.leader


preds = leader.predict(test).as_data_frame()["predict"].astype(int)
y_true = test[y].as_data_frame().astype(int)


f1 = f1_score(y_true, preds, average="weighted")
print("Weighted F1-score:", f1)

stackedensemble prediction progress: |███████████████████████████████████████████| (done) 100%
Weighted F1-score: 0.7187212207622665




