# H2OAutoML Plugin

Since H2O-3 `3.28.0.1`, users have the possibility to customize the `H2OAutoML` model selection engine by writing their own training steps as a Java plugin.

This tutorial was updated for H2O-3 `3.46.0.1`.

## How to write a simple plugin

To create such plugin, user simply needs to create a small project containing at least:
- an implementation of the `ai.h2o.automl.ModelingStepsProvider` interface.
- a file `META-INF/services/ai.h2o.automl.ModelingStepsProvider` with a entry for each of those implementations that need to be exposed to the service provider of the main `H2O-3` jar.

This folder contains such a plugin example:
```text
.
├── Makefile
├── java_plugin.ipynb
└── src
    ├── META-INF
    │   └── services
    │       └── ai.h2o.automl.ModelingStepsProvider
    └── my
        └── automl
            ├── MyDRFStepsProvider.java
            └── MyGLMStepsProvider.java
```

with `src/META-INF/services/ai.h2o.automl.ModelingStepsProvider`:
```text
my.automl.MyDRFStepsProvider
my.automl.MyGLMStepsProvider
```

and for example `MyDRFStepsProvider.java`:
```java
package my.automl;

import ai.h2o.automl.*;
import hex.grid.Grid;
import hex.tree.drf.DRFModel;
import hex.tree.drf.DRFModel.DRFParameters;
import water.Job;

import java.util.HashMap;
import java.util.Map;
import java.util.stream.IntStream;

import static ai.h2o.automl.ModelingStep.ModelStep.DEFAULT_MODEL_TRAINING_WEIGHT;


public class MyDRFStepsProvider implements ModelingStepsProvider<MyDRFStepsProvider.DRFSteps> {

  public static class DRFSteps extends ModelingSteps {

    static final String NAME = Algo.DRF.name();
    static abstract class DRFGridStep extends ModelingStep.GridStep<DRFModel> {

      DRFGridStep(String id, AutoML autoML) {
        super(NAME, Algo.DRF, id, autoML);
      }

      public DRFParameters prepareModelParameters() {
        DRFParameters drfParameters = new DRFParameters();
        drfParameters._sample_rate = 0.8;
        drfParameters._col_sample_rate_per_tree = 0.8;
        drfParameters._col_sample_rate_change_per_level = 0.9;
        return drfParameters;
      }
    }

    private ModelingStep[] grids = new ModelingStep[]{
            new DRFGridStep("grid_1", aml()) {
              @Override
              public Map<String, Object[]> prepareSearchParameters() {
                Map<String, Object[]> searchParams = new HashMap<>();
                searchParams.put("_ntrees", IntStream.rangeClosed(5, 1000).filter(i -> i % 50 == 0).boxed().toArray());
                searchParams.put("_nbins", IntStream.of(5, 10, 15, 20, 30).boxed().toArray());
                searchParams.put("_max_depth", IntStream.rangeClosed(3, 20).boxed().toArray());
                searchParams.put("_min_rows", IntStream.of(3, 5, 10, 20, 50, 80, 100).boxed().toArray());
                return searchParams;
              }
            },
    };

    public DRFSteps(AutoML autoML) {
      super(autoML);
    }

    @Override
    protected ModelingStep[] getGrids() {
      return grids;
    }

    @Override
    public String getProvider() {
      return NAME;
    }
  }

  @Override
  public String getName() {
    return "MyDRF";
  }

  @Override
  public DRFSteps newInstance(AutoML aml) {
    return new DRFSteps(aml);
  }
}


```

As shown above, writing a `ModelingStepsProvider` simply requires to implement 2 methods:
- `String getName()` returning the name of this provider, which should be unique among all the registered providers: default algo names like "GLM", "XGBoost", "GBM", "DRF" are already used by `H2O-3` and must be avoided.
- `T newInstance(AutoML aml)` returning an instance of `ai.h2o.automl.ModelingSteps`: this is the class defining the logic for the default models and/or the grids that the user wants to add to `H2O AutoML`.


## How to add the plugin to H2O-3

H2O AutoML plugins are simply discovered using [ServiceLoader](https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html), so the only requirement is to make this plugin available on the classpath.

The simplest way is to create a jar, and add it to the classpath.
For example, from this directory, running
```bash
make dist
```
will create a jar for our plugin in the `./dist` subfolder.

This jar can then be added to the classpath when starting `H2O-3`:
```bash
java -cp /path/to/h2o.jar:/path/to/automl/plugin.jar water.H2OApp
```
or directly from the clients:
- Python:
```python
import h2o
h2o.init(extra_classpath=["/path/to/automl/plugin.jar"])
```
- R:
```R
library("h2o")
h2o.init(extra_classpath=c("/path/to/automl/plugin.jar"))
```

In [1]:
# run this cell if you don't have h2o installed in your Python environment
!pip install h2o

Collecting h2o
  Using cached h2o-3.46.0.1-py2.py3-none-any.whl.metadata (2.1 kB)
Collecting tabulate (from h2o)
  Using cached tabulate-0.9.0-py3-none-any.whl.metadata (34 kB)
Using cached h2o-3.46.0.1-py2.py3-none-any.whl (265.6 MB)
Using cached tabulate-0.9.0-py3-none-any.whl (35 kB)
Installing collected packages: tabulate, h2o
Successfully installed h2o-3.46.0.1 tabulate-0.9.0


In [2]:
# let's build our plugin jar
!make dist

rm -Rf ./build ./dist
sources = ./src/my/automl/MyGLMStepsProvider.java ./src/my/automl/MyDRFStepsProvider.java
mkdir -p build
javac ./src/my/automl/MyGLMStepsProvider.java ./src/my/automl/MyDRFStepsProvider.java -cp "/Users/tomasfryda/sources/h2o-tutorials/tutorials/automl/venv/lib/python3.10/site-packages/h2o/backend/bin/h2o.jar" -d ./build
cp -R ./src/META-INF ./build
mkdir -p dist
jar cvf ./dist/h2oautoml_plugin.jar -C ./build .
added manifest
ignoring entry META-INF/
adding: META-INF/services/(in = 0) (out= 0)(stored 0%)
adding: META-INF/services/ai.h2o.automl.ModelingStepsProvider(in = 59) (out= 40)(deflated 32%)
adding: my/(in = 0) (out= 0)(stored 0%)
adding: my/automl/(in = 0) (out= 0)(stored 0%)
adding: my/automl/MyGLMStepsProvider$GLMSteps$1.class(in = 2720) (out= 1267)(deflated 53%)
adding: my/automl/MyDRFStepsProvider$DRFSteps.class(in = 1104) (out= 545)(deflated 50%)
adding: my/automl/MyDRFStepsProvider$DRFSteps$DRFGridStep.class(in = 1308) (out= 644)(deflated 50%)
adding:

In [3]:
# and start the Python client with our plugin
import h2o
h2o.init(extra_classpath=["./dist/h2oautoml_plugin.jar"])

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_241"; Java(TM) SE Runtime Environment (build 1.8.0_241-b07); Java HotSpot(TM) 64-Bit Server VM (build 25.241-b07, mixed mode)
  Starting server from /Users/tomasfryda/sources/h2o-tutorials/tutorials/automl/venv/lib/python3.10/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/49/kh67vvnj633ftt08t8zfwsvh0000gn/T/tmpaxfdhqe1
  JVM stdout: /var/folders/49/kh67vvnj633ftt08t8zfwsvh0000gn/T/tmpaxfdhqe1/h2o_tomasfryda_started_from_python.out
  JVM stderr: /var/folders/49/kh67vvnj633ftt08t8zfwsvh0000gn/T/tmpaxfdhqe1/h2o_tomasfryda_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,03 secs
H2O_cluster_timezone:,Europe/Prague
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.46.0.1
H2O_cluster_version_age:,26 days
H2O_cluster_name:,H2O_from_python_tomasfryda_1jugg5
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,3.547 Gb
H2O_cluster_total_cores:,16
H2O_cluster_allowed_cores:,16


## How to use the custom steps

Those new steps won't be trained by default by `H2O AutoML`, however user can use the `modeling_plan` argument in the `Python` or `R` clients to tell `AutoML` to use them.

Let's first run a simple AutoML job and look at the first modeling steps:

In [4]:
from h2o.automl import H2OAutoML

aml = H2OAutoML(project_name="without_plugin", max_models=20, seed=42)

In [5]:
fr = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


In [6]:
target = "CAPSULE"
train = fr
train[target] = train[target].asfactor()

In [7]:
aml.train(y=target, training_frame=train)

AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),4/6
# GBM base models (used / total),0/1
# XGBoost base models (used / total),1/1
# GLM base models (used / total),1/1
# DRF base models (used / total),2/2
# DeepLearning base models (used / total),0/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,0,1,Error,Rate
0,206.0,21.0,0.0925,(21.0/227.0)
1,26.0,127.0,0.1699,(26.0/153.0)
Total,232.0,148.0,0.1237,(47.0/380.0)

metric,threshold,value,idx
max f1,0.4783044,0.8438538,147.0
max f2,0.2784564,0.9124088,209.0
max f0point5,0.563058,0.8628659,123.0
max accuracy,0.4783044,0.8763158,147.0
max precision,0.9937499,1.0,0.0
max recall,0.1690189,1.0,262.0
max specificity,0.9937499,1.0,0.0
max absolute_mcc,0.4783044,0.7417846,147.0
max min_per_class_accuracy,0.4118899,0.8546256,163.0
max mean_per_class_accuracy,0.4783044,0.8687772,147.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0105263,0.9740926,2.4836601,2.4836601,1.0,0.983487,1.0,0.983487,0.0261438,0.0261438,148.3660131,148.3660131,0.0261438
2,0.0210526,0.9607435,2.4836601,2.4836601,1.0,0.9694345,1.0,0.9764607,0.0261438,0.0522876,148.3660131,148.3660131,0.0522876
3,0.0315789,0.9521338,2.4836601,2.4836601,1.0,0.9554505,1.0,0.9694573,0.0261438,0.0784314,148.3660131,148.3660131,0.0784314
4,0.0421053,0.9308037,2.4836601,2.4836601,1.0,0.9417508,1.0,0.9625307,0.0261438,0.1045752,148.3660131,148.3660131,0.1045752
5,0.05,0.9135691,2.4836601,2.4836601,1.0,0.9226537,1.0,0.9562343,0.0196078,0.124183,148.3660131,148.3660131,0.124183
6,0.1,0.8344694,2.4836601,2.4836601,1.0,0.8777112,1.0,0.9169728,0.124183,0.248366,148.3660131,148.3660131,0.248366
7,0.15,0.769297,2.3529412,2.4400871,0.9473684,0.8038476,0.9824561,0.8792644,0.1176471,0.3660131,135.2941176,144.0087146,0.3616078
8,0.2,0.7244628,2.2222222,2.3856209,0.8947368,0.749187,0.9605263,0.8467451,0.1111111,0.4771242,122.2222222,138.5620915,0.4639083
9,0.3,0.5986228,2.0261438,2.2657952,0.8157895,0.6684114,0.9122807,0.7873005,0.2026144,0.6797386,102.6143791,126.5795207,0.6356857
10,0.4,0.4566722,1.503268,2.0751634,0.6052632,0.528366,0.8355263,0.7225669,0.1503268,0.8300654,50.3267974,107.5163399,0.7199332

Unnamed: 0,0,1,Error,Rate
0,160.0,67.0,0.2952,(67.0/227.0)
1,27.0,126.0,0.1765,(27.0/153.0)
Total,187.0,193.0,0.2474,(94.0/380.0)

metric,threshold,value,idx
max f1,0.3454618,0.7283237,192.0
max f2,0.1740043,0.8082497,284.0
max f0point5,0.505322,0.7132244,129.0
max accuracy,0.4507347,0.7631579,152.0
max precision,0.9848574,1.0,0.0
max recall,0.0651714,1.0,358.0
max specificity,0.9848574,1.0,0.0
max absolute_mcc,0.3454618,0.5183244,192.0
max min_per_class_accuracy,0.4105255,0.753304,171.0
max mean_per_class_accuracy,0.3454618,0.7641876,192.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0105263,0.9481861,2.4836601,2.4836601,1.0,0.9646942,1.0,0.9646942,0.0261438,0.0261438,148.3660131,148.3660131,0.0261438
2,0.0210526,0.9344932,1.2418301,1.8627451,0.5,0.9418178,0.75,0.953256,0.0130719,0.0392157,24.1830065,86.2745098,0.0304051
3,0.0315789,0.9130265,1.8627451,1.8627451,0.75,0.9259408,0.75,0.9441509,0.0196078,0.0588235,86.2745098,86.2745098,0.0456077
4,0.0421053,0.9049968,2.4836601,2.0179739,1.0,0.9076527,0.8125,0.9350264,0.0261438,0.0849673,148.3660131,101.7973856,0.0717515
5,0.05,0.8879347,2.4836601,2.0915033,1.0,0.896166,0.8421053,0.9288905,0.0196078,0.1045752,148.3660131,109.1503268,0.0913593
6,0.1,0.8105937,1.5686275,1.8300654,0.6315789,0.857307,0.7368421,0.8930988,0.0784314,0.1830065,56.8627451,83.0065359,0.1389537
7,0.15,0.7237605,1.9607843,1.8736383,0.7894737,0.7698654,0.754386,0.852021,0.0980392,0.2810458,96.0784314,87.3638344,0.2193717
8,0.2,0.6728391,1.8300654,1.8627451,0.7368421,0.6977602,0.75,0.8134558,0.0915033,0.372549,83.0065359,86.2745098,0.2888486
9,0.3,0.5407294,1.7647059,1.8300654,0.7105263,0.6032489,0.7368421,0.7433868,0.1764706,0.5490196,76.4705882,83.0065359,0.416861
10,0.4,0.4516046,1.503268,1.748366,0.6052632,0.4957917,0.7039474,0.6814881,0.1503268,0.6993464,50.3267974,74.8366013,0.5011085

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.7762073,0.0393027,0.7323943,0.7380952,0.8055556,0.7857143,0.8192771
aic,90.73125,11.924663,90.25044,108.79737,77.81611,82.71864,94.07366
auc,0.8134815,0.0542103,0.78,0.7480689,0.8646735,0.873366,0.8012987
err,0.2237927,0.0393027,0.2676056,0.2619048,0.1944444,0.2142857,0.1807229
err_count,17.0,3.391165,19.0,22.0,14.0,15.0,15.0
f0point5,0.7093307,0.0634570,0.621118,0.6629834,0.7486631,0.7638889,0.75
f1,0.7368755,0.0653980,0.6779661,0.6857143,0.8,0.8148148,0.7058824
f2,0.7709812,0.0913014,0.7462686,0.7100592,0.8588957,0.8730159,0.6666667
lift_top_group,2.0142622,1.1980791,2.84,0.0,2.3225806,1.9444444,2.9642856
loglikelihood,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [8]:
aml.modeling_steps

[{'name': 'XGBoost', 'steps': [{'id': 'def_2', 'group': 1, 'weight': 10}]},
 {'name': 'GLM', 'steps': [{'id': 'def_1', 'group': 1, 'weight': 10}]},
 {'name': 'GBM', 'steps': [{'id': 'def_5', 'group': 1, 'weight': 10}]},
 {'name': 'XGBoost', 'steps': [{'id': 'def_1', 'group': 2, 'weight': 10}]},
 {'name': 'DRF', 'steps': [{'id': 'def_1', 'group': 2, 'weight': 10}]},
 {'name': 'GBM',
  'steps': [{'id': 'def_2', 'group': 2, 'weight': 10},
   {'id': 'def_3', 'group': 2, 'weight': 10},
   {'id': 'def_4', 'group': 2, 'weight': 10}]},
 {'name': 'XGBoost', 'steps': [{'id': 'def_3', 'group': 3, 'weight': 10}]},
 {'name': 'DRF', 'steps': [{'id': 'XRT', 'group': 3, 'weight': 10}]},
 {'name': 'GBM', 'steps': [{'id': 'def_1', 'group': 3, 'weight': 10}]},
 {'name': 'DeepLearning',
  'steps': [{'id': 'def_1', 'group': 3, 'weight': 10}]},
 {'name': 'XGBoost', 'steps': [{'id': 'grid_1', 'group': 4, 'weight': 90}]},
 {'name': 'GBM', 'steps': [{'id': 'grid_1', 'group': 4, 'weight': 60}]},
 {'name': 'Deep

As we can see, the default run doesn't contain any step defined in our plugin.
To tell AutoML to use our new steps, we will use the `modeling_plan` property.

In [9]:
# we can decide to add our new steps at the beginning: 
# by default, adding just the provider name will add both the default models and the grids.
new_plan = ["MyGLM", "MyDRF"] + aml.modeling_steps

# it is also possible to be more precise when defining the modeling sequence, 
# for example ensuring that default models are all trained before the grids:
another_plan = [
    ('XGBoost', 'defaults'),
    ('GLM', 'defaults'),
    ('DRF', 'defaults'),
    ('GBM', 'defaults'),
    ('DeepLearning', 'defaults'),
    ('MyGLM', 'grids'),
    ('MyDRF', 'grids'),
    ('XGBoost', 'grids'),
    ('GBM', 'grids'),
    ('DeepLearning', 'grids'),
    'StackedEnsemble'
]

# or even go into further details, 
# for example by tweaking the 'weight' property of the `modeling_plan` to produce more models from the `MyGBM` grid, relatively to other grids: 
# this is currently applied only for grids when using `max_runtime_secs` and/or `max_models` constraints. 
yet_another_plan = [
    ('XGBoost', 'defaults'),
    ('GLM', 'defaults'),
    ('DRF', 'defaults'),
    ('GBM', 'defaults'),
    ('DeepLearning', 'defaults'),
    dict(name='MyDRF', steps=[dict(id='grid_1', group=1, weight=100)]),
    dict(name='MyGLM', steps=[dict(id='solvers', group=1, weight=60)]),
    ('GBM', 'grids'),
    ('DeepLearning', 'grids'),
    'StackedEnsemble'
]

In [10]:
new_plan

['MyGLM',
 'MyDRF',
 {'name': 'XGBoost', 'steps': [{'id': 'def_2', 'group': 1, 'weight': 10}]},
 {'name': 'GLM', 'steps': [{'id': 'def_1', 'group': 1, 'weight': 10}]},
 {'name': 'GBM', 'steps': [{'id': 'def_5', 'group': 1, 'weight': 10}]},
 {'name': 'XGBoost', 'steps': [{'id': 'def_1', 'group': 2, 'weight': 10}]},
 {'name': 'DRF', 'steps': [{'id': 'def_1', 'group': 2, 'weight': 10}]},
 {'name': 'GBM',
  'steps': [{'id': 'def_2', 'group': 2, 'weight': 10},
   {'id': 'def_3', 'group': 2, 'weight': 10},
   {'id': 'def_4', 'group': 2, 'weight': 10}]},
 {'name': 'XGBoost', 'steps': [{'id': 'def_3', 'group': 3, 'weight': 10}]},
 {'name': 'DRF', 'steps': [{'id': 'XRT', 'group': 3, 'weight': 10}]},
 {'name': 'GBM', 'steps': [{'id': 'def_1', 'group': 3, 'weight': 10}]},
 {'name': 'DeepLearning',
  'steps': [{'id': 'def_1', 'group': 3, 'weight': 10}]},
 {'name': 'XGBoost', 'steps': [{'id': 'grid_1', 'group': 4, 'weight': 90}]},
 {'name': 'GBM', 'steps': [{'id': 'grid_1', 'group': 4, 'weight': 60

In [11]:
aml_plugin = H2OAutoML(project_name="with_plugin", max_models=20, modeling_plan=new_plan, seed=42)

In [12]:
aml_plugin.train(y=target, training_frame=train)

AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),5/6
# GBM base models (used / total),0/1
# XGBoost base models (used / total),1/1
# GLM base models (used / total),1/1
# DeepLearning base models (used / total),1/1
# DRF base models (used / total),2/2
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,0,1,Error,Rate
0,215.0,12.0,0.0529,(12.0/227.0)
1,19.0,134.0,0.1242,(19.0/153.0)
Total,234.0,146.0,0.0816,(31.0/380.0)

metric,threshold,value,idx
max f1,0.4952192,0.8963211,145.0
max f2,0.3742025,0.9228824,178.0
max f0point5,0.5114067,0.9186536,139.0
max accuracy,0.5114067,0.9184211,139.0
max precision,0.9965523,1.0,0.0
max recall,0.2042788,1.0,254.0
max specificity,0.9965523,1.0,0.0
max absolute_mcc,0.5114067,0.8301909,139.0
max min_per_class_accuracy,0.4324638,0.907489,159.0
max mean_per_class_accuracy,0.4571376,0.9124701,152.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0105263,0.9899551,2.4836601,2.4836601,1.0,0.9949205,1.0,0.9949205,0.0261438,0.0261438,148.3660131,148.3660131,0.0261438
2,0.0210526,0.9770582,2.4836601,2.4836601,1.0,0.9830283,1.0,0.9889744,0.0261438,0.0522876,148.3660131,148.3660131,0.0522876
3,0.0315789,0.9665472,2.4836601,2.4836601,1.0,0.9725274,1.0,0.9834921,0.0261438,0.0784314,148.3660131,148.3660131,0.0784314
4,0.0421053,0.963405,2.4836601,2.4836601,1.0,0.9650007,1.0,0.9788692,0.0261438,0.1045752,148.3660131,148.3660131,0.1045752
5,0.05,0.9574403,2.4836601,2.4836601,1.0,0.9612627,1.0,0.9760893,0.0196078,0.124183,148.3660131,148.3660131,0.124183
6,0.1,0.8838551,2.4836601,2.4836601,1.0,0.9187791,1.0,0.9474342,0.124183,0.248366,148.3660131,148.3660131,0.248366
7,0.15,0.8386318,2.4836601,2.4836601,1.0,0.8618286,1.0,0.918899,0.124183,0.372549,148.3660131,148.3660131,0.372549
8,0.2,0.7721706,2.4836601,2.4836601,1.0,0.8037238,1.0,0.8901052,0.124183,0.496732,148.3660131,148.3660131,0.496732
9,0.3,0.6084483,2.2875817,2.4183007,0.9210526,0.6982445,0.9736842,0.8261516,0.2287582,0.7254902,128.7581699,141.8300654,0.7122743
10,0.4,0.4594845,1.6339869,2.2222222,0.6578947,0.535029,0.8947368,0.753371,0.1633987,0.8888889,63.3986928,122.2222222,0.8184043

Unnamed: 0,0,1,Error,Rate
0,170.0,57.0,0.2511,(57.0/227.0)
1,35.0,118.0,0.2288,(35.0/153.0)
Total,205.0,175.0,0.2421,(92.0/380.0)

metric,threshold,value,idx
max f1,0.3921057,0.7195122,174.0
max f2,0.1964375,0.8182844,273.0
max f0point5,0.4852148,0.7263752,138.0
max accuracy,0.4852148,0.7736842,138.0
max precision,0.99331,1.0,0.0
max recall,0.0499055,1.0,366.0
max specificity,0.99331,1.0,0.0
max absolute_mcc,0.4725031,0.5252645,142.0
max min_per_class_accuracy,0.3999185,0.7577093,170.0
max mean_per_class_accuracy,0.4554788,0.7604302,149.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0105263,0.9547203,2.4836601,2.4836601,1.0,0.9769318,1.0,0.9769318,0.0261438,0.0261438,148.3660131,148.3660131,0.0261438
2,0.0210526,0.9229596,1.8627451,2.1732026,0.75,0.9398656,0.875,0.9583987,0.0196078,0.0457516,86.2745098,117.3202614,0.0413463
3,0.0315789,0.9095594,2.4836601,2.2766885,1.0,0.9183813,0.9166667,0.9450596,0.0261438,0.0718954,148.3660131,127.6688453,0.0674901
4,0.0421053,0.8733367,1.2418301,2.0179739,0.5,0.8857297,0.8125,0.9302271,0.0130719,0.0849673,24.1830065,101.7973856,0.0717515
5,0.05,0.8688157,1.6557734,1.9607843,0.6666667,0.8704907,0.7894737,0.920795,0.0130719,0.0980392,65.577342,96.0784314,0.0804181
6,0.1,0.7804473,1.6993464,1.8300654,0.6842105,0.8251942,0.7368421,0.8729946,0.0849673,0.1830065,69.9346405,83.0065359,0.1389537
7,0.15,0.7369102,1.9607843,1.8736383,0.7894737,0.7562802,0.754386,0.8340898,0.0980392,0.2810458,96.0784314,87.3638344,0.2193717
8,0.2,0.684835,1.9607843,1.8954248,0.7894737,0.7102792,0.7631579,0.8031372,0.0980392,0.379085,96.0784314,89.5424837,0.2997898
9,0.3,0.548671,1.8300654,1.8736383,0.7368421,0.619589,0.754386,0.7419544,0.1830065,0.5620915,83.0065359,87.3638344,0.4387435
10,0.4,0.4486723,1.4379085,1.7647059,0.5789474,0.4988581,0.7105263,0.6811803,0.1437908,0.7058824,43.7908497,76.4705882,0.5120498

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.7618002,0.0319244,0.7323943,0.75,0.7916667,0.8,0.7349398
aic,88.55574,10.712253,86.2648,105.27355,77.548805,81.98392,91.70763
auc,0.8213459,0.0508080,0.7852174,0.7623292,0.8693942,0.877451,0.8123376
err,0.2381998,0.0319244,0.2676056,0.25,0.2083333,0.2,0.2650602
err_count,18.2,3.563706,19.0,21.0,15.0,14.0,22.0
f0point5,0.6828942,0.0689326,0.6213018,0.6756757,0.7377049,0.7675439,0.6122449
f1,0.7388812,0.0659485,0.6885246,0.7042254,0.7826087,0.8333333,0.6857143
f2,0.8062731,0.0684409,0.7720588,0.7352941,0.8333333,0.9114583,0.7792208
lift_top_group,2.523353,0.409457,2.84,2.5454545,2.3225806,1.9444444,2.9642856
loglikelihood,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
aml_plugin_with_weight_and_group = H2OAutoML(project_name="with_plugin_weight_and_group",
                                             max_models=20, modeling_plan=yet_another_plan, seed=42)

In [14]:
aml_plugin_with_weight_and_group.train(y=target, training_frame=train)

AutoML progress: |███████████████████████████████████████████████████████████████| (done) 100%


key,value
Stacking strategy,cross_validation
Number of base models (used / total),6/6
# GBM base models (used / total),1/1
# XGBoost base models (used / total),1/1
# GLM base models (used / total),1/1
# DRF base models (used / total),2/2
# DeepLearning base models (used / total),1/1
Metalearner algorithm,GLM
Metalearner fold assignment scheme,Random
Metalearner nfolds,5

Unnamed: 0,0,1,Error,Rate
0,195.0,32.0,0.141,(32.0/227.0)
1,14.0,139.0,0.0915,(14.0/153.0)
Total,209.0,171.0,0.1211,(46.0/380.0)

metric,threshold,value,idx
max f1,0.3797844,0.8580247,170.0
max f2,0.2723852,0.9118727,204.0
max f0point5,0.5902507,0.896,117.0
max accuracy,0.5473598,0.8842105,130.0
max precision,0.9886006,1.0,0.0
max recall,0.1833453,1.0,247.0
max specificity,0.9886006,1.0,0.0
max absolute_mcc,0.5473598,0.7593044,130.0
max min_per_class_accuracy,0.4135986,0.875817,159.0
max mean_per_class_accuracy,0.3797844,0.8837638,170.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0105263,0.9695219,2.4836601,2.4836601,1.0,0.9777626,1.0,0.9777626,0.0261438,0.0261438,148.3660131,148.3660131,0.0261438
2,0.0210526,0.9578325,2.4836601,2.4836601,1.0,0.9647135,1.0,0.9712381,0.0261438,0.0522876,148.3660131,148.3660131,0.0522876
3,0.0315789,0.948553,2.4836601,2.4836601,1.0,0.9529217,1.0,0.9651326,0.0261438,0.0784314,148.3660131,148.3660131,0.0784314
4,0.0421053,0.9404593,2.4836601,2.4836601,1.0,0.9451367,1.0,0.9601336,0.0261438,0.1045752,148.3660131,148.3660131,0.1045752
5,0.05,0.9350448,2.4836601,2.4836601,1.0,0.9381147,1.0,0.9566569,0.0196078,0.124183,148.3660131,148.3660131,0.124183
6,0.1,0.8626808,2.4836601,2.4836601,1.0,0.8985186,1.0,0.9275878,0.124183,0.248366,148.3660131,148.3660131,0.248366
7,0.15,0.7904834,2.4836601,2.4836601,1.0,0.8251389,1.0,0.8934382,0.124183,0.372549,148.3660131,148.3660131,0.372549
8,0.2,0.7408565,2.4836601,2.4836601,1.0,0.7704538,1.0,0.8626921,0.124183,0.496732,148.3660131,148.3660131,0.496732
9,0.3,0.6043353,2.0915033,2.3529412,0.8421053,0.6823281,0.9473684,0.8025708,0.2091503,0.7058824,109.1503268,135.2941176,0.6794506
10,0.4,0.4400383,1.372549,2.1078431,0.5526316,0.5287286,0.8486842,0.7341102,0.1372549,0.8431373,37.254902,110.7843137,0.7418157

Unnamed: 0,0,1,Error,Rate
0,175.0,52.0,0.2291,(52.0/227.0)
1,38.0,115.0,0.2484,(38.0/153.0)
Total,213.0,167.0,0.2368,(90.0/380.0)

metric,threshold,value,idx
max f1,0.4190515,0.71875,166.0
max f2,0.1845864,0.8033708,277.0
max f0point5,0.4462597,0.7106274,156.0
max accuracy,0.4462597,0.7684211,156.0
max precision,0.9525649,1.0,0.0
max recall,0.0739242,1.0,358.0
max specificity,0.9525649,1.0,0.0
max absolute_mcc,0.4462597,0.5207521,156.0
max min_per_class_accuracy,0.4067987,0.7577093,170.0
max mean_per_class_accuracy,0.4462597,0.7614235,156.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0105263,0.9212347,2.4836601,2.4836601,1.0,0.9438015,1.0,0.9438015,0.0261438,0.0261438,148.3660131,148.3660131,0.0261438
2,0.0210526,0.9041638,1.8627451,2.1732026,0.75,0.9128321,0.875,0.9283168,0.0196078,0.0457516,86.2745098,117.3202614,0.0413463
3,0.0315789,0.881939,1.8627451,2.0697168,0.75,0.8925068,0.8333333,0.9163801,0.0196078,0.0653595,86.2745098,106.9716776,0.0565489
4,0.0421053,0.865613,1.8627451,2.0179739,0.75,0.8731413,0.8125,0.9055704,0.0196078,0.0849673,86.2745098,101.7973856,0.0717515
5,0.05,0.8580983,2.4836601,2.0915033,1.0,0.8614413,0.8421053,0.8986027,0.0196078,0.1045752,148.3660131,109.1503268,0.0913593
6,0.1,0.8118837,1.6993464,1.8954248,0.6842105,0.8385658,0.7631579,0.8685842,0.0849673,0.1895425,69.9346405,89.5424837,0.1498949
7,0.15,0.7360918,2.2222222,2.0043573,0.8947368,0.7757828,0.8070175,0.8376504,0.1111111,0.3006536,122.2222222,100.4357298,0.2521954
8,0.2,0.684378,1.5686275,1.8954248,0.6315789,0.7084408,0.7631579,0.805348,0.0784314,0.379085,56.8627451,89.5424837,0.2997898
9,0.3,0.5669741,1.7647059,1.8518519,0.7105263,0.6079371,0.745614,0.7395444,0.1764706,0.5555556,76.4705882,85.1851852,0.4278023
10,0.4,0.4621551,1.372549,1.7320261,0.5526316,0.5173456,0.6973684,0.6839947,0.1372549,0.6928105,37.254902,73.2026144,0.4901673

Unnamed: 0,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.7665744,0.0864845,0.8285714,0.6190476,0.7702703,0.7857143,0.8292683
aic,91.67853,10.632716,86.341,109.32132,90.186035,81.18413,91.36013
auc,0.8204654,0.0493890,0.8624079,0.7397836,0.8100961,0.8356643,0.854375
err,0.2334256,0.0864845,0.1714286,0.3809524,0.2297297,0.2142857,0.1707317
err_count,18.0,8.031189,12.0,32.0,17.0,15.0,14.0
f0point5,0.7023248,0.1065778,0.8290156,0.5514706,0.664557,0.6927711,0.7738095
f1,0.7496241,0.0723853,0.8421053,0.6521739,0.7118644,0.7540984,0.7878788
f2,0.8099436,0.0334815,0.8556150,0.7978724,0.7664233,0.8273382,0.8024691
lift_top_group,1.9543399,1.1388135,1.8918918,2.625,0.0,2.6923077,2.5625
loglikelihood,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Verification

Let's now compare the 2 leaderboards.

The first one contains only models defined by `H2O AutoML`, whereas the second one contains a mix of models defined by both `H2O AutoML` and our plugin.

In [15]:
aml.leaderboard.head(30)

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_BestOfFamily_1_AutoML_1_20240409_111346,0.810745,0.528704,0.707779,0.235812,0.417919,0.174657
GLM_1_AutoML_1_20240409_111346,0.808816,0.523744,0.736683,0.273545,0.418759,0.175359
StackedEnsemble_AllModels_1_AutoML_1_20240409_111346,0.803115,0.531748,0.712266,0.244551,0.419681,0.176132
XGBoost_2_AutoML_1_20240409_111346,0.796925,0.546114,0.674287,0.236662,0.425042,0.180661
DRF_1_AutoML_1_20240409_111346,0.793139,0.546391,0.694818,0.270767,0.425796,0.181302
XGBoost_1_AutoML_1_20240409_111346,0.790447,0.546164,0.687742,0.259048,0.426998,0.182327
GBM_4_AutoML_1_20240409_111346,0.78751,0.547597,0.683774,0.258616,0.430242,0.185108
GBM_3_AutoML_1_20240409_111346,0.786934,0.546971,0.691114,0.277879,0.428129,0.183295
XGBoost_grid_1_AutoML_1_20240409_111346_model_3,0.786041,0.56919,0.698224,0.264087,0.435522,0.189679
GBM_1_AutoML_1_20240409_111346,0.784199,0.554973,0.68929,0.255276,0.43086,0.185641


In [16]:
aml_plugin.leaderboard.head(30)

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_BestOfFamily_1_AutoML_2_20240409_111554,0.816216,0.517716,0.716022,0.23993,0.413727,0.17117
StackedEnsemble_AllModels_1_AutoML_2_20240409_111554,0.815784,0.516445,0.733739,0.240923,0.413595,0.17106
GLM_1_AutoML_2_20240409_111554,0.808816,0.523744,0.736683,0.273545,0.418759,0.175359
DeepLearning_grid_1_AutoML_2_20240409_111554_model_1,0.805764,0.576212,0.726422,0.257695,0.43227,0.186857
GLM_grid_1_AutoML_2_20240409_111554_model_2,0.805448,0.528366,0.72757,0.271271,0.421004,0.177245
GLM_grid_1_AutoML_2_20240409_111554_model_1,0.805188,0.528585,0.72699,0.264303,0.42119,0.177401
XGBoost_2_AutoML_2_20240409_111554,0.796925,0.546114,0.674287,0.236662,0.425042,0.180661
DRF_1_AutoML_2_20240409_111554,0.793139,0.546391,0.694818,0.270767,0.425796,0.181302
XGBoost_1_AutoML_2_20240409_111554,0.790447,0.546164,0.687742,0.259048,0.426998,0.182327
GBM_4_AutoML_2_20240409_111554,0.78751,0.547597,0.683774,0.258616,0.430242,0.185108


In [17]:
aml_plugin_with_weight_and_group.leaderboard.head(30)

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse
StackedEnsemble_BestOfFamily_5_AutoML_3_20240409_111636,0.812934,0.519122,0.723725,0.23872,0.414639,0.171926
StackedEnsemble_BestOfFamily_2_AutoML_3_20240409_111636,0.811062,0.525953,0.710498,0.240074,0.416397,0.173386
GLM_1_AutoML_3_20240409_111636,0.808816,0.523744,0.736683,0.273545,0.418759,0.175359
StackedEnsemble_AllModels_1_AutoML_3_20240409_111636,0.807665,0.526937,0.715294,0.234675,0.417351,0.174181
StackedEnsemble_BestOfFamily_1_AutoML_3_20240409_111636,0.807031,0.528696,0.714458,0.23695,0.41878,0.175377
GLM_grid_1_AutoML_3_20240409_111636_model_2,0.805448,0.528366,0.72757,0.271271,0.421004,0.177245
GLM_grid_1_AutoML_3_20240409_111636_model_1,0.805188,0.528585,0.72699,0.264303,0.42119,0.177401
StackedEnsemble_AllModels_5_AutoML_3_20240409_111636,0.804382,0.529274,0.719191,0.247531,0.419218,0.175744
StackedEnsemble_AllModels_2_AutoML_3_20240409_111636,0.80349,0.53249,0.720249,0.251375,0.419972,0.176376
XRT_1_AutoML_3_20240409_111636,0.79717,0.542008,0.700051,0.263597,0.426069,0.181535


In [18]:
h2o.cluster().shutdown()

H2O session _sid_88ad closed.
