# Automated ML
---

AutoML algorithms try to find the optimal pipeline:
* Multiple machine learning algorithms (random forests, linear models, SVMs, etc.)
* Multiple preprocessing steps (missing value imputation, scaling, PCA, feature selection, etc.)
* Hyperparameters for all of the models and preprocessing steps
* Multiple ways to ensemble or stack the algorithms


Libraries:
* H2O
* TPOT
* auto-sklearn (not supported for Windows)

Author: Enrico Lauckner <github.com/elauckne>

In [1]:
import pandas as pd
import numpy as np
import timeit
import os

from sklearn.datasets import load_breast_cancer

from tpot import TPOTClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Setup Result Table

In [2]:
def add_results(run, elapsed, accuracy, file = 'results/results_automl.csv'):
    
    if os.path.exists(file):
        results = pd.read_csv('results/results_automl.csv')
    else:
        results = pd.DataFrame(columns=['Run', 'Elapsed', 'Accuracy'])
        
    results.loc[len(results)] = [run, elapsed, accuracy]
    results.to_csv(file, index=False)
    
    return results

In [3]:
seed = 666
cores = 3

## Prepare Data

In [4]:
bc = load_breast_cancer()

In [5]:
x = bc.data
y = bc.target
feature_names = bc.feature_names

Train Test Split

In [6]:
x_trn, x_val, y_trn, y_val = train_test_split(x, y, test_size = .2, random_state = seed)

## TPOT

[TPOTClassifier Parameters](https://epistasislab.github.io/tpot/api/)  
[Tutorial](https://towardsdatascience.com/tpot-automated-machine-learning-in-python-4c063b3e5de9)

#### TPOTClassifier Default Configuration

* generations = 50
* population_size = 50
* offspring_size = population_size
* cv = 5

In [7]:
TPOTClassifier()

TPOTClassifier(config_dict=None, crossover_rate=0.1, cv=5,
        disable_update_check=False, early_stop=None, generations=100,
        max_eval_time_mins=5, max_time_mins=None, memory=None,
        mutation_rate=0.9, n_jobs=1, offspring_size=None,
        periodic_checkpoint_folder=None, population_size=100,
        random_state=None, scoring=None, subsample=1.0, use_dask=False,
        verbosity=0, warm_start=False)

#### Small Test (5/5)

* generations = 5
* population_size = 5

In [8]:
tpot = TPOTClassifier(random_state=seed, 
                     n_jobs=-1, 
                     generations=5, 
                     population_size=5,
                     cv=5)

In [9]:
run = 'tpot (5/5)'

start_time = timeit.default_timer()
tpot.fit(x_trn, y_trn)
elapsed = round(timeit.default_timer() - start_time)
accuracy = tpot.score(x_val, y_val).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)

Elapsed (s):  25
Accuracy:     0.904


In [10]:
tpot.fitted_pipeline_

Pipeline(memory=None,
     steps=[('stackingestimator', StackingEstimator(estimator=ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='entropy',
           max_depth=None, max_features=0.05, max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=9, min_...tors=100, n_jobs=None,
           oob_score=False, random_state=None, verbose=0, warm_start=False))])

#### 10/50

In [11]:
tpot = TPOTClassifier(random_state=seed, 
                     n_jobs=-1, 
                     generations=10, 
                     population_size=50,
                     cv=5)

In [12]:
run = 'tpot (10/50)'

start_time = timeit.default_timer()
tpot.fit(x_trn, y_trn)
elapsed = round(timeit.default_timer() - start_time)
accuracy = tpot.score(x_val, y_val).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)

Elapsed (s):  368
Accuracy:     0.982


In [13]:
tpot.fitted_pipeline_

Pipeline(memory=None,
     steps=[('minmaxscaler', MinMaxScaler(copy=True, feature_range=(0, 1))), ('pca', PCA(copy=True, iterated_power=8, n_components=None, random_state=None,
  svd_solver='randomized', tol=0.0, whiten=False)), ('stackingestimator', StackingEstimator(estimator=LinearSVC(C=1.0, class_weight=None, dual=False,...penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False))])

#### Default (50/50)

In [14]:
tpot = TPOTClassifier(random_state=seed, 
                     n_jobs=-1, 
                     cv=5)

In [15]:
run = 'tpot (50/50)'

start_time = timeit.default_timer()
tpot.fit(x_trn, y_trn)
elapsed = round(timeit.default_timer() - start_time)
accuracy = tpot.score(x_val, y_val).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)



Elapsed (s):  4997
Accuracy:     0.982


In [16]:
tpot.fitted_pipeline_

Pipeline(memory=None,
     steps=[('featureunion', FeatureUnion(n_jobs=None,
       transformer_list=[('functiontransformer-1', FunctionTransformer(accept_sparse=False, check_inverse=True,
          func=<function copy at 0x000001EBB84341E0>, inv_kw_args=None,
          inverse_func=None, kw_args=None, pass_y='deprecated',
  ...penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False))])

### TPOT Results

In [18]:
pd.read_csv('results/results_automl.csv')

Unnamed: 0,Run,Elapsed,Accuracy
0,tpot (5/5),25,0.904
1,tpot (10/50),368,0.982
2,tpot (50/50),4997,0.982


Export Pipeline as Script

In [19]:
tpot.export('results/tpot_pipe.py')

True

In [28]:
results_tpot = pd.DataFrame.from_dict(tpot.evaluated_individuals_).T
results_tpot.shape

(9504, 6)

In [30]:
results_tpot.sort_values('internal_cv_score', ascending=False).head()

Unnamed: 0,crossover_count,generation,internal_cv_score,mutation_count,operator_count,predecessor
"LogisticRegression(StandardScaler(CombineDFs(input_matrix, ZeroCount(MinMaxScaler(ZeroCount(input_matrix))))), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",0,INVALID,0.984615,10,5,(LogisticRegression(StandardScaler(CombineDFs(...
"LogisticRegression(StandardScaler(CombineDFs(input_matrix, MaxAbsScaler(input_matrix))), LogisticRegression__C=0.1, LogisticRegression__dual=True, LogisticRegression__penalty=l2)",0,INVALID,0.984615,9,3,(LogisticRegression(StandardScaler(CombineDFs(...
"LogisticRegression(StandardScaler(CombineDFs(Binarizer(input_matrix, Binarizer__threshold=0.9), OneHotEncoder(CombineDFs(input_matrix, input_matrix), OneHotEncoder__minimum_fraction=0.15, OneHotEncoder__sparse=False, OneHotEncoder__threshold=10))), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",2,INVALID,0.984615,12,4,(LogisticRegression(StandardScaler(CombineDFs(...
"LogisticRegression(StandardScaler(CombineDFs(ZeroCount(OneHotEncoder(input_matrix, OneHotEncoder__minimum_fraction=0.05, OneHotEncoder__sparse=False, OneHotEncoder__threshold=10)), input_matrix)), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",2,INVALID,0.984615,15,4,(LogisticRegression(StandardScaler(CombineDFs(...
"LogisticRegression(StandardScaler(CombineDFs(BernoulliNB(BernoulliNB(input_matrix, BernoulliNB__alpha=0.01, BernoulliNB__fit_prior=False), BernoulliNB__alpha=100.0, BernoulliNB__fit_prior=True), input_matrix)), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",2,INVALID,0.984615,13,4,(LogisticRegression(StandardScaler(CombineDFs(...


In [32]:
results_tpot.sort_values('internal_cv_score').head()

Unnamed: 0,crossover_count,generation,internal_cv_score,mutation_count,operator_count,predecessor
"GradientBoostingClassifier(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesClassifier__criterion=gini, SelectFromModel__ExtraTreesClassifier__max_features=0.9500000000000001, SelectFromModel__ExtraTreesClassifier__n_estimators=100, SelectFromModel__threshold=0.2), GradientBoostingClassifier__learning_rate=0.1, GradientBoostingClassifier__max_depth=4, GradientBoostingClassifier__max_features=0.35000000000000003, GradientBoostingClassifier__min_samples_leaf=16, GradientBoostingClassifier__min_samples_split=12, GradientBoostingClassifier__n_estimators=100, GradientBoostingClassifier__subsample=0.15000000000000002)",0,INVALID,-inf,10,2,"(GradientBoostingClassifier(input_matrix, Grad..."
"LinearSVC(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesClassifier__criterion=gini, SelectFromModel__ExtraTreesClassifier__max_features=0.9000000000000001, SelectFromModel__ExtraTreesClassifier__n_estimators=100, SelectFromModel__threshold=0.25), LinearSVC__C=15.0, LinearSVC__dual=False, LinearSVC__loss=squared_hinge, LinearSVC__penalty=l1, LinearSVC__tol=0.001)",0,INVALID,-inf,3,2,"(LinearSVC(input_matrix, LinearSVC__C=15.0, Li..."
"LogisticRegression(OneHotEncoder(StandardScaler(CombineDFs(ZeroCount(OneHotEncoder(input_matrix, OneHotEncoder__minimum_fraction=0.15, OneHotEncoder__sparse=False, OneHotEncoder__threshold=10)), input_matrix)), OneHotEncoder__minimum_fraction=0.1, OneHotEncoder__sparse=False, OneHotEncoder__threshold=10), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",2,INVALID,-inf,15,5,(LogisticRegression(StandardScaler(CombineDFs(...
"LogisticRegression(SelectFromModel(input_matrix, SelectFromModel__ExtraTreesClassifier__criterion=entropy, SelectFromModel__ExtraTreesClassifier__max_features=0.2, SelectFromModel__ExtraTreesClassifier__n_estimators=100, SelectFromModel__threshold=0.2), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",0,0,-inf,0,2,"(ROOT,)"
"LogisticRegression(SelectFromModel(CombineDFs(input_matrix, input_matrix), SelectFromModel__ExtraTreesClassifier__criterion=gini, SelectFromModel__ExtraTreesClassifier__max_features=0.4, SelectFromModel__ExtraTreesClassifier__n_estimators=100, SelectFromModel__threshold=0.1), LogisticRegression__C=0.1, LogisticRegression__dual=False, LogisticRegression__penalty=l2)",2,INVALID,-inf,12,2,(LogisticRegression(StandardScaler(CombineDFs(...


In [31]:
results_tpot.generation.value_counts()

INVALID    9410
0            94
Name: generation, dtype: int64

#### Time Stopper for TPOT

Stop after 1 min

In [7]:
tpot = TPOTClassifier(random_state=seed, 
                     n_jobs=-1, 
                     max_time_mins = 1)

In [8]:
run = 'tpot (60)'

start_time = timeit.default_timer()
tpot.fit(x_trn, y_trn)
elapsed = round(timeit.default_timer() - start_time)
accuracy = tpot.score(x_val, y_val).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)

Elapsed (s):  62
Accuracy:     0.982


Stop after 6 mins

In [9]:
tpot = TPOTClassifier(random_state=seed, 
                     n_jobs=-1, 
                     max_time_mins = 6)

In [10]:
run = 'tpot (360)'

start_time = timeit.default_timer()
tpot.fit(x_trn, y_trn)
elapsed = round(timeit.default_timer() - start_time)
accuracy = tpot.score(x_val, y_val).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)

Elapsed (s):  361
Accuracy:     0.982


## H2O

[H2O AUto ML](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)

In [None]:
import h2o
from h2o.automl import H2OAutoML

In [86]:
h2o.init(nthreads = cores)

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
; OpenJDK 64-Bit Server VM (Zulu 8.20.0.5-win64) (build 25.121-b15, mixed mode)
  Starting server from C:\Users\elauckner\Anaconda3\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\ELAUCK~1\AppData\Local\Temp\tmp1ke1ptnq
  JVM stdout: C:\Users\ELAUCK~1\AppData\Local\Temp\tmp1ke1ptnq\h2o_elauckner_started_from_python.out
  JVM stderr: C:\Users\ELAUCK~1\AppData\Local\Temp\tmp1ke1ptnq\h2o_elauckner_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.


0,1
H2O cluster uptime:,03 secs
H2O cluster timezone:,Europe/Berlin
H2O data parsing timezone:,UTC
H2O cluster version:,3.22.1.4
H2O cluster version age:,8 days
H2O cluster name:,H2O_from_python_elauckner_lnz3si
H2O cluster total nodes:,1
H2O cluster free memory:,1.762 Gb
H2O cluster total cores:,4
H2O cluster allowed cores:,3


H2O Input format

In [89]:
x_cols = list(feature_names)
y_col = 'y'

train = pd.DataFrame(x_trn, columns = x_cols)
train[y_col] = y_trn
train = h2o.H2OFrame(train)
train[y_col] = train[y_col].asfactor()

test = pd.DataFrame(x_val, columns = x_cols)
test[y_col] = y_val
test = h2o.H2OFrame(test)
test[y_col] = test[y_col].asfactor()

Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%


In [117]:
aml = H2OAutoML(max_runtime_secs=60, seed=seed)

run = 'h2o (60)'

start_time = timeit.default_timer()
aml.train(x=x_cols, y=y_col, training_frame=train)
elapsed = round(timeit.default_timer() - start_time)

y_pred = h2o.as_list(aml.predict(test)['predict']).predict.values
accuracy = accuracy_score(y_val,y_pred).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)

AutoML progress: |████████████████████████████████████████████████████████| 100%
glm prediction progress: |████████████████████████████████████████████████| 100%
Elapsed (s):  65
Accuracy:     0.982


In [118]:
aml = H2OAutoML(max_runtime_secs=350, seed=seed)

run = 'h2o (350)'

start_time = timeit.default_timer()
aml.train(x=x_cols, y=y_col, training_frame=train)
elapsed = round(timeit.default_timer() - start_time)

y_pred = h2o.as_list(aml.predict(test)['predict']).predict.values
accuracy = accuracy_score(y_val,y_pred).round(3)

add_results(run, elapsed, accuracy)

print('Elapsed (s): ', elapsed)
print('Accuracy:    ', accuracy)

AutoML progress: |████████████████████████████████████████████████████████| 100%
deeplearning prediction progress: |███████████████████████████████████████| 100%
Elapsed (s):  313
Accuracy:     0.991


In [124]:
aml.leader

Model Details
H2ODeepLearningEstimator :  Deep Learning
Model Key:  DeepLearning_grid_1_AutoML_20190224_175857_model_2


ModelMetricsBinomial: deeplearning
** Reported on train data. **

MSE: 0.013241155023315935
RMSE: 0.11507021779468367
LogLoss: 0.06074781300842693
Mean Per-Class Error: 0.00692041522491349
AUC: 0.9997290198857715
pr_auc: 0.31472416370612905
Gini: 0.9994580397715429
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.7918756239478332: 


0,1,2,3,4
,0.0,1.0,Error,Rate
0,162.0,4.0,0.0241,(4.0/166.0)
1,0.0,289.0,0.0,(0.0/289.0)
Total,162.0,293.0,0.0088,(4.0/455.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.7918756,0.9931271,72.0
max f2,0.7918756,0.9972395,72.0
max f0point5,0.9975061,0.9972008,64.0
max accuracy,0.9975061,0.9912088,64.0
max precision,1.0000000,1.0,0.0
max recall,0.7918756,1.0,72.0
max specificity,1.0000000,1.0,0.0
max absolute_mcc,0.9975061,0.9813029,64.0
max min_per_class_accuracy,0.9960478,0.9879518,67.0


Gains/Lift Table: Avg response rate: 63,52 %, avg score: 65,20 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0307692,1.0,1.5743945,1.5743945,1.0,1.0,1.0,1.0,0.0484429,0.0484429,57.4394464,57.4394464
,2,0.0307692,1.0000000,0.0,1.5743945,0.0,0.0,1.0,1.0,0.0,0.0484429,-100.0,57.4394464
,3,0.0417582,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0,0.0173010,0.0657439,57.4394464,57.4394464
,4,0.0505495,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.0138408,0.0795848,57.4394464,57.4394464
,5,0.1010989,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.0795848,0.1591696,57.4394464,57.4394464
,6,0.1516484,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.0795848,0.2387543,57.4394464,57.4394464
,7,0.2,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.0761246,0.3148789,57.4394464,57.4394464
,8,0.3010989,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.1591696,0.4740484,57.4394464,57.4394464
,9,0.4,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.1557093,0.6297578,57.4394464,57.4394464




ModelMetricsBinomial: deeplearning
** Reported on cross-validation data. **

MSE: 0.014336557656496418
RMSE: 0.1197353651036168
LogLoss: 0.1297225298026461
Mean Per-Class Error: 0.015508400383541066
AUC: 0.9964981031392004
pr_auc: 0.2539076744473231
Gini: 0.9929962062784008
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.86598063971928: 


0,1,2,3,4
,0.0,1.0,Error,Rate
0,162.0,4.0,0.0241,(4.0/166.0)
1,2.0,287.0,0.0069,(2.0/289.0)
Total,164.0,291.0,0.0132,(6.0/455.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.8659806,0.9896552,52.0
max f2,0.3074667,0.9944942,58.0
max f0point5,0.8659806,0.9876118,52.0
max accuracy,0.8659806,0.9868132,52.0
max precision,1.0000000,1.0,0.0
max recall,0.3074667,1.0,58.0
max specificity,1.0000000,1.0,0.0
max absolute_mcc,0.8659806,0.9715179,52.0
max min_per_class_accuracy,0.9847136,0.9759036,48.0


Gains/Lift Table: Avg response rate: 63,52 %, avg score: 64,63 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.1340659,1.0,1.5743945,1.5743945,1.0,1.0,1.0,1.0,0.2110727,0.2110727,57.4394464,57.4394464
,2,0.1538462,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0,0.0311419,0.2422145,57.4394464,57.4394464
,3,0.2065934,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.0830450,0.3252595,57.4394464,57.4394464
,4,0.3010989,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.1487889,0.4740484,57.4394464,57.4394464
,5,0.4,1.0000000,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.1557093,0.6297578,57.4394464,57.4394464
,6,0.5010989,0.9999999,1.5743945,1.5743945,1.0,1.0000000,1.0,1.0000000,0.1591696,0.7889273,57.4394464,57.4394464
,7,0.6,0.9999272,1.4344483,1.5513264,0.9111111,0.9999932,0.9853480,0.9999989,0.1418685,0.9307958,43.4448289,55.1326413
,8,0.6989011,0.0000004,0.6997309,1.4308176,0.4444444,0.4683944,0.9088050,0.9247718,0.0692042,1.0,-30.0269127,43.0817610
,9,0.8,0.0000000,0.0,1.25,0.0,0.0000000,0.7939560,0.8079051,0.0,1.0,-100.0,25.0



Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.9758242,0.0090618,1.0,0.9670330,0.9670330,0.9670330,0.978022
auc,0.9909738,0.0049237,1.0,0.9822361,0.9905956,0.9845201,0.9975174
err,0.0241758,0.0090618,0.0,0.0329670,0.0329670,0.0329670,0.0219780
err_count,2.2,0.8246211,0.0,3.0,3.0,3.0,2.0
f0point5,0.9701644,0.0109459,1.0,0.9602649,0.9602649,0.959596,0.970696
f1,0.9810840,0.0069457,1.0,0.9747899,0.9747899,0.974359,0.9814815
f2,0.992323,0.0028214,1.0,0.9897611,0.9897611,0.9895833,0.9925094
lift_top_group,1.5791695,0.0613605,1.4444444,1.5689656,1.5689656,1.5964912,1.7169812
logloss,0.2476795,0.1030896,0.0001849,0.3750733,0.3342585,0.366259,0.1626217


Scoring History: 


0,1,2,3,4,5,6,7,8,9,10,11,12,13
,timestamp,duration,training_speed,epochs,iterations,samples,training_rmse,training_logloss,training_r2,training_auc,training_pr_auc,training_lift,training_classification_error
,2019-02-24 18:02:11,0.000 sec,,0.0,0,0.0,,,,,,,
,2019-02-24 18:02:11,43.655 sec,15913 obs/sec,0.8043956,1,366.0,0.1340939,0.0848777,0.9224047,0.9971860,0.4827648,1.5743945,0.0175824
,2019-02-24 18:02:12,43.861 sec,17314 obs/sec,8.1054945,10,3688.0,0.1150702,0.0607478,0.9428597,0.9997290,0.3147242,1.5743945,0.0087912


Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
area error,1.0,1.0,0.0412919
worst area,0.9233761,0.9233761,0.0381279
mean concave points,0.9225333,0.9225333,0.0380931
worst texture,0.8924891,0.8924891,0.0368525
worst smoothness,0.8621823,0.8621823,0.0356011
---,---,---,---
concavity error,0.7459589,0.7459589,0.0308020
mean area,0.7454403,0.7454403,0.0307806
smoothness error,0.7412180,0.7412180,0.0306063



See the whole table with table.as_data_frame()




In [127]:
h2o.save_model(aml.leader, path = 'results/h2o_automl')

'C:\\i\\kazi\\code\\_Notebooks\\results\\h2o_automl\\DeepLearning_grid_1_AutoML_20190224_175857_model_2'

In [85]:
h2o.cluster().shutdown()

## Results

In [11]:
result_df = pd.read_csv('results/results_automl.csv')
result_df.sort_values('Accuracy', ascending = False)

Unnamed: 0,Run,Elapsed,Accuracy
4,h2o (350),313,0.991
1,tpot (10/50),368,0.982
2,tpot (50/50),4997,0.982
3,h2o (60),65,0.982
5,tpot (60),62,0.982
6,tpot (360),361,0.982
0,tpot (5/5),25,0.904
