### Instalação da lib H2O

#### A instalação pode ser feita a partir do tutorial disponível na documentação oficial: 

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html#install-in-python

### Instalação da lib Seaborn para uso do dataset

In [19]:
!pip3 install seaborn --user



### Carregando dataset

In [20]:
import numpy as np
import seaborn as sns

iris = sns.load_dataset('iris')

features = np.array(iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values)

target = np.array(iris['species'].values)

iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


### H2O - AutoML

In [21]:
import h2o
from h2o.automl import H2OAutoML

h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.6" 2020-01-14; OpenJDK Runtime Environment (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1); OpenJDK 64-Bit Server VM (build 11.0.6+10-post-Ubuntu-1ubuntu118.04.1, mixed mode, sharing)
  Starting server from /home/jorge/.local/lib/python3.6/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpxlv4s_kx
  JVM stdout: /tmp/tmpxlv4s_kx/h2o_jorge_started_from_python.out
  JVM stderr: /tmp/tmpxlv4s_kx/h2o_jorge_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O cluster uptime:,02 secs
H2O cluster timezone:,America/Campo_Grande
H2O data parsing timezone:,UTC
H2O cluster version:,3.28.0.1
H2O cluster version age:,2 months and 1 day
H2O cluster name:,H2O_from_python_jorge_kk7lgf
H2O cluster total nodes:,1
H2O cluster free memory:,1.924 Gb
H2O cluster total cores:,4
H2O cluster allowed cores:,4


#### Convertendo o dataset de pandas para H2OFrame

In [33]:
df = h2o.H2OFrame(iris)

columns = df.col_names

columns

Parse progress: |█████████████████████████████████████████████████████████| 100%


['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

#### Rodando AutoML

In [29]:
aml = H2OAutoML(max_models = 10, max_runtime_secs=1800, seed = 1)

aml.train(x = columns[0:-1], y = columns[-1], training_frame = df)

lb = aml.leaderboard

lb.head(rows=lb.nrows)

AutoML progress: |████████████████████████████████████████████████████████| 100%


#### Detalhes do modelo que obteve melhor resultado

In [31]:
aml.leader

Model Details
H2OXGBoostEstimator :  XGBoost
Model Key:  XGBoost_2_AutoML_20200218_170645


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees
0,,53.0




ModelMetricsMultinomial: xgboost
** Reported on train data. **

MSE: 0.06473704418531609
RMSE: 0.25443475427959145
LogLoss: 0.2759313115223833
Mean Per-Class Error: 0.03333333333333333

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Unnamed: 0,setosa,versicolor,virginica,Error,Rate
0,50.0,0.0,0.0,0.0,0 / 50
1,0.0,47.0,3.0,0.06,3 / 50
2,0.0,2.0,48.0,0.04,2 / 50
3,50.0,49.0,51.0,0.033333,5 / 150



Top-3 Hit Ratios: 


Unnamed: 0,k,hit_ratio
0,1,0.966667
1,2,1.0
2,3,1.0



ModelMetricsMultinomial: xgboost
** Reported on cross-validation data. **

MSE: 0.1066490132015355
RMSE: 0.3265716050141768
LogLoss: 0.38451643619842774
Mean Per-Class Error: 0.05333333333333334

Confusion Matrix: Row labels: Actual class; Column labels: Predicted class


Unnamed: 0,setosa,versicolor,virginica,Error,Rate
0,50.0,0.0,0.0,0.0,0 / 50
1,0.0,45.0,5.0,0.1,5 / 50
2,0.0,3.0,47.0,0.06,3 / 50
3,50.0,48.0,52.0,0.053333,8 / 150



Top-3 Hit Ratios: 


Unnamed: 0,k,hit_ratio
0,1,0.946667
1,2,1.0
2,3,1.0



Cross-Validation Metrics Summary: 


Unnamed: 0,Unnamed: 1,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
0,accuracy,0.94666666,0.02981424,0.96666664,0.96666664,0.9,0.96666664,0.93333334
1,err,0.053333335,0.02981424,0.033333335,0.033333335,0.1,0.033333335,0.06666667
2,err_count,1.6,0.8944272,1.0,1.0,3.0,1.0,2.0
3,logloss,0.38451645,0.02594509,0.35737795,0.41450667,0.4090087,0.37728488,0.36440396
4,max_per_class_error,0.16,0.08944272,0.1,0.1,0.3,0.1,0.2
5,mean_per_class_accuracy,0.94666666,0.02981424,0.96666664,0.96666664,0.9,0.96666664,0.93333334
6,mean_per_class_error,0.053333335,0.02981424,0.033333335,0.033333335,0.1,0.033333335,0.06666667
7,mse,0.10664901,0.011163419,0.09485739,0.12025658,0.116454236,0.10314183,0.09853504
8,r2,0.8400265,0.01674513,0.85771394,0.8196151,0.82531863,0.84528726,0.85219747
9,rmse,0.32621667,0.017017998,0.30798927,0.3467803,0.3412539,0.321157,0.3139029



Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_classification_error
0,,2020-02-18 17:06:49,1.100 sec,0.0,0.666667,1.098612,0.666667
1,,2020-02-18 17:06:49,1.120 sec,5.0,0.580123,0.868397,0.046667
2,,2020-02-18 17:06:49,1.133 sec,10.0,0.492713,0.679412,0.046667
3,,2020-02-18 17:06:49,1.145 sec,15.0,0.423255,0.549819,0.046667
4,,2020-02-18 17:06:49,1.156 sec,20.0,0.377317,0.471157,0.04
5,,2020-02-18 17:06:49,1.169 sec,25.0,0.331954,0.39779,0.04
6,,2020-02-18 17:06:49,1.181 sec,30.0,0.303363,0.352869,0.04
7,,2020-02-18 17:06:49,1.193 sec,35.0,0.282497,0.320252,0.033333
8,,2020-02-18 17:06:49,1.210 sec,40.0,0.268683,0.298934,0.033333
9,,2020-02-18 17:06:49,1.242 sec,45.0,0.262959,0.289629,0.033333



Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,petal_length,741.49939,1.0,0.468916
1,petal_width,704.247009,0.949761,0.445358
2,sepal_length,103.144821,0.139103,0.065228
3,sepal_width,32.413437,0.043713,0.020498


