# H2O practical approach
## The following code can be used in an Jupyter Notebook (Python 3.8.X, H2O cluster version 3.36.0.1).

Import the required modules.

In [None]:
import h2o
from h2o.automl import H2OAutoML
import numpy as np
from pandas import read_csv
from numpy import set_printoptions
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score

Attempting to start a local H2O server

In [None]:
h2o.init()

Reading the data.

In [None]:
filename = 'MOVIESTREAM_CHURN_RED_TRAIN.csv'
dataframe = read_csv(filename)

Data preprocessing: filling missing values, substitution of values, select the training features and target feature.

In [None]:
dataframe['YRS_CURRENT_EMPLOYER'] = dataframe['YRS_CURRENT_EMPLOYER'].fillna(0)
dataframe['IS_CHURNER'] = dataframe['IS_CHURNER'].replace(['no'], 0)
dataframe['IS_CHURNER'] = dataframe['IS_CHURNER'].replace(['yes'], 1)
array = dataframe.values
ID_train = array[:,0]
y_train = array[:,-1]
htrain = h2o.H2OFrame(dataframe)
htrain['IS_CHURNER'] = htrain['IS_CHURNER'].asfactor()
x = htrain.columns
y = 'IS_CHURNER'
x.remove(y)
x.remove('CUST_ID')

__Model selection and tuning__. The time limit for running AutoML is set to five minutes. In this scenario we removed algorithms like Stacked Ensemble and Deep Learning.

In [None]:
aml = H2OAutoML(max_models = 3, max_runtime_secs=300, exclude_algos=['StackedEnsemble','DeepLearning'], seed = 1)

__Training H2O AutoML__. The AutoML leaderboard uses cross-validation metrics to rank the models. The leader model is stored at _aml.leader_ and the leaderboard is stored at _aml.leaderboard_.

In [None]:
aml.train(x=x, y=y, training_frame=htrain)

Checking the Leaderboard.

In [None]:
lb = aml.leaderboard
lb.head()

__Save the best model to filesystem__.

In [None]:
model_path = h2o.save_model(aml.leader, path = "h2o_model")
print(model_path) 

__H2O Explainability interface__ is a convenient wrapper to a number of explainabilty methods and visualizations in H2O. The _explain()_ function generates a list of explanations – individual units of explanation such as a Partial Dependence plot, a Feature Importance plot or a SHapley Additive exPlanations (SHAP) Summary of Top Tree-based Model.

In [None]:
xplain_model = aml.leader.explain(htrain)

__Predicting on train data using the leader model__. The predict function outputs predicted classes, as well as the probability estimates for each of the classes (confidence).

In [None]:
pred_h2o = aml.leader.predict(htrain)
pred_pandas=pred_h2o.as_data_frame(use_pandas=True)
print(pred_pandas)
probs = pred_pandas.values

__Restore the model from the filesystem__.

In [None]:
saved_model = h2o.load_model(model_path)