# Steam to H2O3
This notebook provides a getting started tutorial for how to securely connect to an instance of the H2O AI Cloud from a local workstation and then accomplish common tasks using the H2O3.

##### For more H2O-3 tutorials for more data science details go to: 'https://github.com/h2oai/h2o-tutorials/blob/master/h2o-open-tour-2016/chicago/intro-to-h2o.ipynb'

## Notebook Setup
This tutorial relies on the latest Steam SDK (1.8.11) which can be installed into a python environment by: 

1. Go to the Enterprise Steam amd click on `Python client`.
3. Navigate to the location where the Python client was downloaded, and install the client using `pip install h2osteam-1.8.11-py2.py3-none-any.whl`.

In [None]:
import h2osteam
import h2o
import os
import getpass
import h2o_mlops_client as mlops

from h2osteam.clients import H2oKubernetesClient

# Import H2O GLM:
from h2o.estimators.glm import H2OGeneralizedLinearEstimator

## Table of Contents
<div class="toc"><ul class="toc-item"><li><span><a href="#Notebook-Setup" data-toc-modified-id="Notebook-Setup-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Notebook Setup</a></span></li><li><span><a href="#Securely-Connect" data-toc-modified-id="Securely-Connect-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Securely Connect</a></span></li><li><span><a href="#H2O3-Clusters" data-toc-modified-id="H2O3-Clusters-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>H2O3 Clusters</a></span><ul class="toc-item"><li><span><a href="#Create-new-cluster" data-toc-modified-id="Create-new-cluster-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Create new cluster</a></span></li><li><span><a href="#List-all-existing-clusters" data-toc-modified-id="List-all-existing-clusters-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>List all existing clusters</a></span></li><li><span><a href="#Add-a-dataset" data-toc-modified-id="Add-a-dataset-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Add a dataset</a></span></li><li><span><a href="#Run-an-experiment" data-toc-modified-id="Run-an-experiment-3.4"><span class="toc-item-num">3.4&nbsp;&nbsp;</span>Run an experiment</a></span></li><li><span><a href="#Pause-our-instance" data-toc-modified-id="Pause-our-instance-3.5"><span class="toc-item-num">3.5&nbsp;&nbsp;</span>Pause our instance</a></span></li><li><span><a href="#Delete-the-instance" data-toc-modified-id="Delete-the-instance-3.6"><span class="toc-item-num">3.6&nbsp;&nbsp;</span>Delete the instance</a></span></li></ul></li></ul></div>

## Securely Connect

Lets now connect to Steam securely!

Remember that this token expires periodically. You can obtain a fresh token by running this code again!

In [None]:
refresh_token = 'https://cloud.h2o.ai/auth/get-platform-token'

print('Click link to get personalized password:', refresh_token)

tp = mlops.TokenProvider(
    token_endpoint_url = 'https://auth.cloud.h2o.ai/auth/realms/hac/protocol/openid-connect/token',
    client_id = 'hac-platform-public',
    refresh_token=getpass.getpass()
)

Now lets log in to Steam!

In [None]:
steam = h2osteam.login(
    url="https://steam.cloud.h2o.ai/",
    access_token=tp.ensure_fresh_token(),
)

## H2O3 Clusters

This example hows how to create an cluster and connect to it. First lets check the version of H2O python client and the available H2O server versions.

H2O Python Client Version:

In [None]:
h2o.__version__

H2O server version available:

In [None]:
h2o_engines=h2osteam.api().get_h2o_kubernetes_engines()

In [None]:
for i in range(len(h2o_engines)):
    print(h2o_engines[i]['version'])

### Create new cluster

Now lets launch the cluster! 
**Note**: Create a cluster with a version same or older than the version of H2O python client

In [None]:
cluster = H2oKubernetesClient().launch_cluster(
    name="test_cluster",
    version="3.36.0.3",
)

Lets ensure the cluster is running by running the following line:

In [None]:
cluster.is_running()

#### Connecting to new cluster

Finally, lets connect to the cluster

In [None]:
cluster.connect()

### List all existing clusters

In [None]:
clusters = H2oKubernetesClient().get_clusters()
clusters

### Add a dataset

To add a dataset, you can either upload the dada from your local machine or import the data from a URL

In [None]:
#loan_csv = "/Volumes/H2OTOUR/loan.csv"  # modify this for your machine

# Alternatively, you can import the data directly from a URL
loan_csv = "https://raw.githubusercontent.com/h2oai/app-consumer-loan/master/data/loan.csv"

data = h2o.import_file(loan_csv)  # 163,987 rows x 15 columns

Lets compare the shape of the data to check if file was imported properly!

In [None]:
data.shape

### Run an experiment
#### 1. First split the dataset for training and testing

In [None]:
data['bad_loan'] = data['bad_loan'].asfactor()  #encode the binary repsonse as a factor
data['bad_loan'].levels()  #optional: after encoding, this shows the two factor levels, '0' and '1'

In [None]:
splits = data.split_frame(ratios=[0.7, 0.15], seed=1)  

train = splits[0]
valid = splits[1]
test = splits[2]

In [None]:
y = 'bad_loan'
x = list(data.columns)

In [None]:
x.remove(y)  #remove the response
x.remove('int_rate')  #remove the interest rate column because it's correlated with the outcome

#### 2. Set up the experiment's settings (ie. accuracy, time, target column, etc.)

We first create an object of class, "H2OGeneralizedLinearEstimator". This does not actually do any training, it just sets the model up for training by specifying model parameters.

In [None]:
# Initialize the GLM estimator:
# Similar to R's glm() and H2O's R GLM, H2O's GLM has the "family" argument

glm_fit1 = H2OGeneralizedLinearEstimator(family='binomial', model_id='glm_fit1')

#### 3. Launch Experiment

Now that glm_fit1 object is initialized, we can train the model:

In [None]:
glm_fit1.train(x=x, y=y, training_frame=train, validation_frame=valid)

#### 4. View information, summary, model artifacts, and model performance of experiment

Let's see the performance of the GLM that were just trained. 

In [None]:
glm_perf1 = glm_fit1.model_performance(test)

Instead of printing the entire model performance metrics object, it is probably easier to print just the metric that you are interested in comparing. Here we are going to compare the test AUC to the training and validation AUC

In [None]:
print (glm_perf1.auc())

In [None]:
print (glm_fit1.auc(train=True))
print (glm_fit1.auc(valid=True))

### Pause our instance

In [None]:
cluster.stop()

### Delete the instance

In [None]:
cluster.terminate()