# Steam to Driverless AI
This notebook provides a getting started tutorial for how to securely connect to an instance of the H2O AI Cloud from a local workstation and then accomplish common tasks using Driverless AI.

##### For more Driverless AI tutorials for more data science details go to: https://github.com/h2oai/driverlessai-tutorials/tree/master/dai_python_client

## Notebook Setup
This tutorial relies on the latest Steam SDK (1.8.11) which can be installed into a python environment by: 

1. Go to the Enterprise Steam amd click on `Python client`.
3. Navigate to the location where the Python client was downloaded, and install the client using `pip install h2osteam-1.8.11-py2.py3-none-any.whl`.

In [None]:
import os
import getpass
import h2osteam
import h2o_mlops_client as mlops
from h2osteam.clients import DriverlessClient
import pandas as pd

## Table of Contents
<div class="toc">
    <ul class="toc-item">
        <li>
            <span>
                <a href="#Notebook-Setup" data-toc-modified-id="Notebook-Setup-1">
                    <span class="toc-item-num">1&nbsp;&nbsp;</span>
                    Notebook Setup
                </a>
            </span>
        </li>
        <li>
            <span>
                <a href="#Securely-Connect" data-toc-modified-id="Securely-Connect-2">
                    <span class="toc-item-num">2&nbsp;&nbsp;</span>
                    Securely Connect
                </a>
            </span>
        </li>
        <li>
            <span>
                <a href="#AI-Engines" data-toc-modified-id="AI-Engines-3">
                    <span class="toc-item-num">3&nbsp;&nbsp;</span>
                    AI Engines
                </a>
            </span>
        </li>
        <li>
            <span>
                <a href="#Driverless-AI-Instances" data-toc-modified-id="Driverless-AI-Instances-4">
                    <span class="toc-item-num">4&nbsp;&nbsp;</span>
                    Driverless AI Instances
                </a>
            </span>
            <ul class="toc-item">
                <li>
                    <span>
                        <a href="#Create-new-instance" data-toc-modified-id="Create-new-instance-4.1">
                            <span class="toc-item-num">4.1&nbsp;&nbsp;</span>
                            Create new instance
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#List-all-Driverless-AI-instances-I-own" data-toc-modified-id="List-all-Driverless-AI-instances-I-own-4.2">
                            <span class="toc-item-num">4.2&nbsp;&nbsp;</span>
                            List all Driverless AI instances I own
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#Add-a-dataset" data-toc-modified-id="Add-a-dataset-4.3">
                            <span class="toc-item-num">4.3&nbsp;&nbsp;</span>
                            Add a dataset
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#Run-an-experiment" data-toc-modified-id="Run-an-experiment-4.4">
                            <span class="toc-item-num">4.4&nbsp;&nbsp;</span>
                            Run an experiment
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#List-all-exisiting-experiments" data-toc-modified-id="List-all-exisiting-experiments-4.5">
                            <span class="toc-item-num">4.5&nbsp;&nbsp;</span>
                             List all exisiting experiments
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#Pause-the-instance" data-toc-modified-id="Pause-the-instance-4.6">
                            <span class="toc-item-num">4.6&nbsp;&nbsp;</span>
                            Pause the instance
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#Resume-the-instance" data-toc-modified-id="Resume-the-instance-4.7">
                            <span class="toc-item-num">4.7&nbsp;&nbsp;</span>
                            Resume the instance
                        </a>
                    </span>
                </li>
                <li>
                    <span>
                        <a href="#Delete-the-instance" data-toc-modified-id="Delete-the-instance-4.8">
                            <span class="toc-item-num">4.8&nbsp;&nbsp;</span>
                            Delete the instance
                        </a>
                    </span>
                </li>
            </ul>
        </li>
    </ul>
 </div>
 


## Securely Connect

Lets now connect to Steam securely!

Remember that this token expires periodically. You can obtain a fresh token by running this code again!

In [None]:
refresh_token = 'https://cloud.h2o.ai/auth/get-platform-token'

print('Click link to get personalized password:', refresh_token)

tp = mlops.TokenProvider(
    token_endpoint_url = 'https://auth.cloud.h2o.ai/auth/realms/hac/protocol/openid-connect/token',
    client_id = 'hac-platform-public',
    refresh_token=getpass.getpass()
)

Now lets log in to Steam!

In [None]:
steam = h2osteam.login(
    url="https://steam.cloud.h2o.ai/",
    access_token=tp.ensure_fresh_token(),
)

## AI Engines

AI Engine is a tool that helps you build an AI system or machine learning model. These tools help to reiterate taks that repetitive and often difficult to achieve by a human. In this notebook, we are specifically looking at Driverless AI.

First lets check to see whether you have the same version of steam as the server version.

Server version of steam:

In [None]:
H2OSteam = h2osteam.api().get_config_meta()

In [None]:
H2OSteam['version']

User version of steam:

In [None]:
print(h2osteam.__version__)

## Driverless AI Instances

### Create new instance

In case we want to use a different version of Driverless AI, run the following line to see what versions are available:

In [None]:
DAI_engines = h2osteam.api().get_driverless_engines()

for i in range(len(DAI_engines)):
    print(DAI_engines[i]['version'])

In [None]:
print(f"The newest DAI version is {DAI_engines[len(DAI_engines)-1]['version']}")

If you would like to use a different profile, run the following line to see what profiles are available:

In [None]:
h2osteam.print_profiles()

This example hows how to create an instance of Driverless AI v1.10.1.3 and connect to it.

In [None]:
instance = DriverlessClient().launch_instance(name="test-instance",
                                              version="1.10.1.3",
                                              profile_name="default-driverless-kubernetes")
client = instance.connect()

If you want to interact with the UI, you can use this link!

In [None]:
client.server.gui()

### List all Driverless AI instances I own

The following code lists all the available Driverless AI instances on steam and its attributes. Note that if you have not created an instance, nothing will show up on this list. Run this code again after you created an instance to see it!

In [None]:
DAI_instance = steam.get_driverless_instances()

In [None]:
DAI_list = []

for i in range(len(DAI)):   
    DAI_list.append([DAI_instance[i]['name'], DAI_instance[i]['profile_name'], DAI_instance[i]['version'],
                     DAI_instance[i]['status'], DAI_instance[i]['created_by']])

df_DAI = pd.DataFrame(
    DAI_list,
    columns = ['DAI Name', 'Profile Name', 'Version', 'Status', 'Created By']
)

df_DAI

### Add a dataset

#### Add dataset from URL

To create a dataset on the Driverless AI server using data from an URL, include the URL and the name of the connecter used for data transfer, and set the dataset name.

In [None]:
telco_churn = client.datasets.create(data="https://h2o-internal-release.s3-us-west-2.amazonaws.com/data/Splunk/churn.csv", 
                                  data_source="s3", 
                                  name="Telco_Churn")

#### Upload dataset from your local machine

To add a dataset on the Driverless AI server using data on your local machine, first download the dataset onto a path on your local machine like this:

In [None]:
download_location = '/Users/admin/Downloads/'

In [None]:
local_file_path = telco_churn.download(download_location, overwrite=True)

Then set the dataset name and create the dataset on the Driverless AI server

In [None]:
telco_churn2 = client.datasets.create(local_file_path, name="Telco_Churn_Duplicate")

#### Change name of dataset

To change the dataset name, simply run this command

In [None]:
print("Old Name:", telco_churn2.name)

telco_churn2.rename("Telco Churn New Name")

print("New Name:", telco_churn2.name)

### Run an experiment
#### 1. First split the dataset for training and testing

In [None]:
telco_churn_split = telco_churn.split_to_train_test(
    train_size=0.8, 
    train_name='telco_churn_train', 
    test_name='telco_churn_test', 
    target_column= "Churn?",
    seed=42
)

In [None]:
telco_churn_split

#### 2. Set up the experiment's settings (ie. accuracy, time, target column, etc.)

You might want to run several experiments with different dial and expert settings. All of these will likely have some things in common, namely details about this specific dataset. We will create a dictionary to use in many experiments.

In [None]:
telco_settings = {
    **telco_churn_split,
    'task': 'classification',
    'target_column': "Churn?", 
    'scorer': 'F1'
}

In [None]:
client.experiments.preview( # Get experiment preview with our settings
    **telco_settings
)

There may be several common types of experiments you want to run, so H2O.ai will be creating common experiment settings in dictionaries for easy use. The one below turns off all extra settings such as building pipelines or checking for leakage. It also uses the fastest experiment settings.

In [None]:
fast_settings = {
    'accuracy': 1,
    'time': 1,
    'interpretability': 6,
    'make_python_scoring_pipeline': 'off',
    'make_mojo_scoring_pipeline': 'off',
    'benchmark_mojo_latency': 'off',
    'make_autoreport': False,
    'check_leakage': 'off',
    'check_distribution_shift': 'off'
}

#### 3. Launch experiment

In [None]:
default_baseline = client.experiments.create_async( #comment out the other experiments that you dont want to run
    **telco_settings, 
    #name='Fastest Settings', **fast_settings,
    name='Default Baseline', accuracy=7, time=2, interpretability=8
)

#### 4. View information, summary, model artifacts, and model performance of experiment

In [None]:
#Prints information on experiment

print("Name:", default_baseline.name)
print("Datasets:", default_baseline.datasets)
print("Target:", default_baseline.settings['target_column'])
print("Scorer:", default_baseline.metrics()['scorer'])
print("Task:", default_baseline.settings['task'])
print("Status:", default_baseline.status(verbose=2))
print("Web Page: ", end='')
default_baseline.gui()

In [None]:
#view experiment summary

default_baseline.summary() 

In [None]:
#see what model artifacts are available

print("Available artifacts:", default_baseline.artifacts.list()) 

In [None]:
#generate autodoc

default_baseline.artifacts.create('autoreport') 

In [None]:
#download autodoc

artifacts = default_baseline.artifacts.download(['autoreport'], download_location, overwrite=True) 

In [None]:
#OSX - open autodoc on MacOS

!open -a "Microsoft Word" {artifacts["autoreport"]} 

In [None]:
#view final model performance

default_baseline.metrics() 

In [None]:
print("Validation", default_baseline.metrics()["scorer"], ":\t",round(default_baseline.metrics()['val_score'], 3))
print("Test", default_baseline.metrics()["scorer"], ":\t",round(default_baseline.metrics()['test_score'], 3))

#### 5. Download and view test set predictions

In [None]:
# Download predictions from test dataset
artifacts = default_baseline.artifacts.download(['test_predictions'], download_location, overwrite=True)
local_predictions = pd.read_csv(artifacts['test_predictions'])

local_predictions.head()

### List all exisiting experiments

You can list all existing experiments using:

In [None]:
exp = client.experiments.list()

In [None]:
experiment_list = []

for i in range(len(exp)):
    experiment_list.append([exp[i].name, exp[i].settings['target_column'], exp[i].settings['task'], 
    exp[i].settings['scorer'], exp[i].settings['accuracy'], exp[i].settings['time'],
    exp[i].settings['interpretability'], str(exp[i].datasets['train_dataset']).split(' ', 1)[0], 
    str(exp[i].datasets['validation_dataset']).split(' ', 1)[0],
    str(exp[i].datasets['test_dataset']).split(' ', 1)[0]])
                            
df_experiments = pd.DataFrame(
    experiment_list,
    columns = ['Experiment Name', 'Target Column', 'Task', 'Scorer', 'Accuracy', 'Time', 'Interpretability',
               'Train Dataset', 'Validation Dataset', 'Test Dataset']
)

df_experiments

### Pause the instance

You can pause an instance that is currently running. Pausing an instance shuts it down, it is similar to powering off a server. You will not loose any data and you can start an instance at any time.

In [None]:
instance.stop()

### Resume the instance

You can resume a paused instance by simply running:

In [None]:
instance.start()

### Delete the instance

When you no longer need an instance, you can terminate it. Once deleted, there is no way to restart the instance or access any data.

In [None]:
instance.terminate()