## Prerequisites
This tutorial relies on the latest Steam SDK (1.8.11) which can be installed into a python environment by:

1. Click on My AI Engines from the H2O AI Cloud and then `Python client` to download the wheel file
2. Navigate to the location where the python client was downloaded and install the client using `pip install h2osteam-1.8.11-py2.py3-none-any.whl`

We require the `h2o_authn` library for securely connecting to the H2O AI Cloud platform: `pip install h2o_authn`.

We also set the following variables to connect to a specific H2O AI Cloud environment. They can be found by logging into the platform, clicking on your name, and choosing the `CLI & API Access` page. Then, copy values from the `Accessing H2O AI Cloud APIs` section.

In [1]:
CLIENT_ID = "q8s-internal-platform"
TOKEN_ENDPOINT = "https://auth.demo.h2o.ai/auth/realms/q8s-internal/protocol/openid-connect/token"
REFRESH_TOKEN = "https://cloud-internal.h2o.ai/auth/get-platform-token"

H2O_STEAM_URL = "https://steam.cloud-internal.h2o.ai/"

H2O_MLOPS_GATEWAY = "https://mlops-api.cloud-internal.h2o.ai"

In [2]:
from getpass import getpass

import h2o_authn
import h2osteam
from h2osteam.clients import DriverlessClient

import requests
import json
import h2o_mlops_client

import pandas as pd
import numpy as np


## Securely connect to the platform
We first connect to the H2O AI Cloud using our personal access token to create a token provider object. We can then use this object to log into Steam and other APIs.

In [3]:
print(f"Visit {REFRESH_TOKEN} to get your personal access token")
tp = h2o_authn.TokenProvider(
    refresh_token=getpass("Enter your access token: "),
    client_id=CLIENT_ID,
    token_endpoint_url=TOKEN_ENDPOINT
)

Visit https://cloud-internal.h2o.ai/auth/get-platform-token to get your personal access token
Enter your access token: ········


Next, we will connect to our AI Engine manager to view all instances of Driverless AI that we have access to. If you don't have an instance of Driverless AI, or you need to learn how to start your instance, please view the Enterprise Steam tutorial. 

In [4]:
steam = h2osteam.login(
    url=H2O_STEAM_URL,
    access_token=tp()
)

## Connect to Driverless AI
We'll create a connection object called dai that we will use to interact with the platform. Throughout the client you can use the .gui() function to get a link which will take you to the user interface of that specific page.

In [5]:
instance = DriverlessClient().launch_instance(name="test-instance",
                                              version="1.10.2",
                                              profile_name="default-driverless-kubernetes")
client = instance.connect()

Driverless AI instance is submitted, please wait...
Driverless AI instance is running


### Upload and Download Data
You can upload data using any method that is enabled on your system. Here we will show:
* Add data from a public s3 link
* Download a dataset
* Upload data from your local machine
* Rename a dataset
* Upload using the JDBC connector

In [6]:
telco_churn = client.datasets.create(data="https://h2o-internal-release.s3-us-west-2.amazonaws.com/data/Splunk/churn.csv", 
                                  data_source="s3", 
                                  name="Telco_Churn",
                                  force=True
                                 )

Complete 100.00% - [4/4] Computed stats for column Churn?


### Split a Dataset
The split function returns a dictionary of two datasets so you can easily pass them to the experiments

In [7]:
telco_churn_split = telco_churn.split_to_train_test(
    train_size=0.8, 
    train_name='telco_churn_train', 
    test_name='telco_churn_test', 
    target_column= "Churn?", # Beta users with client from before March 15th use target_col
    seed=42
)

Complete


In [8]:
telco_churn_split

{'train_dataset': <class 'Dataset'> 97bf065c-cd58-11ec-83d6-7205daa658dd telco_churn_train,
 'test_dataset': <class 'Dataset'> 97bf2812-cd58-11ec-83d6-7205daa658dd telco_churn_test}

## Modeling
**Notes:** Dictionaries allow you to easily use common settings in your experiments <br/>
**Notes:** Experiments will be `sync` by default meaning they will lock the notebook until they are complete. You can also use `async` versions of the fucntions. With the `async` functions you can use included code below to monthior and experiment as it runs, see logs in real time, and stop it when it is "good enough".

### Dictionary for a Use Case
We might want to run several experiments with different dial and expert settings. All of these will likely have some things in common, namely details about this specific dataset. We will create a dictionary to use in many experiments.

In [9]:
telco_settings = {
    **telco_churn_split,
    'task': 'classification',
    'target_column': "Churn?", # Beta users with client from before March 15th use target_col
    'scorer': 'F1'
}

### Launch an Experiment
We will start by running an async experiment which will immeadiatly free our notebook to run additional commands

In [10]:
default_baseline = client.experiments.create_async(
    **telco_settings, 
    name='Default Baseline', accuracy=7, time=2, interpretability=8,
    force=True,
)

Experiment launched at: https://steam.cloud-internal.h2o.ai:443/proxy/driverless/535/#/experiment?key=c3900880-cd58-11ec-83d6-7205daa658dd


### View the Experiment Summary

In [12]:
default_baseline.summary()

Status: Complete
Experiment: Default Baseline (c3900880-cd58-11ec-83d6-7205daa658dd)
  Version: 1.10.2, 2022-05-06 16:30
  Settings: 7/2/8, seed=793681543, GPUs disabled
  Train data: telco_churn_train (2666, 21)
  Validation data: N/A
  Test data: [Test] (667, 20)
  Target column: Churn? (binary, 14.479% target class)
System specs: Docker/Linux, 28 GB, 32 CPU cores, 0/0 GPU
  Max memory usage: 1.03 GB, 0 GB GPU
Recipe: AutoDL (19 iterations, 8 individuals)
  Validation scheme: stratified, 6 internal holdouts (3-fold CV)
  Feature engineering: 94 features scored (18 selected)
Timing: MOJO latency 0.0920 millis (1.8MB), Python latency 82.2066 millis (1.3MB)
  Data preparation: 7.96 secs
  Shift/Leakage detection: 3.70 secs
  Model and feature tuning: 111.75 secs (67 of 72 models trained)
  Feature evolution: 236.41 secs (198 of 288 models trained)
  Final pipeline training: 29.70 secs (6 models trained)
  Python / MOJO scorer building: 36.26 secs / 22.83 secs
Validation score: F1 = 0.25

## Projects

### Create a project

In [13]:
project = client.projects.create(
    name="Example_Project",
    description="Steam-DAI-MLOps Tutorial"
)

### Link experiment to project

In [14]:
experiment = client.experiments.list()[0]
project.link_experiment(experiment=experiment)

<driverlessai._projects.Project at 0x10bb638e0>

## Connect to MLOps

In [15]:
mlops = h2o_mlops_client.Client(
    gateway_url=H2O_MLOPS_GATEWAY,
    token_provider=tp,
)

### List all projects you have access to

In [16]:
my_projects = mlops.storage.project.list_projects(body={
    'filter': None, 
    'paging': None, 
    'sorting': None
}).project

for p in my_projects:
    print(p.id, p.display_name)

96798617-1ab5-49f7-8a60-c9fd2438cd27 Example_Project


### Select a specific project to work with
We have previously created this project using Driverless AI and added models to it.

In [18]:
USE_CASE_PROJECT = mlops.storage.project.get_project(body={
    'project_id': '96798617-1ab5-49f7-8a60-c9fd2438cd27'
})
USE_CASE_PROJECT

{'project': {'created_time': datetime.datetime(2022, 5, 6, 16, 31, 53, 841712, tzinfo=tzutc()),
             'description': 'Steam-DAI-MLOps Tutorial',
             'display_name': 'Example_Project',
             'id': '96798617-1ab5-49f7-8a60-c9fd2438cd27',
             'last_modified_time': datetime.datetime(2022, 5, 6, 16, 31, 53, 841712, tzinfo=tzutc()),
             'owner_id': '0d118392-fb76-4c79-b5bd-f188ba6e22f2'}}

## Experiments

### List experiments
Get a list of all experiments that are in our project

In [19]:
my_project_experiments = mlops.storage.experiment.list_experiments({
    'filter': None,
    'paging': None,
    'project_id': USE_CASE_PROJECT.project.id,
    'response_metadata': None,
    'sorting': None
}).experiment

for e in my_project_experiments:
    print(e.id, e.display_name)

c3900880-cd58-11ec-83d6-7205daa658dd Default Baseline


### Select a specific project to work with
We have previously created this project using Driverless AI and added models to it.

In [20]:
USE_CASE_EXPERIMENT = mlops.storage.experiment.get_experiment({
    'id': 'c3900880-cd58-11ec-83d6-7205daa658dd', 
    'response_metadata': None
})
USE_CASE_EXPERIMENT

{'experiment': {'created_time': datetime.datetime(2022, 5, 6, 16, 31, 56, 649868, tzinfo=tzutc()),
                'display_name': 'Default Baseline',
                'id': 'c3900880-cd58-11ec-83d6-7205daa658dd',
                'last_modified_time': datetime.datetime(2022, 5, 6, 16, 31, 56, 649868, tzinfo=tzutc()),
                'metadata': None,
                'owner_id': '0d118392-fb76-4c79-b5bd-f188ba6e22f2',
                'parameters': {'fold_column': '',
                               'target_column': 'Churn?',
                               'test_dataset_id': '',
                               'training_dataset_id': '',
                               'validation_dataset_id': '',
                               'weight_column': ''},
                'statistics': {'training_duration': '452s'},
                'status': 'EXPERIMENT_STATUS_UNSPECIFIED',
                'tag': []}}

## Deployments

### Deployment environment 
`Dev` and `Prod` are deployment environment tags that you can use for your model deployements. 

In [21]:
my_project_deployment_environments = mlops.storage.deployment_environment.list_deployment_environments(body={
    'filter': None, 
    'paging': None, 
    'project_id': USE_CASE_PROJECT.project.id, 
    'sorting': None
}).deployment_environment

for de in my_project_deployment_environments:
    print(de.id, de.display_name)

82d04998-4a0e-418c-8289-5fe75a4ac133 DEV
fc5088e9-323f-4040-a097-6dc396614a91 PROD


In [22]:
USE_CASE_DEPLOYMENT_EVN = mlops.storage.deployment_environment.get_deployment_environment({
    'deployment_environment_id': '82d04998-4a0e-418c-8289-5fe75a4ac133'
})
USE_CASE_DEPLOYMENT_EVN

{'deployment_environment': {'created_time': datetime.datetime(2022, 5, 6, 16, 31, 53, 850811, tzinfo=tzutc()),
                            'deployment_target_name': 'kubernetes',
                            'display_name': 'DEV',
                            'id': '82d04998-4a0e-418c-8289-5fe75a4ac133',
                            'last_modified_time': datetime.datetime(2022, 5, 6, 16, 31, 53, 850811, tzinfo=tzutc()),
                            'project_id': '96798617-1ab5-49f7-8a60-c9fd2438cd27'}}

### Deployment Types
Deployment types help MLOps understand how traffic should be routed when new data is sent for predictions.

In [23]:
h2o_mlops_client.StorageDeploymentType().allowable_values

['DEPLOYMENT_TYPE_UNSPECIFIED',
 'SINGLE_MODEL',
 'SHADOW_TRAFFIC',
 'SPLIT_TRAFFIC']

In [24]:
USE_CASE_DEPLOYMENT = mlops.storage.deployment_environment.deploy({
    'deployment_environment_id': USE_CASE_DEPLOYMENT_EVN.deployment_environment.id,
    'experiment_id': USE_CASE_EXPERIMENT.experiment.id,
    'metadata': None,
    'response_metadata': None,
    'secondary_scorer': None,
    'type': h2o_mlops_client.StorageDeploymentType().SINGLE_MODEL
})


In [25]:
USE_CASE_DEPLOYMENT

{'deployment': {'created_time': datetime.datetime(2022, 5, 6, 16, 34, 48, 90732, tzinfo=tzutc()),
                'deployer_data': '',
                'deployer_data_version': '',
                'deployment_environment_id': '82d04998-4a0e-418c-8289-5fe75a4ac133',
                'experiment_id': 'c3900880-cd58-11ec-83d6-7205daa658dd',
                'id': '0c36ead1-ae68-492b-9493-c5fcceba5e3d',
                'last_modified_time': datetime.datetime(2022, 5, 6, 16, 34, 48, 90732, tzinfo=tzutc()),
                'metadata': None,
                'project_id': '96798617-1ab5-49f7-8a60-c9fd2438cd27',
                'secondary_scorer': [],
                'type': 'SINGLE_MODEL'}}

### Deployment health
Wait until our deployment has gone from launching to healthy

In [27]:
while mlops.deployer.deployment_status.get_deployment_status({
    'deployment_id': USE_CASE_DEPLOYMENT.deployment.id
}).deployment_status == "LAUNCHING":
    pass

In [28]:
DEPLOYMENT_STATUS = mlops.deployer.deployment_status.get_deployment_status({
    'deployment_id': USE_CASE_DEPLOYMENT.deployment.id
}).deployment_status

print(DEPLOYMENT_STATUS.state)
print(DEPLOYMENT_STATUS.scorer.sample_request.url)
print(DEPLOYMENT_STATUS.scorer.score.url)

HEALTHY
https://model.cloud-internal.h2o.ai/0c36ead1-ae68-492b-9493-c5fcceba5e3d/model/sample_request
https://model.cloud-internal.h2o.ai/0c36ead1-ae68-492b-9493-c5fcceba5e3d/model/score


## Make predictions
Using the `https` library, we make predictions on new data

In [29]:
sample_request_as_text = requests.get(DEPLOYMENT_STATUS.scorer.sample_request.url).text
sample_request = json.loads(sample_request_as_text)
sample_request

{'fields': ['Account Length',
  'Area Code',
  "Int'l Plan",
  'VMail Plan',
  'VMail Message',
  'Day Mins',
  'Day Calls',
  'Day Charge',
  'Eve Mins',
  'Eve Calls',
  'Eve Charge',
  'Night Mins',
  'Night Calls',
  'Night Charge',
  'Intl Mins',
  'Intl Calls',
  'Intl Charge',
  'CustServ Calls'],
 'rows': [['0',
   '0',
   'text',
   'text',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0',
   '0']]}

In [30]:
fields = sample_request["fields"]
fields

['Account Length',
 'Area Code',
 "Int'l Plan",
 'VMail Plan',
 'VMail Message',
 'Day Mins',
 'Day Calls',
 'Day Charge',
 'Eve Mins',
 'Eve Calls',
 'Eve Charge',
 'Night Mins',
 'Night Calls',
 'Night Charge',
 'Intl Mins',
 'Intl Calls',
 'Intl Charge',
 'CustServ Calls']

In [31]:
sample_row = sample_request["rows"][0]
sample_row

['0',
 '0',
 'text',
 'text',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0',
 '0']

In [32]:
new_row = ['100', '100', 'text', 'text', '100', '100', '100', '100', '100', '100', '100', '100', '100', '100', '100', '100', '100', '100']

In [33]:
new_predictions = requests.post(
    url=DEPLOYMENT_STATUS.scorer.score.url,
    json={
        'fields': fields,
        'rows': [sample_row, new_row]
    }
)

In [34]:
new_predictions

<Response [200]>

In [35]:
predictions_dict = json.loads(new_predictions.text)
predictions_dict

{'fields': ['Churn?.False.', 'Churn?.True.'],
 'id': 'c3900880-cd58-11ec-83d6-7205daa658dd',
 'score': [['0.9789733205468227', '0.021026679453177242'],
  ['0.15210581391684352', '0.8478941860831565']]}

In [36]:
pd.DataFrame(predictions_dict["score"], columns=predictions_dict["fields"])

Unnamed: 0,Churn?.False.,Churn?.True.
0,0.9789733205468228,0.0210266794531772
1,0.1521058139168435,0.8478941860831565
