# Demo for NitroML on Cloud using KubeFlow 

## Step 1: Get `kfp` and `skaffold`. 

In [1]:
import sys

# install kfp (https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.html)
!{sys.executable} -m pip install --user --upgrade -q kfp==0.4.0

# Download skaffold and set it executable.
!curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64 && chmod +x skaffold && mv skaffold /home/jupyter/.local/bin/
    
# Set `PATH` to include user python binary directory and a directory containing `skaffold`.
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 44.6M  100 44.6M    0     0  72.6M      0 --:--:-- --:--:-- --:--:-- 72.5M


## Step 2: Check and install  tfx (if necessary)
#### If TFX is not installed, uncomment the pip install command below. We have tested this example with `tfx==0.22.0`

In [2]:
# !{sys.executable} -m pip install --user --upgrade -q tfx==0.22.0
!python3 -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"

TFX version: 0.22.0


## Step 3: Get the GCP project ID and create Docker image name

In [13]:
# Read GCP project id from env.
shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
GCP_PROJECT_ID=shell_output[0]
print("GCP project ID:" + GCP_PROJECT_ID)

GCP project ID:nitroml-brain-xgcp


In [6]:
# Docker image name for the pipeline image 
IMAGE_NAME = 'nitroml_benchmark4'
CUSTOM_TFX_IMAGE='gcr.io/' + GCP_PROJECT_ID + '/' + IMAGE_NAME

## Step 4: Set KFP Cluster End point

In [5]:
# This refers to the KFP cluster endpoint
# To find your endpoint, go to: Google_Project_Console -> AI_PLATFORMS -> PIPELINES. 
# Then for the cluster you want to run your pipeline on, click on the "Open Pipeline Dashboard". Copy the url "*.googleusercontent.com". This is your ENDPOINT var.
ENDPOINT='ee1a2cabbbc2f13-dot-us-east1.pipelines.googleusercontent.com' # Enter your ENDPOINT here.
if not ENDPOINT:
    from absl import logging
    logging.error('Set your ENDPOINT in this cell.')

In [8]:
from examples import config
PIPELINE_NAME=config.PIPELINE_NAME

In [7]:
import os
PROJECT_DIR=os.path.join(os.path.expanduser("~"), "AIHub" , 'nitroml')
%cd {PROJECT_DIR}

/home/jupyter/AIHub/nitroml


## Step 5: Create the tfx pipeline

In [10]:
!tfx pipeline create  \
--pipeline-path=examples/titanic_benchmark.py \
--endpoint={ENDPOINT} \
--build-target-image={CUSTOM_TFX_IMAGE}

CLI
Creating pipeline
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Reading build spec from build.yaml
Target image gcr.io/nitroml-brain-xgcp/nitroml_benchmark4 is not used. If the build spec is provided, update the target image in the build spec file build.yaml.
Use skaffold to build the container image.
/home/jupyter/.local/bin/skaffold
New container image is built. Target image is available in the build spec file.
I0630 14:21:33.165660 139722481284480 dataset_info.py:361] Load dataset info from gs://artifacts.nitroml-brain-xgcp.appspot.com/tensorflow-datasets/titanic/2.0.0
I0630 14:21:33.816079 139722481284480 tfds_dataset.py:46] Preparing dataset...
I0630 14:21:33.855611 139722481284480 dataset_builder.py:282] Reusing dataset titanic (gs://artifacts.nitroml-brain-xgcp.appspot.com/tensorflow-datasets/titanic/2.0.0)
I0630 14:21:33.855832 139722481284480 tfds_dataset.py:48] tfds.core.DatasetInfo(
    name='titanic',
    version=2.0.0,
    descript

## Step 6: Run the created tfx pipeline

In [11]:
!tfx run create --pipeline-name={config.PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Creating a run for pipeline: nitroml_examples
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Run created for pipeline: nitroml_examples
+------------------+--------------------------------------+----------+---------------------------+-------------------------------------------------------------------------------------------------------------------------+
| pipeline_name    | run_id                               | status   | created_at                | link                                                                                                                    |
| nitroml_examples | 5de9597e-982e-403c-b290-4c4e11891eeb |          | 2020-06-30T14:23:39+00:00 | http://ee1a2cabbbc2f13-dot-us-east1.pipelines.googleusercontent.com/#/runs/details/5de9597e-982e-403c-b290-4c4e11891eeb |
+------------------+--------------------------------------+----------+---------------------------+--------------------------------------------------------------

## Step 7 (Optional): If the pipeline src is updated, we will have to update the pipeline at endpoint. The following block updates the pipeline and runs it.

In [15]:
# If we update the pipeline
!tfx pipeline update \
--pipeline-path=examples/titanic_benchmark.py \
--endpoint={ENDPOINT}

!tfx run create --pipeline-name {PIPELINE_NAME} --endpoint={ENDPOINT}

CLI
Updating pipeline
Detected Kubeflow.
Use --engine flag if you intend to use a different orchestrator.
Reading build spec from build.yaml
Use skaffold to build the container image.
/home/jupyter/.local/bin/skaffold
New container image is built. Target image is available in the build spec file.
I0630 18:16:19.162647 140205734782336 dataset_info.py:361] Load dataset info from gs://artifacts.nitroml-brain-xgcp.appspot.com/tensorflow-datasets/titanic/2.0.0
I0630 18:16:19.895675 140205734782336 tfds_dataset.py:46] Preparing dataset...
I0630 18:16:19.937338 140205734782336 dataset_builder.py:282] Reusing dataset titanic (gs://artifacts.nitroml-brain-xgcp.appspot.com/tensorflow-datasets/titanic/2.0.0)
I0630 18:16:19.937591 140205734782336 tfds_dataset.py:48] tfds.core.DatasetInfo(
    name='titanic',
    version=2.0.0,
    description='Dataset describing the survival status of individual passengers on the Titanic. Missing values in the original dataset are represented using ?. Float and in