# Compile and Deploy the TFX Pipeline to KFP

This Notebook helps you to compile the **TFX Pipeline** to a **KFP package**. This will creat an **Argo YAML** file in a **.tar.gz** package. We perform the following steps:
1. Build a custom container image that include our modules
2. Compile TFX Pipeline using CLI
3. Deploy the compiled pipeline to KFP 


## 0. Set compile time variables

In [2]:
import os

os.environ["PROJECT_ID"]="ksalama-research" # Set your project

os.environ["IMAGE_NAME"]="tfx-image"
os.environ["TAG"]="latest"
os.environ["KFP_TFX_IMAGE"]="gcr.io/{}/{}:{}".format(
    os.environ.get("PROJECT_ID"), 
    os.environ.get("IMAGE_NAME"),
    os.environ.get("TAG"))

os.environ["NAMESPACE"]="kubeflow-pipelines"
os.environ["GCP_REGION"]="europe-west1" # Set your region
os.environ["ARTIFACT_STORE_URI"]="gs://ks-kfp-artifact-store" # Set your GCS Bucket
os.environ["GCS_STAGING_PATH"]=os.environ.get("ARTIFACT_STORE_URI")+"/staging"
os.environ["GKE_CLUSTER_NAME"]="ks-ml-cluster-01" # Set your GKE cluster name
os.environ["GKE_CLUSTER_ZONE"]="europe-west1-b" # Set your GKE cluster zone
os.environ["RUNTIME_VERSION"]="1.15"
os.environ["PYTHON_VERSION"]="3.7"
os.environ["BEAM_RUNNER"]="DirectRunner"

os.environ["PIPELINE_NAME"]="tfx_census_classification"

## 1. Build Container Image

The pipeline uses a custom docker image, which is a derivative of the [tensorflow/tfx:0.21.4](https://hub.docker.com/r/tensorflow/tfx) image, as a runtime execution environment for the pipeline's components. The same image is also used as a a training image used by **AI Platform Training**.

The custom image modifies the base image by adding the `modules` and `raw_schema` folders.


In [5]:
!gcloud builds submit --tag $KFP_TFX_IMAGE ./ml_pipeline

Creating temporary tarball archive of 24 file(s) totalling 68.7 KiB before compression.
Uploading tarball of [./ml_pipeline] to [gs://ksalama-research_cloudbuild/source/1595281626.93-4dcf1e8f65fd4b7dbebee66136277849.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/ksalama-research/builds/6782e309-ca5c-4e97-b1e9-8e2e050f21f8].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/6782e309-ca5c-4e97-b1e9-8e2e050f21f8?project=944117458110].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "6782e309-ca5c-4e97-b1e9-8e2e050f21f8"

FETCHSOURCE
Fetching storage object: gs://ksalama-research_cloudbuild/source/1595281626.93-4dcf1e8f65fd4b7dbebee66136277849.tgz#1595281627596680
Copying gs://ksalama-research_cloudbuild/source/1595281626.93-4dcf1e8f65fd4b7dbebee66136277849.tgz#1595281627596680...
/ [1 files][ 16.1 KiB/ 16.1 KiB]                                                
Operation completed over 1 objects/16.1 KiB. 

## 2. Compile TFX Pipeline using CLI

In [None]:
#!tfx pipeline --help

In [9]:
!tfx pipeline compile \
    --engine=kubeflow \
    --pipeline_path=ml_pipeline/runner.py 

2020-07-20 21:52:59.561435: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-07-20 21:52:59.561592: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-07-20 21:52:59.561624: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
CLI
Compiling pipeline
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
2020-07-20 21:53:03.331527: W tensorflow/stream_executor/platform/default

## 3. Deploy the Compiled Pipeline to KFP

In [None]:
#!kfp pipeline --help

In [10]:
%%bash

gcloud container clusters get-credentials ${GKE_CLUSTER_NAME} --zone ${GKE_CLUSTER_ZONE}
export INVERSE_PROXY_HOSTNAME=$(kubectl describe configmap inverse-proxy-config -n ${NAMESPACE} | grep "googleusercontent.com")

kfp --namespace=${NAMESPACE} --endpoint=${INVERSE_PROXY_HOSTNAME} \
    pipeline upload \
    --pipeline-name=${PIPELINE_NAME} \
    ${PIPELINE_NAME}.tar.gz

Pipeline Details
------------------
ID           a6d1db77-d91a-440f-ae71-0557cc4624da
Name         tfx_census_classification
Description
Uploaded at  2020-07-20T21:54:30+00:00
+--------------------+-----------------------------------------------------------------------+
| Parameter Name     | Default Value                                                         |
| pipeline-root      | gs://ks-kfp-artifact-store/tfx_census_classification/{{workflow.uid}} |
+--------------------+-----------------------------------------------------------------------+
| eval-steps         | 500                                                                   |
+--------------------+-----------------------------------------------------------------------+
| train-steps        | 5000                                                                  |
+--------------------+-----------------------------------------------------------------------+
| accuracy-threshold | 0.75                                     

Fetching cluster endpoint and auth data.
kubeconfig entry generated for ks-ml-cluster-01.
Pipeline a6d1db77-d91a-440f-ae71-0557cc4624da has been submitted



## Use the KFP UI to run the deployed pipeline...