# Compile and Deploy the TFX Pipeline to KFP

This Notebook helps you to compile the **TFX Pipeline** to a **KFP package**. This will creat an **Argo YAML** file in a **.tar.gz** package. We perform the following steps:
1. Build a custom container image that include our modules
2. Compile TFX Pipeline using CLI
3. Deploy the compiled pipeline to KFP 


## 0. Set compile time variables

In [20]:
import os

os.environ["PROJECT_ID"]="ksalama-research" # Set your project

os.environ["IMAGE_NAME"]="tfx-image"
os.environ["TAG"]="latest"
os.environ["KFP_TFX_IMAGE"]="gcr.io/{}/{}:{}".format(
    os.environ.get("PROJECT_ID"), 
    os.environ.get("IMAGE_NAME"),
    os.environ.get("TAG"))

os.environ["NAMESPACE"]="kubeflow-pipelines"
os.environ["GCP_REGION"]="europe-west1" # Set your region
os.environ["ARTIFACT_STORE_URI"]="gs://ks-kfp-artifact-store" # Set your GCS Bucket
os.environ["GCS_STAGING_PATH"]=os.environ.get("ARTIFACT_STORE_URI")+"/staging"
os.environ["GKE_CLUSTER_NAME"]="ks-ml-cluster-01" # Set your GKE cluster name
os.environ["GKE_CLUSTER_ZONE"]="europe-west1-b" # Set your GKE cluster zone
os.environ["RUNTIME_VERSION"]="1.15"
os.environ["PYTHON_VERSION"]="3.7"
os.environ["BEAM_RUNNER"]="DirectRunner"

os.environ["PIPELINE_NAME"]="tfx_census_classification"

## 1. Build Container Image

The pipeline uses a custom docker image, which is a derivative of the [tensorflow/tfx:0.21.4](https://hub.docker.com/r/tensorflow/tfx) image, as a runtime execution environment for the pipeline's components. The same image is also used as a a training image used by **AI Platform Training**.

The custom image modifies the base image by adding the `modules` and `raw_schema` folders.


In [21]:
!gcloud builds submit --tag $KFP_TFX_IMAGE .

Creating temporary tarball archive of 25 file(s) totalling 337.2 KiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/root/.config/gcloud/logs/2020.07.20/23.35.19.809871.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://ksalama-research_cloudbuild/source/1595288119.93-9292d51ed26e482094c3c14159b00b63.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/ksalama-research/builds/09eafa18-7649-4531-a00d-c2714802a623].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/09eafa18-7649-4531-a00d-c2714802a623?project=944117458110].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "09eafa18-7649-4531-a00d-c2714802a623"

FETCHSOURCE
Fetching storage object: gs://ksalama-research_cloudbuild/source/1595288119.93-9292d51ed26e482094c3c14159b00b63.tgz#159528812068805

## 2. Compile TFX Pipeline using CLI

In [None]:
#!tfx pipeline --help

In [22]:
!tfx pipeline compile \
    --engine=kubeflow \
    --pipeline_path=ml_pipeline/runner.py 

2020-07-20 23:38:19.685517: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-07-20 23:38:19.685668: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-07-20 23:38:19.685692: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
CLI
Compiling pipeline
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
2020-07-20 23:38:23.458438: W tensorflow/stream_executor/platform/default

## 3. Deploy the Compiled Pipeline to KFP

In [None]:
#!kfp pipeline --help

In [23]:
%%bash

gcloud container clusters get-credentials ${GKE_CLUSTER_NAME} --zone ${GKE_CLUSTER_ZONE}
export KFP_ENDPOINT=$(kubectl describe configmap inverse-proxy-config -n ${NAMESPACE} | grep "googleusercontent.com")

kfp --namespace=${NAMESPACE} --endpoint=${KFP_ENDPOINT} \
    pipeline upload \
    --pipeline-name=${PIPELINE_NAME} \
    ${PIPELINE_NAME}.tar.gz

Pipeline Details
------------------
ID           dbf0ecbe-d390-424a-a867-521cc39fbcf8
Name         tfx_census_classification
Description
Uploaded at  2020-07-20T23:38:35+00:00
+--------------------+-----------------------------------------------------------------------+
| Parameter Name     | Default Value                                                         |
| pipeline-root      | gs://ks-kfp-artifact-store/tfx_census_classification/{{workflow.uid}} |
+--------------------+-----------------------------------------------------------------------+
| eval-steps         | 500                                                                   |
+--------------------+-----------------------------------------------------------------------+
| train-steps        | 5000                                                                  |
+--------------------+-----------------------------------------------------------------------+
| accuracy-threshold | 0.75                                     

Fetching cluster endpoint and auth data.
kubeconfig entry generated for ks-ml-cluster-01.
Pipeline dbf0ecbe-d390-424a-a867-521cc39fbcf8 has been submitted



## Use the KFP UI to run the deployed pipeline...