# Kubeflow pipelines
Learning Objectives:

1. Learn how to deploy a Kubeflow cluster on GCP
2. Learn how to create a experiment in Kubeflow
3. Learn how to package you code into a Kubeflow pipeline
4. Learn how to run a Kubeflow pipeline in a repeatable and traceable way

## Introduction
In this notebook, we will first setup a Kubeflow cluster on GCP. Then, we will create a Kubeflow experiment and a Kubflow pipeline from our taxifare machine learning code. At last, we will run the pipeline on the Kubeflow cluster, providing us with a reproducible and traceable way to execute machine learning code.

In [36]:
pip freeze | grep kfp || pip install kfp

kfp==1.4.0
kfp-pipeline-spec==0.1.5
kfp-server-api==1.3.0
Note: you may need to restart the kernel to use updated packages.


In [37]:
from os import path

import kfp
import kfp.compiler as compiler
import kfp.components as comp
import kfp.dsl as dsl
import kfp.gcp as gcp
import kfp.notebook

## Setup a Kubeflow cluster on GCP
TODO 1

To deploy a Kubeflow cluster in your GCP project, use the AI Platform pipelines:

1. Go to AI Platform Pipelines in the GCP Console.
2. Create a new instance
3. Hit "Configure"
4. Check the box "Allow access to the following Cloud APIs"
5. Hit "Create Cluster"
6. Hit "Deploy"

When the cluster is ready, go back to the AI Platform pipelines page and click on "SETTINGS" entry for your cluster. This will bring up a pop up with code snippets on how to access the cluster programmatically.

Copy the "host" entry and set the "HOST" variable below with that.

In [38]:
HOST = "2a9c23cb3f73571c-dot-europe-west1.pipelines.googleusercontent.com"
BUCKET = "buddie_rec_data"

## Create an experiment
TODO 2

We will start by creating a Kubeflow client to pilot the Kubeflow cluster:

In [39]:
client = kfp.Client(host=HOST)

In [40]:
client.list_experiments()

{'experiments': [{'created_at': datetime.datetime(2021, 2, 7, 12, 58, 25, tzinfo=tzutc()),
                  'description': 'All runs created without specifying an '
                                 'experiment will be grouped here.',
                  'id': '97dd67b0-d056-468d-9f3e-78d236279ac9',
                  'name': 'Default',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'},
                 {'created_at': datetime.datetime(2021, 2, 7, 13, 1, 17, tzinfo=tzutc()),
                  'description': None,
                  'id': '9dd39e50-32a3-4430-9593-4bba04df8fc5',
                  'name': 'buddieRec',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'}],
 'next_page_token': None,
 'total_size': 2}

In [41]:
exp = client.create_experiment(name='buddieRec')

In [42]:
client.list_experiments()

{'experiments': [{'created_at': datetime.datetime(2021, 2, 7, 12, 58, 25, tzinfo=tzutc()),
                  'description': 'All runs created without specifying an '
                                 'experiment will be grouped here.',
                  'id': '97dd67b0-d056-468d-9f3e-78d236279ac9',
                  'name': 'Default',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'},
                 {'created_at': datetime.datetime(2021, 2, 7, 13, 1, 17, tzinfo=tzutc()),
                  'description': None,
                  'id': '9dd39e50-32a3-4430-9593-4bba04df8fc5',
                  'name': 'buddieRec',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'}],
 'next_page_token': None,
 'total_size': 2}

## Packaging your code into Kubeflow components
We have packaged our taxifare ml pipeline into three components:

- ./components/bq2gcs that creates the training and evaluation data from BigQuery and exports it to GCS
- ./components/trainjob that launches the training container on AI-platform and exports the model
- ./components/deploymodel that deploys the trained model to AI-platform as a REST API

Each of these components has been wrapped into a Docker container, in the same way we did with the taxifare training code in the previous lab.

If you inspect the code in these folders, you'll notice that the main.py or main.sh files contain the code we previously executed in the notebooks (loading the data to GCS from BQ, or launching a training job to AI-platform, etc.). The last line in the Dockerfile tells you that these files are executed when the container is run. So we just packaged our ml code into light container images for reproducibility.

We have made it simple for you to build the container images and push them to the Google Cloud image registry gcr.io in your project:

In [43]:
#sudo chmod 755 buddieRec/scripts/*.sh

In [44]:
# Builds the taxifare trainer container in case you skipped the optional part of lab 1
!buddieRec/scripts/build.sh

Sending build context to Docker daemon  111.6kB
Step 1/8 : FROM gcr.io/deeplearning-platform-release/tf2-cpu
 ---> d52504fdb37e
Step 2/8 : COPY . /code
 ---> Using cache
 ---> 3fabde8207f3
Step 3/8 : WORKDIR /code
 ---> Using cache
 ---> 81f17f2f8958
Step 4/8 : RUN pip3 install cloudml-hypertune
 ---> Using cache
 ---> ccfa65eaf487
Step 5/8 : RUN pip3 install tensorflow_recommenders
 ---> Using cache
 ---> 5b7124dd6459
Step 6/8 : RUN pip3 install gcsfs
 ---> Using cache
 ---> e7ca4ad07b03
Step 7/8 : RUN pip3 install tensorflow==2.3.0
 ---> Using cache
 ---> 10cf46dc8289
Step 8/8 : ENTRYPOINT ["python3", "-m", "trainer.task"]
 ---> Using cache
 ---> 739abd0c9ba4
Successfully built 739abd0c9ba4
Successfully tagged gcr.io/buddie-270710/buddierec_training_container:latest


In [45]:
# Pushes the taxifare trainer container to gcr/io
!buddieRec/scripts/push.sh

Using default tag: latest
The push refers to repository [gcr.io/buddie-270710/buddierec_training_container]

[1B9e6c8de0: Preparing 
[1B86ec1204: Preparing 
[1B056ea09e: Preparing 
[1B63d4fcbe: Preparing 
[1B9007822d: Preparing 
[1Bd16ae66b: Preparing 
[1B3ee85360: Preparing 
[1B127a9d4c: Preparing 
[1B9c3bf55d: Preparing 
[1Be8fd4ff0: Preparing 
[1B0370cab4: Preparing 
[1Bb6c33408: Preparing 
[1B4b160541: Preparing 
[1B8b7fb87e: Preparing 
[1Bd372a1da: Preparing 
[1B0d62afb9: Preparing 
[1Bfb2f7eb0: Preparing 
[1Bdeadeefa: Preparing 
[1B62bfc51d: Preparing 
[1B3b0fe7f1: Preparing 
[1B9f02e96e: Preparing 
[1Befae17e5: Preparing 
[1B818f1f96: Preparing 
[1B2392e386: Preparing 
[2B2392e386: Layer already exists 23A[2K[20A[2K[18A[2K[14A[2K[9A[2K[5A[2Klatest: digest: sha256:4f4ec83008562c942c502908b31a68fc663f2b1739be9aba90ef92a1552f89a3 size: 5557


In [46]:
# Builds the KF component containers and push them to gcr/io
!cd pipelines && make components

/bin/sh: line 0: cd: pipelines: No such file or directory


Now that the container images are pushed to the [registry in your project](https://console.cloud.google.com/gcr), we need to create yaml files describing to Kubeflow how to use these containers. It boils down essentially to

- describing what arguments Kubeflow needs to pass to the containers when it runs them
- telling Kubeflow where to fetch the corresponding Docker images

In the cells below, we have three of these "Kubeflow component description files", one for each of our components.


**IMPORTANT: Modify the image URI in the cell below to reflect that you pushed the images into the gcr.io associated with your project.**

In [47]:
%%writefile bq2gcs.yaml

name: bq2gcs
    
description: |
    This component creates the training and
    validation datasets as BiqQuery tables and export
    them into a Google Cloud Storage bucket at
    gs://<BUCKET>/taxifare/data.
        
inputs:
    - {name: Input Bucket , type: String, description: 'GCS directory path.'}

implementation:
    container:
        image: gcr.io/buddie-270710/buddieRec-bq2gcs
        args: ["--bucket", {inputValue: Input Bucket}]

Overwriting bq2gcs.yaml


In [48]:
%%writefile trainjob.yaml

name: trainjob
    
description: |
    This component trains a model to predict that taxi fare in NY.
    It takes as argument a GCS bucket and expects its training and
    eval data to be at gs://<BUCKET>/taxifare/data/ and will export
    the trained model at  gs://<BUCKET>/taxifare/model/.
        
inputs:
    - {name: Input Bucket , type: String, description: 'GCS directory path.'}

implementation:
    container:
        image: gcr.io/buddie-270710/buddieRec-trainjob
        args: [{inputValue: Input Bucket}]

Overwriting trainjob.yaml


In [49]:
%%writefile deploymodel.yaml

name: deploymodel
    
description: |
    This component deploys a trained taxifare model on GCP as taxifare:dnn.
    It takes as argument a GCS bucket and expects the model to deploy 
    to be found at gs://<BUCKET>/taxifare/model/export/savedmodel/
        
inputs:
    - {name: Input Bucket , type: String, description: 'GCS directory path.'}

implementation:
    container:
        image: gcr.io/buddie-270710/buddieRec-deploymodel
        args: [{inputValue: Input Bucket}]

Overwriting deploymodel.yaml


## Create a Kubeflow pipeline
The code below creates a kubeflow pipeline by decorating a regular function with the @dsl.pipeline decorator. Now the arguments of this decorated function will be the input parameters of the Kubeflow pipeline.

Inside the function, we describe the pipeline by

- loading the yaml component files we created above into a Kubeflow op
- specifying the order into which the Kubeflow ops should be run

In [50]:
PIPELINE_TAR = 'buddieRec.tar.gz'
BQ2GCS_YAML = './bq2gcs.yaml'
TRAINJOB_YAML = './trainjob.yaml'
DEPLOYMODEL_YAML = './deploymodel.yaml'


@dsl.pipeline(
    name='BuddieRec',
    description='Train a ml model to predict contentId according to clientId')
def pipeline(gcs_bucket_name='buddie_rec_data'):

    bq2gcs_op = comp.load_component_from_file(BQ2GCS_YAML)
    bq2gcs = bq2gcs_op(
        input_bucket=gcs_bucket_name,
    )

    trainjob_op = comp.load_component_from_file(TRAINJOB_YAML)
    trainjob = trainjob_op(
        input_bucket=gcs_bucket_name,
    )

    deploymodel_op = comp.load_component_from_file(DEPLOYMODEL_YAML)
    deploymodel = deploymodel_op(
        input_bucket=gcs_bucket_name,
    )

    trainjob.after(bq2gcs)
    deploymodel.after(trainjob)

The pipeline function above is then used by the Kubeflow compiler to create a Kubeflow pipeline artifact that can be either uploaded to the Kubeflow cluster from the UI, or programatically, as we will do below:

In [51]:
compiler.Compiler().compile(pipeline, PIPELINE_TAR)

In [52]:
ls $PIPELINE_TAR

buddieRec.tar.gz


If you untar and uzip this pipeline artifact, you'll see that the compiler has transformed the Python description of the pipeline into yaml description!

Now let's feed Kubeflow with our pipeline and run it using our client:

In [53]:
run = client.run_pipeline(
    experiment_id=exp.id, 
    job_name='buddieRec', 
    pipeline_package_path='buddieRec.tar.gz', 
    params={
        'gcs_bucket_name': BUCKET,
    },
)

Have a look at the link to monitor the run.

Now all the runs are nicely organized under the experiment in the UI, and new runs can be either manually launched or scheduled through the UI in a completely repeatable and traceable way!