# Reusable components and Pre-build components

This tutorial describes the manual way of writing a full component program (in any language) and a component definition for it. Below is a summary of the steps involved in creating and using a component:

- Write the program that contains your component’s logic. The program must use files and command-line arguments to pass data to and from the component.
- Containerize the program.
- Write a component specification in YAML format that describes the component for the Kubeflow Pipelines system.
- Use the Kubeflow Pipelines SDK to load your component, use it in a pipeline and run that pipeline.

More over, we will combine our built components together with a pre-build components to compose a pipeline with two steps
- Train a minist model and export to GCS
- Deploy the exported tensorflow model on AI Platform
- Test the deployment by calling the end point

**Note: Make sure that you have docker installed in the local environment**

In [3]:
import kfp
import kfp.gcp as gcp
import kfp.dsl as dsl
import kfp.compiler as compiler
import kfp.components as comp
import datetime

import kubernetes as k8s

In [4]:
PROJECT_ID='kubeflow-pipeline-fantasy'

# Build reusable components

## Writing the program code

The following cell creates a file `app.py` that contains a Python script. The script takes a GCS bucket name as an input argument, gets the lists of blobs in that bucket, prints the list of blobs and also writes them to an output file.

In [63]:
%%bash

# Create folders if they don't exist.
mkdir -p tmp/reuse_components_pipeline/minist_training

# Create the Python file that lists GCS blobs.
cat > ./tmp/reuse_components_pipeline/minist_training/app.py <<HERE
import argparse
from datetime import datetime
import tensorflow as tf

parser = argparse.ArgumentParser()
parser.add_argument(
    '--model_path', type=str, required=True, help='Name of the model file.')
parser.add_argument(
    '--bucket', type=str, required=True, help='GCS bucket name.')
args = parser.parse_args()

bucket=args.bucket
model_path=args.model_path

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

print(model.summary())    

mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

callbacks = [
  tf.keras.callbacks.TensorBoard(log_dir=bucket + '/logs/' + datetime.now().date().__str__()),
  # Interrupt training if val_loss stops improving for over 2 epochs
  tf.keras.callbacks.EarlyStopping(patience=2, monitor='val_loss'),
]

model.fit(x_train, y_train, batch_size=32, epochs=5, callbacks=callbacks,
          validation_data=(x_test, y_test))

from tensorflow import gfile

gcs_path = bucket + "/" + model_path
# The export require the folder is new
if gfile.Exists(gcs_path):
    gfile.DeleteRecursively(gcs_path)
tf.keras.experimental.export_saved_model(model, gcs_path)

with open('/output.txt', 'w') as f:
  f.write(gcs_path)
HERE

## Create a Docker container
Create your own container image that includes your program. 
- If your component creates some outputs to be fed as inputs to the downstream components, each separate output must be written as a string to a separate local text file inside the container image. 
- For example, if a trainer component needs to output the trained model path, it can write the path to a local file `/output.txt`. 
- The string written to an output file cannot be too big. If it is too big (>> 100 kB), it is recommended to save the output to an external persistent storage and pass the storage path to the next component.

Now create a container that runs the script. Start by creating a `Dockerfile`. A `Dockerfile` contains the instructions to assemble a Docker image. The `FROM` statement specifies the Base Image from which you are building. `WORKDIR` sets the working directory. When you assemble the Docker image, `COPY` will copy the required files and directories (for example, `app.py`) to the filesystem of the container. `RUN` will execute a command (for example, install the dependencies) and commits the results. 

In [64]:
%%bash

# Create Dockerfile.
# AI platform only support tensorflow 1.14
cat > ./tmp/reuse_components_pipeline/minist_training/Dockerfile <<EOF
FROM tensorflow/tensorflow:1.14.0-py3
WORKDIR /app
COPY . /app
EOF

Now that we have created our Dockerfile we can create our Docker image. Then we need to push the image to a registry to host the image. Now create a Shell script that builds a container image and stores it in the Google Container Registry.

In [65]:
%%bash -s "{PROJECT_ID}"

IMAGE_NAME="minist_training_kf_pipeline"
TAG="latest" # "v_$(date +%Y%m%d_%H%M%S)"

# Create script to build docker image and push it.
cat > ./tmp/reuse_components_pipeline/minist_training/build_image.sh <<HERE
PROJECT_ID="${1}"
IMAGE_NAME="${IMAGE_NAME}"
TAG="${TAG}"
GCR_IMAGE="gcr.io/\${PROJECT_ID}/\${IMAGE_NAME}:\${TAG}"
docker build -t \${IMAGE_NAME} .
docker tag \${IMAGE_NAME} \${GCR_IMAGE}
docker push \${GCR_IMAGE}
docker inspect --format="{{index .RepoDigests 0}}" "${IMAGE_NAME}"
docker image rm \${IMAGE_NAME}
docker image rm \${GCR_IMAGE}
HERE

Run the script.

In [66]:
%%bash

# Build and push the image.
cd tmp/reuse_components_pipeline/minist_training
bash build_image.sh

Sending build context to Docker daemon  5.632kB
Step 1/3 : FROM tensorflow/tensorflow:1.14.0-py3
 ---> 4cc892a3babd
Step 2/3 : WORKDIR /app
 ---> Running in 9a683d63434c
Removing intermediate container 9a683d63434c
 ---> 6a94fdb7cc0b
Step 3/3 : COPY . /app
 ---> a6a06c59c76e
Successfully built a6a06c59c76e
Successfully tagged minist_training_kf_pipeline:latest
The push refers to repository [gcr.io/kubeflow-pipeline-fantasy/minist_training_kf_pipeline]
ad4f7a6923bb: Preparing
6fd4fb980e53: Preparing
a144de6e67e6: Preparing
4b8ec9124f1c: Preparing
652cdcb17d30: Preparing
dd7f77c80a16: Preparing
f4cb77175ac9: Preparing
31835e84bcc0: Preparing
75e70aa52609: Preparing
dda151859818: Preparing
fbd2732ad777: Preparing
ba9de9d8475e: Preparing
75e70aa52609: Waiting
dda151859818: Waiting
dd7f77c80a16: Waiting
f4cb77175ac9: Waiting
31835e84bcc0: Waiting
fbd2732ad777: Waiting
ba9de9d8475e: Waiting
652cdcb17d30: Layer already exists
a144de6e67e6: Layer already exists
4b8ec9124f1c: Layer already ex

## Writing your component definition file
To create a component from your containerized program you need to write component specification in YAML format that describes the component for the Kubeflow Pipelines system.

For the complete definition of a Kubeflow Pipelines component, see the [component specification](https://www.kubeflow.org/docs/pipelines/reference/component-spec/). However, for this tutorial you don’t need to know the full schema of the component specification. The tutorial provides enough information for the relevant the components.

Start writing the component definition (component.yaml) by specifying your container image in the component’s implementation section:

In [67]:
%%bash

# Create Yaml

cat > minist_component.yaml <<HERE
name: Minist training
description: Train a minist model and save to GCS
inputs:
  - name: model_path
    description: 'Path of the tf model.'
    type: String
  - name: bucket
    description: 'GCS bucket name.'
    type: String
outputs:
  - name: gcs_model_path
    description: 'Trained model path.'
    type: GCSPath
implementation:
  container:
    image: gcr.io/kubeflow-pipeline-fantasy/minist_training_kf_pipeline@sha256:314be0b9ddfa933eaca15a694dca29d0b93eb0890ef5ee9a221bcfa5a79db60d
    command: [
      python, /app/app.py,
      --model_path, {inputValue: model_path},
      --bucket,     {inputValue: bucket},
    ]
    fileOutputs:
      gcs_model_path: /output.txt
HERE

In [None]:
import os
minist_train_op = kfp.components.load_component_from_file(os.path.join('./', 'minist_component.yaml')) 

In [None]:
minist_train_op.component_spec

# Define deployment operation on AI Platform

In [70]:
mlengine_deploy_op = comp.load_component_from_url(
    'https://raw.githubusercontent.com/kubeflow/pipelines/master/components/gcp/ml_engine/deploy/component.yaml')

def deploy(
    project_id,
    model_uri,
    model_id,
    runtime_version,
    python_version):
    
    return mlengine_deploy_op(
        model_uri=model_uri,
        project_id=project_id, 
        model_id=model_id, 
        runtime_version=runtime_version, 
        python_version=python_version,
        replace_existing_version=True, 
        set_default=True)

# Create a lightweight component for testing the deployment

In [84]:
def deployment_test(project_id: str, model_name: str, version: str) -> str:

    model_name = model_name.split("/")[-1]
    version = version.split("/")[-1]
    
    import googleapiclient.discovery
    
    def predict(project, model, data, version=None):
      """Run predictions on a list of instances.

      Args:
        project: (str), project where the Cloud ML Engine Model is deployed.
        model: (str), model name.
        data: ([[any]]), list of input instances, where each input instance is a
          list of attributes.
        version: str, version of the model to target.

      Returns:
        Mapping[str: any]: dictionary of prediction results defined by the model.
      """

      service = googleapiclient.discovery.build('ml', 'v1')
      name = 'projects/{}/models/{}'.format(project, model)

      if version is not None:
        name += '/versions/{}'.format(version)

      response = service.projects().predict(
          name=name, body={
              'instances': data
          }).execute()

      if 'error' in response:
        raise RuntimeError(response['error'])

      return response['predictions']

    import tensorflow as tf
    import json
    
    mnist = tf.keras.datasets.mnist
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    result = predict(
        project=project_id,
        model=model_name,
        data=x_test[0:2].tolist(),
        version=version)
    print(result)
    
    return json.dumps(result)

In [85]:
deployment_test(
    project_id=PROJECT_ID,
    model_name="minist",
    version='ver_bb1ebd2a06ab7f321ad3db6b3b3d83e6' # previous deployed version for testing
)



[{'dense_1': [8.298250420146758e-10, 5.753670873076544e-09, 8.097066483969684e-08, 8.076931408140808e-06, 1.990844988863927e-12, 1.6837516103596073e-10, 2.0005800130656198e-14, 0.9999895095825195, 1.339585775639307e-09, 2.21594291360816e-06]}, {'dense_1': [2.2494565932174027e-10, 2.0927380319335498e-05, 0.9999790191650391, 4.458803459783667e-08, 6.04291304021347e-17, 2.6902041705412216e-10, 3.358231401295875e-11, 1.3083588540645143e-16, 1.6866483903976714e-11, 1.6048520169901435e-16]}]


'[{"dense_1": [8.298250420146758e-10, 5.753670873076544e-09, 8.097066483969684e-08, 8.076931408140808e-06, 1.990844988863927e-12, 1.6837516103596073e-10, 2.0005800130656198e-14, 0.9999895095825195, 1.339585775639307e-09, 2.21594291360816e-06]}, {"dense_1": [2.2494565932174027e-10, 2.0927380319335498e-05, 0.9999790191650391, 4.458803459783667e-08, 6.04291304021347e-17, 2.6902041705412216e-10, 3.358231401295875e-11, 1.3083588540645143e-16, 1.6866483903976714e-11, 1.6048520169901435e-16]}]'

In [86]:
deployment_test_op = comp.func_to_container_op(
    func=deployment_test, 
    base_image="tensorflow/tensorflow:1.15.0-py3",
    packages_to_install=["google-api-python-client==1.7.8"])

# Create your workflow as a Python function

Define your pipeline as a Python function. ` @kfp.dsl.pipeline` is a required decoration including `name` and `description` properties. Then compile the pipeline function. After the compilation is completed, a pipeline file is created.

In [87]:
# Define the pipeline
@dsl.pipeline(
   name='Minist pipeline',
   description='A toy pipeline that performs minist model training.'
)
def minist_reuse_component_pipeline(
    project_id: str = PROJECT_ID,
    model_path: str = 'mnist_model', 
    bucket: str = "gs://kubeflow-pipeline-fantasy-kubeflow1-bucket"
):
    train_task = minist_train_op(
        model_path=model_path, 
        bucket=bucket
    ).apply(gcp.use_gcp_secret('user-gcp-sa'))
    
    deploy_task = deploy(
        project_id=project_id,
        model_uri=train_task.outputs['gcs_model_path'],
        model_id="minist", 
        runtime_version="1.14",
        python_version="3.5"
    ).apply(gcp.use_gcp_secret('user-gcp-sa'))  
    
    deploy_test_task = deployment_test_op(
        project_id=project_id,
        model_name=deploy_task.outputs["model_name"], 
        version=deploy_task.outputs["version_name"],
    ).apply(gcp.use_gcp_secret('user-gcp-sa'))
    
    return True

In [88]:
#Get or create an experiment and submit a pipeline run
in_cluster = True
try:
  k8s.config.load_incluster_config()
except:
  in_cluster = False
  pass

if in_cluster:
    client = kfp.Client()
else:
    host = "https://kubeflow1.endpoints.kubeflow-pipeline-fantasy.cloud.goog/pipeline"
    client_id = "493831447550-os23o55235htd9v45a9lsejv8d1plhd0.apps.googleusercontent.com"
    other_client_id = "493831447550-iu24vv6id3ng5smhf2lboovv5qukuhbh.apps.googleusercontent.com"
    other_client_secret = "cB8Xj-rb9JWCYcCRDlpTMfhc"
    client = kfp.Client(host=host, 
                        client_id=client_id,
                        other_client_id=other_client_id, 
                        other_client_secret=other_client_secret)

In [89]:
pipeline_func = minist_reuse_component_pipeline
pipeline_filename = pipeline_func.__name__ + '.pipeline.zip'

compiler.Compiler().compile(pipeline_func, pipeline_filename)
#Submit a pipeline run
arguments = {"model_path":"mnist_model",
             "bucket":"gs://kubeflow-pipeline-fantasy-kubeflow1-bucket"}
run_name = pipeline_func.__name__ + ' run'
experiment = client.create_experiment('python-functions-minist')

run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)

  serialized_value),


Follow the [instructions](https://www.kubeflow.org/docs/other-guides/accessing-uis/) on kubeflow.org to access Kubeflow UIs. Upload the created pipeline and run it.

**Warning:** When the pipeline is run, it pulls the image from the repository to the Kubernetes cluster to create a container. Kubernetes caches pulled images. One solution is to use the image digest instead of the tag in your component dsl, for example, `s/v1/sha256:9509182e27dcba6d6903fccf444dc6188709cc094a018d5dd4211573597485c9/g`. Alternatively, if you don't want to update the digest every time, you can try `:latest` tag, which will force the k8s to always pull the latest image..

___