# Kubeflow Pipelines

The [Kubeflow Pipelines SDK](https://github.com/kubeflow/pipelines/tree/master/sdk) provides a set of Python packages that you can use to specify and run your machine learning (ML) workflows. A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other.


Kubeflow website has a very detail expaination of kubeflow components, please go to [Introduction to the Pipelines SDK](https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/) for details

# Install the Kubeflow Pipelines SDK

This guide tells you how to install the [Kubeflow Pipelines SDK](https://github.com/kubeflow/pipelines/tree/master/sdk) which you can use to build machine learning pipelines. You can use the SDK to execute your pipeline, or alternatively you can upload the pipeline to the Kubeflow Pipelines UI for execution.

All of the SDK’s classes and methods are described in the auto-generated [SDK reference docs](https://kubeflow-pipelines.readthedocs.io/en/latest/).


Run the following command to install the Kubeflow Pipelines SDK


In [1]:
!pip install https://storage.googleapis.com/ml-pipeline/release/0.1.29/kfp.tar.gz --upgrade --user

Collecting https://storage.googleapis.com/ml-pipeline/release/0.1.29/kfp.tar.gz
[?25l  Downloading https://storage.googleapis.com/ml-pipeline/release/0.1.29/kfp.tar.gz (88kB)
[K    100% |████████████████████████████████| 92kB 4.6MB/s ta 0:00:011
Collecting kubernetes<=9.0.0,>=8.0.0 (from kfp==0.1.29)
[?25l  Downloading https://files.pythonhosted.org/packages/00/f7/4f196c55f1c2713d3edc8252c4b45326306eef4dc10048f13916fe446e2b/kubernetes-9.0.0-py2.py3-none-any.whl (1.4MB)
[K    100% |████████████████████████████████| 1.4MB 13.3MB/s ta 0:00:01
Collecting kfp-server-api<=0.1.25,>=0.1.18 (from kfp==0.1.29)
  Downloading https://files.pythonhosted.org/packages/3e/24/a82ae81487bf61fb262e67167cee1843f2f70d940713c092b124c9aaa0dc/kfp-server-api-0.1.18.3.tar.gz
Collecting argo-models==2.2.1a (from kfp==0.1.29)
  Downloading https://files.pythonhosted.org/packages/62/53/a92df7c1c793edf2db99b14e428246e4b49b93499a5c9ed013e0aa2416f6/argo-models-2.2.1a0.tar.gz
Collecting tabulate==0.8.3 (from kfp==

  Building wheel for kfp-server-api (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/07/13/f3/31e9e1a25e10b8c3d04df74a01f4dbcf16a4119272cd41ba7a
  Building wheel for argo-models (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/bd/5b/6b/20cdc06ddb10caa3a86f5804eb9a90122ae8de0bcf19a468d8
  Building wheel for tabulate (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/2b/67/89/414471314a2d15de625d184d8be6d38a03ae1e983dbda91e84
  Building wheel for wrapt (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/b1/c2/ed/d62208260edbd3fa7156545c00ef966f45f2063d0a84f8208a
Successfully built kfp kfp-server-api argo-models tabulate wrapt
[31mkubeflow-tfjob 0.1.3 has requirement kubernetes>=10.0.1, but you'll have kubernetes 9.0.0 which is incompatible.[0m
[31mkubeflow-pytorchjob 0.1.3 has requirement kubernetes>=10.0.1, but you'll have kubernetes 9.0.0 which is incompatible.[0m
[31

After successful installation, the command `dsl-compile` should be available. You can use this command to verify it

In [2]:
!which dsl-compile

/opt/conda/bin/dsl-compile


> Note: Please check official documentation to understand Pipline concetps before your move forward.  [Introduction to Pipelines SDK](https://www.kubeflow.org/docs/pipelines/sdk/sdk-overview/)

# Build a Simple Pipeline

In this example, we want to calculate sum of three numbers. 

1. Let's assume we have a python image to use. It accepts two arguments and return sum of them. 

2. The sum of a and b will be used to calculate final result with sum of c and d. In total, we will have three arithmetical operators. Then we use another echo operator to print the result. 

# 1. Create a container image for each component

Assumes that you have already created a program to perform the task required in a particular step of your ML workflow. For example, if the task is to train an ML model, then you must have a program that does the training,

Your component can create `outputs` that the downstream components can use as `inputs`. This will be used to build Job Directed Acyclic Graph (DAG)


> In this case, we will use a python base image to do the calculation. We skip buiding our own image.

# 2. Create a Python function to wrap your component

Define a Python function to describe the interactions with the Docker container image that contains your pipeline component.

Here, in order to simplify the process, we use simple way to calculate sum. Ideally, you need to build a new container image for your code change.

In [3]:
import kfp
from kfp import dsl

def add_two_numbers(a, b):
    return dsl.ContainerOp(
        name='calculate_sum',
        image='python:3.6.8',
        command=['python', '-c'],
        arguments=['with open("/tmp/results.txt", "a") as file: file.write(str({} + {}))'.format(a, b)],
        file_outputs={
            'data': '/tmp/results.txt',
        }
    )

def echo_op(text):
    return dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=['echo "Result: {}"'.format(text)]
    )

# 3. Define your pipeline as a Python function

Describe each pipeline as a Python function.

In [4]:
@dsl.pipeline(
  name='Calcualte sum pipeline',
  description='Calculate sum of numbers and prints the result.'
)
def calculate_sum(
    a=7,
    b=10,
    c=4,
    d=7
):
    """A four-step pipeline with first two running in parallel."""

    sum1 = add_two_numbers(a, b)
    sum2 = add_two_numbers(c, d)
    sum = add_two_numbers(sum1.output, sum2.output)

    echo_task = echo_op(sum.output)

# 4. Compile the pipeline

Compile the pipeline to generate a compressed YAML definition of the pipeline. The Kubeflow Pipelines service converts the static configuration into a set of Kubernetes resources for execution.

There are two ways to compile the pipeline:  Kubeflow Pipelines Python SDK or Command Line Interface (CLI)

Option 1 - Python SDK
```
kfp.compiler.Compiler.compile()
```

In [5]:
kfp.compiler.Compiler().compile(calculate_sum, 'calculate-sum-pipeline.zip')

Option 2 - CLI
```
dsl-compile --py [path/to/python/file] --output my-pipeline.zip
```

We are using Option 1 above to highlight the Kubeflow Pipelines Python SDK.

# 5. Deploy pipeline

There're two ways to deploy the pipeline. Either upload the generate `.tar.gz` file through the `Kubeflow Pipelines UI`, or use `Kubeflow Pipeline SDK` to deploy it.

We will only show sdk usage here.

In [6]:
client = kfp.Client()
aws_experiment = client.create_experiment(name='aws')
my_run = client.run_pipeline(aws_experiment.id, 'calculate-sum-pipeline', 
  'calculate-sum-pipeline.zip')

  return yaml.load(f)
