# An Introduction to Kubeflow Pipelines SDK

## Imports

I like to put all my imports at the top of the notebook.

In [1]:
import time

import kfp
from kfp import dsl
from kfp import compiler

## Metadata

Fill out the metadata for the run, pipeline and experiment!

1. `namespace`: Your namespace.
1. `experiment_name`: Your pipelines are run in an experiment. Give your experiment a unique and descriptive name.
1. `experiment_description`: You should provide a short description, it will be a gift to your future self.
1. `pipeline_name`: Name your pipeline. Must be unique. Try to be descriptive.
1. `pipeline_description`: The more metadata the better!
1. `pipeline_package_path`: This is the location of the zipped YAML containing the description of the pipeline.
1. `run_name`: The run's name is automatically generated by concatenating the `experiment_name`, `pipeline_name` and today's time/date.

In [13]:
namespace = "bryanpaget"

experiment_name = "happy-little-experiment"
experiment_description = "My happiest experiment to date."

pipeline_name = "pipeline-312342"
pipeline_description = "This is what I'm doing!"

run_name = f"E{experiment_name}-{pipeline_name}-{time.strftime('%Y%m%d-%H%M%S')}"

## Pipeline Parameters

This is where you populate a dictionary with your pipeline's parameters. For this simple example we just need a dictionary of 5 integers.

In [9]:
pipeline_parameters = {
    'a': 5,
    'b': 5,
    'c': 8,
    'd': 10,
    'e': 18
}

## Function Operator

This is a simple operator for Kubeflow. For the next demo I'll do something more interesting. In the mean time here is the documentation on writing your own components. https://www.kubeflow.org/docs/components/pipelines/sdk/component-development/#writing-your-component-definition-file


In [10]:
def average_op(*numbers):
    """
    Factory for average ContainerOps: accepts an arbitrary number of input numbers,
    returning a ContainerOp that passes those numbers to the underlying Docker image
    for averaging.

    For dsl.ContainerOp:

        name (String): What will show up on the pipeline viewer.
        image (String): The container image that KFP runs to do the work.
        command (List): Put the commands for the container here.
        arguments (Dictionary): Passes each number as a separate command line argument.
                                Note that these arguments get serialized to strings
        file_outputs (Dictionary): Expect an output file called out.txt to be
                                   generated KFP can read this file and bring it back automatically

    Returns: output collected from ./out.txt from inside the container

    """

    if len(numbers) < 1:
        raise ValueError("You must specify at least one number to average.")

    return dsl.ContainerOp(
        name="average",
        image="k8scc01covidacr.azurecr.io/kfp-components/average:v1",
        command=["python", "average.py"],
        arguments=numbers,
        file_outputs={'data': './out.txt'}
    )

## Pipeline

This is where the pipeline is created using the `@dsl.pipeline` decorator.

In [11]:
@dsl.pipeline(name=pipeline_name, description=pipeline_description)
def pipeline_func(a, b, c, d, e):

    avg_1 = average_op(a, b, c)
    avg_2 = average_op(d, e)

    average_result_overall = average_op(avg_1.output, avg_2.output)

    print(average_result_overall)

## Publish Pipeline and Run Pipeline in an Experiment

The experiment is created once a connection is established to the KFP client. The pipeline is compiled and then run inside the experiment.

In [12]:
pipeline_package_path = f"{run_name}.yaml.zip"

client = kfp.Client()

experiment = client.create_experiment(
    name=experiment_name,
    description=experiment_description,
    namespace=namespace)

compiler.Compiler().compile(
    pipeline_func=pipeline_func,
    package_path=pipeline_package_path,
    type_check=True)

response = client.upload_pipeline(pipeline_package_path, pipeline_name=pipeline_name)

run = client.run_pipeline(
    experiment_id=experiment.id,
    job_name=run_name,
    pipeline_package_path=pipeline_package_path,
    params=pipeline_parameters
)

{'ContainerOp': {'is_exit_handler': False, 'human_name': 'average', 'display_name': None, 'name': 'average-3', 'node_selector': {}, 'volumes': [], 'tolerations': [], 'affinity': {}, 'pod_annotations': {}, 'pod_labels': {}, 'num_retries': 0, 'retry_policy': None, 'backoff_factor': None, 'backoff_duration': None, 'backoff_max_duration': None, 'timeout': 0, 'init_containers': [], 'sidecars': [], 'loop_args': None, '_component_spec_inputs_with_pipeline_params': [], '_inputs': [], 'dependent_names': [], 'enable_caching': True, 'attrs_with_pipelineparams': ['node_selector', 'volumes', 'pod_annotations', 'pod_labels', 'num_retries', 'init_containers', 'sidecars', 'tolerations', '_container', 'artifact_arguments', '_parameter_arguments'], '_is_v2': False, '_container': {'args': ['{{pipelineparam:op=average;name=data}}',
          '{{pipelineparam:op=average-2;name=data}}'],
 'command': ['python', 'average.py'],
 'env': None,
 'env_from': None,
 'image': 'k8scc01covidacr.azurecr.io/kfp-componen