## Kubeflow Pipeline Example

Kubeflow pipelines allow engineers to create a repeatable process for running different parts of machine learning model in parallel and on appropriate hardware.

Getting a pipeline up and running is not easy. After a machine learning model is working, each of these steps must be completed to convert it into a pipeline: 
* Create Docker images for independent parts of the model
* Write Kubeflow Pipeline orchestration code using the KFP DSL Python API
* Compile Kubeflow Pipeline code, along with docker images, into a .tar.gz file
* Upload compiled pipeline .tar.gz to Kubeflow
* Create an Experiment in Kubeflow to organize Pipeline runs
* Create a Run and then start it

All the steps listed above can be done with a single click by using Kale. This notebook presents a minimal example to show how Kale can convert a Notebook to a pipeline when annotations have been added to the code blocks in the Jupyter Notebook file.

Variables are tagged for transformation into pipeline run parameters to let a pipeline be run with different inputs. Code cells can be divided into independent pipeline steps. Steps that are dependent on prior steps are tagged for propper pipeline creation. Imports and common functions are marked for inclusion with each independent pipeline process. 

You can look at the annotations by inspecting the Notebook source, or by installing the Kale jupyter extension (see [github.com/kubeflow-kale/jupyterlab-kubeflow-kale](https://github.com/kubeflow-kale/jupyterlab-kubeflow-kale))




Click the Kale Deployment Panel icon ![kubeflow-favicon](https://www.kubeflow.org/favicons/favicon-32x32.png) in the left naviagtion bar

The **Kale Deployment Panel** will appear. Slide the **Enable** slider to the right

![enable-slider](https://aegirio.endpoints.kubeflow-pipeline-trials.cloud.goog/notebook/gadgeteer/myfirstnbserver/files/enable-slider.png?_xsrf=2%7Ced26d3de%7Cfad8d169787d218707e48e87fc3ace82%7C1589987705)

The Pipeline name and Experiment name are now shown in the Kale Deployment Panel

Annotations in the Jupyter Notebook file that were previously not shown are now displayed. A **Tag** element is added to the metadata element of code cells
```
"metadata": {
    "tags": [
     "imports"
    ]
   }
```

In [None]:
import random

Click the pencil icon on the right side of the cell above to open a window that allows editing the cell metadata. Click the **X** in the upper left to close the metadata edit window

![close-metadata-edit](https://aegirio.endpoints.kubeflow-pipeline-trials.cloud.goog/notebook/gadgeteer/myfirstnbserver/files/images/metadata-edit-close.png?_xsrf=2%7Ced26d3de%7Cfad8d169787d218707e48e87fc3ace82%7C1589987705)

In [None]:
CANDIES = 37

Variables annotated as pipeline-parameters will available in the **Create Run** UI

![run-parameter](https://aegirio.endpoints.kubeflow-pipeline-trials.cloud.goog/notebook/gadgeteer/myfirstnbserver/files/images/run-parameter.png?_xsrf=2%7Ced26d3de%7Cfad8d169787d218707e48e87fc3ace82%7C1589987705)

Any variable that is used, not just pipeline parameters, is persisted on the local file system so it is available to any dependent Pipeline Steps

In [None]:
def get_handful(kidname, left):
    if left == 0:
        print("{0} Got no candy!".format(kidname))
        return 0
    c = random.randint(1, left)
    print("{0} got {1} candies!".format(kidname, c))
    return c

Code cells annotated as imports and functions will be prepended to each pipeline step

In [None]:
print("Let's put %s candies in a bag and have three kids get a handful" % CANDIES)

This is the first actual pipeline step. When the pipeline is run, This step will be executed by creating a container from the image specified in **Advanced Setting** in the Kale Deployment panel

This step does not have any dependencies in the notebook code, but it does reference variables that are used in other steps. The value of these shared variables is persisted during pipeline runs by the automatically created `kale-marshal-volume` step


In [None]:
# kid1 gets a handful, without looking in the bad!
kid1 = get_handful("Jack", CANDIES)

This pipeline step is marked as dependent on the **sack** step. A container will be created only after the sack step successfully completes

In [None]:
kid2 = get_handful("Jill", CANDIES - kid1)

This step is marked as dependent on the **kid1** step which can be seen by hovering over the colored dot next to its name.

Notice we did not do anything to the variable `kid1` to make it available in this step 

In [None]:
kid3 = get_handful("Paul", CANDIES - kid1 - kid2)

print("There are {0} candies still in the sack".format(CANDIES - kid1 - kid2 - kid3))

### Create Pipeline and Run
A new name Pipeline name and an optional description can be entered in the Kale Deployment Panel.

Experiments are used by Kubeflow to organize different Pipeline runs. You can select an existing Experiment, or click **+New Experiment** to create a new one.

A unique name for this Pipeline run will be automatically created.

Click the **COMPILE AND RUN** button in the Kale Deployment Panel