# Hello world

This notebook shows you in a very simple way how you can start writing python code that executes on the notebook server and then turns that into a kubeflow pipeline and componentizing them. 

To start off, let's just print "hello world" in this notebook:

In [1]:
print("Hello World")

Hello World


We can also print this differently using a shell command:

In [2]:
!echo "Hello World"

Hello World


### Connecting to kubeflow

When we want to automate things on kubeflow, we need know how to connect to the kubeflow server and execute actions there. There's an SDK for Kubeflow called "kfp", which contains a client object that can execute a number of actions. When you run python notebooks on the kubeflow server itself, you only have to import kfp and it will connect to the cluster it runs on. When you do not run on a notebook pod, which is the case in either AI Platform Notebooks or a local deployment, you should specify the hostname to connect to separately, as I've done here.

In [3]:
import kfp

host='http://localhost:8080'
client = kfp.Client(host=host)

In [4]:
client.list_experiments()

{'experiments': [{'created_at': datetime.datetime(2020, 9, 24, 11, 30, 17, tzinfo=tzutc()),
                  'description': 'All runs created without specifying an '
                                 'experiment will be grouped here.',
                  'id': '4f2c1759-d018-4e9f-b893-b4043a821413',
                  'name': 'Default',
                  'resource_references': None,
                  'storage_state': 'STORAGESTATE_AVAILABLE'}],
 'next_page_token': None,
 'total_size': 1}

### The first pipeline

In the first pipeline we'll turn the hello world statement into a function and provide the minimal amount of code to run that on kubeflow. Thus, we demonstrate how to build a pipeline, but also give you the information that it is possible to run a python function directly on kubeflow with one simple decorator. 

In [5]:
# This import gives us access to the domain specific language to create pipelines
from kfp import dsl
# this import is to be able to turn a python function into a containerOp (a kubeflow task).
from kfp.components import func_to_container_op


@func_to_container_op
def hello_world():
    print("Hello world")


# This decorator specifies that this function defines the pipeline. The function can have a number of arguments,
# all of which will be added to the UI of kubeflow as well, so that you can specify the value of those arguments in the UI.
# In this case no arguments are given for simplicity.
@dsl.pipeline(
    name='Hello World pipeline',  # The name for the pipeline
    description='Just prints "Hello World".'  # The description of the pipeline
)
def hello_world_pipeline():
    # Here it defines one task, composed of our python function.
    echo_task = hello_world()

After pressing enter, nothing happens. This is because we've only defined what the pipeline should look like in python code, but we didn't specify anything with regards to what to do with it. One of the things we can do is compile the pipeline into YAML, which we can then upload to Kubeflow.

In [6]:
kfp.compiler.Compiler().compile(hello_world_pipeline, 'hello_world_pipeline.yaml')

In the browser on the left you would now see the yaml file appear. This YAML file is what kubeflow uses to define the workflow. Open it to see what's in there and see if you recognize all the elements from the pipeline.

Another thing we can do with the SDK is to trigger this pipeline on kubeflow, saving the step of manually uploading it and triggering it from the UI. When we do that, the experiment link and the run link appear for inspection

In [7]:
client.create_run_from_pipeline_func(hello_world_pipeline, arguments={})

RunPipelineResult(run_id=79444b0d-3b71-4238-b239-7bbd4124c007)

Next, familiarize yourself with the UI, where to find the task logs, the task output. Most of it is self explanatory.  You'll notice that even though we triggered a pipeline run on kubeflow, the pipeline definition is not available in the "Pipelines" section.

### Including a container more explicitly.

In Kubeflow, anything that runs is always a container, even if you just run a python function as above (check the yaml to figure out which docker container it used). We can pass parameters to containers. Even for as simple as printing a "Hello World" message, we must download a container that can run the command. In this case we will use the `library/bash:4.4.23` container, which then runs one shell command to echo hello world. Here is what the pipeline then looks like:

In [8]:
# Instead of the function decorator, we make an explicit reference to a container image using the "kfp.dsl" module.
def echo_op():
    return dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=['echo "hello world"']
    )


@dsl.pipeline(
    name='Hello World pipeline 2',
    description='Just prints "Hello World" through the shell.'
)
def hello_world_pipeline_2():
    echo_task = echo_op()

In [9]:
# Compilation is not necessary for running this on kubeflow, just to generate the YAML so you can see what's in there.
kfp.compiler.Compiler().compile(hello_world_pipeline_2, 'hello_world_pipeline_2.yaml')
client.create_run_from_pipeline_func(hello_world_pipeline_2, arguments={})

RunPipelineResult(run_id=cf73cb18-7d04-4349-aacb-fa227d5a736f)

### Pipeline parameters

We'll revisit the example and add some pipeline parameters. I advise always to use defaults, so that the pipeline is triggerable without overridden parameters and doesn't crash. Also in the UI, the default parameters prepopulate the input boxes.

In [10]:
# Adding one parameter to this step. This is typically how you pass parameters into containers
def echo_op_with_params(hello_to_who):
    return dsl.ContainerOp(
        name='echo',
        image='library/bash:4.4.23',
        command=['sh', '-c'],
        arguments=[f'echo "hello {hello_to_who}"']
    )


@dsl.pipeline(
    name='Parameters pipeline',
    description='Adds parameters to the basic pipeline.'
)
def hello_world_pipeline_3(hello_to_who="Sherlock Holmes"):
    # Passing the parameter to the step
    echo_task = echo_op_with_params(hello_to_who)

In [11]:
client.create_run_from_pipeline_func(hello_world_pipeline_3, arguments={"hello_to_who": "Watson"})

RunPipelineResult(run_id=51277e5d-1e8e-4c41-adc2-43bfe5437d94)

Next, it is also important to demonstrate how we can upload pipelines to kubeflow. As you may have noticed, the pipelines before executed fine on kubeflow, but in the "Pipelines" section, the definition of the pipeline is not available for a manual trigger for example. For that to happen, we compile the pipeline first and we'll upload the YAML to kubeflow:

In [12]:
kfp.compiler.Compiler().compile(hello_world_pipeline_3, 'hello_world_pipeline_3.yaml')

Upload the ```hello_world_pipeline_3.yaml``` file to kubeflow (from the "Pipelines" section, the "Upload" button top right). After it is uploaded, you can do a "Create Run" for it, which will trigger the pipeline with the supplied parameters. Have a look around the UI to familiarize yourself further.

Play around a bit more, familiarize more, you now know the basis of kubeflow pipelines.

End of first tutorial.