# Kubeflow Lightweight Components

We will use this notebook to explore the concept of lightweight components that are used in building Kubeflow pipelines. This will allow us to explore Kubeflow constructs and deployment mechanisms without having to change anything about out existing code bases. Once we finish with this exploratory example, we can then port some of our other efforts over to Kubeflow pipelines.

First we will need to ensure that the SDK is installed on the jupyter instance that we've deployed in our production Kubeflow cluster.

We will be borrowing and extending the example [Lightweight Components](https://github.com/kubeflow/pipelines/blob/master/samples/core/lightweight_component/Lightweight%20Python%20components%20-%20basics.ipynb) example contained within the [Kubeflow Pipelines repo](https://github.com/kubeflow/pipelines).

In [1]:
!pip3 install kfp --upgrade

Collecting kfp
[?25l  Downloading https://files.pythonhosted.org/packages/f7/7f/00cbe74e1f2e89ca606dc1662828036613d1396e703f95d1fec538301938/kfp-0.1.25.tar.gz (76kB)
[K    100% |████████████████████████████████| 81kB 4.2MB/s ta 0:00:01
Collecting PyJWT>=1.6.4 (from kfp)
  Downloading https://files.pythonhosted.org/packages/87/8b/6a9f14b5f781697e51259d81657e6048fd31a113229cf346880bb7545565/PyJWT-1.7.1-py2.py3-none-any.whl
Collecting requests_toolbelt>=0.8.0 (from kfp)
[?25l  Downloading https://files.pythonhosted.org/packages/60/ef/7681134338fc097acef8d9b2f8abe0458e4d87559c689a8c306d0957ece5/requests_toolbelt-0.9.1-py2.py3-none-any.whl (54kB)
[K    100% |████████████████████████████████| 61kB 9.0MB/s eta 0:00:01
Collecting kfp-server-api<0.1.19,>=0.1.18 (from kfp)
  Downloading https://files.pythonhosted.org/packages/3e/24/a82ae81487bf61fb262e67167cee1843f2f70d940713c092b124c9aaa0dc/kfp-server-api-0.1.18.3.tar.gz
Collecting argo-models==2.2.1a (from kfp)
  Downloading https://files.

In [1]:
import kfp.components as comp

As pointed out in the example above there are some scoping limitations that we will need to pay attention to because we are working to use this Jupyter notebook as an example of creating Kubeflow a [Kubeflow container op](https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.ContainerOp)

Below we're using a trivial example of a method that sums up two `float`s. ***It is important to note that if a function operates on numbers, like the one below, the parameters _need_ to have type hints, like we've shown below.*** These supported numbers types are `[float, int, bool]`. Everything else needs to be passed as a string.

This is a nice example of the use cases for type hints in python.

In [2]:
def add(a: float, b: float) -> float:
    '''
    Add two floats and return a float
    '''
    return a + b

In [3]:
add_op = comp.func_to_container_op(add)

In [6]:
from typing import NamedTuple

def my_divmod(dividend: float, divisor: float, output_dir:str = './') -> NamedTuple('MyDivModOutput', [('quotient', float), ('remainder', float)]):
    """
    simple method to divide numbers and return the quotient
    """
    
    import numpy as np
    
    def divmod_helper(dividend, divisor):
        return np.divmod(dividend, divisor)
    
    (quotient, remainder) = divmod_helper(dividend, divisor)
    
    from tensorflow.python.lib.io import file_io
    import json
    
    metadata = {
        'outputs' : [{
            'type': 'tensorboard',
            'source': 'gs://ml-pipeline-dataset/tensorboard-train',
            }]
        }
    
    with open(output_dir + "mlpipeline-ui-metadata.json", 'w') as f:
        json.dump(metadata, f)
    
    metrics = {
        'metrics': [{
            'name': 'quotient',
            'numberValue': float(quotient),
        },{
            'name': 'remainder',
            'numberValue': float(remainder),
        }]}
    
    with file_io.FileIO(output_dir + 'mlpipeline-metrics.json', 'w') as f:
        json.dump(metrics, f)
    
    from collections import namedtuple
    divmod_output = namedtuple('MyDivModOutput', ['quotient', 'remainder'])
    return divmod_output(quotient, remainder)

In [7]:
my_divmod(100, 7)

MyDivModOutput(quotient=14, remainder=2)

Now lets do a quick conversion to a pipeline operator

In [8]:
divmod_op = comp.func_to_container_op(my_divmod, base_image='tensorflow/tensorflow:1.13.1-py3')

In [9]:
import kfp.dsl as dsl
@dsl.pipeline(
    name='Calculation pipeline',
    description='Toy example for showing the use of Kubeflow Pipelines'
)

def calc_pipeline(a='a',b='7',c='17'):
    
    add_task = add_op(a, 4)

    divmod_task = divmod_op(add_task.output, b, '/')
    
    result_task = add_op(divmod_task.outputs['quotient'], c)

In [10]:
pipeline_func = calc_pipeline
pipeline_filename = pipeline_func.__name__ = ".pipeline.zip"
import kfp.compiler as compiler
compiler.Compiler().compile(pipeline_func, pipeline_filename)

In [11]:
arguments = {'a': '7', 'b': '8'}

EXPERIMENT_NAME = 'TEST_PIPELINE'
import kfp
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)

run_name = pipeline_func.__name__ + ' run'
run_result = client.run_pipeline(experiment.id, run_name, pipeline_filename, arguments)
