# Kubeflow Pipeline

If you have Kubeflow installed, you can create a Kubeflow pipeline using MLRun. Make sure to first run the [**mlrun-code notebook**](./mlrun-code.ipynb) to create the relevant MLRun functions created by the mlrun-code notebook. The code below will load the functions, create a workflow and execute it.

### Create Project Object

In [1]:
project_name_base = 'nyc-taxi'

In [4]:
from os import path
import mlrun
project_path = path.abspath('conf')
project = mlrun.new_project(project_name_base,
                            context=project_path,
                            init_git=True,
                            user_project=True)

project.set_function(f'db://{project.name}/taxi')
project.set_function(f'db://{project.name}/model-serving')

<mlrun.runtimes.serving.ServingRuntime at 0x7fad19d97850>

### Create the Workflow

In [5]:
%%writefile {path.join(project_path, 'workflow.py')}
from kfp import dsl
from mlrun.platforms import auto_mount

funcs = {}
taxi_records_csv_path = 'https://s3.wasabisys.com/iguazio/data/Taxi/yellow_tripdata_2019-01_subset.csv'
zones_csv_path = 'https://s3.wasabisys.com/iguazio/data/Taxi/taxi_zones.csv'

# init functions is used to configure function resources and local settings
def init_functions(functions: dict, project=None, secrets=None):
    for f in functions.values():
        f.apply(auto_mount())

@dsl.pipeline(
    name="NYC Taxi Demo",
    description="Convert ML script to MLRun"
)

def kfpipeline():
    
    # build our ingestion function (container image)
    builder = funcs['taxi'].deploy_step(skip_deployed=True)
    
    # run the ingestion function with the new image and params
    ingest = funcs['taxi'].as_step(
        name="fetch_data",
        handler='fetch_data',
        image=builder.outputs['image'],
        inputs={'taxi_records_csv_path': taxi_records_csv_path,
                'zones_csv_path': zones_csv_path},
        outputs=['nyc-taxi-dataset', 'zones-dataset'])

    # Join and transform the data sets 
    transform = funcs["taxi"].as_step(
        name="transform_dataset",
        handler='transform_dataset',
        inputs={"taxi_records_csv_path": ingest.outputs['nyc-taxi-dataset'],
                "zones_csv_path" : ingest.outputs['zones-dataset']},
        outputs=['nyc-taxi-dataset-transformed'])

    # Train the model
    train = funcs["taxi"].as_step(
        name="train",
        handler="train_model",
        inputs={"input_ds" : transform.outputs['nyc-taxi-dataset-transformed']},
        outputs=['FareModel'])
    
    # Deploy the model
    deploy = funcs["model-serving"].deploy_step(models={"taxi-serving_v1": train.outputs['FareModel']}, tag='v2')

Writing /User/hackathon/howto/converting-to-mlrun/conf/workflow.py


In [6]:
project.set_workflow('main', 'workflow.py', embed=True)

In [7]:
project.save()

### Run the Workflow

In [8]:
artifact_path = path.abspath('./pipe/{{workflow.uid}}')
run_id = project.run(
    'main',
    arguments={}, 
    artifact_path=artifact_path, 
    dirty=True,
    watch=True)

> 2021-05-10 08:28:00,861 [info] using in-cluster config.


> 2021-05-10 08:28:01,344 [info] Pipeline run id=23c8bfa3-c82e-4b33-8f85-8746dd23a144, check UI or DB for progress
> 2021-05-10 08:28:01,344 [info] waiting for pipeline run completion


uid,start,state,name,results,artifacts
...bf76fcec,May 10 08:30:17,completed,train,,FareModel
...dba48239,May 10 08:28:55,completed,transform_dataset,,nyc-taxi-dataset-transformed
...c2d80e3c,May 10 08:28:18,completed,fetch_data,,nyc-taxi-datasetzones-dataset
