# Python function-based components example

A Kubeflow Pipelines component is a self-contained set of code that performs one step in your ML workflow. 

Python function-based components make it easier to iterate quickly by letting you build your component code as a Python function and generating the component specification for you.

## Setup

Before you run this notebook, ensure that your Google Cloud user account and project are granted access to the Managed Pipelines Experimental. To be granted access to the Managed Pipelines Experimental, fill out this [form](http://go/cloud-mlpipelines-signup) and let your account representative know you have requested access. 

This notebook is intended to be run on either one of:
* [AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks). See the "AI Platform Notebooks" section in the Experimental [User Guide](https://docs.google.com/document/d/1JXtowHwppgyghnj1N1CT73hwD1caKtWkLcm2_0qGBoI/edit?usp=sharing) for more detail on creating a notebook server instance.
* [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb)

We'll first install some libraries and set up some variables.


Set `gcloud` to use your project.  **Edit the following cell before running it**.

In [None]:
PROJECT_ID = 'your-project-id'  # <---CHANGE THIS

In [None]:
!gcloud config set project {PROJECT_ID}

If you're running this notebook on colab, authenticate with your user account:

In [None]:
import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

### Install the KFP SDK and AI Platform Pipelines client library

For Managed Pipelines Experimental, you'll need to download a special version of the AI Platform client library.

In [None]:
!gsutil cp gs://cloud-aiplatform-pipelines/aiplatform_pipelines_client-0.1.0.caip20201109-py3-none-any.whl .

Then, install the libraries and restart the kernel.

In [None]:
if 'google.colab' in sys.modules:
  USER_FLAG = ''
else:
  USER_FLAG = '--user'

In [None]:
!python3 -m pip install {USER_FLAG} kfp==1.1.1 --upgrade
!python3 -m pip install {USER_FLAG} aiplatform_pipelines_client-0.1.0.caip20201109-py3-none-any.whl --upgrade

In [None]:
if not 'google.colab' in sys.modules:
  # Automatically restart kernel after installs
  import IPython
  app = IPython.Application.instance()
  app.kernel.do_shutdown(True)

The KFP version should be == 1.1.1.



In [None]:
# Check the KFP version
!python3 -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

### Set some variables

**Before you run the next cell**, **edit it** to set variables for your project.  See the "Before you begin" section of the User Guide for information on creating your API key.  For `BUCKET_NAME`, enter the name of a Cloud Storage (GCS) bucket in your project.  Don't include the `gs://` prefix.

In [None]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

# Required Parameters
USER = 'your-user-name' # <---CHANGE THIS
BUCKET_NAME = 'your-bucket-name'  # <---CHANGE THIS
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(BUCKET_NAME, USER)

PROJECT_ID = 'your-project-id'  # <---CHANGE THIS
REGION = 'us-central1'
API_KEY = 'your-api-key'  # <---CHANGE THIS

print('PIPELINE_ROOT: {}'.format(PIPELINE_ROOT))


## Build components from Python functions


First, define your component's code as a standalone python function. Here, we're defining two functions: one to add two numbers, and one to subtract one number from another. 

The `add` function generates the sum as an output.
Note the use of the `OutputPath` type in the `add` definition. A GCS path will automatically be generated, and `add` will write the sum to that path.
The `sub` function does the same.

In [None]:
from kfp.v2.components import OutputPath

def add(a: float, 
        b: float,
        sum: OutputPath('Float')
):
  '''Calculates the sum of two arguments'''
  import logging
  import subprocess

  logging.getLogger().setLevel(logging.INFO)
  res = a + b
  logging.info('got sum result: %s', res)
  logging.info('sum path: %s', sum)

  with open('temp.txt', "w") as outfile:
    subprocess.run(['echo', str(res)], stdout=outfile)
  subprocess.run(['gsutil', 'cp', 'temp.txt', sum])


def sub(arg1: float, 
        arg2: float,
        diff: OutputPath('Float')
        ):
  '''Calculates the difference of two arguments'''
  import logging
  import subprocess
  
  logging.getLogger().setLevel(logging.INFO)
  logging.info("got arg1 %s and arg2 %s", arg1, arg2)
  res = arg1 - arg2
  logging.info('got diff result: %s', res)
  logging.info('diff path: %s', diff)

  with open('temp.txt', "w") as outfile:
    subprocess.run(['echo', str(res)], stdout=outfile)
  subprocess.run(['gsutil', 'cp', 'temp.txt', diff])  


Next, use `kfp.v2.components.create_component_from_func` to generate the component specification YAML and return a
    factory function that you can use to create `kfp.v2.dsl.ContainerOp` class instances for your pipeline.
    The component specification YAML is a reusable and shareable definition of your component.

We're using `google/cloud-sdk:latest` as the base image for the components so that they can access the `gsutil` command-line utility. 


In [None]:
import kfp.v2.components as comp

add_op = comp.create_component_from_func(
    add, output_component_file='add_component.yaml', base_image='google/cloud-sdk:latest')
sub_op = comp.create_component_from_func(
    sub, output_component_file='sub_component.yaml', base_image='google/cloud-sdk:latest')

## Create and run your pipeline.

Now we'll define a pipeline that uses these two components.



In [None]:
from kfp.v2 import dsl
from kfp.v2 import compiler
from kfp.v2 import components

@dsl.pipeline(
  name='arithmetic-pipeline',
  description='An example pipeline that performs arithmetic calculations.'
)
def calc_pipeline(
  a: float=10,
  b: float=7,
):
  add_task = add_op(a, 4)
  sub_task = sub_op(add_task.outputs['sum'], b)    


Compile the pipeline to create a job specification.

In [None]:
compiler.Compiler().compile(pipeline_func=calc_pipeline, 
                            pipeline_root=PIPELINE_ROOT,
                            output_path='fn_pipeline_job.json')

Create an instance of the AI Platform client and run the pipeline.

In [None]:
from aiplatform.pipelines import client

api_client = client.Client(project_id=PROJECT_ID, region=REGION, api_key=API_KEY)

response = api_client.create_run_from_job_spec(
          job_spec_path='fn_pipeline_job.json',
          # pipeline_root=PIPELINE_ROOT,  # optional- use if want to override compile-time value
          )

### Monitor the pipeline run in the Cloud Console

Once you've deployed the pipeline run, you can monitor it in the [Cloud Console](https://console.cloud.google.com/ai/platform/pipelines) under **AI Platform (Unified)** > **Pipelines**. 

Click in to the pipeline run to see the run graph (for our simple pipeline, this consists of one step), and click on a step to view the job detail and the logs for that step.

-----------------------------
Copyright 2020 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.