# Managed Pipelines Experimental: KFP SDK introduction

This notebook shows an example of building and running a simple pipeline on [Managed Pipelines Experimental](https://docs.google.com/document/d/1JXtowHwppgyghnj1N1CT73hwD1caKtWkLcm2_0qGBoI/edit?ts=5f90dcea#heading=h.p4rp2vtz67w2), using the [Kubeflow Pipelines (KFP) SDK](https://www.kubeflow.org/docs/pipelines/) 

It shows how to to construct *function-based components* — pipeline components defined from Python function definitions— and how to specify a pipeline using those components, then launch a pipeline run from the notebook.


## Setup

Before you run this notebook, ensure that your Google Cloud user account and project are granted access to the Managed Pipelines Experimental. To be granted access to the Managed Pipelines Experimental, fill out this [form](http://go/cloud-mlpipelines-signup) and let your account representative know you have requested access. 

This notebook is intended to be run on either one of:
* [AI Platform Notebooks](https://cloud.google.com/ai-platform-notebooks). See the "AI Platform Notebooks" section in the Experimental [User Guide](https://docs.google.com/document/d/1JXtowHwppgyghnj1N1CT73hwD1caKtWkLcm2_0qGBoI/edit?usp=sharing) for more detail on creating a notebook server instance.
* [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb)


**To run this notebook on AI Platform Notebooks**, click on the **File** menu, then select "Download .ipynb".  Then, upload that notebook from your local machine to AI Platform Notebooks. (In the AI Platform Notebooks left panel, look for an icon of an arrow pointing up, to upload).

We'll first install some libraries and set up some variables.


Set `gcloud` to use your project.  **Edit the following cell before running it**.

In [None]:
PROJECT_ID = 'your-project-id'  # <---CHANGE THIS

In [None]:
!gcloud config set project {PROJECT_ID}

If you're running this notebook on colab, authenticate with your user account:

In [None]:
import sys
if 'google.colab' in sys.modules:
  from google.colab import auth
  auth.authenticate_user()

-----------------

**If you're on AI Platform Notebooks**, authenticate with Google Cloud before running the next section, by running
```sh
gcloud auth login
```
**in the Terminal window** (which you can open via **File** > **New** in the menu). You only need to do this once per notebook instance.

### Install the KFP SDK and AI Platform Pipelines client library

For Managed Pipelines Experimental, you'll need to download a special version of the AI Platform client library.

In [None]:
!gsutil cp gs://cloud-aiplatform-pipelines/aiplatform_pipelines_client-0.1.0.caip20201109-py3-none-any.whl .

Then, install the libraries and restart the kernel.

In [None]:
if 'google.colab' in sys.modules:
  USER_FLAG = ''
else:
  USER_FLAG = '--user'

In [None]:
!python3 -m pip install {USER_FLAG} kfp==1.1.1 --upgrade
!python3 -m pip install {USER_FLAG} aiplatform_pipelines_client-0.1.0.caip20201109-py3-none-any.whl --upgrade

In [None]:
if not 'google.colab' in sys.modules:
  # Automatically restart kernel after installs
  import IPython
  app = IPython.Application.instance()
  app.kernel.do_shutdown(True)

The KFP version should be == 1.1.1.



In [None]:
# Check the KFP version
!python3 -c "import kfp; print('KFP version: {}'.format(kfp.__version__))"

### Set some variables

**Before you run the next cell**, **edit it** to set variables for your project.  See the "Before you begin" section of the User Guide for information on creating your API key.  For `BUCKET_NAME`, enter the name of a Cloud Storage (GCS) bucket in your project.  Don't include the `gs://` prefix.

In [None]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

# Required Parameters
USER = 'your-user-name' # <---CHANGE THIS
BUCKET_NAME = 'your-bucket-name'  # <---CHANGE THIS
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(BUCKET_NAME, USER)

PROJECT_ID = 'your-project-id'  # <---CHANGE THIS
REGION = 'us-central1'
API_KEY = 'your-api-key'  # <---CHANGE THIS

print('PIPELINE_ROOT: {}'.format(PIPELINE_ROOT))

## Define a simple pipeline

Now we'll define a very simple pipeline. It takes one input parameter and has one step.

### Create a function-based pipeline component

We'll first create a component based on a very simple python function. It takes a string input parameter and returns that value as output.

In [None]:
from kfp.v2 import components

def hello_world(text: str):
    print(text)
    return text

components.func_to_container_op(hello_world,
      output_component_file='hw.yaml')


Next, we'll define a one-step pipeline that uses that component.
The pipeline takes an input parameter, and passes that parameter as an argument to the pipeline step.

In [None]:
from kfp.v2 import dsl
from kfp.v2 import compiler
from kfp.v2 import components


# Create a pipeline op from the component we defined above.
hw_op = components.load_component_from_file('./hw.yaml') # you can also use load_component_from_url

@dsl.pipeline(
  name='hello-world',
  description='A simple intro pipeline'
)
def pipeline_parameter_to_consumer(text: str='hi there'):
    '''Pipeline that passes small pipeline parameter string to consumer op'''
    consume_task = hw_op(text) # Passing pipeline parameter as argument to consumer op
    
pipeline_func = pipeline_parameter_to_consumer

Compile the pipeline:

In [None]:
compiler.Compiler().compile(pipeline_func=pipeline_func, 
                            pipeline_root=PIPELINE_ROOT,
                            output_path='hw_pipeline_job.json')

## Submit the pipeline job

Here, we'll create an API client using the API key you generated.

Then, we'll submit the pipeline job by passing the compiled spec to the `create_run_from_job_spec()` method. Note that we're passing a `parameter_values` dict that specifies the pipeline input parameters we want to use.

In [None]:
from aiplatform.pipelines import client

api_client = client.Client(project_id=PROJECT_ID, region=REGION, api_key=API_KEY)

response = api_client.create_run_from_job_spec(
          job_spec_path='hw_pipeline_job.json',
          # pipeline_root=PIPELINE_ROOT,  # optional- use if want to override compile-time value
          parameter_values={'text': 'Hello world!'})

### Monitor the pipeline run in the Cloud Console

Once you've deployed the pipeline run, you can monitor it in the [Cloud Console](https://console.cloud.google.com/ai/platform/pipelines) under **AI Platform (Unified)** > **Pipelines**. 

Click in to the pipeline run to see the run graph (for our simple pipeline, this consists of one step), and click on a step to view the job detail and the logs for that step.

### Submit a pipeline job via the Cloud Console

You can also submit pipeline jobs directly through the Cloud Console.  To see this, download the `hw_pipeline_job.json` file generated by the compilation.

In the Console, click **CREATE RUN**.  

<a href="https://storage.googleapis.com/amy-jo/images/kf-pls/create_run.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/kf-pls/create_run.png" width="40%"/></a>

Upload `hw_pipeline_job.json` and give your run a name (this must be unique).  

<a href="https://storage.googleapis.com/amy-jo/images/kf-pls/run_details.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/kf-pls/run_details.png" width="50%"/></a>

You can enter a input parameter value if you like.

<a href="https://storage.googleapis.com/amy-jo/images/kf-pls/run_parameters.png" target="_blank"><img src="https://storage.googleapis.com/amy-jo/images/kf-pls/run_parameters.png" width="50%"/></a>

Then click **SUBMIT**.  You'll see your new pipeline run start up in the Console.

## What next?

Next, try out notebooks with more complex pipelines.  These pipelines show how to pass data between components and build more complex components.

- a simple KFP example that [shows how data can be passed between pipeline steps](https://colab.research.google.com/drive/1NztsGV-FAp71MU7zfMHU0SlfQ8dpw-9u).
- a KFP example that [shows building custom components for data processing and training](https://colab.research.google.com/drive/1CV5SgrhRp0bgJcFKGc0G5oWwGTHc7bqt?usp=sharing). It also shows how to pass typed artifact data between components,  how to specify required resources when defining a pipeline, and how to query a pipeline's metadata.
- A TFX notebook that [shows the canonical 'Chicago taxi' example](https://colab.research.google.com/drive/1dNLlm21F6f5_4aeIg-Zs_F1iGGRPEvhW), and how to use custom Python functions and custom containers.

-----------------------------
Copyright 2020 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.