![tracker](https://us-central1-statmike-mlops-349915.cloudfunctions.net/tracking-pixel?path=statmike%2Fvertex-ai-mlops%2FMLOps&file=Vertex+AI+Pipelines+-+Control.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20Control.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20Control.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20Control.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20Control.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Vertex AI Pipelines - Control 

[Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) is a serverless  runner for Kubeflow Pipelines [(KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) and the [TensorFlow Extended (TFX)](ttps://www.tensorflow.org/tfx/guide/understanding_tfx_pipelines) framework.

Components are used to runs the steps of a pipelines.  A pipeline task runs the component with inputs and results in the components outputs.  The components execute code on compute with a container image.

This notebook will focus on controlling the flow of task exectuion within a pipeline:
- Order
- Conditional Execution: if, elif (else if), and else
- Looping
- Exit Handling

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [2]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [3]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [4]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('kfp', 'kfp')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [5]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [6]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [7]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [8]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline-control'
SERIES = 'mlops'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [9]:
import os
import time
import importlib
from google.cloud import aiplatform
import kfp
from typing import NamedTuple

Clients

In [10]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [11]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [12]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [13]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Example Components

Components that:
- turn any integer into a letter (`number_to_letter`)
- turn any letter into a number (`letter_to_number`)
- generate a random coin flip of 'H' or 'T' (`flip_coin`)

In [15]:
@kfp.dsl.component(base_image = 'python:3.10')
def number_to_letter(number: int) -> str:
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    return alphabet[(number-1) % 26]

@kfp.dsl.component(base_image = 'python:3.10')
def letter_to_number(letter: str) -> int:
    alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    return alphabet.index(letter.upper()) + 1

@kfp.dsl.component(base_image = 'python:3.10')
def flip_coin() -> str:
    import random
    flip = random.randint(0, 1)
    flipmap = ['T', 'H']
    return flipmap[flip]

## Function To Run Pipeline

In [16]:
def pipeline_runner(pipeline_func, pipeline_name):
    
    # compile the pipeline
    kfp.compiler.Compiler().compile(
        pipeline_func = pipeline_func,
        package_path = f'{DIR}/{pipeline_name}.yaml'
    )
    
    # create pipeline job
    pipeline_job = aiplatform.PipelineJob(
        display_name = f"{pipeline_name}",
        template_path = f"{DIR}/{pipeline_name}.yaml",
        pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    )
    
    # submit pipeline job
    response = pipeline_job.submit(
        service_account = SERVICE_ACCOUNT
    )
    
    # wait on pipeline job
    pipeline_job.wait()
    
    # return pipeline job
    return pipeline_job

---
## Ordering Tasks: DAG

The outputs of components are used as inputs to other components forcing an order of operations.  All of the `task_1*` tasks run at the same time as they have no dependencies.

In [18]:
pipeline_name = f"{SERIES}-{EXPERIMENT}-order-dag"
pipeline_name

'mlops-pipeline-control-order-dag'

In [43]:
@kfp.dsl.pipeline(name = pipeline_name)
def order_pipeline_dag():
    
    task_1a = number_to_letter(number = 1)
    task_1b = number_to_letter(number = 2)
    task_1c = number_to_letter(number = 3)
    
    task_2a = letter_to_number(letter = task_1a.output)
    task_2b = letter_to_number(letter = task_1b.output)
    task_2c = letter_to_number(letter = task_1c.output)

In [44]:
pipeline_job = pipeline_runner(order_pipeline_dag, pipeline_name)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-20240326231325
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-20240326231325')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-control-order-dag-20240326231325?project=1026793852137
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-20240326231325 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-20240326231325 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-20240326231325 current state:


In [46]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-control-order-dag-20240326231325?project=1026793852137


In [19]:
aiplatform.get_pipeline_df(pipeline = pipeline_name)

Unnamed: 0,pipeline_name,run_name,param.vmlmd_lineage_integration
0,mlops-pipeline-control-order-dag,mlops-pipeline-control-order-dag-20240326231325,{'pipeline_run_component': {'location_id': 'us...


<p><center>
    <img alt="Order DAG" src="../architectures/notebooks/mlops/order-dag.png" width="85%">
</center><p>

---
## Ordering Tasks: DAG + Explicit Dependency

The outputs of components are used as inputs to other components forcing an order of operations. 

The `task_1*` component do not have any input dependencies and by default run at the same time - as seen above.  Using the `.after()` method - [reference](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask.after) - allow sfor explicit depency on another task.  The pipeline below uses `.after()` to force the order of the `task_1a`, then `task_1b`, then `task_1c`.

In [20]:
pipeline_name = f"{SERIES}-{EXPERIMENT}-order-dag-explicit"
pipeline_name

'mlops-pipeline-control-order-dag-explicit'

In [21]:
@kfp.dsl.pipeline(name = pipeline_name)
def order_pipeline_dag_explicit():
    
    task_1a = number_to_letter(number = 1)
    task_1b = number_to_letter(number = 2).after(task_1a)
    task_1c = number_to_letter(number = 3).after(task_1b)
    
    task_2a = letter_to_number(letter = task_1a.output)
    task_2b = letter_to_number(letter = task_1b.output)
    task_2c = letter_to_number(letter = task_1c.output)

In [22]:
pipeline_job = pipeline_runner(order_pipeline_dag_explicit, pipeline_name)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-20240326235331
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-20240326235331')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-control-order-dag-explicit-20240326235331?project=1026793852137
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-20240326235331 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-20240326235331 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-con

In [23]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-control-order-dag-explicit-20240326235331?project=1026793852137


In [24]:
aiplatform.get_pipeline_df(pipeline = pipeline_name)

Unnamed: 0,pipeline_name,run_name,param.vmlmd_lineage_integration
0,mlops-pipeline-control-order-dag-explicit,mlops-pipeline-control-order-dag-explicit-2024...,{'pipeline_run_component': {'location_id': 'us...


<p><center>
    <img alt="Order DAG - Explicit" src="../architectures/notebooks/mlops/order-dag-explicit.png" width="85%">
</center><p>

## Conditional Execution

There are three conditions that can be used to control the flow of pipleines: `kfp.dsl.If`, `kfp.dsl.Elif`, and `kfp.dsl.Else`.  [Reference](https://www.kubeflow.org/docs/components/pipelines/v2/pipelines/control-flow/#conditions-dslif-dslelif-dslelse)


In [25]:
pipeline_name = f"{SERIES}-{EXPERIMENT}-order-dag-explicit-condition"
pipeline_name

'mlops-pipeline-control-order-dag-explicit-condition'

In [41]:
@kfp.dsl.pipeline(name = pipeline_name)
def order_pipeline_dag_explicit_condition():
    
    task_1a = number_to_letter(number = 1)
    task_1b = number_to_letter(number = 2).after(task_1a)
    task_1c = number_to_letter(number = 3).after(task_1b)
    
    flip = flip_coin()
    
    with kfp.dsl.If(flip.output == 'H', name = 'Heads?'):
        with kfp.dsl.If(flip.output == 'H', name = 'Heads?'):
            task_2a = letter_to_number(letter = task_1a.output)
        with kfp.dsl.Else(name = 'Not Heads'):
            task_2b = letter_to_number(letter = task_1b.output)       
    with kfp.dsl.Elif(flip.output == 'T', name = 'Tails'):
        task_2c = letter_to_number(letter = task_1c.output)

In [42]:
pipeline_job = pipeline_runner(order_pipeline_dag_explicit_condition, pipeline_name)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-condition-20240327003324
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-condition-20240327003324')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-control-order-dag-explicit-condition-20240327003324?project=1026793852137
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-condition-20240327003324 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-control-order-dag-explicit-condition-20240327003324


In [43]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-control-order-dag-explicit-condition-20240327003324?project=1026793852137


In [44]:
aiplatform.get_pipeline_df(pipeline = pipeline_name)

Unnamed: 0,pipeline_name,run_name,param.vmlmd_lineage_integration
0,mlops-pipeline-control-order-dag-explicit-cond...,mlops-pipeline-control-order-dag-explicit-cond...,{'pipeline_run_component': {'location_id': 'us...
1,mlops-pipeline-control-order-dag-explicit-cond...,mlops-pipeline-control-order-dag-explicit-cond...,{'pipeline_run_component': {'project_id': 'sta...


<p><center>
    <img alt="Order DAG - Explicit - Conditions" src="../architectures/notebooks/mlops/order-dag-explicit-condition.png" width="85%">
</center><p>