![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2FMLOps&dt=Vertex+AI+Pipelines+-+Components.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Vertex AI Pipelines - Components 

[Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) is a serverless  runner for Kubeflow Pipelines [(KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) and the [TensorFlow Extended (TFX)](ttps://www.tensorflow.org/tfx/guide/understanding_tfx_pipelines) framework.

Components are used to runs the steps of a pipelines.  A pipeline task runs the component with inputs and results in the components outputs.  The components runs code on compute with a container image.

This notebook will focus on the different types of component construction:
- [Pre-built Google Cloud Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction)
- [Custom KFP Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/)
    - Python Components:
        - Lightweight Python Components
        - Containerized Python Components
    - Arbitrary Containers:
        - Container Components
    - Importer Components
        - A provided importer for artifact created prior to the pipeline

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [5]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.artifactregistry_v1', 'google-cloud-artifact-registry'),
    ('kfp', 'kfp'),
    ('google_cloud_pipeline_components', 'google-cloud-pipeline-components'),
    ('docker', 'docker')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [6]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable artifactregistry.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [7]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [8]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline-components'
SERIES = 'mlops'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [39]:
import os
import time
from google.cloud import aiplatform
from google.cloud import storage
from google.cloud import artifactregistry_v1
import kfp
from typing import NamedTuple
import docker

Clients

In [10]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

# gcs storage client
gcs = storage.Client(project = PROJECT_ID)

# artifact registry client
ar_client = artifactregistry_v1.ArtifactRegistryClient()

Docker Check:

In [18]:
docker_client = docker.from_env()

if docker_client.ping():
    print(f"Docker is installed and running. Version: {docker_client.version()['Version']}")
else:
    print('Docker is either not installed or not running - please fix before proceeding.')

Docker is installed and running. Version: 20.10.17


parameters:

In [19]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [20]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [13]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Components

In [74]:
from google_cloud_pipeline_components.v1.model import ModelGetOp
from google_cloud_pipeline_components.types import artifact_types

### Prebuilt Google Cloud Pipeline Components

Google Cloud provides a growing list of components covering AutoML, Batch Prediction, BigQuery Ml, .... and MANY more services!  

The benefits of a prebuilt components include:
- simple debugging
- standarized artifact types that are tracked with Vertex AI ML Metadata.  Ths makes lineage easy!
- These don't have to launch a container to then launch a service - which is more cost effective!

These can be reviewed several ways:
- Directly in the documentation: [Google Cloud Pipeline Components List](https://cloud.google.com/vertex-ai/docs/pipelines/gcpc-list)
- At their accompanying [API Reference](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.10.0/api/index.html)
- At their source in the GitHub repository for kubeflow pipeliens: [GitHub/kubeflow/pipelines/components/google-cloud](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud)

These prebuilt component also include prebuilt artifact types for Google Cloud Resources:
- [Google Cloud Artifact Types](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.10.0/api/artifact_types.html)

Here, the `ModelGetOp` component is used to retrieve an artifact for a model in the Vertex AI Model Registry.

```python
from google_cloud_pipeline_components.v1.model import ModelGetOp

    vertex_model_1 = ModelGetOp(
        model_name = model_name.outputs['model_name'],
        project = project,
        location = region
    ).set_display_name('Prebuilt Component')
```

### Lightweight Python Components

A simple way to create a component from a Python function.  This will create the container a the runtime of taks from the `base_image` and install the `packages_to_innstall`.

References:
- [Lightweight Python Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/lightweight-python-components/)

Here, a simple function will using the Vertex AI SDK to retrieve a list of all models and pass the the versioned resource name of the first one as an ouput.

In [64]:
@kfp.dsl.component(
    base_image = 'python:3.10',
    packages_to_install = ['google-cloud-aiplatform']
)
def example_lightweight(
    project: str,
    region: str
) -> NamedTuple('lightweight_outputs', model_name = str, model_resource_name = str, uri = str):
    
    # vertex ai client
    from google.cloud import aiplatform
    aiplatform.init(project = project, location = region)
    
    # list models in region
    models = aiplatform.Model.list()
    
    outputs = NamedTuple('lightweight_outputs', model_name = str, model_resource_name = str, uri = str)
    
    return outputs(
        models[0].versioned_resource_name.split('/')[-1],
        models[0].versioned_resource_name,
        f"https://{region}-aiplatform.googleapis.com/v1/{models[0].versioned_resource_name}"
    )

### Containerized Python Components

This extends the idea of the lightweight Python components.  This will actually build the container for the compoent and install the `packages_to_install` so that they are already installed at the time it runs.

### Container Components

Any container can be a component with [Container Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/container-components/).  

This looks a lot like a lightweight Python component but it orchestrates the running of the specified container image with inputs and ouputs.

The example below takes the [alpine docker image](https://hub.docker.com/_/alpine) and runs a simple command that take the `model_resource_name` of an artifact created by the Importer Components (below) and print it out as part of a string which is returned as an output.  

In [76]:
@kfp.dsl.container_component
def example_container(
    vertex_model_a: kfp.dsl.Input[artifact_types.VertexModel],
    note: kfp.dsl.OutputPath(str)
):
    return kfp.dsl.ContainerSpec(
        image = 'alpine',
        command = [
            'sh', '-c', '''RESPONSE="The Model is: $0!"\
                            && echo $RESPONSE\
                            && mkdir -p $(dirname $1)\
                            && echo $RESPONSE > $1
                            '''
        ],
        args = [vertex_model_a.metadata['model_resource_name'], note]
    )

### Importer Components

Sometime the artifact that is need is created before the pipeline.  The `dsl.importer` component is a quick way to import the artifact. [More on the Importer Component](https://www.kubeflow.org/docs/components/pipelines/v2/components/importer-component/).

While the `dsl.importer` component can be used to import [generic artifacts](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/) it can also be used to import predefined [Google Cloud Artifact Types](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.10.0/api/artifact_types.html) as shown in the Vertex AI documentation page for [Create an ML artifact](https://cloud.google.com/vertex-ai/docs/pipelines/use-components#use_an_importer_node).

Here, the `dsl.importer` component is used to load a model in the Vertex AI Model Registry.

```python
vertex_model_2 = kfp.dsl.importer(
        artifact_uri = model_name.outputs['uri'],
        artifact_class = artifact_types.VertexModel,
        metadata = {'model_resource_name': model_name.outputs['model_resource_name']}
    )
```

---
## Vertex AI Pipelines

- [Vertex AI Python SDK for Pipeline Jobs](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob)
- [Specify machine configurations for a component](https://cloud.google.com/vertex-ai/docs/pipelines/machine-types)

### Create Pipeline

In [77]:
@kfp.dsl.pipeline(
    name = f'{SERIES}-{EXPERIMENT}',
    description = 'A simple pipeline for testing',
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root'
)
def example_pipeline(
    project: str,
    region: str,
    
):
    from google_cloud_pipeline_components.types import artifact_types
    from google_cloud_pipeline_components.v1.model import ModelGetOp
    
    # Lightweight Python Components
    model_name = example_lightweight(
        project = project,
        region = region
    ).set_display_name('Lightweight Python Component').set_cpu_limit('1')
    
    # prebuilt Google Cloud Pipeline Component
    vertex_model_1 = ModelGetOp(
        model_name = model_name.outputs['model_name'],
        project = project,
        location = region
    ).set_display_name('Prebuilt Component')
    
    # importer component
    vertex_model_2 = kfp.dsl.importer(
        artifact_uri = model_name.outputs['uri'],
        artifact_class = artifact_types.VertexModel,
        metadata = {'model_resource_name': model_name.outputs['model_resource_name']}
    ).set_display_name('Importer Component')
    
    # container component
    container = example_container(
        vertex_model_a = vertex_model_2.outputs['artifact']
    )
    

### Compile Pipeline

In [78]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml'
)

### Create Pipeline Job

In [79]:
parameters = dict(
    project = PROJECT_ID,
    region = REGION,
)

In [80]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = parameters,
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

### Submit Pipeline Job

In [81]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240316211748
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240316211748')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-components-20240316211748?project=1026793852137


In [82]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-components-20240316211748?project=1026793852137


In [83]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240316211748 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240316211748 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240316211748


### Retrieve Pipeline Information

In [None]:
aiplatform.get_pipeline_df(pipeline = f'{SERIES}-{EXPERIMENT}')

In [None]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [None]:
for task in tasks:
  print(task, tasks[task].state)

In [None]:
for task in tasks:
    print(task)

In [45]:
tasks['example-number']

task_id: 6698494285479673856
task_name: "example-number"
create_time {
  seconds: 1709846982
  nanos: 866863000
}
start_time {
  seconds: 1709846983
  nanos: 483675000
}
end_time {
  seconds: 1709847085
  nanos: 815256000
}
executor_detail {
  container_detail {
    main_job: "projects/1026793852137/locations/us-central1/customJobs/2970989201182425088"
  }
}
state: SUCCEEDED
execution {
  name: "projects/1026793852137/locations/us-central1/metadataStores/default/executions/17843432237847752265"
  display_name: "example-number"
  state: COMPLETE
  etag: "1709847085771"
  create_time {
    seconds: 1709846983
    nanos: 143000000
  }
  update_time {
    seconds: 1709847085
    nanos: 771000000
  }
  schema_title: "system.ContainerExecution"
  schema_version: "0.0.1"
  metadata {
    fields {
      key: "input:number"
      value {
        number_value: 34.2
      }
    }
    fields {
      key: "output:Output"
      value {
        number_value: 44.2
      }
    }
    fields {
      key:

---
## More!

Want to schedule a pipeline like this? Check out this workflow:
- [Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)