![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps&file=Vertex+AI+Pipelines+-+IO.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20IO.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FVertex%2520AI%2520Pipelines%2520-%2520IO.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20IO.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20IO.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Vertex AI Piplines - IO

[Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) is a serverless  runner for Kubeflow Pipelines [(KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) and the [TensorFlow Extended (TFX)](ttps://www.tensorflow.org/tfx/guide/understanding_tfx_pipelines) framework.

Components are used to run the steps of a pipelines.  A pipeline task runs the component with inputs and results in the components outputs.  The components execute code on compute with a container image.  And all the inputs and outputs are logged as pipeline metadata - automatically!

This notebook will focus on the different data types for inputs and outputs to components.

**Parameters** are Python objects like `str`, `int`, `float`, `bool`, `list`, `dict` objects that are defined as inputs to pipelines and components. Components can also return parameters for input into subsequent components. Paramters are excellent for changing the behavior of a pipeline/component through inputs rather than rewriting code.
- [KFP Parameters](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/parameters/)

**Artifacts** are multi-parameter objects that represent machine learning artifacts and have defined schemas and are stored as metadata with lineage.  The artifact schemas follow the [ML Metadata (MLMD)](https://github.com/google/ml-metadata) client library.  This helps with understanding and analyzing a pipeline.
- [KFP Artifacts](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/)
    - provided [artifact types](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/#artifact-types)
    - [Google Cloud Artifact Types](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/artifact_types.html)

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('kfp', 'kfp'),
    ('google_cloud_pipeline_components', 'google-cloud-pipeline-components'),
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline-io'
SERIES = 'mlops'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [8]:
import os
import time
import importlib
from google.cloud import aiplatform
import kfp
from typing import NamedTuple

In [9]:
kfp.__version__

'2.7.0'

In [10]:
aiplatform.__version__

'1.50.0'

Clients

In [11]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [12]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [13]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [14]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Parameters

**Parameters** are Python objects like `str`, `int`, `float`, `bool`, `list`, `dict` objects that are defined as inputs to pipelines and components. Components can also return parameters for input into subsequent components. Paramters are excellent for changing the behavior of a pipeline/component through inputs rather than rewriting code.
- [KFP Parameters](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/parameters/)

---
### Inputs and Output

An example pipeline that has all the types of input parameters and outputs a single parameter:

#### Create Pipeline Components

These are simple Python components, specifically lightweight Python components and container components.  For more details on the types of components check out this workflow in the same repository:
- [Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)

A simple lightweight python component with single input and single output:

In [130]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["pandas"]
)
def single_input(string: str) -> str:
    text = string
    return text

A lightweight python component with multiple inputs and a single output:

In [131]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["pandas"]
)
def multi_input(
    a_str: str,
    a_int: int,
    a_float: float,
    a_bool: bool,
    a_dict: dict,
    a_list: list
) -> list:
    text = [a_str, a_int, a_float, a_bool, a_dict, a_list]
    return [str(t) for t in text]

A container component with a single input and single output.  Note that this type of component using the [`kfp.dsl.OutputPath`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.OutputPath) to return the output to an object that looks like an input:
- The shell command for `mkdir` is used to create an output loation  with the `kfp.dsl.OutputPath` variable
- The `echo` command along with the `>` write to instruction are used to write values to the `kfp.dsl_OutputPath` using the directory created

In [132]:
@kfp.dsl.container_component
def io_container(
    in_str: str,
    out_str: kfp.dsl.OutputPath(str)
):
    return kfp.dsl.ContainerSpec(
        image = 'alpine',
        command = [
            'sh', '-c',  f'''mkdir -p $(dirname {out_str})\
                            && echo "echoing {in_str}" > {out_str}'''
        ]
    )

#### Create Pipeline

In [133]:
pipeline_name = f'{SERIES}-{EXPERIMENT}-parameter-io'

In [134]:
@kfp.dsl.pipeline(
    name = pipeline_name,
    description = 'A simple pipeline for testing',
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root'
)
def example_pipeline(
    a_str: str,
    a_int: int,
    a_float: float,
    a_bool: bool,
    a_dict: dict,
    a_list: list
) -> list:
    
    single_io = single_input(string = a_str)
    multi_i = multi_input(
        a_str = single_io.output,
        a_int= a_int,
        a_float= a_float,
        a_bool = a_bool,
        a_dict = a_dict,
        a_list = a_list   
    )
    container_io = io_container(in_str = single_io.output)
    
    return multi_i.output

#### Compile Pipeline

In [135]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{pipeline_name}.yaml'
)

#### Create Pipeline Job

In [136]:
parameters = dict(
    a_str = 'test string',
    a_int = 1,
    a_float = 1.2,
    a_bool = True,
    a_dict = dict(key = 45),
    a_list = [1, 2, 3]
)

In [137]:
pipeline_job = aiplatform.PipelineJob(
    display_name = pipeline_name,
    template_path = f"{DIR}/{pipeline_name}.yaml",
    parameter_values = parameters,
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

#### Submit Pipeline Job

In [138]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-io-parameter-io-20240507115117?project=1026793852137


In [139]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-io-parameter-io-20240507115117?project=1026793852137


In [140]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-io-20240507115117


#### Retrieve Pipeline Information

In [141]:
aiplatform.get_pipeline_df(pipeline = f'{pipeline_name}')

Unnamed: 0,pipeline_name,run_name,param.input:a_dict,param.input:a_list,param.input:a_str,param.input:a_float,param.input:a_bool,param.input:a_int,param.vmlmd_lineage_integration,param.output:Output
0,mlops-pipeline-io-parameter-io,mlops-pipeline-io-parameter-io-20240507115117,{'key': 45.0},"[1.0, 2.0, 3.0]",test string,1.2,True,1.0,{'pipeline_run_component': {'location_id': 'us...,"[test string, 1, 1.2, True, {'key': 45}, [1, 2..."


In [142]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [143]:
for task in tasks:
  print(task, tasks[task].state)

mlops-pipeline-io-parameter-io-20240507115117 State.SUCCEEDED
multi-input State.SUCCEEDED
io-container State.SUCCEEDED
single-input State.SUCCEEDED


In [144]:
#tasks['io-container']

---
### Multiple Outputs

How to handle multiple output parameters when there is a single output object?  This is possible in pipelines components by using the [`typing` modules](https://docs.python.org/3/library/typing.html#module-typing) [`NamedTuple`](https://docs.python.org/3/library/typing.html#typing.NamedTuple) implementation.  The are tuples with named fields, meaning, that fields can be accessed by name instead of position indexes.

#### Understanding `NamedTuple`

First, are regular tuple.  The following creates a tuple and then recalls the 3rd element with an index.  At first this might seem like a list in Python but tuples are immutable objects - elements cannot be modified, added, or removed.  

In [166]:
example_tuple = (1, 'string', 4.5, True)

In [167]:
example_tuple[2]

4.5

Now, a named tuple.  

In [168]:
example_named_tuple = NamedTuple('example', x=int, y=int)

In [169]:
test_tuple = example_named_tuple(4, 8)

In [170]:
test_tuple

example(x=4, y=8)

In [171]:
test_tuple.x

4

Also, define and populate all at once like this:

In [172]:
NamedTuple('example', x=int, y=int)(2, 9)

example(x=2, y=9)

#### Create Pipeline Components

These are simple Python components, specifically lightweight Python components and container components.  For more details on the types of components check out this workflow in the same repository:
- [Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)

A lightweight python component with multiple inputs and multiple outputs:

In [182]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["pandas"]
)
def multi_input_output(
    a_str: str,
    a_int: int,
    a_float: float,
    a_bool: bool,
    a_dict: dict,
    a_list: list
) -> NamedTuple('multi_output', b_str=str, b_int=int, b_float=float, b_bool=bool, b_dict=dict, b_list=list):
    from typing import NamedTuple
    output = NamedTuple('multi_output', b_str=str, b_int=int, b_float=float, b_bool=bool, b_dict=dict, b_list=list)
    return output(a_str, a_int, a_float, a_bool, a_dict, a_list)

#### Compile Pipeline

In [183]:
pipeline_name = f'{SERIES}-{EXPERIMENT}-parameter-multi-io'

In [184]:
@kfp.dsl.pipeline(
    name = pipeline_name,
    description = 'A simple pipeline for testing',
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root'
)
def example_pipeline(
    a_str: str,
    a_int: int,
    a_float: float,
    a_bool: bool,
    a_dict: dict,
    a_list: list
) -> list:
    
    multi_io = multi_input_output(
        a_str = a_str,
        a_int= a_int,
        a_float= a_float,
        a_bool = a_bool,
        a_dict = a_dict,
        a_list = a_list 
    )
    multi_i = multi_input(
        a_str = multi_io.outputs['b_str'],
        a_int= multi_io.outputs['b_int'],
        a_float= multi_io.outputs['b_float'],
        a_bool = multi_io.outputs['b_bool'],
        a_dict = multi_io.outputs['b_dict'],
        a_list = multi_io.outputs['b_list']   
    )
    
    return multi_i.output

#### Compile Pipeline

In [185]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{pipeline_name}.yaml'
)

#### Create Pipeline Job

In [186]:
parameters = dict(
    a_str = 'test string',
    a_int = 1,
    a_float = 1.2,
    a_bool = True,
    a_dict = dict(key = 45),
    a_list = [1, 2, 3]
)

In [187]:
pipeline_job = aiplatform.PipelineJob(
    display_name = pipeline_name,
    template_path = f"{DIR}/{pipeline_name}.yaml",
    parameter_values = parameters,
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

#### Submit Pipeline Job

In [188]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-io-parameter-multi-io-20240507123009?project=1026793852137


In [189]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-io-parameter-multi-io-20240507123009?project=1026793852137


In [190]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-multi-io-20240507123009 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-parameter-mu

#### Retrieve Pipeline Information

In [191]:
aiplatform.get_pipeline_df(pipeline = f'{pipeline_name}')

Unnamed: 0,pipeline_name,run_name,param.input:a_list,param.input:a_str,param.vmlmd_lineage_integration,param.output:Output,param.input:a_bool,param.input:a_dict,param.input:a_int,param.input:a_float
0,mlops-pipeline-io-parameter-multi-io,mlops-pipeline-io-parameter-multi-io-202405071...,"[1.0, 2.0, 3.0]",test string,{'pipeline_run_component': {'parent_task_names...,"[test string, 1, 1.2, True, {'key': 45}, [1, 2...",True,{'key': 45.0},1.0,1.2
1,mlops-pipeline-io-parameter-multi-io,mlops-pipeline-io-parameter-multi-io-202405071...,"[1.0, 2.0, 3.0]",test string,{'pipeline_run_component': {'project_id': 'sta...,,True,{'key': 45.0},1.0,1.2
2,mlops-pipeline-io-parameter-multi-io,mlops-pipeline-io-parameter-multi-io-202405071...,"[1.0, 2.0, 3.0]",test string,{'pipeline_run_component': {'task_name': 'mlop...,,True,{'key': 45.0},1.0,1.2


In [192]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [193]:
for task in tasks:
  print(task, tasks[task].state)

multi-input State.SUCCEEDED
multi-input-output State.SUCCEEDED
mlops-pipeline-io-parameter-multi-io-20240507123009 State.SUCCEEDED


In [196]:
#tasks['multi-input-output']

---
## Artifacts: KFP Artifacts

**Artifacts** are multi-parameter objects that represent machine learning artifacts and have defined schemas and are stored as metadata with lineage.  The artifact schemas follow the [ML Metadata (MLMD)](https://github.com/google/ml-metadata) client library.  This helps with understanding and analyzing a pipeline.
- [KFP Artifacts](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/)
    - provided [artifact types](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/#artifact-types)
    - [Google Cloud Artifact Types](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/artifact_types.html)
    
Artifacts are passed to and from components with input parameters that are defined with wrappers:
- `kfp.dsl.Input`
- `kfp.dsl.Output`

> **NOTE:** There is a more Pythonic apporach in development that removes the need for input/ouput wrappers and allows the usage of artifacts like other parameters.  Check out [this reference](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/#artifact-types) for the emerging details.

For Example:
```Python
import kfp

@kfp.dsl.component()
def new_component(input_artifact: kfp.dsl.Artifact) > kfp.dsl.Artifact:
    ...
    return kfp.dsl.Artifact
```
    

### Artifact Type: `kfp.dsl.Artifact`

[`kfp.dsl.Artifact`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Artifact) is a generic artifact with parameters:
- `name`: str
- `uri`: str
- `metadata`: dict

Example:
```Python

```

### Artifact Type: `kfp.dsl.Dataset`

### Artifact Type: `kfp.dsl.Model`

### Artifact Type: `kfp.dsl.Metrics`

### Artifact Type: `kfp.dsl.ClassificationMetrics`

### Artifact Type: `kfp.dsl.SlicedClassificationMetrics`

### Artifact Type: `kfp.dsl.HTML`

### Artifact Type: `kfp.dsl.Markdown`

#### Create Pipeline Components

These are simple Python components, specifically lightweight Python components and container components.  For more details on the types of components check out this workflow in the same repository:
- [Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)

A lightweight python component with multiple inputs and multiple outputs:

In [15]:
from kfp.dsl import Artifact

In [39]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["pandas"]
)
def example_component1(
    name: str,
    uri: str,
    a_dict: dict
) -> Artifact:
    
    my_artifact = Artifact(uri = uri, metadata = a_dict)
    
    return my_artifact

@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["pandas"]
)
def example_component2(
    name: str,
    uri: str,
    a_dict: dict
) -> Artifact:
    
    my_artifact = Artifact(uri = uri)
    
    return my_artifact

@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["pandas"]
)
def example_component3(
    name: str,
    uri: str,
    a_dict: dict
) -> Artifact:
    
    my_artifact = Artifact(metadata = a_dict)
    my_artifact.uri = kfp.dsl.get_uri('my_artifact')
    
    return my_artifact

#### Compile Pipeline

In [40]:
pipeline_name = f'{SERIES}-{EXPERIMENT}-artifacts'

In [41]:
@kfp.dsl.pipeline(
    name = pipeline_name,
    description = 'A simple pipeline for testing',
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root'
)
def example_pipeline(
    name: str,
    uri: str,
    a_dict: dict
):# -> Artifact:
    
    create_artifact1 = example_component1(name = name, uri = uri, a_dict = a_dict)
    create_artifact2 = example_component2(name = name, uri = uri, a_dict = a_dict)
    create_artifact3 = example_component3(name = name, uri = uri, a_dict = a_dict)
    #return create_artifact

#### Compile Pipeline

In [42]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{pipeline_name}.yaml'
)

#### Create Pipeline Job

In [43]:
parameters = dict(
    name = 'test string',
    uri = 'https://www.google.com',
    a_dict = dict(key = 45),
)

In [44]:
pipeline_job = aiplatform.PipelineJob(
    display_name = pipeline_name,
    template_path = f"{DIR}/{pipeline_name}.yaml",
    parameter_values = parameters,
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

#### Submit Pipeline Job

In [45]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-io-artifacts-20240508173548?project=1026793852137


In [46]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-io-artifacts-20240508173548?project=1026793852137


In [47]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-io-artifacts-20240508173548


#### Retrieve Pipeline Information

In [191]:
aiplatform.get_pipeline_df(pipeline = f'{pipeline_name}')

Unnamed: 0,pipeline_name,run_name,param.input:a_list,param.input:a_str,param.vmlmd_lineage_integration,param.output:Output,param.input:a_bool,param.input:a_dict,param.input:a_int,param.input:a_float
0,mlops-pipeline-io-parameter-multi-io,mlops-pipeline-io-parameter-multi-io-202405071...,"[1.0, 2.0, 3.0]",test string,{'pipeline_run_component': {'parent_task_names...,"[test string, 1, 1.2, True, {'key': 45}, [1, 2...",True,{'key': 45.0},1.0,1.2
1,mlops-pipeline-io-parameter-multi-io,mlops-pipeline-io-parameter-multi-io-202405071...,"[1.0, 2.0, 3.0]",test string,{'pipeline_run_component': {'project_id': 'sta...,,True,{'key': 45.0},1.0,1.2
2,mlops-pipeline-io-parameter-multi-io,mlops-pipeline-io-parameter-multi-io-202405071...,"[1.0, 2.0, 3.0]",test string,{'pipeline_run_component': {'task_name': 'mlop...,,True,{'key': 45.0},1.0,1.2


In [192]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [193]:
for task in tasks:
  print(task, tasks[task].state)

multi-input State.SUCCEEDED
multi-input-output State.SUCCEEDED
mlops-pipeline-io-parameter-multi-io-20240507123009 State.SUCCEEDED


In [196]:
#tasks['multi-input-output']

---
## Artifacts: Vertex AI ML Metadata

- https://cloud.google.com/vertex-ai/docs/pipelines/use-components#consume_or_produce_artifacts_in_your_component
- https://cloud.google.com/vertex-ai/docs/pipelines/artifact-types