![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FPipelines&file=Vertex+AI+Pipelines+-+Start+Here.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FPipelines%2FVertex%2520AI%2520Pipelines%2520-%2520Start%2520Here.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---
This is part of a [series of notebook based workflows](./readme.md) that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:

||Notebook Workflow|Description|
|---|---|---|
|_**This Notebook**_|[Vertex AI Pipelines - Start Here](./Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb)|What are pipelines? Start here to go from code to pipeline and see it in action.|
||[Vertex AI Pipelines - Introduction](./Vertex%20AI%20Pipelines%20-%20Introduction.ipynb)|Introduction to pipelines with the console and Vertex AI SDK|
||[Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)|An introduction to all the ways to create pipeline components from your code|
||[Vertex AI Pipelines - IO](./Vertex%20AI%20Pipelines%20-%20IO.ipynb)|An overview of all the type of inputs and outputs for pipeline components|
||[Vertex AI Pipelines - Control](./Vertex%20AI%20Pipelines%20-%20Control.ipynb)|An overview of controlling the flow of exectution for pipelines|
||[Vertex AI Pipelines - Secret Manager](./Vertex%20AI%20Pipelines%20-%20Secret%20Manager.ipynb)|How to pass sensitive information to pipelines and components|
||[Vertex AI Pipelines - GCS Read and Write](./Vertex%20AI%20Pipelines%20-%20GCS%20Read%20and%20Write.ipynb)|How to read/write to GCS from components, including container components.|
||[Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)|How to schedule pipeline execution|
||[Vertex AI Pipelines - Notifications](./Vertex%20AI%20Pipelines%20-%20Notifications.ipynb)|How to send email notification of pipeline status.|
||[Vertex AI Pipelines - Management](./Vertex%20AI%20Pipelines%20-%20Management.ipynb)|Managing, Reusing, and Storing pipelines and components|
||[Vertex AI Pipelines - Testing](./Vertex%20AI%20Pipelines%20-%20Testing.ipynb)|Strategies for testing components and pipeliens locally and remotely to aide development.|
||[Vertex AI Pipelines - Managing Pipeline Jobs](./Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb)|Manage runs of pipelines in an environment: list, check status, filtered list, cancel and delete jobs.|

To discover these notebooks as part of an introduction to MLOps orchestration [start here](./readme.md).  To read more about MLOps also check out [the parent folder](../readme.md).

---

# Vertex AI Pipelines - Start Here

What are pipelines?
- They help you automate, manage, and scale your ML workflows
- They offer reproducibility, collaboration, and efficiency


In this quick start, we'll take a simple code example and run it both in a notebook and as a pipeline on Vertex AI Pipelines. This will likely spark many questions, and that's great! The rest of this series will dive deeper into each aspect of pipelines, providing comprehensive answers by example. 

Ready to see the pipelines in action? Let's dive into our first code example! 

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [67]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [68]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [69]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.51.0'),
    ('kfp', 'kfp'),
    ('sklearn', 'scikit-learn'),
    ('numpy', 'numpy')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [70]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [71]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [72]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [73]:
REGION = 'us-central1'
SERIES = 'mlops'
EXPERIMENT = 'pipeline-start'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [74]:
import os

from IPython.display import Markdown as show_md

from google.cloud import aiplatform
import kfp

import sklearn
import sklearn.datasets
import sklearn.linear_model
import sklearn.metrics
import sklearn.preprocessing
import numpy as np

In [75]:
kfp.__version__

'2.11.0'

In [76]:
aiplatform.__version__

'1.78.0'

Clients

In [77]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [78]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [79]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [80]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## ML Code - A Simple Example

A simple of example of using [scikit-learn](https://scikit-learn.org/stable/index.html) to create a linear model (regression) using the [diabetes data](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset):
- 442 samples
- predict disease progression: `disease_progression`
- using attributes: `sex, bmi, bp, total serum cholesterol, low-density lipoproteins, high-density lipoproteins, HDL, triglycerides, blood sugar`

In [81]:
# set a percentage of data to use for the test split
test_pct = 0.05

In [82]:
# load data
diabetes_X, diabetes_y = sklearn.datasets.load_diabetes(return_X_y = True)

In [83]:
# determine split_index from test_pct
split_index = int(len(diabetes_y) * test_pct)

In [84]:
# split data into train/test
train_X, test_X = diabetes_X[0:-1*split_index], diabetes_X[-1*split_index:]
train_y, test_y = diabetes_y[0:-1*split_index], diabetes_y[-1*split_index:]

In [85]:
# standardize the columns of X based on the train_X data
scaler = sklearn.preprocessing.StandardScaler()
scaler.fit(train_X)
train_X_normalized, test_X_normalized = scaler.transform(train_X), scaler.transform(test_X)

In [86]:
# create model
regression = sklearn.linear_model.LinearRegression()

In [87]:
# train model on training data
regression.fit(train_X_normalized, train_y)

In [88]:
# make predictions
test_predictions = regression.predict(test_X_normalized)

In [89]:
# metrics
mse = sklearn.metrics.mean_squared_error(test_y, test_predictions)
rmse = sklearn.metrics.root_mean_squared_error(test_y, test_predictions)
mae = sklearn.metrics.mean_absolute_error(test_y, test_predictions)
r2 = sklearn.metrics.r2_score(test_y, test_predictions)

In [90]:
# prepare result
result = dict(
    mse = mse,
    rmse = rmse,
    mae = mae,
    r2 = r2
)

In [91]:
result

{'mse': 1836.660017696029,
 'rmse': 42.856271626169594,
 'mae': 34.28990709280616,
 'r2': 0.6109895023653351}

---
## ML Code - As A Function

Turning ML Code into a function allows input parameters to be used to change the operations. In this case a simple split value is provided as `test_pct` for test percentage.

In [92]:
def trainer(test_pct = 0.20):

    # load data
    diabetes_X, diabetes_y = sklearn.datasets.load_diabetes(return_X_y = True)
    
    # determine split_index from test_pct
    split_index = int(len(diabetes_y) * test_pct)
    
    # split data into train/test
    train_X, test_X = diabetes_X[0:-1*split_index], diabetes_X[-1*split_index:]
    train_y, test_y = diabetes_y[0:-1*split_index], diabetes_y[-1*split_index:]
    
    # standardize the columns of X based on the train_X data
    scaler = sklearn.preprocessing.StandardScaler()
    scaler.fit(train_X)
    train_X_normalized, test_X_normalized = scaler.transform(train_X), scaler.transform(test_X)
    
    # create model
    regression = sklearn.linear_model.LinearRegression()
    
    # train model on training data
    regression.fit(train_X_normalized, train_y)
    
    # make predictions
    test_predictions = regression.predict(test_X_normalized)
    
    # metrics
    mse = sklearn.metrics.mean_squared_error(test_y, test_predictions)
    rmse = sklearn.metrics.root_mean_squared_error(test_y, test_predictions)
    mae = sklearn.metrics.mean_absolute_error(test_y, test_predictions)
    r2 = sklearn.metrics.r2_score(test_y, test_predictions)
    
    # prepare result
    result = dict(
        mse = mse,
        rmse = rmse,
        mae = mae,
        r2 = r2
    )
    
    return result

In [93]:
trainer(test_pct = .25)

{'mse': 2732.3884212594717,
 'rmse': 52.272252881040735,
 'mae': 40.963182420561566,
 'r2': 0.5575557705003222}

---
## ML Code - As A Pipeline Component

Just like the function above, this decorate the function to make it a pipeline component with KFP while also allowing customization for the enviornment like which image to use and packages to use.

Notice these changes form the above function:
- The `@kfp.dsl.component` decorator with optional specification of a container image and Python packages to install
- The type hint `-> dict` to define the return type as a dictionary
- The inclusion of package imports since the component will execute in isolation

In [94]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["scikit-learn", "numpy"]
)
def trainer(test_pct: float) -> dict:

    import sklearn
    import sklearn.datasets
    import sklearn.linear_model
    import sklearn.metrics
    import numpy as np
    
    # load data
    diabetes_X, diabetes_y = sklearn.datasets.load_diabetes(return_X_y = True)
    
    # determine split_index from test_pct
    split_index = int(len(diabetes_y) * test_pct)
    
    # split data into train/test
    train_X, test_X = diabetes_X[0:-1*split_index], diabetes_X[-1*split_index:]
    train_y, test_y = diabetes_y[0:-1*split_index], diabetes_y[-1*split_index:]
    
    # standardize the columns of X based on the train_X data
    scaler = sklearn.preprocessing.StandardScaler()
    scaler.fit(train_X)
    train_X_normalized, test_X_normalized = scaler.transform(train_X), scaler.transform(test_X)
    
    # create model
    regression = sklearn.linear_model.LinearRegression()
    
    # train model on training data
    regression.fit(train_X_normalized, train_y)
    
    # make predictions
    test_predictions = regression.predict(test_X_normalized)
    
    # metrics
    mse = sklearn.metrics.mean_squared_error(test_y, test_predictions)
    rmse = sklearn.metrics.root_mean_squared_error(test_y, test_predictions)
    mae = sklearn.metrics.mean_absolute_error(test_y, test_predictions)
    r2 = sklearn.metrics.r2_score(test_y, test_predictions)
    
    # prepare result
    result = dict(
        mse = mse,
        rmse = rmse,
        mae = mae,
        r2 = r2
    )
    
    return result

### Run The Component On Vertex AI Pipelines

In [95]:
kfp.compiler.Compiler().compile(
    pipeline_func = trainer,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml',
    pipeline_name = f"{SERIES}-{EXPERIMENT}-component"
)

In [96]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-component",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        test_pct = 0.15
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

In [97]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-component-20250312162653
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-component-20250312162653')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-component-20250312162653?project=1026793852137


In [98]:
show_md(f'The Dashboard can be [viewed here]({pipeline_job._dashboard_uri()})')

The Dashboard can be [viewed here](https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-component-20250312162653?project=1026793852137)

In [99]:
pipeline_job.wait()

PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-component-20250312162653


**Select The Pipeline Run In The Console:**
<p align="center"><center>
    <img align="center" alt="Pipeline Runs" src="../resources/images/screenshots/pipelines/start-run.png" width="70%">
</center></p>

**Review The Pipeline: Running State**
<p align="center"><center>
    <img align="center" alt="Pipeline Running" src="../resources/images/screenshots/pipelines/start-running.png" width="70%">
</center></p>

**Review The Pipeline: Completed With Output Parameters**
<p align="center"><center>
    <img align="center" alt="Pipeline Parameters" src="../resources/images/screenshots/pipelines/start-complete.png" width="70%">
</center></p>

---
## ML Code - As A Pipeline Component With Artifacts

A small extention of the pipeline component to store parameters in artifacts.

Notice the following changes from the previous component:
- The type hint points to an artifact object `kfp.dsl.Metrics`
- the result is prepared as a `kfp.dsl.Metrics` object
- the return value is set to the new metrics object

In [35]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["scikit-learn", "numpy"]
)
def trainer(test_pct: float) -> kfp.dsl.Metrics:

    import sklearn
    import sklearn.datasets
    import sklearn.linear_model
    import sklearn.metrics
    import numpy as np
    
    # load data
    diabetes_X, diabetes_y = sklearn.datasets.load_diabetes(return_X_y = True)
    
    # determine split_index from test_pct
    split_index = int(len(diabetes_y) * test_pct)
    
    # split data into train/test
    train_X, test_X = diabetes_X[0:-1*split_index], diabetes_X[-1*split_index:]
    train_y, test_y = diabetes_y[0:-1*split_index], diabetes_y[-1*split_index:]
    
    # standardize the columns of X based on the train_X data
    scaler = sklearn.preprocessing.StandardScaler()
    scaler.fit(train_X)
    train_X_normalized, test_X_normalized = scaler.transform(train_X), scaler.transform(test_X)
    
    # create model
    regression = sklearn.linear_model.LinearRegression()
    
    # train model on training data
    regression.fit(train_X_normalized, train_y)
    
    # make predictions
    test_predictions = regression.predict(test_X_normalized)
    
    # metrics
    mse = sklearn.metrics.mean_squared_error(test_y, test_predictions)
    rmse = sklearn.metrics.root_mean_squared_error(test_y, test_predictions)
    mae = sklearn.metrics.mean_absolute_error(test_y, test_predictions)
    r2 = sklearn.metrics.r2_score(test_y, test_predictions)
    
    # prepare result
    metrics = kfp.dsl.Metrics()
    metrics.log_metric('mse', mse)
    metrics.log_metric('rmse', rmse)
    metrics.log_metric('mae', mae)
    metrics.log_metric('r2', r2)
    
    return metrics

### Run The Component On Vertex AI Pipelines

In [36]:
kfp.compiler.Compiler().compile(
    pipeline_func = trainer,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml',
    pipeline_name = f"{SERIES}-{EXPERIMENT}-component"
)

In [41]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-component",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        test_pct = 0.15
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

In [42]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-component-20250309145048
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-component-20250309145048')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-component-20250309145048?project=1026793852137


In [43]:
show_md(f'The Dashboard can be [viewed here]({pipeline_job._dashboard_uri()})')

The Dashboard can be [viewed here](https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-component-20250309145048?project=1026793852137)

In [44]:
pipeline_job.wait()

PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-component-20250309145048


**Review The Pipeline: Completed With Output Artifact For Metrics**
<p align="center"><center>
    <img align="center" alt="Pipeline With Artifacts" src="../resources/images/screenshots/pipelines/start-complete2.png" width="70%">
</center></p>

---
## ML Code - As A Multi-Component Pipeline With Artifacts

Split the single component into two components: one for data preparation and one for training.

Then, create a pipeline.  This is a special component that treats components as funtions and allows connection through inputs and outputs.

In [45]:
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["scikit-learn", "numpy"]
)
def dataprep(test_pct: float) -> kfp.dsl.Dataset:

    import sklearn.datasets
    import numpy as np
    import os

    # load data
    diabetes_X, diabetes_y = sklearn.datasets.load_diabetes(return_X_y = True)
    
    # determine split_index from test_pct
    split_index = int(len(diabetes_y) * test_pct)
    
    # split data into train/test
    train_X, test_X = diabetes_X[0:-1*split_index], diabetes_X[-1*split_index:]
    train_y, test_y = diabetes_y[0:-1*split_index], diabetes_y[-1*split_index:]
    
    # standardize the columns of X based on the train_X data
    scaler = sklearn.preprocessing.StandardScaler()
    scaler.fit(train_X)
    train_X_normalized, test_X_normalized = scaler.transform(train_X), scaler.transform(test_X)
    
    # create dataset artifact with information about data
    dataset = kfp.dsl.Dataset(
        uri = kfp.dsl.get_uri(),
        metadata = dict(
            test_n = test_X.shape[0],
            train_n = train_X.shape[0],
            n_features = test_X.shape[1],
            train_X = 'train_X.npy',
            test_X = 'test_X.npy',
            train_y = 'train_y.npy',
            test_y = 'test_y.npy',
            train_X_normalized = 'train_X_normalized.npy',
            test_X_normalized = 'test_X_normalized.npy' 
        ) 
    )
    
    # save the data splits and partitions with the dataset - this is small data
    os.makedirs(dataset.path, exist_ok = True)
    np.save(dataset.path+'/train_X.npy', train_X)
    np.save(dataset.path+'/test_X.npy', test_X)
    np.save(dataset.path+'/train_y.npy', train_y)
    np.save(dataset.path+'/test_y.npy', test_y)
    np.save(dataset.path+'/train_X_normalized.npy', train_X_normalized)
    np.save(dataset.path+'/test_X_normalized.npy', test_X_normalized)
    
    return dataset
    
@kfp.dsl.component(
    base_image = "python:3.11",
    packages_to_install = ["scikit-learn", "numpy"]
)
def trainer(
    dataset: kfp.dsl.Dataset
) -> kfp.dsl.Metrics:

    import sklearn.linear_model
    import sklearn.metrics
    import numpy as np
    
    # load data using information on dataset artifact
    train_X_normalized = np.load(dataset.path + '/' + dataset.metadata['train_X_normalized'])
    test_X_normalized = np.load(dataset.path + '/' + dataset.metadata['test_X_normalized'])
    train_y = np.load(dataset.path + '/' + dataset.metadata['train_y'])
    test_y = np.load(dataset.path + '/' + dataset.metadata['test_y'])
    
    # create model
    regression = sklearn.linear_model.LinearRegression()
    
    # train model on training data
    regression.fit(train_X_normalized, train_y)
    
    # make predictions
    test_predictions = regression.predict(test_X_normalized)
    
    # metrics
    mse = sklearn.metrics.mean_squared_error(test_y, test_predictions)
    rmse = sklearn.metrics.root_mean_squared_error(test_y, test_predictions)
    mae = sklearn.metrics.mean_absolute_error(test_y, test_predictions)
    r2 = sklearn.metrics.r2_score(test_y, test_predictions)
    
    # prepare result
    metrics = kfp.dsl.Metrics()
    metrics.log_metric('mse', mse)
    metrics.log_metric('rmse', rmse)
    metrics.log_metric('mae', mae)
    metrics.log_metric('r2', r2)
    
    return metrics

### Create The Pipeline

In [46]:
@kfp.dsl.pipeline(
    name = f"{SERIES}-{EXPERIMENT}-multi-component"
)
def train_pipeline(
    test_pct: float
):
    
    dataset = dataprep(test_pct = test_pct)
    train = trainer(dataset = dataset.output)

### Run The Pipeline On Vertex AI Pipelines

In [47]:
kfp.compiler.Compiler().compile(
    pipeline_func = train_pipeline,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml'
)

In [48]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-multi-component",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        test_pct = 0.15
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

In [49]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-multi-component-20250309151556?project=1026793852137


In [50]:
show_md(f'The Dashboard can be [viewed here]({pipeline_job._dashboard_uri()})')

The Dashboard can be [viewed here](https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-multi-component-20250309151556?project=1026793852137)

In [51]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309151556


**Review The Pipeline: Completed With Multiple Components And Artifacts**
<p align="center"><center>
    <img align="center" alt="Pipeline Complete" src="../resources/images/screenshots/pipelines/start-complete3.png" width="70%">
</center></p>

---
## Integrate With Vertex AI Experiments

The code in the trainer as well as the full pipeline can be directly connected to a Vertex AI Experiment run.  Read me about [Vertex AI Experiments](../Experiment%20Tracking/readme.md).

In [53]:
pipeline_job.name

'mlops-pipeline-start-multi-component-20250309151556'

### Associate The Previous Pipeline Run With An Experiment Run

A basic link of the pipeline run to an experiment run within an experiment.

In [54]:
aiplatform.init(experiment = SERIES)
aiplatform.start_run(run = pipeline_job.name)#, resume = True)
aiplatform.log(pipeline_job = pipeline_job)
aiplatform.end_run()

Associating projects/1026793852137/locations/us-central1/metadataStores/default/contexts/mlops-mlops-pipeline-start-multi-component-20250309151556 to Experiment: mlops


### Add A Pipeline Run To An Experiment Automatically

As an example, rerun the previous pipeline and simply specify an experiment name during the `.submit()` to automatically associate the pipeline run with the experiment as a new experiment run and connect the associated artifacts, parameters, and metrics automatically.

In [55]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-multi-component",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        test_pct = 0.15
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

In [56]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT,
    experiment = SERIES
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309154221
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309154221')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-multi-component-20250309154221?project=1026793852137
Associating projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309154221 to Experiment: mlops


In [57]:
show_md(f'The Dashboard can be [viewed here]({pipeline_job._dashboard_uri()})')

The Dashboard can be [viewed here](https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-multi-component-20250309154221?project=1026793852137)

In [58]:
pipeline_job.wait()

PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-multi-component-20250309154221


**Review The Experiment:**
<p align="center"><center>
    <img align="center" alt="Pipeline Complete" src="../resources/images/screenshots/pipelines/start-experiment.png" width="70%">
</center></p>

---
## ML Code - As A Multi-Component With Looping And Control

Use looping and control to try many iterations of the input parameter `test_pct`.  Notice that the `kfp.dsl.ParallelFor` automatically runs the trials in parallel.  Then the `kfp.dslCollected` enables gathering all the results to be evalauted by a new custom component `min_mae` that processes the results from all the trials.


In [60]:
from typing import List

In [61]:
# use the same components created above in this pipeline:
# def dataprep(test_pct: float) -> kfp.dsl.Dataset:
# def trainer(dataset: kfp.dsl.Dataset) -> kfp.dsl.Metrics:

@kfp.dsl.component()
def min_mae(trials: List[kfp.dsl.Metrics]) -> kfp.dsl.Metrics:
    mae = [trial.metadata['mae'] for trial in trials]
    return trials[mae.index(min(mae))]
    

@kfp.dsl.pipeline(
    name = f"{SERIES}-{EXPERIMENT}-looping-control"
)
def train_pipeline(
    test_pct: float
):
    
    with kfp.dsl.ParallelFor(
        items = [round(x * 0.01, 2) for x in range(5, 31)],
        parallelism = 25,
        name = 'Loop of 25, all at once'
    ) as try_pct:
        dataset = dataprep(test_pct = try_pct)
        train = trainer(dataset = dataset.output)
        
    best_trial = min_mae(trials = kfp.dsl.Collected(train.output))

  return component_factory.create_component_from_func(


### Run The Pipeline On Vertex AI Pipelines

In [62]:
kfp.compiler.Compiler().compile(
    pipeline_func = train_pipeline,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml'
)

In [63]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-looping-control",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        test_pct = 0.15
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

In [64]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT,
    experiment = SERIES
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-looping-control-20250309170021?project=1026793852137
Associating projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 to Experiment: mlops


In [65]:
show_md(f'The Dashboard can be [viewed here]({pipeline_job._dashboard_uri()})')

The Dashboard can be [viewed here](https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-start-looping-control-20250309170021?project=1026793852137)

In [66]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-start-looping-control-20250309170021 current 

**Review The Pipeline: Completed With Multiple Components And Artifacts**
<p align="center"><center>
    <img align="center" alt="Pipeline Complete" src="../resources/images/screenshots/pipelines/start-complete4.png" width="70%">
</center></p>

---
## Do More With Pipelines:

This is just the beginning, to explore and learn more about pipelines continue with the series below:

This is part of a [series of notebook based workflows](./readme.md) that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:

||Notebook Workflow|Description|
|---|---|---|
|_**This Notebook**_|[Vertex AI Pipelines - Start Here](./Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb)|What are pipelines? Start here to go from code to pipeline and see it in action.|
||[Vertex AI Pipelines - Introduction](./Vertex%20AI%20Pipelines%20-%20Introduction.ipynb)|Introduction to pipelines with the console and Vertex AI SDK|
||[Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)|An introduction to all the ways to create pipeline components from your code|
||[Vertex AI Pipelines - IO](./Vertex%20AI%20Pipelines%20-%20IO.ipynb)|An overview of all the type of inputs and outputs for pipeline components|
||[Vertex AI Pipelines - Control](./Vertex%20AI%20Pipelines%20-%20Control.ipynb)|An overview of controlling the flow of exectution for pipelines|
||[Vertex AI Pipelines - Secret Manager](./Vertex%20AI%20Pipelines%20-%20Secret%20Manager.ipynb)|How to pass sensitive information to pipelines and components|
||[Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)|How to schedule pipeline execution|
||[Vertex AI Pipelines - Notifications](./Vertex%20AI%20Pipelines%20-%20Notifications.ipynb)|How to send email notification of pipeline status.|
||[Vertex AI Pipelines - Management](./Vertex%20AI%20Pipelines%20-%20Management.ipynb)|Managing, Reusing, and Storing pipelines and components|
||[Vertex AI Pipelines - Testing](./Vertex%20AI%20Pipelines%20-%20Testing.ipynb)|Strategies for testing components and pipeliens locally and remotely to aide development.|


To discover these notebooks as part of an introduction to MLOps orchestration [start here](./readme.md).  To read more about MLOps also check out [the parent folder](../readme.md).

---