![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FPipelines&file=Vertex+AI+Pipelines+-+Scheduling.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FPipelines%2FVertex%2520AI%2520Pipelines%2520-%2520Scheduling.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---
This is part of a [series of notebook based workflows](./readme.md) that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:

|Link To Section|Notebook Workflow|Description|
|---|---|---|
||[Vertex AI Pipelines - Start Here](./Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb)|What are pipelines? Start here to go from code to pipeline and see it in action.|
||[Vertex AI Pipelines - Introduction](./Vertex%20AI%20Pipelines%20-%20Introduction.ipynb)|Introduction to pipelines with the console and Vertex AI SDK|
||[Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)|An introduction to all the ways to create pipeline components from your code|
||[Vertex AI Pipelines - IO](./Vertex%20AI%20Pipelines%20-%20IO.ipynb)|An overview of all the type of inputs and outputs for pipeline components|
||[Vertex AI Pipelines - Control](./Vertex%20AI%20Pipelines%20-%20Control.ipynb)|An overview of controlling the flow of exectution for pipelines|
||[Vertex AI Pipelines - Secret Manager](./Vertex%20AI%20Pipelines%20-%20Secret%20Manager.ipynb)|How to pass sensitive information to pipelines and components|
||[Vertex AI Pipelines - GCS Read and Write](./Vertex%20AI%20Pipelines%20-%20GCS%20Read%20and%20Write.ipynb)|How to read/write to GCS from components, including container components.|
|_**This Notebook**_|[Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)|How to schedule pipeline execution|
||[Vertex AI Pipelines - Notifications](./Vertex%20AI%20Pipelines%20-%20Notifications.ipynb)|How to send email notification of pipeline status.|
||[Vertex AI Pipelines - Management](./Vertex%20AI%20Pipelines%20-%20Management.ipynb)|Managing, Reusing, and Storing pipelines and components|
||[Vertex AI Pipelines - Testing](./Vertex%20AI%20Pipelines%20-%20Testing.ipynb)|Strategies for testing components and pipeliens locally and remotely to aide development.|
||[Vertex AI Pipelines - Managing Pipeline Jobs](./Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb)|Manage runs of pipelines in an environment: list, check status, filtered list, cancel and delete jobs.|


To discover these notebooks as part of an introduction to MLOps orchestration [start here](./readme.md).  To read more about MLOps also check out [the parent folder](../readme.md).

---

# Vertex AI Pipelines - Scheduling 

[Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) is a serverless  runner for Kubeflow Pipelines [(KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) and the [TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx/guide/understanding_tfx_pipelines) framework.

This notebook based workflow creates a simple KFP pipeline, compiles it and then runs it on Vertex AI Pipelines.  

Then, the pipeline is **scheduled** to run repeatedly using the scheduler API:
- [Schedule a pipeline run with schedular API](https://cloud.google.com/vertex-ai/docs/pipelines/schedule-pipeline-run).

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('kfp', 'kfp')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
SERIES = 'mlops'
EXPERIMENT = 'pipeline-scheduling'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [9]:
import os, time

from google.cloud import aiplatform
from google.cloud import storage
import kfp

Clients

In [10]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

# gcs storage client
gcs = storage.Client(project = PROJECT_ID)

parameters:

In [11]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [12]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [13]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Vertex AI Pipelines

- https://cloud.google.com/vertex-ai/docs/pipelines/introduction
- https://www.kubeflow.org/docs/components/pipelines/v2/introduction/
- https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob

### Create Pipeline Components

These are simple Python components, specifically lightweight Python components.  For more details on the types of components check out this workflow in the same repository:
- [Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)


Simple component that takes an input string, adds another string, outputs longer string.

In [14]:
@kfp.dsl.component(
    base_image = "python:3.10",
    packages_to_install = ["pandas"]
)
def example_string(text: str) -> str:
    text += ',... and more text'
    return text

Simple component that take an input number (float) and adds 10, outputs the incremented number.

In [15]:
@kfp.dsl.component(
    base_image = "python:3.10",
    packages_to_install = ["pandas"]
)
def example_number(number: float) -> float:
    number += 10
    return number

Simple component that takes input string and number and create a string from both, outputs the new string.

In [16]:
@kfp.dsl.component(
    base_image = "python:3.10",
    packages_to_install = ["pandas"]
)
def example_combo(text: str, number: float) -> str:
    result = f'{text}, ... and a number {number}'
    return result

### Create Pipeline

In [17]:
@kfp.dsl.pipeline(
    name = f'{SERIES}-{EXPERIMENT}',
    description = 'A simple pipeline for testing'
)
def example_pipeline(
    text: str,
    number: float
):
    text_task = example_string(text = text)
    number_task = example_number(number = number)
    combo_task = example_combo(text = text_task.output, number = number_task.output)

### Compile Pipeline

In [18]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml'
)

### Create Pipeline Job

In [19]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(text ='Example text', number = 34.2),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

### Submit Pipeline Job

In [20]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-scheduling-20250310190042?project=1026793852137


In [21]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-scheduling-20250310190042?project=1026793852137


In [22]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310190042


<p><center>
    <img alt="Overview of Pipeline" src="../resources/images/screenshots/pipelines/schedule/pipeline-overview.png" width="90%">
</center><p>

### Retrieve Pipeline Information

In [23]:
aiplatform.get_pipeline_df(pipeline = f'{SERIES}-{EXPERIMENT}')

Unnamed: 0,pipeline_name,run_name,param.input:text,param.vmlmd_lineage_integration,param.input:number
0,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310190042,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
1,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20240316114903085,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
2,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20240316114801619,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
3,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20240316114701725,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
4,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20240316114601422,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
5,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20240316114502103,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
6,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20240316183931,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2


In [24]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [25]:
for task in tasks:
  print(task, tasks[task].state)

mlops-pipeline-scheduling-20250310190042 State.SUCCEEDED
example-number State.SUCCEEDED
example-string State.SUCCEEDED
example-combo State.SUCCEEDED


In [26]:
for task in tasks:
    print(task)

mlops-pipeline-scheduling-20250310190042
example-number
example-string
example-combo


In [27]:
tasks['example-number']

task_id: -1000748236329189376
parent_task_id: -1784374571491655680
task_name: "example-number"
create_time {
  seconds: 1741633245
  nanos: 409695000
}
start_time {
  seconds: 1741633246
  nanos: 47828000
}
end_time {
  seconds: 1741633338
  nanos: 435457000
}
executor_detail {
  container_detail {
    main_job: "projects/1026793852137/locations/us-central1/customJobs/8627487329314930688"
  }
}
state: SUCCEEDED
execution {
  name: "projects/1026793852137/locations/us-central1/metadataStores/default/executions/15094865489501908552"
  display_name: "example-number"
  state: COMPLETE
  etag: "1741633338379"
  create_time {
    seconds: 1741633245
    nanos: 634000000
  }
  update_time {
    seconds: 1741633338
    nanos: 379000000
  }
  schema_title: "system.ContainerExecution"
  schema_version: "0.0.1"
  metadata {
    fields {
      key: "vmlmd_lineage_integration"
      value {
        struct_value {
          fields {
            key: "pipeline_run_component"
            value {
     

---
## Schedule Pipeline

- https://cloud.google.com/vertex-ai/docs/pipelines/schedule-pipeline-run

Example here: run every 1 minutes, 5 times

### Copy Compiled Pipeline To GCS

Or a KFP Artifact Registry as demonstrated in [Vertex AI Pipelines - Management](./Vertex%20AI%20Pipelines%20-%20Management.ipynb).

In [28]:
bucket = gcs.lookup_bucket(GCS_BUCKET)
blob = bucket.blob(f'{SERIES}/{EXPERIMENT}/{SERIES}-{EXPERIMENT}.yaml')
blob.upload_from_filename(f'{DIR}/{SERIES}-{EXPERIMENT}.yaml')

### Create Pipeline Job

Reference the GCS location of the compiled pipeline in the `template_path` parameter.

In [38]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}",
    template_path = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/{SERIES}-{EXPERIMENT}.yaml',
    parameter_values = dict(text ='Example text', number = 34.2),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

### Create A Schedule From Pipeline Job

Example:
- Run every minute
- allow 3 runs to run concurrently
    - a run may need to start before a prior run is finished
- stop after 5 runs

In [39]:
pipeline_job_schedule = pipeline_job.create_schedule(
    display_name = f"{SERIES}-{EXPERIMENT}",
    cron = "*/1 * * * *",
    max_concurrent_run_count = 3,
    max_run_count = 5
)

Creating PipelineJobSchedule
PipelineJobSchedule created. Resource name: projects/1026793852137/locations/us-central1/schedules/6259831958231056384
To use this PipelineJobSchedule in another session:
schedule = aiplatform.PipelineJobSchedule.get('projects/1026793852137/locations/us-central1/schedules/6259831958231056384')
View Schedule:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/schedules/6259831958231056384?project=1026793852137


### Wait on Pipeline Schedule 

This demonstration will complete within 6 minutes...

In [40]:
time.sleep(6*1*60)

### Retrieve Pipeline Information

Note: This now includes the original run, and the 5 runs conducted by the schedule:

In [41]:
aiplatform.get_pipeline_df(pipeline = f'{SERIES}-{EXPERIMENT}')

Unnamed: 0,pipeline_name,run_name,param.input:text,param.vmlmd_lineage_integration,param.input:number
0,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310123200480,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
1,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310123100540,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
2,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310123000724,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
3,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122900486,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
4,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122800372,Example text,{'pipeline_run_component': {'parent_task_names...,34.2
5,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122400401,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
6,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122300372,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
7,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122200422,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
8,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122100393,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2
9,mlops-pipeline-scheduling,mlops-pipeline-scheduling-20250310122000439,Example text,{'pipeline_run_component': {'location_id': 'us...,34.2


### List The Schedules

In [42]:
schedules = aiplatform.PipelineJobSchedule.list(
    filter = f'display_name="{SERIES}-{EXPERIMENT}"',
)
schedules

[<google.cloud.aiplatform.pipeline_job_schedules.PipelineJobSchedule object at 0x7f5a7a869f30> 
 resource name: projects/1026793852137/locations/us-central1/schedules/6259831958231056384]

### List The Runs From The Schedule

In [43]:
jobs = schedules[0].list_jobs()
for job in jobs:
    print(job.to_dict(), '\n\n')

By enabling simple view, the PipelineJob resources returned from this method will not contain all fields.
{'name': 'projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310123200480', 'createTime': '2025-03-10T19:32:00.608961Z', 'startTime': '2025-03-10T19:32:01.170639Z', 'endTime': '2025-03-10T19:32:02.913039Z', 'updateTime': '2025-03-10T19:32:02.913039Z', 'pipelineSpec': {'pipelineInfo': {'name': 'mlops-pipeline-scheduling'}}, 'state': 'PIPELINE_STATE_SUCCEEDED', 'jobDetail': {'pipelineContext': {'name': 'projects/1026793852137/locations/us-central1/metadataStores/default/contexts/mlops-pipeline-scheduling'}, 'pipelineRunContext': {'name': 'projects/1026793852137/locations/us-central1/metadataStores/default/contexts/mlops-pipeline-scheduling-20250310123200480'}}, 'labels': {'vertex-ai-pipelines-run-billing-id': '8572606426185203712'}} 


{'name': 'projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-scheduling-20250310123100540'

### Console View Of Schedule

**Review Schedule While Running:**
<p><center>
    <img alt="Overview of Pipeline" src="../resources/images/screenshots/pipelines/schedule/schedule-running.png" width="90%">
</center><p>
    
**Review Completed Schedule:**
<p><center>
    <img alt="Overview of Pipeline" src="../resources/images/screenshots/pipelines/schedule/schedule-complete.png" width="90%">
</center><p>

### Delete A Schedule

In [45]:
schedules = aiplatform.PipelineJobSchedule.list(
    filter = f'display_name="{SERIES}-{EXPERIMENT}"',
)
schedules

[<google.cloud.aiplatform.pipeline_job_schedules.PipelineJobSchedule object at 0x7f5a7a868220> 
 resource name: projects/1026793852137/locations/us-central1/schedules/6259831958231056384]

In [46]:
schedules[0].done()

True

In [47]:
schedules[0].delete()

Deleting PipelineJobSchedule : projects/1026793852137/locations/us-central1/schedules/6259831958231056384
PipelineJobSchedule deleted. . Resource name: projects/1026793852137/locations/us-central1/schedules/6259831958231056384
Deleting PipelineJobSchedule resource: projects/1026793852137/locations/us-central1/schedules/6259831958231056384
Delete PipelineJobSchedule backing LRO: projects/1026793852137/locations/us-central1/operations/6542659437323091968
PipelineJobSchedule resource projects/1026793852137/locations/us-central1/schedules/6259831958231056384 deleted.
