![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2FMLOps&dt=Vertex+AI+Pipelines+-+Scheduling.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('kfp', 'kfp')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

## API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

---
## Setup

Inputs

In [13]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [14]:
REGION = 'us-central1'
EXPERIMENT = 'vertex-pipelines'
SERIES = 'dev'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [15]:
import os
import time
from google.cloud import aiplatform
from google.cloud import storage
import kfp

Clients

In [16]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

# gcs storage client
gcs = storage.Client(project = PROJECT_ID)

parameters:

In [20]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [21]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [22]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

## Vertex AI Pipelines

- https://cloud.google.com/vertex-ai/docs/pipelines/introduction
- https://www.kubeflow.org/docs/components/pipelines/v2/introduction/
- https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob

### Create Pipeline Components

Simple component that takes an input string, adds another string, outputs longer string.

In [26]:
@kfp.dsl.component(
    base_image = "python:3.10",
    packages_to_install = ["pandas"]
)
def example_string(text: str) -> str:
    text += ',... and more text'
    return text

Simple component that take an input number (float) and adds 10, outputs the incremented number.

In [27]:
@kfp.dsl.component(
    base_image = "python:3.10",
    packages_to_install = ["pandas"]
)
def example_number(number: float) -> float:
    number += 10
    return number

Simple component that takes input string and number and create a string from both, outputs the new string.

In [28]:
@kfp.dsl.component(
    base_image = "python:3.10",
    packages_to_install = ["pandas"]
)
def example_combo(text: str, number: float) -> str:
    result = f'{text}, ... and a number {number}'
    return result

### Create Pipeline

In [29]:
@kfp.dsl.pipeline(
    name = f'{SERIES}-{EXPERIMENT}',
    description = 'A simple pipeline for testing',
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root'
)
def example_pipeline(
    text: str,
    number: float
):
    text_task = example_string(text = text)
    number_task = example_number(number = number)
    combo_task = example_combo(text = text_task.output, number = number_task.output)

### Compile Pipeline

In [30]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml'
)

### Create Pipeline Job

In [31]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(text ='Example text', number = 34.2),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

### Submit Pipeline Job

In [32]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/dev-vertex-pipelines-20240307212937?project=1026793852137


In [33]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/dev-vertex-pipelines-20240307212937?project=1026793852137


In [34]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307212937


### Retrieve Pipeline Information

In [35]:
aiplatform.get_pipeline_df(pipeline = f'{SERIES}-{EXPERIMENT}')

Unnamed: 0,pipeline_name,run_name,param.input:text,param.vmlmd_lineage_integration,param.input:number
0,dev-vertex-pipelines,dev-vertex-pipelines-20240307212937,Example text,{'pipeline_run_component': {'pipeline_run_id':...,34.2


In [36]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [37]:
for task in tasks:
  print(task, tasks[task].state)

example-combo State.SUCCEEDED
example-string State.SUCCEEDED
dev-vertex-pipelines-20240307212937 State.SUCCEEDED
example-number State.SUCCEEDED


In [38]:
for task in tasks:
    print(task)

example-combo
example-string
dev-vertex-pipelines-20240307212937
example-number


In [45]:
tasks['example-number']

task_id: 6698494285479673856
task_name: "example-number"
create_time {
  seconds: 1709846982
  nanos: 866863000
}
start_time {
  seconds: 1709846983
  nanos: 483675000
}
end_time {
  seconds: 1709847085
  nanos: 815256000
}
executor_detail {
  container_detail {
    main_job: "projects/1026793852137/locations/us-central1/customJobs/2970989201182425088"
  }
}
state: SUCCEEDED
execution {
  name: "projects/1026793852137/locations/us-central1/metadataStores/default/executions/17843432237847752265"
  display_name: "example-number"
  state: COMPLETE
  etag: "1709847085771"
  create_time {
    seconds: 1709846983
    nanos: 143000000
  }
  update_time {
    seconds: 1709847085
    nanos: 771000000
  }
  schema_title: "system.ContainerExecution"
  schema_version: "0.0.1"
  metadata {
    fields {
      key: "input:number"
      value {
        number_value: 34.2
      }
    }
    fields {
      key: "output:Output"
      value {
        number_value: 44.2
      }
    }
    fields {
      key:

---
## Schedule Pipeline

- https://cloud.google.com/vertex-ai/docs/pipelines/schedule-pipeline-run

Example here: run every 2 minutes, 5 times

### Copy Compiled Pipeline To GCS

In [46]:
bucket = gcs.lookup_bucket(GCS_BUCKET)
blob = bucket.blob(f'{SERIES}/{EXPERIMENT}/{SERIES}-{EXPERIMENT}.yaml')
blob.upload_from_filename(f'{DIR}/{SERIES}-{EXPERIMENT}.yaml')

### Create Pipeline Job

Reference the GCS location of the compiled pipeline in the `template_path` parameter.

In [47]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}",
    template_path = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/{SERIES}-{EXPERIMENT}.yaml',
    parameter_values = dict(text ='Example text', number = 34.2),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

### Create A Schedule From Pipeline Job

In [49]:
pipeline_job_schedule = pipeline_job.create_schedule(
    display_name = f"{SERIES}-{EXPERIMENT}",
    cron = "*/2 * * * *",
    max_concurrent_run_count = 3,
    max_run_count = 5
)

Creating PipelineJobSchedule
PipelineJobSchedule created. Resource name: projects/1026793852137/locations/us-central1/schedules/1195010809718112256
To use this PipelineJobSchedule in another session:
schedule = aiplatform.PipelineJobSchedule.get('projects/1026793852137/locations/us-central1/schedules/1195010809718112256')
View Schedule:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/schedules/1195010809718112256?project=1026793852137


### Wait on Pipeline Schedule 

This demonstration will complete within 12 minutes...

In [50]:
time.sleep(6*2*60)

### Retrieve Pipeline Information

Note: This now includes the original run, and the 5 runs conducted by the schedule:

In [51]:
aiplatform.get_pipeline_df(pipeline = f'{SERIES}-{EXPERIMENT}')

Unnamed: 0,pipeline_name,run_name,param.input:number,param.vmlmd_lineage_integration,param.input:text
0,dev-vertex-pipelines,dev-vertex-pipelines-20240307134600855,34.2,{'pipeline_run_component': {'task_name': 'dev-...,Example text
1,dev-vertex-pipelines,dev-vertex-pipelines-20240307134402304,34.2,{'pipeline_run_component': {'pipeline_run_id':...,Example text
2,dev-vertex-pipelines,dev-vertex-pipelines-20240307134201783,34.2,{'pipeline_run_component': {'pipeline_run_id':...,Example text
3,dev-vertex-pipelines,dev-vertex-pipelines-20240307134001539,34.2,{'pipeline_run_component': {'task_name': 'dev-...,Example text
4,dev-vertex-pipelines,dev-vertex-pipelines-20240307133801577,34.2,{'pipeline_run_component': {'location_id': 'us...,Example text
5,dev-vertex-pipelines,dev-vertex-pipelines-20240307212937,34.2,{'pipeline_run_component': {'pipeline_run_id':...,Example text


### List The Schedules

In [52]:
schedules = aiplatform.PipelineJobSchedule.list(
    filter = f'display_name="{SERIES}-{EXPERIMENT}"',
)
schedules

[<google.cloud.aiplatform.pipeline_job_schedules.PipelineJobSchedule object at 0x7f1017b21420> 
 resource name: projects/1026793852137/locations/us-central1/schedules/1195010809718112256]

### List The Runs From The Schedule

In [53]:
jobs = schedules[0].list_jobs()
for job in jobs:
    print(job.to_dict(), '\n\n')

By enabling simple view, the PipelineJob resources returned from this method will not contain all fields.
{'name': 'projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307134600855', 'createTime': '2024-03-07T21:46:01.038966Z', 'startTime': '2024-03-07T21:46:01.781983Z', 'endTime': '2024-03-07T21:46:03.891055Z', 'updateTime': '2024-03-07T21:46:03.891055Z', 'pipelineSpec': {'pipelineInfo': {'name': 'dev-vertex-pipelines'}}, 'state': 'PIPELINE_STATE_SUCCEEDED', 'jobDetail': {'pipelineContext': {'name': 'projects/1026793852137/locations/us-central1/metadataStores/default/contexts/dev-vertex-pipelines'}, 'pipelineRunContext': {'name': 'projects/1026793852137/locations/us-central1/metadataStores/default/contexts/dev-vertex-pipelines-20240307134600855'}}, 'labels': {'vertex-ai-pipelines-run-billing-id': '5145949988357931008'}} 


{'name': 'projects/1026793852137/locations/us-central1/pipelineJobs/dev-vertex-pipelines-20240307134402304', 'createTime': '2024-03-