![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FPipelines&file=Vertex+AI+Pipelines+-+Managing+Pipeline+Jobs.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FPipelines%2FVertex%2520AI%2520Pipelines%2520-%2520Managing%2520Pipeline%2520Jobs.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Vertex AI Pipelines - Managing Jobs

Vertex AI Pipeline Jobs are runs of a pipeline.  These can be directly run by a user, started by API, or scheduled.  Withing Vetex AI a project can have many jobs running at any time and a history of all past jobs.  This workflow shows how to review and manage the jobs in an enviornment.

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.51.0'),
    ('kfp', 'kfp'),
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline-jobstate'
SERIES = 'mlops'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [8]:
import os
from IPython.display import Markdown as show_md

from google.cloud import aiplatform
import kfp

In [9]:
kfp.__version__

'2.11.0'

In [10]:
aiplatform.__version__

'1.78.0'

Clients

In [11]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [12]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [13]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [14]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Make A Simple Pipeline For Testing/Illustrating

In [17]:
@kfp.dsl.component(
    base_image = "python:3.11"
)
def waiter(wait_time: int) -> dict:
    import time
    
    time.sleep(wait_time)
    
    return

In [19]:
kfp.compiler.Compiler().compile(
    pipeline_func = waiter,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml',
    pipeline_name = f"{SERIES}-{EXPERIMENT}-component"
)

## Executing Pipeline Jobs And Monitoring Directly

### Execute A PipelineJob with `.run()`

This method runs and locks the local context while it waits on the job to end.  It continually updates the state of the job in the output to the local session.

In [20]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-component",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        wait_time = 200
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = False # True (enabled), False (disable), None (defer to component level caching) 
)

In [21]:
job_run = pipeline_job.run(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212175926
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212175926')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-jobstate-component-20250212175926?project=1026793852137
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212175926 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212175926 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212175926 current s

Notice that the job continually reported state for the duration of running.

In [28]:
pipeline_job.state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

In [29]:
pipeline_job.state.value

4

In [32]:
pipeline_job.state.name

'PIPELINE_STATE_SUCCEEDED'

In [33]:
type(pipeline_job.state)

<enum 'PipelineState'>

In [None]:
pipeline_job.done()

### Execute A PipelineJob with `.submit()` and `.wait()`

This method starts the job and returns the context to the local session without waiting on the job to complete.  The wait method can be used to force the session to wait on the job to end.

In [36]:
pipeline_job2 = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}-component",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = dict(
        wait_time = 200
    ),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = False # True (enabled), False (disable), None (defer to component level caching) 
)

In [37]:
job_run2 = pipeline_job2.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-jobstate-component-20250212193454?project=1026793852137


Notice that the job returns to the local context after it is sumitted.

In [38]:
type(job_run2)

NoneType

In [39]:
type(pipeline_job2)

google.cloud.aiplatform.pipeline_jobs.PipelineJob

In [41]:
pipeline_job2.state

<PipelineState.PIPELINE_STATE_RUNNING: 3>

In [None]:
pipeline_job2.done()

In [43]:
pipeline_job2.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454


In [44]:
pipeline_job2.state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

In [48]:
pipeline_job2.done()

True

In [49]:
pipeline_job2.resource_name

'projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454'

---
## Listing Pipeline Jobs

### All Runs

In [53]:
jobs_list = aiplatform.PipelineJob.list(filter = None, enable_simple_view = True)

By enabling simple view, the PipelineJob resources returned from this method will not contain all fields.


In [54]:
len(jobs_list)

381

In [56]:
jobs_list[0:2]

[<google.cloud.aiplatform.pipeline_jobs.PipelineJob object at 0x7fcd7b35d090> 
 resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212193454,
 <google.cloud.aiplatform.pipeline_jobs.PipelineJob object at 0x7fcd7b35c310> 
 resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-jobstate-component-20250212192434]

In [59]:
jobs_list[0].state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

### Named Runs: Filter By `display_name`

In [93]:
jobs_list = aiplatform.PipelineJob.list(
    filter = f'display_name = "{SERIES}-{EXPERIMENT}-component"', 
    enable_simple_view = True
)

By enabling simple view, the PipelineJob resources returned from this method will not contain all fields.


In [94]:
len(jobs_list)

3

In [95]:
jobs_list[0].state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

### Named Runs: Filter By `display_name`

In [93]:
jobs_list = aiplatform.PipelineJob.list(
    filter = f'display_name = "{SERIES}-{EXPERIMENT}-component"', 
    enable_simple_view = True
)

By enabling simple view, the PipelineJob resources returned from this method will not contain all fields.


In [94]:
len(jobs_list)

3

In [95]:
jobs_list[0].state

<PipelineState.PIPELINE_STATE_SUCCEEDED: 4>

list:
- monitor a specific job
- list jobs
- list runs of a speciifc pipeline
- list by state - get the running jobs for example
- list by timeframe
- force a job to stop
- clone?
    