In [1]:
# Copyright 2022 Google LLC
# Authors: 
# Fabian Hirschmann <fhirschmann@google.com>, 
# Elia Secchi <eliasecchi@google.com>,
# Megha Agarwal <meghaag@google.com>,
# Mandie Quartly <mandieq@google.com>
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Automated MLOps pipeline build, testing and deployment

In the previous notebook, you created a machine learning pipeline to train a model. In this session, it's all about automating the training and deployment of this model. Hence, the objective this notebook is to:

1. Refactor your Kubeflow pipeline into a Python file that can be compiled into YAML in an automated fashion.
1. Write a script to deploy a compiled Kubeflow pipeline to Vertex AI.
1. Use Cloud Build (CI/CD) to compile, test, and run your Kubeflow pipeline.
1. Create a Cloud Source Repository (Git) to automatically trigger Cloud Build on every change on the master branch
1. Create a [pipeline template](https://cloud.google.com/vertex-ai/docs/pipelines/create-pipeline-template) to allow for the pipeline to be reused and retriggered
1. Setup a schedule for the pipeline, which can be done in 2 ways:
      - using Cloud Scheduler job, which sends a message to a Pub/Sub topic, and then calls a Cloud Function to trigger the VertexAI pipeline.
      - we anticipate the new Vertex AI Pipelines Schedules API
      
      
As part of this notebook, you'll create the following files using the `%%writefile` directive.
- `src/requirements.txt`: Python requirements file listing all dependencies.
- `src/pipeline.py`: File containing your Kubeflow pipeline in Python and logic to compile the pipeline into YAML.
- `src/create-pipeline-template.py`: Python script to create a Kubeflow Pipeline Template in Vertex AI.
- `src/submit-pipeline.py`: Python script to submit pipeline to Vertex AI.
- `src/cloudbuild.yml`: Cloud build pipeline to run your CI/CD process.
- `src/tests/test_pipeline.py`: Unit test for pipeline
- `cf-trigger/main.py`: Cloud Function to trigger your Kubeflow Pipeline.

## Prerequisites

Using the cell below, ensures that the following cloud service APIs are enabled for this lab:
1. `Vertex AI API`
1. `Cloud Build API`
1. `Artifact Registry API`
1. `Cloud Source Repositories API`
1. `Cloud Function API`
1. `Cloud Scheduler API`

It also ensures that your Compute Engine default service account `{PROJECT_NUMBER}-compute@developer.gserviceaccount.com` has following permissions enabled:
1. `Storage Admin`
1. `Vertex AI User`
1. `Cloud Build Editor`
1. `Artifact Registry Writer`
1. `Source Repository Administrator`

And that your Cloud Build default service account `{PROJECT_NUMBER}-@cloudbuild.gserviceaccount.com` has following permissions enabled:
1. `Service Account User`
1. `Vertex AI User`

If the credentials you are using in this notebooks don't have the right permissions to run the following cell, feel free to run these in [Cloud Shell](https://cloud.google.com/shell/docs/using-cloud-shell) to setup your environment.

In [None]:
%%bash

gcloud services enable cloudbuild.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable sourcerepo.googleapis.com
gcloud services enable cloudfunctions.googleapis.com
gcloud services enable cloudscheduler.googleapis.com

PROJECT_ID=$(gcloud config get-value project)
# Default Compute Engine SA roles
PROJECT_NUM=$(gcloud projects list --filter="$PROJECT_ID" --format="value(PROJECT_NUMBER)")
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}-compute@developer.gserviceaccount.com"\
      --role='roles/storage.admin'
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}-compute@developer.gserviceaccount.com"\
      --role='roles/aiplatform.user'
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}-compute@developer.gserviceaccount.com"\
      --role='roles/cloudbuild.builds.editor'
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}-compute@developer.gserviceaccount.com"\
      --role='roles/artifactregistry.writer'
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}-compute@developer.gserviceaccount.com"\
      --role='roles/source.admin'

# Default Cloud Build SA roles
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}@cloudbuild.gserviceaccount.com"\
      --role='roles/aiplatform.user'
gcloud projects add-iam-policy-binding $PROJECT_ID \
      --member="serviceAccount:${PROJECT_NUM}@cloudbuild.gserviceaccount.com"\
      --role='roles/iam.serviceAccountUser'

### Install required python packages

In [2]:
!mkdir -p src

In [17]:
%%writefile src/requirements.txt
kfp==2.0.0b12
pytest==7.2.0
pytz==2022.7
google-cloud-aiplatform==1.20.0
google-api-core==2.10.2
google-auth==1.35.0
google-cloud-bigquery==1.20.0
google-cloud-core==1.7.3
google-cloud-resource-manager==1.6.3
google-cloud-storage==2.2.1


Overwriting src/requirements.txt


In [18]:
!pip install -r src/requirements.txt --user

Collecting kfp==2.0.0b12
  Downloading kfp-2.0.0-beta.12.tar.gz (492 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m492.9/492.9 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Collecting kfp-pipeline-spec==0.2.0
  Using cached kfp_pipeline_spec-0.2.0-py3-none-any.whl (12 kB)
Building wheels for collected packages: kfp
  Building wheel for kfp (setup.py) ... [?25ldone
[?25h  Created wheel for kfp: filename=kfp-2.0.0b12-py3-none-any.whl size=556211 sha256=c317b0a5e39db3b4f728bf16d3bc4982d94324cfff00498765996a91428279ed
  Stored in directory: /home/jupyter/.cache/pip/wheels/6a/32/7b/59029a83a3d4addae5589319683a03d963fe61ada3ec2afb96
Successfully built kfp
Installing collected packages: kfp-pipeline-spec, kfp
  Attempting uninstall: kfp-pipeline-spec
    Found existing installation: kfp-pipeline-spec 0.1.16
    Uninstalling kfp-pipeline-spec-0.1.16:
      Successfully uninstalled kfp-pipeline-spec-0.

### Setup environment variables

In [5]:
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

In [106]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
     
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)
    
    shell_output = !  ! gcloud projects list --filter="$PROJECT_ID" --format="value(PROJECT_NUMBER)" 2>/dev/null
    PROJECT_NUM = shell_output[0]
    print("Project Num", PROJECT_NUM)



Project ID: vertex-ai-test-365213
Project Num 370018035372


In [4]:
REGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"
    
BUCKET_NAME = f"mlops-2-{PROJECT_ID}"
EXPERIMENT_NAME = "mlops-2-experiment"
PIPELINE_NAME = "mlops-2-pipeline"
ENDPOINT_NAME = "mlops-2-endpoint"
REPOSITORY_NAME = f"mlops-2-{PROJECT_ID}"

In [8]:
!gsutil mb -c regional -l $REGION gs://$BUCKET_NAME

Creating gs://mlops-coaching-2-vertex-ai-test-365213/...
ServiceException: 409 A Cloud Storage bucket named 'mlops-coaching-2-vertex-ai-test-365213' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


## Automated MLOps pipeline creation

### Create a script containing your Vertex AI/Kubeflow Pipeline to compile the pipeline into `pipeline.yaml`

> <font color='green'>**Task 1**</font>
>
> Create a Python script `src/pipeline.py` that creates a file name `pipeline.yaml` from the Kubeflow pipeline you developed last week. The output file should be in YAML and not JSON format.
>
> If you were unable to produce a Kubeflow pipeline last week, please use the one provided below. Otherwise, replace it with your own.

In [22]:
%%writefile src/pipeline.py

#### Optionally, you can customise the pipeline you are deploying by inserting the pipeline from part 1 ####


from typing import NamedTuple

from kfp.dsl import pipeline
from kfp.dsl import component
from kfp import compiler

@component() 
def concat(a: str, b: str) -> str:
    return a + b

@component
def reverse(a: str) -> NamedTuple("outputs", [("before", str), ("after", str)]):
    return a, a[::-1]

@pipeline(name="mlops-workshop-pipeline")
def basic_pipeline(a: str='stres', b: str='sed'):
    concat_task = concat(a=a, b=b)
    reverse_task = reverse(a=concat_task.output)

if __name__ == '__main__':
    compiler.Compiler().compile(pipeline_func=basic_pipeline, package_path="pipeline.yaml")

Overwriting src/pipeline.py


Using the next command, you can test the materialized pipeline generated by your script. You can view the output in a file named `pipeline.yaml`.

In [23]:
!python src/pipeline.py

In [24]:
!head -n20 pipeline.yaml

# PIPELINE DEFINITION
# Name: mlops-coaching-pipeline
# Inputs:
#    a: str [Default: 'stres']
#    b: str [Default: 'sed']
components:
  comp-concat:
    executorLabel: exec-concat
    inputDefinitions:
      parameters:
        a:
          parameterType: STRING
        b:
          parameterType: STRING
    outputDefinitions:
      parameters:
        Output:
          parameterType: STRING
  comp-reverse:
    executorLabel: exec-reverse


### Test the Pipeline

> <font color='green'>**Task 2**</font>
> Write unit/integration tests for the pipeline you created to ensure the component logic that you added works as expected

In [8]:
!mkdir -p src/tests

In [9]:
%%writefile src/tests/test_pipeline.py

import unittest
from pipeline import concat, reverse, basic_pipeline

class TestBasicPipeline(unittest.TestCase):
    # def setUp(self):
        # Get relevant component
    
    def test_concat_component(self):
        self.assertEqual(concat.python_func(3, 3), 6)

    def test_reverse(self):
        self.assertEqual(reverse.python_func("stressed")[1], "desserts")

    def test_pipeline(self):
        pass

if __name__ == '__main__':
    unittest.main()

Writing src/tests/test_pipeline.py


Using the next command, you can run the tests in the script using python `unittest` test runner. It discovers all the test files that start with `test_*`

You can also use other testing framework of your choice (e.g. `pytest`)

In [10]:
!PYTHONPATH=src python -m unittest discover -s src/tests/

...
----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK


### Create a script to submit your compile kubeflow pipeline (`pipeline.yaml`) to Vertex AI

In [14]:
%%writefile src/submit-pipeline.py
import os

from google.cloud import aiplatform
import google.auth

PROJECT_ID = os.getenv("PROJECT_ID")
if not PROJECT_ID:
    creds, PROJECT_ID = google.auth.default()

REGION = os.environ["REGION"]
BUCKET_NAME = os.environ["BUCKET_NAME"]
EXPERIMENT_NAME = os.environ["EXPERIMENT_NAME"]
ENDPOINT_NAME = os.environ["ENDPOINT_NAME"]
PIPELINE_NAME = os.environ["PIPELINE_NAME"]
ENABLE_CACHING = os.getenv("CACHE_PIPELINE", 'true').lower() in ('true', '1', 't')

aiplatform.init(project=PROJECT_ID, location=REGION)
sync_pipeline = os.getenv("SUBMIT_PIPELINE_SYNC", 'False').lower() in ('true', '1', 't')

job = aiplatform.PipelineJob(
    display_name=PIPELINE_NAME,
    template_path='pipeline.yaml',
    location=REGION,
    project=PROJECT_ID,
    enable_caching=ENABLE_CACHING,
    pipeline_root=f'gs://{BUCKET_NAME}'
)
print(f"Submitting pipeline {PIPELINE_NAME} in experiment {EXPERIMENT_NAME}.")
job.submit(experiment=EXPERIMENT_NAME)

if sync_pipeline:
    job.wait()

Overwriting src/submit-pipeline.py


Let's test this script in the Notebook. You can check the pipeline's status by clicking on the link printed by the script.

In [25]:
%set_env REGION=$REGION
%set_env BUCKET_NAME=$BUCKET_NAME
%set_env EXPERIMENT_NAME=$EXPERIMENT_NAME
%set_env PIPELINE_NAME=$PIPELINE_NAME
%set_env ENDPOINT_NAME=$ENDPOINT_NAME
%set_env SUBMIT_PIPELINE_SYNC=1

!python src/submit-pipeline.py

env: REGION=us-central1
env: BUCKET_NAME=mlops-coaching-2-vertex-ai-test-365213
env: EXPERIMENT_NAME=mlops-coaching-2-experiment
env: PIPELINE_NAME=mlops-coaching-2-pipeline
env: ENDPOINT_NAME=mlops-coaching-2-endpoint
env: SUBMIT_PIPELINE_SYNC=1
Submitting pipeline mlops-coaching-2-pipeline in experiment mlops-coaching-2-experiment.
Creating PipelineJob
PipelineJob created. Resource name: projects/370018035372/locations/us-central1/pipelineJobs/mlops-coaching-pipeline-20230210104322
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/370018035372/locations/us-central1/pipelineJobs/mlops-coaching-pipeline-20230210104322')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-coaching-pipeline-20230210104322?project=370018035372
Associating projects/370018035372/locations/us-central1/pipelineJobs/mlops-coaching-pipeline-20230210104322 to Experiment: mlops-coaching-2-experiment
PipelineJob proj

### Create a pipeline template in Artifact Registry from `pipeline.yaml`

A pipeline template is a resource that you can use to publish a workflow definition so that it can be reused multiple times, by a single user or by multiple users. This feature is [documented here](https://cloud.google.com/vertex-ai/docs/pipelines/create-pipeline-template).

The Kubeflow Pipelines SDK registry client is a new client interface that you can use with a compatible registry server, such as Artifact Registry, for version control of your Kubeflow Pipelines (KFP) templates.

> <font color='green'>**Task 3**</font>
>
> Create a Python script `src/create-pipeline-template.py` that uploads `pipeline.yaml` to the Vertex AI Pipeline registry.
>

In [None]:
!gcloud artifacts repositories create mlops2-repo --location=$REGION --repository-format=KFP

In [26]:
%%writefile src/create-pipeline-template.py
import os
import google.auth

from kfp.registry import RegistryClient

PROJECT_ID = os.getenv("PROJECT_ID")
if not PROJECT_ID:
    creds, PROJECT_ID = google.auth.default()
REGION = os.environ["REGION"]

## Your code goes below this line

client = RegistryClient(host=f"https://{REGION}-kfp.pkg.dev/{PROJECT_ID}/mlops2-repo")

template_name, template_version = client.upload_pipeline(
  file_name="pipeline.yaml",
  tags=["v1", "latest"]
)

Writing src/create-pipeline-template.py


In [27]:
!python src/create-pipeline-template.py

### Automate Kubeflow pipeline compilation, template generation, and execution through Cloud Build

Cloud Build is a service that executes your builds on Google Cloud. In this exercise, we want to use it to both compile and run your machine learning pipeline. For more information, please refer to the [Cloud Build documentation](https://cloud.google.com/build/docs/overview).

In [31]:
%%writefile src/cloudbuild.yaml
steps:
  # Install dependencies
  - name: 'python'
    entrypoint: 'pip'
    args: ["install", "-r", "requirements.txt", "--user"]

  # Compile pipeline
  - name: 'python'
    entrypoint: 'python'
    args: ['pipeline.py']
    id: 'compile'

  # Test the Pipeline Components 
  - name: 'python'
    entrypoint: 'python'
    args: ['-m', 'unittest', 'discover', 'tests/']
    id: 'test_pipeline'
    waitFor: ['compile']

  # Upload compiled pipeline to GCS
  - name: 'gcr.io/cloud-builders/gsutil'
    args: ['cp', 'pipeline.yaml', 'gs://${_BUCKET_NAME}']
    id: 'upload'
    waitFor: ['test_pipeline']
        
  # Run the Vertex AI Pipeline (synchronously for test/qa environment) with caching enabled.
  - name: 'python'
    id: 'test'
    entrypoint: 'python'
    env: ['BUCKET_NAME=${_BUCKET_NAME}', 'EXPERIMENT_NAME=qa-${_EXPERIMENT_NAME}', 'PIPELINE_NAME=${_PIPELINE_NAME}',
          'REGION=${_REGION}', 'ENDPOINT_NAME=qa-${_ENDPOINT_NAME}', 'SUBMIT_PIPELINE_SYNC=true', 'ENABLE_CACHING=true']
    args: ['submit-pipeline.py']

  # Create pipeline template and upload it to the artifact registry
  - name: 'python'
    id: 'template'
    entrypoint: 'python'
    env: ['REGION=${_REGION}']
    args: ['create-pipeline-template.py']
    
  # Run the Vertex AI Pipeline (asynchronously for prod environment) with caching disabled.
  # In a real production scenario, this would run in a different GCP project.
  - name: 'python'
    id: 'prod'
    entrypoint: 'python'
    env: ['BUCKET_NAME=${_BUCKET_NAME}', 'EXPERIMENT_NAME=prod-${_EXPERIMENT_NAME}', 'PIPELINE_NAME=${_PIPELINE_NAME}',
          'REGION=${_REGION}', 'ENDPOINT_NAME=prod-${_ENDPOINT_NAME}', 'SUBMIT_PIPELINE_SYNC=false', 'ENABLE_CACHING=false']
    args: ['submit-pipeline.py']

Overwriting src/cloudbuild.yaml


Cloud Build uses a special service account to execute builds on your behalf. When you enable the Cloud Build API on a Google Cloud project, the Cloud Build service account is automatically created and granted the Cloud Build Service Account role for the project. This role gives the service account permissions to perform several tasks, however you can grant more permissions to the service account to perform additional tasks. [This page](https://cloud.google.com/build/docs/securing-builds/configure-access-for-cloud-build-service-account) explains how to grant and revoke permissions to the Cloud Build service account.

For Cloud Build to be able to deploy your pipeline, you need to give its' service account `{PROJECT_NUMBER}@cloudbuild.gserviceaccount.com` the **Vertex AI User** and **Service Account User** role. Note that it may take up to 5 minutes until the new permissions propagate.

In [32]:
!gcloud builds submit ./src --config=src/cloudbuild.yaml --substitutions=_BUCKET_NAME=$BUCKET_NAME,_EXPERIMENT_NAME=$EXPERIMENT_NAME,_PIPELINE_NAME=$PIPELINE_NAME,_REGION=$REGION,_ENDPOINT_NAME=$ENDPOINT_NAME

Creating temporary tarball archive of 8 file(s) totalling 6.5 KiB before compression.
Uploading tarball of [./src] to [gs://vertex-ai-test-365213_cloudbuild/source/1676026036.796529-80b198a42908461cac0e8895c5563b37.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/vertex-ai-test-365213/locations/global/builds/599caf7a-a504-481d-bfb9-050191d41ee2].
Logs are available at [ https://console.cloud.google.com/cloud-build/builds/599caf7a-a504-481d-bfb9-050191d41ee2?project=370018035372 ].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "599caf7a-a504-481d-bfb9-050191d41ee2"

FETCHSOURCE
Fetching storage object: gs://vertex-ai-test-365213_cloudbuild/source/1676026036.796529-80b198a42908461cac0e8895c5563b37.tgz#1676026037036540
Copying gs://vertex-ai-test-365213_cloudbuild/source/1676026036.796529-80b198a42908461cac0e8895c5563b37.tgz#1676026037036540...
/ [1 files][  3.3 KiB/  3.3 KiB]                                                
Oper

### Create a git repository and trigger Cloud Build execution

Before you can create the repository here, please **enable Cloud Source Repositories API** in the Google Cloud console.

> <font color='green'>**Task 4**</font>
>
> Create a build trigger on the source repository that executes `cloudbuild.yaml`. Make sure to pass all **--substitutions**.
>

In [33]:
!gcloud source repos create $REPOSITORY_NAME

Created [mlops-coaching-2-vertex-ai-test-365213].


In [34]:
!gcloud beta builds triggers create cloud-source-repositories \
    --name=mlops2-source-trigger \
    --repo=$REPOSITORY_NAME \
    --branch-pattern=master \
    --build-config=cloudbuild.yaml \
    --substitutions=_BUCKET_NAME=$BUCKET_NAME,_EXPERIMENT_NAME=$EXPERIMENT_NAME,_PIPELINE_NAME=$PIPELINE_NAME,_REGION=$REGION,_ENDPOINT_NAME=$ENDPOINT_NAME

Created [https://cloudbuild.googleapis.com/v1/projects/vertex-ai-test-365213/locations/global/triggers/96b91f37-6821-4238-b0cb-d29db32dc931].
NAME                   CREATE_TIME                STATUS
mlops2-source-trigger  2023-02-10T10:49:24+00:00


In [35]:
!gcloud source repos clone $REPOSITORY_NAME --project=$PROJECT_ID

Cloning into '/home/jupyter/dev-ml-coaching-2023-mlops/ml_coaching_part2/mlops-coaching-2-vertex-ai-test-365213'...
Project [vertex-ai-test-365213] repository [mlops-coaching-2-vertex-ai-test-365213] was cloned to [/home/jupyter/dev-ml-coaching-2023-mlops/ml_coaching_part2/mlops-coaching-2-vertex-ai-test-365213].


In [36]:
!cp -av src/* $REPOSITORY_NAME/

'src/__pycache__' -> 'mlops-coaching-2-vertex-ai-test-365213/__pycache__'
'src/__pycache__/pipeline.cpython-37.pyc' -> 'mlops-coaching-2-vertex-ai-test-365213/__pycache__/pipeline.cpython-37.pyc'
'src/cloudbuild.yaml' -> 'mlops-coaching-2-vertex-ai-test-365213/cloudbuild.yaml'
'src/create-pipeline-template.py' -> 'mlops-coaching-2-vertex-ai-test-365213/create-pipeline-template.py'
'src/pipeline.py' -> 'mlops-coaching-2-vertex-ai-test-365213/pipeline.py'
'src/requirements.txt' -> 'mlops-coaching-2-vertex-ai-test-365213/requirements.txt'
'src/submit-pipeline.py' -> 'mlops-coaching-2-vertex-ai-test-365213/submit-pipeline.py'
'src/tests' -> 'mlops-coaching-2-vertex-ai-test-365213/tests'
'src/tests/test_pipeline.py' -> 'mlops-coaching-2-vertex-ai-test-365213/tests/test_pipeline.py'
'src/tests/__pycache__' -> 'mlops-coaching-2-vertex-ai-test-365213/tests/__pycache__'
'src/tests/__pycache__/test_pipeline.cpython-37.pyc' -> 'mlops-coaching-2-vertex-ai-test-365213/tests/__pycache__/test_pipelin

In [37]:
!cd $REPOSITORY_NAME && git add .

In [38]:
!git config --global user.email "{YOUR_EMAIl}"
!git config --global user.name "{YOUR_NAME}"

In [39]:
!cd $REPOSITORY_NAME && git commit -a -m "initial commit"

[master (root-commit) eab3406] initial commit
 8 files changed, 153 insertions(+)
 create mode 100644 __pycache__/pipeline.cpython-37.pyc
 create mode 100644 cloudbuild.yaml
 create mode 100644 create-pipeline-template.py
 create mode 100644 pipeline.py
 create mode 100644 requirements.txt
 create mode 100644 submit-pipeline.py
 create mode 100644 tests/__pycache__/test_pipeline.cpython-37.pyc
 create mode 100644 tests/test_pipeline.py


In [40]:
!cd $REPOSITORY_NAME && git push -u origin master

Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 4 threads
Compressing objects: 100% (13/13), done.
Writing objects: 100% (13/13), 3.89 KiB | 1.94 MiB/s, done.
Total 13 (delta 0), reused 0 (delta 0), pack-reused 0
To https://source.developers.google.com/p/vertex-ai-test-365213/r/mlops-coaching-2-vertex-ai-test-365213
 * [new branch]      master -> master
branch 'master' set up to track 'origin/master'.


### Git - Cloud Build integration
You will notice that a new Cloud Build pipeline was triggered from the commit you pushed. Any new commit to the repo will trigger a new pipeline being tested and deployed!
Visit this link to see the [Cloud Build](https://console.cloud.google.com/cloud-build/builds) pipeline.

## Schedule pipeline execution

> <font color='green'>**Task 5**</font>
>
> Schedule the execution of the pipeline such that it runs every day.
>

### Schedule Method 1: Cloud Scheduler → Pub/Sub → Cloud Functions

Using this method, you create a Pub/Sub topic that triggers a Cloud Function that triggers a Vertex AI Pipeline. Note that the service account the Cloud Function runs as needs access to Cloud Storage and Vertex AI.

### 1. Create a Pub/Sub topic
The `trigger-mlops-2-pipeline` is the name of the new topic you are creating:

In [41]:
!gcloud pubsub topics create trigger-mlops-2-pipeline

Created topic [projects/vertex-ai-test-365213/topics/trigger-mlops-coaching-2-pipeline].


### 2. Deploy the cloud function
This Cloud Funciton will invoked by Pub/Sub, and will trigger the Vertex AI Piepline

In [42]:
!mkdir -p cf-trigger

In [43]:
%%writefile cf-trigger/requirements.txt
kfp==2.0.0b9
pytest==7.2.0
google-cloud-aiplatform==1.20.0
pytz==2022.7

Writing cf-trigger/requirements.txt


In [62]:
%%writefile cf-trigger/main.py

import os
import base64
import json
from google.cloud import aiplatform

REGION = os.environ["REGION"]
PIPELINE_NAME = os.environ["PIPELINE_NAME"]
PROJECT_ID = os.environ["PROJECT_ID"]
BUCKET_NAME = os.environ["BUCKET_NAME"]


def subscribe(event, context):
    payload_message = base64.b64decode(event['data']).decode('utf-8')
    print(payload_message)
    payload_json = json.loads(payload_message)
    pipeline_name = payload_json['pipeline_name']

    aiplatform.init(project=PROJECT_ID, location=REGION)
    
    job = aiplatform.PipelineJob(
        display_name=PIPELINE_NAME,
        template_path=f'https://{REGION}-kfp.pkg.dev/{PROJECT_ID}/mlops2-repo/{pipeline_name}',
        location=REGION,
        project=PROJECT_ID,
        enable_caching=False,
        pipeline_root=f'gs://{BUCKET_NAME}'
    )

    job.submit()

Overwriting cf-trigger/main.py


In [111]:
!gcloud functions deploy mlops-2-function \
--source=./cf-trigger \
--entry-point=subscribe \
--trigger-topic trigger-mlops-2-pipeline \
--runtime python37 \
--ingress-settings internal-and-gclb \
--set-env-vars REGION=$REGION,PIPELINE_NAME=$PIPELINE_NAME,PROJECT_ID=$PROJECT_ID,BUCKET_NAME=$BUCKET_NAME \
--service-account $PROJECT_NUM-compute@developer.gserviceaccount.com

Deploying function (may take a while - up to 2 minutes)...⠹                    
For Cloud Build Logs, visit: https://console.cloud.google.com/cloud-build/builds;region=us-central1/794c626f-5279-4266-8ec0-58d81dee7628?project=370018035372
Deploying function (may take a while - up to 2 minutes)...done.                
availableMemoryMb: 256
buildId: 794c626f-5279-4266-8ec0-58d81dee7628
buildName: projects/370018035372/locations/us-central1/builds/794c626f-5279-4266-8ec0-58d81dee7628
dockerRegistry: CONTAINER_REGISTRY
entryPoint: subscribe
environmentVariables:
  BUCKET_NAME: mlops-coaching-2-vertex-ai-test-365213
  PIPELINE_NAME: mlops-coaching-2-pipeline
  PROJECT_ID: vertex-ai-test-365213
  REGION: us-central1
eventTrigger:
  eventType: google.pubsub.topic.publish
  failurePolicy: {}
  resource: projects/vertex-ai-test-365213/topics/trigger-mlops-coaching-2-pipeline
  service: pubsub.googleapis.com
ingressSettings: ALLOW_INTERNAL_AND_GCLB
labels:
  deployment-tool: cli-gcloud
maxInstan

### 3. Create Cloud Scheduler
A cloud scheduler to publish a new message to Pub/Sub:

In [112]:
import json
parameters = {"pipeline_name": "mlops-workshop-pipeline/latest"}
message_body = json.dumps(parameters)

In [113]:
message_body

'{"pipeline_name": "mlops-coaching-pipeline/latest"}'

In [115]:
!gcloud pubsub topics publish trigger-mlops-2-pipeline \
  --message='{message_body}'

messageIds:
- '6851316306787686'


In [120]:
!gcloud scheduler jobs create pubsub mlops-workshop-training-pipleline \
--schedule "35 11 * * *" \
--topic=trigger-mlops-2-pipeline \
--location=us-central1 \
--message-body='{message_body}'

name: projects/vertex-ai-test-365213/locations/us-central1/jobs/mlops-coaching-training-pipleline
pubsubTarget:
  data: eyJwaXBlbGluZV9uYW1lIjogIm1sb3BzLWNvYWNoaW5nLXBpcGVsaW5lL2xhdGVzdCJ9
  topicName: projects/vertex-ai-test-365213/topics/trigger-mlops-coaching-2-pipeline
retryConfig:
  maxBackoffDuration: 3600s
  maxDoublings: 16
  maxRetryDuration: 0s
  minBackoffDuration: 5s
schedule: 35 11 * * *
state: ENABLED
timeZone: Etc/UTC
userUpdateTime: '2023-02-10T12:12:34Z'


### Trigger pipeline from scheduler
Now that you deployed your Cloud Scheduler, Pubsub Topic and Cloud Function you can visit the [Cloud Scheduler page](https://console.cloud.google.com/cloudscheduler).to see the scheduled pipeline trigger. You can also trigger this from the console by clicking on the 3 dots on the right under `Actions` and then click on `Force job run`.

You can see the triggered pipeline directly from the [Vertex Pipelines page](https://console.cloud.google.com/vertex-ai/pipelines/runs).

Additionally, the logs produced when triggering the pipeline can be inspected in the [Cloud Function page](https://console.cloud.google.com/functions/details/us-central1/mlops-2-function?env=gen1&tab=logs)

### Coming soon: Vertex AI Scheduler (in private preview)


The Vertex AI Schedule Service API is a new resource that lets you schedule ad hoc or recurring Vertex AI Pipeline runs. The goal of this service is to make scheduling a one off or recurring pipeline run simple, such that any Vertex AI user can quickly understand and leverage schedules to implement naive continuous training in their business.

This new API will replace our previous guidance of using Cloud Scheduler with Cloud Functions to schedule a pipeline run. It may support additional Vertex AI resources in the future.

To use the Schedules API during the Private Preview release, your project ID must be allowlisted in the `SCHEDULED_RUNS_TRUSTED_TESTER` group, you can use projects that you previously signed up with—for example. If you’re not in the trusted testers group, sign up by using this [sign-up form](https://docs.google.com/forms/d/e/1FAIpQLScDxABxIvqjeM_279dwTMmVfFBJD7qmW2leyU_ZBTYutJ62uA/viewform?usp=sf_link). After signing up, wait for a confirmation from the Vertex AI Pipelines team before continuing.


Note that **Scheduler is currently only available via its REST interface and not the Vertex AI SDK**. We don't require the usage of the scheduler for this workshop session.

## Congratulations, you completed the last notebook!