In [51]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https:/www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Automated MLOps pipeline build, testing and deployment

In the previous notebook, you created a machine learning pipeline to train a model. In this session, it's all about automating the training and deployment of this model. Hence, the objective this notebook is to:

1. Write a pipeline into a Python file that can be compiled into YAML in an automated fashion.
2. Define some dummy unit tests
3. Write a script to deploy a compiled Kubeflow pipeline to Vertex AI.
4. Use Code Build (CI/CD) to compile, test, and run your Kubeflow pipeline.


In [2]:
GCP_PROJECTS = !gcloud config get-value project
PROJECT_ID = GCP_PROJECTS[0]
BUCKET_NAME = f"{PROJECT_ID}-mlops" 
REGION = "us-central1"

# Experiments
EXPERIMENT_NAME = "test-experiment"
PIPELINE_NAME = "mlops-pipeline-prod"

### Create a script containing your Vertex AI/Kubeflow Pipeline to compile the pipeline into `pipeline.json`

> <font color='green'>**Task 1**</font>
>
> Create a Python script `src/pipeline.py` that creates a file name `pipeline.json` from a dummy Vertex pipeline
>

In [3]:
!mkdir -p src

In [4]:
%%writefile src/requirements.txt
kfp==1.8.18
pytest==7.2.0
pytz==2022.7
google-cloud-aiplatform==1.20.0
google-api-core==2.10.2
google-auth==1.35.0
google-cloud-bigquery==1.20.0
google-cloud-core==1.7.3
google-cloud-resource-manager==1.6.3
google-cloud-storage==2.2.1

Writing src/requirements.txt


In [27]:
%%writefile src/pipeline.py

from typing import NamedTuple

from kfp.v2.dsl import component, pipeline
import kfp.v2.compiler as compiler


@component() 
def concat(a: str, b: str) -> str:
    return a + b

@component()
def reverse(a: str) -> NamedTuple("outputs", [("before", str), ("after", str)]):
    return a, a[::-1]

@pipeline(name="mlops-dummy-pipeline")
def basic_pipeline(a: str='stres', b: str='sed'):
    concat_task = concat(a=a, b=b)
    reverse_task = reverse(a=concat_task.output)

if __name__ == '__main__':
    compiler.Compiler().compile(pipeline_func=basic_pipeline, package_path="pipeline.json")

Overwriting src/pipeline.py


Using the next command, you can test the materialized pipeline generated by your script. You can view the output in a file named `pipeline.json`.

In [28]:
!python src/pipeline.py



In [29]:
!head -n20 pipeline.json

{
  "pipelineSpec": {
    "components": {
      "comp-concat": {
        "executorLabel": "exec-concat",
        "inputDefinitions": {
          "parameters": {
            "a": {
              "type": "STRING"
            },
            "b": {
              "type": "STRING"
            }
          }
        },
        "outputDefinitions": {
          "parameters": {
            "Output": {
              "type": "STRING"
            }


### Test the Pipeline

> <font color='green'>**Task 2**</font>
> Write unit/integration tests for the pipeline you created to ensure the component logic that you added works as expected

In [30]:
!mkdir -p src/tests

In [31]:
%%writefile src/tests/test_pipeline.py

import unittest
from pipeline import concat, reverse, basic_pipeline

class TestBasicPipeline(unittest.TestCase):
    # def setUp(self):
        # Get relevant component
    
    def test_concat_component(self):
        self.assertEqual(concat.python_func(3, 3), 6)

    def test_reverse(self):
        self.assertEqual(reverse.python_func("stressed")[1], "desserts")

    def test_pipeline(self):
        pass

if __name__ == '__main__':
    unittest.main()

Overwriting src/tests/test_pipeline.py


Using the next command, you can run the tests in the script using python `unittest` test runner. It discovers all the test files that start with `test_*`

You can also use other testing framework of your choice (e.g. `pytest`)

In [32]:
!PYTHONPATH=src python -m unittest discover -s src/tests/

...
----------------------------------------------------------------------
Ran 3 tests in 0.000s

OK


### Create a script to submit your compile kubeflow pipeline (`pipeline.json`) to Vertex AI

In [35]:
%%writefile src/submit-pipeline.py
import os

from google.cloud import aiplatform
import google.auth

PROJECT_ID = os.getenv("PROJECT_ID")
if not PROJECT_ID:
    creds, PROJECT_ID = google.auth.default()

REGION = os.environ["REGION"]
BUCKET_NAME = os.environ["BUCKET_NAME"]
PIPELINE_NAME = os.environ["PIPELINE_NAME"]
EXPERIMENT_NAME = os.environ.get("EXPERIMENT_NAME", "dummy-experiment")
ENDPOINT_NAME = os.environ.get("ENDPOINT_NAME","dummy-endpoint")

aiplatform.init(project=PROJECT_ID, location=REGION)
sync_pipeline = os.getenv("SUBMIT_PIPELINE_SYNC", 'False').lower() in ('true', '1', 't')

job = aiplatform.PipelineJob(
    display_name=PIPELINE_NAME,
    template_path='pipeline.json',
    location=REGION,
    project=PROJECT_ID,
    enable_caching=True,
    pipeline_root=f'gs://{BUCKET_NAME}',
)
print(f"Submitting pipeline {PIPELINE_NAME} in experiment {EXPERIMENT_NAME}.")
job.submit(experiment=EXPERIMENT_NAME)

if sync_pipeline:
    job.wait()

Overwriting src/submit-pipeline.py


Let's test this script in the Notebook. You can check the pipeline's status by clicking on the link printed by the script.

In [None]:
%set_env REGION=$REGION
%set_env BUCKET_NAME=$BUCKET_NAME
%set_env EXPERIMENT_NAME=$EXPERIMENT_NAME
%set_env PIPELINE_NAME=$PIPELINE_NAME
%set_env ENDPOINT_NAME=$ENDPOINT_NAME
%set_env SUBMIT_PIPELINE_SYNC=1

!python src/submit-pipeline.py

### Automate Kubeflow pipeline compilation, template generation, and execution through Cloud Build

Cloud Build is a service that executes your builds on Google Cloud. In this exercise, we want to use it to both compile and run your machine learning pipeline. For more information, please refer to the [Cloud Build documentation](https://cloud.google.com/build/docs/overview).

In [18]:
%%writefile src/cloudbuild.yaml
steps:
  # Install dependencies
  - name: 'python'
    entrypoint: 'pip'
    args: ["install", "-r", "requirements.txt", "--user"]

  # Compile pipeline
  - name: 'python'
    entrypoint: 'python'
    args: ['pipeline.py']
    id: 'compile'

  # Test the Pipeline Components 
  - name: 'python'
    entrypoint: 'python'
    args: ['-m', 'unittest', 'discover', 'tests/']
    id: 'test_pipeline'
    waitFor: ['compile']

  # Upload compiled pipeline to GCS
  - name: 'gcr.io/cloud-builders/gsutil'
    args: ['cp', 'pipeline.json', 'gs://${_BUCKET_NAME}']
    id: 'upload'
    waitFor: ['test_pipeline']
        
  # Run the Vertex AI Pipeline (synchronously for test/qa environment).
  - name: 'python'
    id: 'test'
    entrypoint: 'python'
    env: ['BUCKET_NAME=${_BUCKET_NAME}', 'EXPERIMENT_NAME=qa-${_EXPERIMENT_NAME}', 'PIPELINE_NAME=${_PIPELINE_NAME}',
          'REGION=${_REGION}', 'ENDPOINT_NAME=qa-${_ENDPOINT_NAME}', 'SUBMIT_PIPELINE_SYNC=true']
    args: ['submit-pipeline.py']
    
  # Run the Vertex AI Pipeline (asynchronously for prod environment). In a real production scenario, this would run in a different GCP project.
  - name: 'python'
    id: 'prod'
    entrypoint: 'python'
    env: ['BUCKET_NAME=${_BUCKET_NAME}', 'EXPERIMENT_NAME=prod-${_EXPERIMENT_NAME}', 'PIPELINE_NAME=${_PIPELINE_NAME}',
          'REGION=${_REGION}', 'ENDPOINT_NAME=prod-${_ENDPOINT_NAME}', 'SUBMIT_PIPELINE_SYNC=false']
    args: ['submit-pipeline.py']
    

Writing src/cloudbuild.yaml


Cloud Build uses a special service account to execute builds on your behalf. When you enable the Cloud Build API on a Google Cloud project, the Cloud Build service account is automatically created and granted the Cloud Build Service Account role for the project. This role gives the service account permissions to perform several tasks, however you can grant more permissions to the service account to perform additional tasks. [This page](https://cloud.google.com/build/docs/securing-builds/configure-access-for-cloud-build-service-account) explains how to grant and revoke permissions to the Cloud Build service account.

For Cloud Build to be able to deploy your pipeline, you need to give its' service account `{PROJECT_NUMBER}@cloudbuild.gserviceaccount.com` the **Vertex AI User** and **Service Account User** role. (Step already done as part of the workshop setup)

Now you are ready to trigger this pipeline. This CICD pipeline can be embedded in your repository and triggered when a file changes with any
git provider, see more about triggering [here](https://cloud.google.com/build/docs/automating-builds/create-manage-triggers). 

When you submit your pipeline you can also see the status of it by clicking on the link relative to the Logs (Logs are available at [..)

In [37]:
!gcloud builds submit ./src --config=src/cloudbuild.yaml --substitutions=_BUCKET_NAME=$BUCKET_NAME,_EXPERIMENT_NAME=$EXPERIMENT_NAME,_PIPELINE_NAME=$PIPELINE_NAME,_REGION=$REGION,_ENDPOINT_NAME=$ENDPOINT_NAME

Creating temporary tarball archive of 7 file(s) totalling 5.6 KiB before compression.
Uploading tarball of [./src] to [gs://vertex-ai-test-365213_cloudbuild/source/1674498988.849607-f2a74d912ba04da58ac0a1bccaea3928.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/vertex-ai-test-365213/locations/global/builds/367b1e8e-14d6-4228-8ad5-48e3dbb7683a].
Logs are available at [ https://console.cloud.google.com/cloud-build/builds/367b1e8e-14d6-4228-8ad5-48e3dbb7683a?project=370018035372 ].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "367b1e8e-14d6-4228-8ad5-48e3dbb7683a"

FETCHSOURCE
Fetching storage object: gs://vertex-ai-test-365213_cloudbuild/source/1674498988.849607-f2a74d912ba04da58ac0a1bccaea3928.tgz#1674498989124292
Copying gs://vertex-ai-test-365213_cloudbuild/source/1674498988.849607-f2a74d912ba04da58ac0a1bccaea3928.tgz#1674498989124292...
/ [1 files][  3.0 KiB/  3.0 KiB]                                                
Oper