# 04 - Test and Deploy Training Pipeline to Vertex Pipelines

The purpose of this notebook is to test, deploy, and run the `TFX` pipeline on `Vertex Pipelines`. The notebook covers the following tasks:
1. Run the tests locally.
2. Run the pipeline using `Vertex Pipelines`
3. Execute the pipeline deployment `CI/CD` steps using `Cloud Build`.

## Setup

### Import libraries

In [55]:
%load_ext autoreload
%autoreload 2

import os
import kfp
import tfx.v1 as tfx

print("Tensorflow Version:", tfx.__version__)
print("KFP Version:", kfp.__version__)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
Tensorflow Version: 1.8.0
KFP Version: 1.8.12


### Setup Google Cloud project

In [56]:
PROJECT = 'pbalm-cxb-aa'
REGION = 'europe-west4'
CF_REGION = 'europe-west1' # No Cloud Functions in europe-west4
BUCKET =  PROJECT + '-eu'
SERVICE_ACCOUNT = "188940921537-compute@developer.gserviceaccount.com"

if PROJECT == "" or PROJECT is None or PROJECT == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT = shell_output[0]
    
if SERVICE_ACCOUNT == "" or SERVICE_ACCOUNT is None or SERVICE_ACCOUNT == "[your-service-account]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.account)' 2>/dev/null
    SERVICE_ACCOUNT = shell_output[0]
    
if BUCKET == "" or BUCKET is None or BUCKET == "[your-bucket-name]":
    # Get your bucket name to GCP project id
    BUCKET = PROJECT
    # Try to create the bucket if it doesn't exists
    ! gsutil mb -l $REGION gs://$BUCKET
    print("")
    
print("Project ID:", PROJECT)
print("Region:", REGION)
print("Bucket name:", BUCKET)
print("Service Account:", SERVICE_ACCOUNT)

Project ID: pbalm-cxb-aa
Region: europe-west4
Bucket name: pbalm-cxb-aa-eu
Service Account: 188940921537-compute@developer.gserviceaccount.com


### Set configurations

In [57]:
BQ_LOCATION = 'EU'
BQ_DATASET_NAME = 'vertex_eu' # Change to your BQ dataset name.
BQ_TABLE_NAME = 'creditcards_ml'

VERSION = 'v02'
DATASET_DISPLAY_NAME = 'creditcards'
MODEL_DISPLAY_NAME = f'{DATASET_DISPLAY_NAME}-classifier-{VERSION}'
PIPELINE_NAME = f'{MODEL_DISPLAY_NAME}-train-pipeline'

CICD_IMAGE_NAME = 'cicd:latest'
CICD_IMAGE_URI = f"{REGION}-docker.pkg.dev/{PROJECT}/creditcards/{CICD_IMAGE_NAME}"

DATAFLOW_REGION = 'europe-west4'
DATAFLOW_SERVICE_ACCOUNT = SERVICE_ACCOUNT
DATAFLOW_SUBNETWORK = f'https://www.googleapis.com/compute/v1/projects/{PROJECT}/regions/{REGION}/subnetworks/default'


In [58]:
!rm -r src/raw_schema/.ipynb_checkpoints/

rm: cannot remove 'src/raw_schema/.ipynb_checkpoints/': No such file or directory


## 1. Run the CICD steps locally

### Set pipeline configurations for the local run

In [59]:
os.environ["DATASET_DISPLAY_NAME"] = DATASET_DISPLAY_NAME
os.environ["MODEL_DISPLAY_NAME"] =  MODEL_DISPLAY_NAME
os.environ["PIPELINE_NAME"] = PIPELINE_NAME
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = REGION
os.environ["BQ_LOCATION"] = BQ_LOCATION
os.environ["BQ_DATASET_NAME"] = BQ_DATASET_NAME
os.environ["BQ_TABLE_NAME"] = BQ_TABLE_NAME
os.environ["GCS_LOCATION"] = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/e2e_tests"
os.environ["TRAIN_LIMIT"] = "1000"
os.environ["TEST_LIMIT"] = "100"
os.environ["UPLOAD_MODEL"] = "1"
os.environ["ACCURACY_THRESHOLD"] = "-0.1"    # NB Negative accuracy threshold makes no sense - allows everything
os.environ["BEAM_RUNNER"] = "DirectRunner"
os.environ["TRAINING_RUNNER"] = "local"

In [60]:
from src.tfx_pipelines import config
import importlib
importlib.reload(config)

for key, value in config.__dict__.items():
    if key.isupper(): print(f'{key}: {value}')

PROJECT: pbalm-cxb-aa
REGION: europe-west4
GCS_LOCATION: gs://pbalm-cxb-aa-eu/creditcards/e2e_tests
ARTIFACT_STORE_URI: gs://pbalm-cxb-aa-eu/creditcards/e2e_tests/tfx_artifacts
MODEL_REGISTRY_URI: gs://pbalm-cxb-aa-eu/creditcards/e2e_tests/model_registry
DATASET_DISPLAY_NAME: creditcards
MODEL_DISPLAY_NAME: creditcards-classifier-v02
PIPELINE_NAME: creditcards-classifier-v02-train-pipeline
ML_USE_COLUMN: ml_use
EXCLUDE_COLUMNS: trip_start_timestamp
TRAIN_LIMIT: 1000
TEST_LIMIT: 100
SERVE_LIMIT: 0
NUM_TRAIN_SPLITS: 4
NUM_EVAL_SPLITS: 1
ACCURACY_THRESHOLD: -0.1
USE_KFP_SA: False
TFX_IMAGE_URI: europe-west4-docker.pkg.dev/pbalm-cxb-aa/creditcards/vertex:tfx-1.8
BEAM_RUNNER: DirectRunner
SERVICE_ACCOUNT: 188940921537-compute@developer.gserviceaccount.com
SUBNETWORK: https://www.googleapis.com/compute/v1/projects/pbalm-cxb-aa/regions/europe-west4/subnetworks/default
BEAM_DIRECT_PIPELINE_ARGS: ['--project=pbalm-cxb-aa', '--temp_location=gs://pbalm-cxb-aa-eu/creditcards/e2e_tests/temp']
BEAM_

### Run unit tests

In [61]:
!py.test src/tests/datasource_utils_tests.py -s

platform linux -- Python 3.7.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jupyter/mlops-with-vertex-ai
plugins: anyio-3.6.1
collected 2 items                                                              [0m[1m

src/tests/datasource_utils_tests.py BigQuery Source: pbalm-cxb-aa.vertex_eu.creditcards_ml
[32m.[0mBigQuery Source: pbalm-cxb-aa.vertex_eu.creditcards_ml
[32m.[0m



In [62]:
!py.test src/tests/model_tests.py -s

platform linux -- Python 3.7.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jupyter/mlops-with-vertex-ai
plugins: anyio-3.6.1
collected 3 items                                                              [0m[1m

src/tests/model_tests.py [32m.[0m[33ms[0m2022-07-04 09:49:18.080678: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-04 09:49:18.082534: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to      

### Run e2e pipeline test

In [64]:
!py.test src/tests/pipeline_deployment_tests.py::test_e2e_pipeline -s

platform linux -- Python 3.7.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /home/jupyter/mlops-with-vertex-ai
plugins: anyio-3.6.1
collected 1 item                                                               [0m[1m

src/tests/pipeline_deployment_tests.py upload_model: 1
Pipeline e2e test artifacts stored in: gs://pbalm-cxb-aa-eu/creditcards/e2e_tests
ML metadata store is ready.
Excluding no splits because exclude_splits is not set.
Excluding no splits because exclude_splits is not set.
Labels for model: {"dataset_name": "creditcards", "pipeline_name": "creditcards-classifier-v02-train-pipeline", "pipeline_root": "gs://pbalm-cxb-aa-eu/creditcards/e2e_tests/tfx_artifacts/credi"}
Pipeline components: ['HyperparamsGen', 'TrainDataGen', 'TestDataGen', 'StatisticsGen', 'SchemaImporter', 'ExampleValidator', 'DataTransformer', 'WarmstartModelResolver', 'ModelTrainer', 'BaselineModelResolver', 'ModelEvaluator', 'GcsModelPusher', 'VertexUploader']
Beam pipeline args: ['--project=pbalm-cxb-aa', '--

## 2. Run the training pipeline using Vertex Pipelines



In [17]:
IMG_VERSION='tfx-1.8'

### Set the pipeline configurations for the Vertex AI run

In [18]:
os.environ["DATASET_DISPLAY_NAME"] = DATASET_DISPLAY_NAME
os.environ["MODEL_DISPLAY_NAME"] = MODEL_DISPLAY_NAME
os.environ["PIPELINE_NAME"] = PIPELINE_NAME
os.environ["PROJECT"] = PROJECT
os.environ["REGION"] = DATAFLOW_REGION
os.environ["GCS_LOCATION"] = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}"
os.environ["TRAIN_LIMIT"] = "85000"
os.environ["TEST_LIMIT"] = "15000"
os.environ["BEAM_RUNNER"] = "DataflowRunner"
os.environ["TRAINING_RUNNER"] = "vertex"
os.environ["TFX_IMAGE_URI"] = f"{REGION}-docker.pkg.dev/{PROJECT}/{DATASET_DISPLAY_NAME}/vertex:{IMG_VERSION}"
os.environ["ENABLE_CACHE"] = "1"
os.environ["SUBNETWORK"] = DATAFLOW_SUBNETWORK
os.environ["SERVICE_ACCOUNT"] = DATAFLOW_SERVICE_ACCOUNT

In [19]:
from src.tfx_pipelines import config
import importlib
importlib.reload(config)

for key, value in config.__dict__.items():
    if key.isupper(): print(f'{key}: {value}')

PROJECT: pbalm-cxb-aa
REGION: europe-west4
GCS_LOCATION: gs://pbalm-cxb-aa-eu/creditcards
ARTIFACT_STORE_URI: gs://pbalm-cxb-aa-eu/creditcards/tfx_artifacts
MODEL_REGISTRY_URI: gs://pbalm-cxb-aa-eu/creditcards/e2e_tests/model_registry
DATASET_DISPLAY_NAME: creditcards
MODEL_DISPLAY_NAME: creditcards-classifier-v02
PIPELINE_NAME: creditcards-classifier-v02-train-pipeline
ML_USE_COLUMN: ml_use
EXCLUDE_COLUMNS: trip_start_timestamp
TRAIN_LIMIT: 85000
TEST_LIMIT: 15000
SERVE_LIMIT: 0
NUM_TRAIN_SPLITS: 4
NUM_EVAL_SPLITS: 1
ACCURACY_THRESHOLD: -0.1
USE_KFP_SA: False
TFX_IMAGE_URI: europe-west4-docker.pkg.dev/pbalm-cxb-aa/creditcards/vertex:tfx-1.8
BEAM_RUNNER: DataflowRunner
SERVICE_ACCOUNT: 188940921537-compute@developer.gserviceaccount.com
SUBNETWORK: https://www.googleapis.com/compute/v1/projects/pbalm-cxb-aa/regions/europe-west4/subnetworks/default
BEAM_DIRECT_PIPELINE_ARGS: ['--project=pbalm-cxb-aa', '--temp_location=gs://pbalm-cxb-aa-eu/creditcards/temp']
BEAM_DATAFLOW_PIPELINE_ARGS: [

### Build the ML container image

This is the `TFX` runtime environment for the training pipeline steps.

In [13]:
!echo $TFX_IMAGE_URI

europe-west4-docker.pkg.dev/pbalm-cxb-aa/creditcards/vertex:tfx-1.8


In [28]:
!cp build/Dockerfile.vertex Dockerfile
!gcloud builds submit --tag $TFX_IMAGE_URI . --timeout=15m --machine-type=e2-highcpu-8

Creating temporary tarball archive of 73 file(s) totalling 2.0 MiB before compression.
Uploading tarball of [.] to [gs://pbalm-cxb-aa_cloudbuild/source/1656506421.629663-322d1986978746c396d705a6e3809069.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/pbalm-cxb-aa/locations/global/builds/1155e839-2aa0-4f21-924d-b6c9e9e0738a].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/1155e839-2aa0-4f21-924d-b6c9e9e0738a?project=188940921537].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "1155e839-2aa0-4f21-924d-b6c9e9e0738a"

FETCHSOURCE
Fetching storage object: gs://pbalm-cxb-aa_cloudbuild/source/1656506421.629663-322d1986978746c396d705a6e3809069.tgz#1656506422752768
Copying gs://pbalm-cxb-aa_cloudbuild/source/1656506421.629663-322d1986978746c396d705a6e3809069.tgz#1656506422752768...
/ [1 files][459.4 KiB/459.4 KiB]                                                
Operation completed over 1 objects/459.4 KiB.

### Compile pipeline

In [42]:
from src.tfx_pipelines import runner

pipeline_definition_file = f'{config.PIPELINE_NAME}.json'
pipeline_definition = runner.compile_training_pipeline(pipeline_definition_file)

Labels for model: {"dataset_name": "creditcards", "pipeline_name": "creditcards-classifier-v02-train-pipeline", "pipeline_root": "gs://pbalm-cxb-aa-eu/creditcards/tfx_artifacts/creditcards-cla"}
running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying etl.py -> build/lib
copying transformations.py -> build/lib
installing to /tmp/tmpbwxcsesb
running install
running install_lib
copying build/lib/transformations.py -> /tmp/tmpbwxcsesb
copying build/lib/etl.py -> /tmp/tmpbwxcsesb
running install_egg_info
running egg_info
creating tfx_user_code_DataTransformer.egg-info
writing tfx_user_code_DataTransformer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_DataTransformer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_DataTransformer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_DataTransformer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_DataTransformer.egg-info/SOURCES.txt'
writing manifest fi



running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying trainer.py -> build/lib
copying runner.py -> build/lib
copying model.py -> build/lib
copying defaults.py -> build/lib
copying exporter.py -> build/lib
copying data.py -> build/lib
copying task.py -> build/lib
installing to /tmp/tmp0no4trhs
running install
running install_lib
copying build/lib/trainer.py -> /tmp/tmp0no4trhs
copying build/lib/model.py -> /tmp/tmp0no4trhs
copying build/lib/runner.py -> /tmp/tmp0no4trhs
copying build/lib/task.py -> /tmp/tmp0no4trhs
copying build/lib/data.py -> /tmp/tmp0no4trhs
copying build/lib/defaults.py -> /tmp/tmp0no4trhs
copying build/lib/exporter.py -> /tmp/tmp0no4trhs
running install_egg_info
running egg_info
creating tfx_user_code_ModelTrainer.egg-info
writing tfx_user_code_ModelTrainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_ModelTrainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_ModelTrainer.egg-info/top_lev



In [43]:
from google.cloud import aiplatform as vertex_ai
vertex_ai.init(project=PROJECT, location=REGION)

In [44]:
PIPELINES_STORE = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/compiled_pipelines/"
!gsutil cp {pipeline_definition_file} {PIPELINES_STORE}

Copying file://creditcards-classifier-v02-train-pipeline.json [Content-Type=application/json]...
/ [1 files][ 31.6 KiB/ 31.6 KiB]                                                
Operation completed over 1 objects/31.6 KiB.                                     


### Submit run to Vertex Pipelines

In [45]:
from google.cloud.aiplatform import pipeline_jobs
    
job = pipeline_jobs.PipelineJob(template_path = pipeline_definition_file,
                                display_name=DATASET_DISPLAY_NAME,
                                #enable_caching=False,
                                parameter_values={
                                    'learning_rate': 0.003,
                                    'batch_size': 512,
                                    'hidden_units': '128,128',
                                    'num_epochs': 30,
                                })

job.run(sync=False, service_account=DATAFLOW_SERVICE_ACCOUNT)

Creating PipelineJob


INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob


PipelineJob created. Resource name: projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659


To use this PipelineJob in another session:


INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:


pipeline_job = aiplatform.PipelineJob.get('projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659')


INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659')


View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/europe-west4/pipelines/runs/creditcards-classifier-v02-train-pipeline-20220630091659?project=188940921537


INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/europe-west4/pipelines/runs/creditcards-classifier-v02-train-pipeline-20220630091659?project=188940921537


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/188940921537/locations/europe-west4/pipelineJobs/creditcards-classifier-v02-train-pipeline-20220630091659 current state:
PipelineState.PIPELINE_STATE_RUNNING


### Extracting pipeline runs metadata

In [47]:
from google.cloud import aiplatform as vertex_ai

pipeline_df = vertex_ai.get_pipeline_df(PIPELINE_NAME)
pipeline_df = pipeline_df[pipeline_df.pipeline_name == PIPELINE_NAME]
pipeline_df

Unnamed: 0,pipeline_name,run_name,param.input:hidden_units,param.input:batch_size,param.input:num_epochs,param.input:learning_rate
0,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
1,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
2,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
3,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
4,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,256126.0,512.0,7.0,0.0015
5,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
6,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
7,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
8,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,128128.0,512.0,30.0,0.003
9,creditcards-classifier-v02-train-pipeline,creditcards-classifier-v02-train-pipeline-2022...,256126.0,512.0,7.0,0.0015


## 3. Execute the pipeline deployment CI/CD steps in Cloud Build

The CI/CD routine is defined in the [pipeline-deployment.yaml](pipeline-deployment.yaml) file, and consists of the following steps:
1. Clone the repository to the build environment.
2. Run unit tests.
3. Run a local e2e test of the pipeline.
4. Build the ML container image for pipeline steps.
5. Compile the pipeline.
6. Upload the pipeline to Cloud Storage.

### Build CI/CD container Image for Cloud Build

This is the runtime environment where the steps of testing and deploying the pipeline will be executed.

In [48]:
!echo $CICD_IMAGE_URI

europe-west4-docker.pkg.dev/pbalm-cxb-aa/creditcards/cicd:latest


In [9]:
!cp build/Dockerfile.cicd build/Dockerfile
!gcloud builds submit --tag $CICD_IMAGE_URI build/. --timeout=15m --machine-type=e2-highcpu-8

Creating temporary tarball archive of 16 file(s) totalling 27.6 KiB before compression.
Uploading tarball of [build/.] to [gs://pbalm-cxb-aa_cloudbuild/source/1656498940.433089-c6bc070eb85341b9a995fbaa7f18a58b.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/pbalm-cxb-aa/locations/global/builds/fa254c17-768b-459e-b679-98c9ae4a9b3e].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/fa254c17-768b-459e-b679-98c9ae4a9b3e?project=188940921537].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "fa254c17-768b-459e-b679-98c9ae4a9b3e"

FETCHSOURCE
Fetching storage object: gs://pbalm-cxb-aa_cloudbuild/source/1656498940.433089-c6bc070eb85341b9a995fbaa7f18a58b.tgz#1656498940963890
Copying gs://pbalm-cxb-aa_cloudbuild/source/1656498940.433089-c6bc070eb85341b9a995fbaa7f18a58b.tgz#1656498940963890...
/ [1 files][  4.2 KiB/  4.2 KiB]                                                
Operation completed over 1 objects/4.2

### Run CI/CD from pipeline deployment using Cloud Build

In [51]:
REPO_URL="https://github.com/pbalm/mlops-with-vertex-ai.git"

BRANCH = "main"

GCS_LOCATION = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/"
TEST_GCS_LOCATION = f"gs://{BUCKET}/{DATASET_DISPLAY_NAME}/e2e_tests"
CI_TRAIN_LIMIT = 1000
CI_TEST_LIMIT = 100
CI_UPLOAD_MODEL = 0
CI_ACCURACY_THRESHOLD = -0.1 # again setting accuracy threshold to negative
BEAM_RUNNER = "DataflowRunner"
TRAINING_RUNNER = "vertex"
VERSION = 'latest'
PIPELINE_NAME = f'{MODEL_DISPLAY_NAME}-train-pipeline'
PIPELINES_STORE = os.path.join(GCS_LOCATION, "compiled_pipelines")

TFX_IMAGE_URI = f"{REGION}-docker.pkg.dev/{PROJECT}/{DATASET_DISPLAY_NAME}/vertex:{VERSION}"

SUBSTITUTIONS=f"""\
_REPO_URL='{REPO_URL}',\
_BRANCH={BRANCH},\
_CICD_IMAGE_URI={CICD_IMAGE_URI},\
_PROJECT={PROJECT},\
_REGION={DATAFLOW_REGION},\
_GCS_LOCATION={GCS_LOCATION},\
_TEST_GCS_LOCATION={TEST_GCS_LOCATION},\
_BQ_LOCATION={BQ_LOCATION},\
_BQ_DATASET_NAME={BQ_DATASET_NAME},\
_BQ_TABLE_NAME={BQ_TABLE_NAME},\
_DATASET_DISPLAY_NAME={DATASET_DISPLAY_NAME},\
_MODEL_DISPLAY_NAME={MODEL_DISPLAY_NAME},\
_CI_TRAIN_LIMIT={CI_TRAIN_LIMIT},\
_CI_TEST_LIMIT={CI_TEST_LIMIT},\
_CI_UPLOAD_MODEL={CI_UPLOAD_MODEL},\
_CI_ACCURACY_THRESHOLD={CI_ACCURACY_THRESHOLD},\
_BEAM_RUNNER={BEAM_RUNNER},\
_TRAINING_RUNNER={TRAINING_RUNNER},\
_TFX_IMAGE_URI={TFX_IMAGE_URI},\
_PIPELINE_NAME={PIPELINE_NAME},\
_PIPELINES_STORE={PIPELINES_STORE},\
_SUBNETWORK={DATAFLOW_SUBNETWORK},\
_GCS_BUCKET={BUCKET}/cloudbuild,\
_SERVICE_ACCOUNT={DATAFLOW_SERVICE_ACCOUNT}\
"""
!echo $SUBSTITUTIONS

_REPO_URL=https://github.com/pbalm/mlops-with-vertex-ai.git,_BRANCH=main,_CICD_IMAGE_URI=europe-west4-docker.pkg.dev/pbalm-cxb-aa/creditcards/cicd:latest,_PROJECT=pbalm-cxb-aa,_REGION=europe-west4,_GCS_LOCATION=gs://pbalm-cxb-aa-eu/creditcards/,_TEST_GCS_LOCATION=gs://pbalm-cxb-aa-eu/creditcards/e2e_tests,_BQ_LOCATION=EU,_BQ_DATASET_NAME=vertex_eu,_BQ_TABLE_NAME=creditcards_ml,_DATASET_DISPLAY_NAME=creditcards,_MODEL_DISPLAY_NAME=creditcards-classifier-v02,_CI_TRAIN_LIMIT=1000,_CI_TEST_LIMIT=100,_CI_UPLOAD_MODEL=0,_CI_ACCURACY_THRESHOLD=-0.1,_BEAM_RUNNER=DataflowRunner,_TRAINING_RUNNER=vertex,_TFX_IMAGE_URI=europe-west4-docker.pkg.dev/pbalm-cxb-aa/creditcards/vertex:latest,_PIPELINE_NAME=creditcards-classifier-v02-train-pipeline,_PIPELINES_STORE=gs://pbalm-cxb-aa-eu/creditcards/compiled_pipelines,_SUBNETWORK=https://www.googleapis.com/compute/v1/projects/pbalm-cxb-aa/regions/europe-west4/subnetworks/default,_GCS_BUCKET=pbalm-cxb-aa-eu/cloudbuild,_SERVICE_ACCOUNT=188940921537-compute@de

In [54]:
!gcloud builds submit --no-source --timeout=60m --config build/pipeline-deployment.yaml --substitutions {SUBSTITUTIONS} --machine-type=e2-highcpu-8

Created [https://cloudbuild.googleapis.com/v1/projects/pbalm-cxb-aa/locations/global/builds/22e94189-b4c4-489e-b1c5-d4b7d7f8ff69].
Logs are available at [https://console.cloud.google.com/cloud-build/builds/22e94189-b4c4-489e-b1c5-d4b7d7f8ff69?project=188940921537].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "22e94189-b4c4-489e-b1c5-d4b7d7f8ff69"

FETCHSOURCE
BUILD
Starting Step #0 - "Clone Repository"
Step #0 - "Clone Repository": Already have image (with digest): gcr.io/cloud-builders/git
Step #0 - "Clone Repository": Cloning into 'mlops-with-vertex-ai'...
Step #0 - "Clone Repository": POST git-upload-pack (352 bytes)
Step #0 - "Clone Repository": POST git-upload-pack (194 bytes)
Finished Step #0 - "Clone Repository"
Starting Step #4 - "Copy Dockerfile"
Starting Step #1 - "Unit Test Datasource Utils"
Starting Step #2 - "Unit Test Model"
Step #4 - "Copy Dockerfile": Pulling image: ubuntu
Step #2 - "Unit Test Model": Pulling image: eu