# CI/CD for TFX pipelines

## Objectives

1.  Develop a CI/CD workflow with Cloud Build to build and deploy TFX pipeline code.
2.  Integrate with Github to automatically trigger pipeline deployment with source code repository changes.

In this lab, you will walk through authoring a Cloud Build CI/CD workflow that automatically builds and deploys the same TFX pipeline from `lab-02.ipynb`. You will also integrate your workflow with GitHub by setting up a trigger that starts the workflow when a new tag is applied to the GitHub repo hosting the pipeline's code.



## Setup

In [None]:
import yaml

# Set `PATH` to include the directory containing TFX CLI.
PATH=%env PATH
%env PATH=/home/jupyter/.local/bin:{PATH}

In [None]:
!python -c "import tfx; print('TFX version: {}'.format(tfx.__version__))"

**Note**: this lab was built and tested with the following package versions:

`TFX version: 0.25.0`

In [None]:
%pip install --upgrade --user tfx==0.25.0

**Note**: you may need to restart the kernel to pick up the correct package versions.

## Understanding the Cloud Build workflow
Review the `cloudbuild.yaml` file to understand how the CI/CD workflow is implemented and how environment specific settings are abstracted using **Cloud Build** variables.

The **Cloud Build** CI/CD workflow automates the steps you walked through manually during:
1. Builds the custom TFX image to be used as a runtime execution environment for TFX components and as the AI Platform Training training container.
1. Compiles the pipeline and uploads the pipeline to the KFP environment
1. Pushes the custom TFX image to your project's **Container Registry**

The **Cloud Build** workflow configuration uses both standard and custom [Cloud Build builders](https://cloud.google.com/cloud-build/docs/cloud-builders). The custom builder encapsulates **TFX CLI**. 


## Configuring environment settings

Navigate to [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console.

### Create or select an existing Kubernetes cluster (GKE) and deploy AI Platform

Make sure to select `"Allow access to the following Cloud APIs https://www.googleapis.com/auth/cloud-platform"` to allow for programmatic access to your pipeline by the Kubeflow SDK for the rest of the lab. Also, provide an `App instance name` such as "tfx" or "mlops". Note you may have already deployed an AI Pipelines instance during the Setup for the lab series. If so, you can proceed using that instance below in the next step.

Validate the deployment of your AI Platform Pipelines instance in the console before proceeding.

### Configure environment settings

Update  the below constants  with the settings reflecting your lab environment. 

- `GCP_REGION` - the compute region for AI Platform Training and Prediction
- `ARTIFACT_STORE` - the GCS bucket created during installation of AI Platform Pipelines. The bucket name starts with the `kubeflowpipelines-` prefix.

In [None]:
# Use the following command to identify the GCS bucket for metadata and pipeline storage.
!gsutil ls

* `CUSTOM_SERVICE_ACCOUNT` - In the gcp console Click on the Navigation Menu and navigate to `IAM & Admin`, then to `Service Accounts` and use the service account starting with prefix - 'tfx-tuner-caip-service-account'. This enables CloudTuner and the Google Cloud AI Platform extensions Tuner component to work together and allows for distributed and parallel tuning backed by AI Platform Vizier's hyperparameter search algorithm. Please see the lab setup `README` for setup instructions.

- `ENDPOINT` - set the `ENDPOINT` constant to the endpoint to your AI Platform Pipelines instance. The endpoint to the AI Platform Pipelines instance can be found on the [AI Platform Pipelines](https://console.cloud.google.com/ai-platform/pipelines/clusters) page in the Google Cloud Console. Open the *SETTINGS* for your instance and use the value of the `host` variable in the *Connect to this Kubeflow Pipelines instance from a Python client via Kubeflow Pipelines SKD* section of the *SETTINGS* window. The format is `'....[region].pipelines.googleusercontent.com'`.

In [None]:
GCP_REGION = 'us-central1'
ARTIFACT_STORE_URI = 'gs://dougkelly-sandbox-kubeflowpipelines-default' #Change
CUSTOM_SERVICE_ACCOUNT = 'tfx-tuner-caip-service-account@dougkelly-sandbox.iam.gserviceaccount.com' #Change
ENDPOINT = '60ff837483ecde05-dot-us-central2.pipelines.googleusercontent.com' #Change

PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]

## Creating the TFX CLI builder

### Review the Dockerfile for the TFX CLI builder

In [None]:
!cat tfx-cli/Dockerfile

In [None]:
!cat tfx-cli/requirements.txt

### Build the image and push it to your project's Container Registry

**Hint**: Review the [Cloud Build](https://cloud.google.com/cloud-build/docs/running-builds/start-build-manually#gcloud) gcloud command line reference for builds submit. Your image should follow the format `gcr.io/[PROJECT_ID]/[IMAGE_NAME]:latest`. Note the source code for the tfx-cli is in the directory `./tfx-cli`.

In [None]:
IMAGE_NAME='tfx-cli'
TAG='latest'
IMAGE_URI='gcr.io/{}/{}:{}'.format(PROJECT_ID, IMAGE_NAME, TAG)

In [None]:
!gcloud builds submit --timeout=15m --tag {IMAGE_URI} tfx-cli

## Manually trigger CI/CD pipeline run with Cloud Build

You can manually trigger **Cloud Build** runs using the `gcloud builds submit` command.

In [None]:
PIPELINE_NAME='tfx_covertype_continuous_training'
MODEL_NAME='tfx_covertype_classifier'
DATA_ROOT_URI='gs://cloud-training/OCBL203/workshop-datasets'
TAG_NAME='test'
TFX_IMAGE_NAME='lab-03-tfx-image'
PIPELINE_FOLDER='pipeline'
PIPELINE_DSL='runner.py'
RUNTIME_VERSION='2.3'
PYTHON_VERSION='3.7'
USE_KFP_SA='False'
ENABLE_TUNING='True'

SUBSTITUTIONS="""
_GCP_REGION={},\
_ARTIFACT_STORE_URI={},\
_CUSTOM_SERVICE_ACCOUNT={},\
_ENDPOINT={},\
_PIPELINE_NAME={},\
_MODEL_NAME={},\
_DATA_ROOT_URI={},\
_TFX_IMAGE_NAME={},\
TAG_NAME={},\
_PIPELINE_FOLDER={},\
_PIPELINE_DSL={},\
_RUNTIME_VERSION={},\
_PYTHON_VERSION={},\
_USE_KFP_SA={},\
_ENABLE_TUNING={},
""".format(GCP_REGION, 
           ARTIFACT_STORE_URI,
           CUSTOM_SERVICE_ACCOUNT,
           ENDPOINT,
           PIPELINE_NAME,
           MODEL_NAME,
           DATA_ROOT_URI,
           TFX_IMAGE_NAME,
           TAG_NAME, 
           PIPELINE_FOLDER,
           PIPELINE_DSL,
           RUNTIME_VERSION,
           PYTHON_VERSION,
           USE_KFP_SA,
           ENABLE_TUNING
           ).strip()

Hint: you can manually trigger **Cloud Build** runs using the `gcloud builds submit` command. See the [documentation](https://cloud.google.com/sdk/gcloud/reference/builds/submit) for pass the `cloudbuild.yaml` file and SUBSTITIONS as arguments.

In [None]:
!gcloud builds submit . --timeout=15m --config cloudbuild.yaml --substitutions {SUBSTITUTIONS}

## Setting up GitHub integration