# Continuous training with TFX and Cloud AI Platform

This lab demonstrates how to develop a Managed Pipelines pipeline that uses **AI Platform** and **Cloud Dataflow** as executors to run the TFX components at scale. You will also learn how to structure your pipeline code and how to use **TFX CLI** to compile and deploy the pipeline.

## Set up the environment

### Verify TFX SDK Version

*Note**: this lab was developed and tested with the following TF ecosystem package versions:

`Tensorflow Version: 2.3.0`  
`TFX Version: 0.23.0.caip20200818`  
`TFDV Version: 0.23.0`  
`TFMA Version: 0.23.0`



In [1]:
%load_ext autoreload
%autoreload 2

In [12]:
import os
import tensorflow as tf
import tensorflow_data_validation as tfdv
import tensorflow_model_analysis as tfma
import tfx

from tfx.tools.cli.ai_platform_pipelines import labels

print("Tensorflow Version:", tf.__version__)
print("TFX Version:", tfx.__version__)
print("TFDV Version:", tfdv.__version__)
print("TFMA Version:", tfma.VERSION_STRING)

Tensorflow Version: 2.3.0
TFX Version: 0.23.0.caip20200818
TFDV Version: 0.23.0
TFMA Version: 0.23.0


In [13]:
dir(labels)

['CAIPP_API_KEY_ENV',
 'CAIPP_ENGINE',
 'CAIPP_GCP_PROJECT_ID_ENV',
 'CAIPP_RUN_FLAG_ENV',
 'CAIPP_TFX_IMAGE_ENV',
 'JOB_NAME',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

If the versions above do not match, update your packages in the current Jupyter kernel. 

### Update `PATH` with the location of TFX SDK.

In [3]:
os.environ['PATH'] += os.pathsep + '/home/jupyter/.local/bin'

## Understanding the pipeline design
The pipeline source code can be found in the `pipeline` and `modules` folders.

In [4]:
!ls -la pipeline

total 28
drwxr-xr-x 4 jupyter jupyter 4096 Aug 31 00:25 .
drwxr-xr-x 6 jupyter jupyter 4096 Aug 31 01:36 ..
-rw-r--r-- 1 jupyter jupyter 1106 Aug 31 01:32 configs.py
-rw-r--r-- 1 jupyter jupyter    0 Aug 30 01:57 __init__.py
drwxr-xr-x 2 jupyter jupyter 4096 Aug 29 22:00 .ipynb_checkpoints
-rw-r--r-- 1 jupyter jupyter 7981 Aug 30 02:32 pipeline.py
drwxr-xr-x 2 jupyter jupyter 4096 Aug 31 01:33 __pycache__


The `pipeline` folder contains the pipeline DSL and configurations.

The `config.py` module configures the default values for the pipeline's settings.
The default values can be overwritten at compile time by using environment variables.

The `pipeline.py` module contains the TFX DSL defining the workflow implemented by the pipeline.

The `beam_runner.py` configures the Beam runner used for local testing.

The `ml_runner.py` configures the Managed Pipelines runner.


In [5]:
!ls -la modules

total 24
drwxr-xr-x 2 jupyter jupyter 4096 Aug 30 01:57 .
drwxr-xr-x 6 jupyter jupyter 4096 Aug 31 01:36 ..
-rw-r--r-- 1 jupyter jupyter 1222 Aug 29 21:59 features.py
-rw-r--r-- 1 jupyter jupyter    0 Aug 30 01:57 __init__.py
-rw-r--r-- 1 jupyter jupyter 7302 Aug 29 21:59 model.py
-rw-r--r-- 1 jupyter jupyter 2032 Aug 29 21:59 preprocessing.py


The `modules` folder contains user code for `Transform` and `Trainer` components.


The `preprocessing.py` module implements the data preprocessing logic  the `Transform` component.

The `model.py` module implements the training logic for the   `Train` component.

The `features.py` module contains common definitions for the `model.py` and `preprocessing.py` modules.


## Building and deploying the pipeline

You will use TFX CLI to compile and deploy the pipeline. As noted in the previous section, the environment specific settings can be updated by modifying the `config.py` file or setting respective environment variables.

### Set the environment variables

In [6]:
PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
API_KEY = 'AIzaSyC3Mxax2j15dD8vWxAhe6riGAqAasOEi-U'

PIPELINE_NAME = 'tfx_covertype_continuous_training'
ARTIFACT_STORE = 'gs://mlops-dev-env-artifact-store'
DATA_ROOT = 'gs://workshop-datasets/covertype/small'

TARGET_IMAGE = f'gcr.io/{PROJECT_ID}/caip-tfx-custom'
BASE_IMAGE = 'gcr.io/caip-pipelines-assets/tfx:latest'

#MODEL_NAME = 'tfx_covertype_classifier'

#CUSTOM_TFX_IMAGE = 'gcr.io/{}/{}'.format(PROJECT_ID, PIPELINE_NAME)
#RUNTIME_VERSION = '2.1'
#PYTHON_VERSION = '3.7'

In [7]:
%env PIPELINE_NAME={PIPELINE_NAME}
%env ARTIFACT_STORE={ARTIFACT_STORE}
%env DATA_ROOT={DATA_ROOT}


#%env ARTIFACT_STORE_URI={ARTIFACT_STORE_URI}

#%env GCP_REGION={GCP_REGION}
#%env MODEL_NAME={MODEL_NAME}

#%env RUNTIME_VERSION={RUNTIME_VERSION}
#%env PYTHON_VERIONS={PYTHON_VERSION}
#%env USE_KFP_SA={USE_KFP_SA}

env: PIPELINE_NAME=tfx_covertype_continuous_training
env: ARTIFACT_STORE=gs://mlops-dev-env-artifact-store
env: DATA_ROOT=gs://workshop-datasets/covertype/small


***Currently there is an issue with TFX CLI and environment variables. As a temporary mitigation update the `pipeline/configs.py` with equivalent values***

In [14]:
%%writefile pipeline/configs.py
# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The pipeline configurations.
"""

import os


PIPELINE_NAME=os.getenv("PIPELINE_NAME", "tfx_covertype_continuous_training")
ARTIFACT_STORE=os.getenv("ARTIFACT_STORE", "gs://mlops-dev-env-artifact-store")
DATA_ROOT=os.getenv("DATA_ROOT", "gs://workshop-datasets/covertype/small")
#MODEL_NAME=os.getenv("MODEL_NAME", "covertype_classifier")
#GCP_REGION=os.getenv("GCP_REGION", "us-central1")
#RUNTIME_VERSION=os.getenv("RUNTIME_VERSION", "2.1")
#PYTHON_VERSION=os.getenv("PYTHON_VERSION", "3.7")
 

Overwriting pipeline/configs.py


### Compile the pipeline

In [15]:
!tfx caipp pipeline create  \
--pipeline_path=runner.py \
--build-base-image={BASE_IMAGE} \
--build-target-image={TARGET_IMAGE} 

CLI
Cloud AI Platform Pipelines
Creating pipeline
Reading build spec from build.yaml
Target image gcr.io/mlops-dev-env/caip-tfx-custom is not used. If the build spec is provided, update the target image in the build spec file build.yaml.
[Skaffold] Generating tags...
[Skaffold]  - gcr.io/mlops-dev-env/caip-tfx-custom -> gcr.io/mlops-dev-env/caip-tfx-custom:latest
[Skaffold] Checking cache...
[Skaffold]  - gcr.io/mlops-dev-env/caip-tfx-custom: Not found. Building
[Skaffold] Building [gcr.io/mlops-dev-env/caip-tfx-custom]...
[Skaffold] Sending build context to Docker daemon  102.4kB
[Skaffold] Step 1/4 : FROM gcr.io/caip-pipelines-assets/tfx:latest
[Skaffold]  ---> 19c14dda2bb5
[Skaffold] Step 2/4 : WORKDIR /pipeline
[Skaffold]  ---> Using cache
[Skaffold]  ---> 73280faa1be8
[Skaffold] Step 3/4 : COPY ./ ./
[Skaffold]  ---> 4a6c8f2b9235
[Skaffold] Step 4/4 : ENV PYTHONPATH="/pipeline:${PYTHONPATH}"
[Skaffold]  ---> Running in eee0cc84a5b7
[Skaffold]  ---> 7a6906fafa9f
[Skaffold] Successf

If you need to redeploy the pipeline you can first delete the previous version using `tfx pipeline delete` or you can update the pipeline in-place using `tfx pipeline update`.

To delete the pipeline:

`tfx caipp pipeline delete --pipeline_name {PIPELINE_NAME}`

To update the pipeline:

`tfx caipp pipeline update --pipeline_path runner.py`

### Submit the pipeline run

In [11]:
%env

{'CONDA_SHLVL': '1',
 'LD_LIBRARY_PATH': '/usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64',
 'CONDA_EXE': '/opt/conda/bin/conda',
 'RESTRICTION_TYPE_FILE_PATH': '/opt/deeplearning/restriction',
 'LANG': 'en_US.UTF-8',
 'DL_PATH': '/opt/deeplearning',
 'OS_UBUNTU1804': 'ubuntu-1804-lts',
 'INVOCATION_ID': '47f3f3ed8b66414e94558cde8b8ceda8',
 'GSETTINGS_SCHEMA_DIR_CONDA_BACKUP': '',
 'CONDA_PREFIX': '/opt/conda',
 'OS_DEBIAN9': 'debian-9',
 'JUPYTER_DEPS_PATH': '/opt/deeplearning/jupyter',
 'WORKSPACE_PATH': '/opt/deeplearning/workspace',
 'TUTORIALS_PATH': '/opt/deeplearning/workspace/tutorials',
 '_CE_M': '',
 'USER': 'jupyter',
 'FRAMEWORK_FILE_PATH': '/opt/deeplearning/metadata/framework',
 'ENV_URI_FILE_PATH': '/opt/deeplearning/metadata/env_uri',
 'PWD': '/home/jupyter',
 'VERSION_FILE_PATH': '/opt/deeplearning/metadata/version',
 'HOME': '/home/jupyter',
 'CONDA_PYTHON_EXE': '/opt/conda/bin/python',
 'JOURNAL_STREAM': '8:18043',
 'DL_ANACONDA_HOME': '

In [10]:
!tfx caipp run create \
--pipeline-name={PIPELINE_NAME} \
--project-id={PROJECT_ID} \
--api-key={API_KEY} \
--target-image={TARGET_IMAGE}

CLI
Cloud AI Platform Pipelines
Creating a run for pipeline: tfx_covertype_continuous_training
****************
gcr.io/mlops-dev-env/caip-tfx-custom
AIzaSyC3Mxax2j15dD8vWxAhe6riGAqAasOEi-U
mlops-dev-env
gs://mlops-workshop-artifact-store
None
*****************
INFO:absl:Excluding no splits because exclude_splits is not set.
Runner run
INFO:absl:Compiled JSON request: {"name": "projects/mlops-dev-env/pipelineJobs/covertype_continuous_training_20200831014013", "displayName": "covertype_continuous_training", "spec": {"pipelineContext": "covertype_continuous_training", "steps": {"StatisticsGen": {"task": {"inputs": {"examples": {"stepOutput": {"step": "CsvExampleGen", "output": "examples"}}}, "executionProperties": {"exclude_splits": {"stringValue": "[]"}}, "outputs": {"statistics": {"artifact": {"customProperties": {"custom:name": {"stringValue": "statistics"}, "custom:pipeline_name": {"stringValue": "covertype_continuous_training"}, "tfx_type": {"stringValue": "tfx.types.standard_artifac

To list all active runs of the pipeline:

In [None]:
!tfx caipp run list --pipeline_name {PIPELINE_NAME} --api-key {API_KEY}

To retrieve the status of a given run:

In [None]:
RUN_ID='[YOUR RUN ID]'

!tfx caipp run status --pipeline_name {PIPELINE_NAME} \
--run_id {RUN_ID} \
--api-key {API_KEY}

## Next Steps

In this lab, you learned how to manually build and deploy a TFX pipeline to AI Platform Pipelines and trigger pipeline runs from a notebook. In the next lab, you will construct a Cloud Build CI/CD workflow that automatically builds and deploys this same TFX pipeline.

## License

<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>