# Continuous training with TFX and Cloud AI Platform

This lab demonstrates how to develop a Managed Pipelines pipeline that uses **AI Platform** and **Cloud Dataflow** as executors to run the TFX components at scale. You will also learn how to structure your pipeline code and how to use **TFX CLI** to submit pipeline runs

## Set up the environment

### Verify TFX SDK Version

*Note**: this lab was developed and tested with the following TF ecosystem package versions:

`Tensorflow Version: 2.3.0`  
`TFX Version: 0.23.0.caip20200818`  
`TFDV Version: 0.23.0`  
`TFMA Version: 0.23.0`



In [1]:
import os
import tensorflow as tf
import tensorflow_data_validation as tfdv
import tensorflow_model_analysis as tfma
import tfx

from tfx.tools.cli.ai_platform_pipelines import labels

print("Tensorflow Version:", tf.__version__)
print("TFX Version:", tfx.__version__)
print("TFDV Version:", tfdv.__version__)
print("TFMA Version:", tfma.VERSION_STRING)

Tensorflow Version: 2.3.0
TFX Version: 0.23.0.caip20200818
TFDV Version: 0.23.0
TFMA Version: 0.23.0


If the versions above do not match, update your packages in the current Jupyter kernel. 

### Update `PATH` with the location of TFX SDK.

In [2]:
os.environ['PATH'] += os.pathsep + '/home/jupyter/.local/bin'

## Understanding the pipeline design
The pipeline source code can be found in the `pipeline` and `modules` folders.

In [3]:
!ls -la pipeline

total 32
drwxr-xr-x 4 jupyter jupyter 4096 Sep  1 02:12 .
drwxr-xr-x 7 jupyter jupyter 4096 Sep  1 03:38 ..
-rw-r--r-- 1 jupyter jupyter 1787 Sep  1 01:13 configs.py
-rw-r--r-- 1 jupyter jupyter    0 Aug 30 01:57 __init__.py
drwxr-xr-x 2 jupyter jupyter 4096 Aug 31 01:37 .ipynb_checkpoints
-rw-r--r-- 1 jupyter jupyter 8636 Sep  1 02:12 pipeline.py
drwxr-xr-x 2 jupyter jupyter 4096 Sep  1 02:13 __pycache__


The `pipeline` folder contains the pipeline DSL and configurations.

The `configs.py` module configures the default values for the pipeline's settings.
The default values can be overwritten at compile time by using environment variables.

The `pipeline.py` module contains the TFX DSL defining the workflow implemented by the pipeline.



In [4]:
!ls -la modules

total 28
drwxr-xr-x 3 jupyter jupyter 4096 Aug 31 20:56 .
drwxr-xr-x 7 jupyter jupyter 4096 Sep  1 03:38 ..
-rw-r--r-- 1 jupyter jupyter 1220 Aug 31 20:42 features.py
-rw-r--r-- 1 jupyter jupyter    0 Aug 30 01:57 __init__.py
drwxr-xr-x 2 jupyter jupyter 4096 Aug 31 20:55 .ipynb_checkpoints
-rw-r--r-- 1 jupyter jupyter 7374 Aug 31 20:56 model.py
-rw-r--r-- 1 jupyter jupyter 2044 Aug 31 20:55 preprocessing.py


The `modules` folder contains user code for `Transform` and `Trainer` components.


The `preprocessing.py` module implements the data preprocessing logic  the `Transform` component.

The `model.py` module implements the training logic for the   `Train` component.

The `features.py` module contains common definitions for the `model.py` and `preprocessing.py` modules.


In [5]:
!ls -la runner.py

-rw-r--r-- 1 jupyter jupyter 3887 Sep  1 02:10 runner.py


The `runner.py` module in the root folder of the lab contains configurations for the Managed Pipelines runner.

## Building and deploying the pipeline

You will use TFX CLI to compile and deploy the pipeline. As noted in the previous section, the environment specific settings can be updated by modifying the `configs.py` file or setting respective environment variables.

### Set the environment variables

In [8]:
API_KEY = ''

PROJECT_ID = !(gcloud config get-value core/project)
PROJECT_ID = PROJECT_ID[0]
GCP_REGION = 'us-central1'
PIPELINE_NAME = 'tfx_covertype_continuous_training'
ARTIFACT_STORE = 'gs://mlops-dev-env-artifact-store'
DATA_ROOT = 'gs://workshop-datasets/covertype/small'

TARGET_IMAGE = f'gcr.io/{PROJECT_ID}/caip-tfx-custom'
BASE_IMAGE = 'gcr.io/caip-pipelines-assets/tfx:latest'

In [9]:
%env PIPELINE_NAME={PIPELINE_NAME}
%env ARTIFACT_STORE={ARTIFACT_STORE}
%env DATA_ROOT={DATA_ROOT}
%env GCP_REGION={GCP_REGION}

env: PIPELINE_NAME=tfx_covertype_continuous_training
env: ARTIFACT_STORE=gs://mlops-dev-env-artifact-store
env: DATA_ROOT=gs://workshop-datasets/covertype/small
env: GCP_REGION=us-central1


***Currently there is an issue with TFX CLI and environment variables. As a temporary mitigation update the `pipeline/configs.py` with equivalent values***

In [10]:
%%writefile pipeline/configs.py
# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The pipeline configurations.
"""

import os


PIPELINE_NAME = os.getenv("PIPELINE_NAME", "tfx_covertype_continuous_training")
ARTIFACT_STORE = os.getenv("ARTIFACT_STORE", "gs://mlops-dev-env-artifact-store")
DATA_ROOT = os.getenv("DATA_ROOT", "gs://workshop-datasets/covertype/small")
SCHEMA_URI = os.getenv("SCHEMA_URI", "schema")
GCP_REGION = os.getenv("GCP_REGION", "us-central1")
DATAFLOW_MACHINE_TYPE = os.getenv("DATAFLOW_MACHINE_TYPE", "n1-standard-8")
DATAFLOW_DISK_SIZE = os.getenv("DATAFLOW_DISK_SIZE", "100")
PREPROCESSING_FN = os.getenv("PREPROCESSING_FN", "modules.preprocessing.preprocessing_fn")
RUN_FN = os.getenv("RUN_FUN", "modules.model.run_fn")
TRAIN_NUM_STEPS = os.getenv("TRAIN_NUM_STEPS", 5000)
EVAL_NUM_STEPS = os.getenv("EVAL_NUM_STEPS", 500)
CAIP_TRAINING_MACHINE_TYPE = os.getenv("CAIP_TRAINING_MACHINE_TYPE", "n1-standard-8")
SERVING_MODEL_DIR = os.getenv("SERVING_MODEL_DIR", "gs://mlops-dev-env-artifact-store/models/covertype")
EVAL_ACCURACY_THRESHOLD = os.getenv("EVAL_ACCURACY_THRESHOLD", 0.5)
MODEL_NAME=os.getenv("MODEL_NAME", "covertype_classifier")
RUNTIME_VERSION=os.getenv("RUNTIME_VERSION", "2.1")
PYTHON_VERSION=os.getenv("PYTHON_VERSION", "3.7")

Overwriting pipeline/configs.py


### Build the pipeline

You can build a custom TFX container image  and compile the pipeline into the JSON IR in one step, using the `tfx caipp pipeline create` command. 

As you debug the pipeline DSL, you may prefer to first use the `tfx caipp pipeline compile` command, which is faster as it only executes the compilation step. After the DSL compiles successfully you can use the `tfx caipp pipeline create` to go through both steps.


#### Compile the pipeline

In [11]:
!tfx caipp pipeline compile \
--pipeline_path=runner.py \
--project_id={PROJECT_ID} \
--target-image={TARGET_IMAGE} 

CLI
Cloud AI Platform Pipelines
Compiling pipeline
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
[0mPipeline compiled successfully.
[0m

#### Create the pipeline 

In [12]:
!tfx caipp pipeline create  \
--pipeline_path=runner.py \
--build-base-image={BASE_IMAGE} \
--build-target-image={TARGET_IMAGE} 

CLI
Cloud AI Platform Pipelines
Creating pipeline
Reading build spec from build.yaml
No local setup.py, copying the directory and configuring the PYTHONPATH.
[Skaffold] Generating tags...
[Skaffold]  - gcr.io/mlops-dev-env/caip-tfx-custom -> gcr.io/mlops-dev-env/caip-tfx-custom:latest
[Skaffold] Checking cache...
[Skaffold]  - gcr.io/mlops-dev-env/caip-tfx-custom: Not found. Building
[Skaffold] Building [gcr.io/mlops-dev-env/caip-tfx-custom]...
[Skaffold] Sending build context to Docker daemon    151kB
[Skaffold] Step 1/4 : FROM gcr.io/caip-pipelines-assets/tfx:latest
[Skaffold]  ---> 19c14dda2bb5
[Skaffold] Step 2/4 : WORKDIR /pipeline
[Skaffold]  ---> Using cache
[Skaffold]  ---> 73280faa1be8
[Skaffold] Step 3/4 : COPY ./ ./
[Skaffold]  ---> 2a2b6460fd63
[Skaffold] Step 4/4 : ENV PYTHONPATH="/pipeline:${PYTHONPATH}"
[Skaffold]  ---> Running in 43b4c94baa3d
[Skaffold]  ---> 9d73df9c65fb
[Skaffold] Successfully built 9d73df9c65fb
[Skaffold] Successfully tagged gcr.io/mlops-dev-env/caip

If you need to rebuild the pipeline you can first delete the previous version using `tfx pipeline delete` or you can update the pipeline in-place using `tfx pipeline update`.

To delete the pipeline:

`tfx caipp pipeline delete --pipeline_name {PIPELINE_NAME}`

To update the pipeline:

`tfx caipp pipeline update --pipeline_path runner.py`

### Submit the pipeline run

In [13]:
!tfx caipp run create \
--pipeline-name={PIPELINE_NAME} \
--project-id={PROJECT_ID} \
--api-key={API_KEY} \
--target-image={TARGET_IMAGE}

CLI
Cloud AI Platform Pipelines
Creating a run for pipeline: tfx_covertype_continuous_training
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Excluding no splits because exclude_splits is not set.
INFO:absl:Compiled JSON request: {"name": "projects/mlops-dev-env/pipelineJobs/tfx_covertype_continuous_training_20200901034258", "displayName": "tfx_covertype_continuous_training", "spec": {"pipelineContext": "tfx_covertype_continuous_training", "steps": {"StatisticsGen.generate_statistics": {"task": {"inputs": {"examples": {"stepOutput": {"step": "CsvExampleGen.import_csv_data", "output": "examples"}}}, "executionProperties": {"exclude_splits": {"stringValue": "[]"}}, "outputs": {"statistics": {"artifact": {"customProperties": {"type_name": {"stringValue": "ExampleStatistics"}, "custom:producer_component": {"stringValue": "StatisticsGen.generate_statistics"}, "custom:name": {"stringValue": "statisti

To list all active runs of the pipeline:

In [None]:
!tfx caipp run list \
--project_id {PROJECT_ID} \
--pipeline_name {PIPELINE_NAME} \
--api-key {API_KEY}

To retrieve the status of a given run:

In [None]:
JOB_NAME='tfx_covertype_continuous_training_20200831022237'

!tfx caipp run status \
--project_id {PROJECT_ID} \
--job_name {JOB_NAME} \
--api-key {API_KEY}

## License

<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>