## Table of Contents
1. Setup Google Cloud Project
2. Setup Jupyter Notebook
3. Tensorflow Extended Module Files
4. Migrate Baseline Data
5. Write a Vertex AI/Kubeflow Definition File
6. Submit Pipeline
7. Continuous Training

The goal of this notebook is to walkthrough a TFX pipeline orchestrated using a managed version of Kubeflow Pipelines on Google Cloud Platform (Vertex Pipelines). Like the Airflow example, the core TFX components can be reused with minor modifications and the pipeline will be initiated through a Jupyter notebook.

## 1. Setup Google Cloud project

There are some things we need to do before starting our pipeline on Google Cloud Platform. 

It is necessary to [create a Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects) which allow us to keep everything related to the pipeline together.

Once a project is created, we need to [enable the appropriate APIs](https://support.google.com/googleapi/answer/6158841?hl=en&ref_topic=7013279). Enabling the appropriate APIs allows us to access the services necessary for this project. We will be running a pipeline orchestrated using a managed Kubeflow Pipelines instance; therefore, we will need to enable the Vertex AI, Compute Engine, and Notebooks API. In addition, we will need to enable the Cloud Functions, Cloud Build, and Cloud Resource Manager API to create a Cloud Functions function used for continuous training.

We can enter commands into Google Cloud Platform's `Cloud Shell`, an online bash shell based on Debian, to help us manage our project. We can [lauch the `Cloud Shell` from the Google Cloud console](https://cloud.google.com/shell/docs/launching-cloud-shell).

It is necessary to create a service to run our pipeline. With the appropriate permissions, a service account can make authorized API calls and interact with the enabled APIs. We can enter the following code into the `Cloud Shell` to create a service account.
```
gcloud iam service-accounts create SERVICE_ACCOUNT_ID \
    --display-name="DISPLAY_NAME" \
    --project=PROJECT_ID
```
With the following definitions:
- `SERVICE_ACCOUNT_ID`: ID of the created service account
- `DISPLAY_NAME`: display name of the service account
- `PROJECT_ID`: project the service account is associated with

In the case of this project, the following was entered into the `Cloud Shell`:
```
gcloud iam service-accounts create hot-dog-pipeline-service\
    --display-name="hot-dog-pipeline-service" \
    --project="hot-dog-pipeline"
```

Once a service account is created, it is necessary to give the service account a role. On Google Cloud Platform, a role is associated with the necessary permissions to interact with the API. This role should include all the necessary permissions to successfully run the pipeline.

```
gcloud projects add-iam-policy-binding PROJECT_ID\
    --member="serviceAccount:SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com" \
    --role="ROLE"
```
With the following definitions:
- `SERVICE_ACCOUNT_ID`: ID of the created service account
- `PROJECT_ID`: project the service account is associated with
- `ROLE`: role to associate with the service account

Since we will be interacting with Vertex AI, we need to give the service account an appropriate role and associated permissions. In the case of this project, the following was entered into the `Cloud Shell`:
```
gcloud projects add-iam-policy-binding hot-dog-pipeline \
    --member="serviceAccount:hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com" \
    --role="roles/aiplatform.serviceAgent"
```

We also want to grant the service accout the Cloud Function Developer. This will be helpful for continuous training.

```
gcloud projects add-iam-policy-binding hot-dog-pipeline \
    --member="serviceAccount:hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com" \
    --role="roles/cloudfunctions.developer"
```

Now that we have a service account set up with the appropriate roles and permissions, it is necessary to associate the service account with the user account so the user can run the pipeline.

```
gcloud iam service-accounts add-iam-policy-binding \
    SERVICE_ACCOUNT_ID@PROJECT_ID.iam.gserviceaccount.com \
    --member="user:USER_EMAIL" \
    --role="roles/iam.serviceAccountUser"
```
With the following definitions:
- `SERVICE_ACCOUNT_ID`: ID of the created service account
- `PROJECT_ID`: project the service account is associated with
- `USER_EMAIL`: email address of user who will run pipeline

In the case of this project, the following was entered into the `Cloud Shell` (the user email has been removed for privacy issues):

```
gcloud iam service-accounts add-iam-policy-binding \
    hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com \
    --member="user:@gmail.com" \
    --role="roles/iam.serviceAccountUser"
```

We need to create a storage account to hold all the files associated with this project.

```
gsutil mb -p PROJECT_ID -l LOCATION gs://BUCKET_NAME
```
With the following definitions:
- `PROJECT_ID`: project the service account is associated with
- `LOCATION`: Google Cloud Platform zone we want the bucket located in
- `BUCKET_NAME`: what we want to call the bucket

Note: Bucket names should be [globally unique](https://cloud.google.com/storage/docs/naming-buckets); therefore, a new bucket name is necessary if you are following alone with this notebook.

The following was entered into the `Cloud Shell`:
```
gsutil mb -p hot-dog-pipeline -l US-CENTRAL1 gs://hot-dog-pipeline-gcs
```

We also need a separate bucket we used exclusively for data. This is because the Cloud Functions does not take a subdirectory as a `trigger-resource` (see '7. Continuous Training').
```
gsutil mb -p hot-dog-pipeline -l US-CENTRAL1 gs://hot-dog-pipeline-gcs-data
```

We grant the service account storage privledges so there is read and write access in the storage buckets for the pipeline artifacts.

```
gsutil iam ch \
serviceAccount:hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com:roles/storage.admin \
gs://hot-dog-pipeline-gcs

gsutil iam ch \
serviceAccount:hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com:roles/storage.admin \
gs://hot-dog-pipeline-gcs-data
```

We are ready to create a Jupyter notebook to run the rest of this notebook in. We are using a [Deep Learning VM Image](https://cloud.google.com/deep-learning-vm), specifically tf2-ent-2-7-cu113-notebooks-v20211202-debian-10, as the image contains preinstalled key machine learning frameworks and tools. The options of the `gcloud notebooks instances create` are too long but are available [here](https://cloud.google.com/sdk/gcloud/reference/notebooks/instances/create). The following was entered into `Cloud Shell`:
```
gcloud notebooks instances create hot-dog-pipeline-notebook \
    --vm-image-project="deeplearning-platform-release" \
    --vm-image-name=tf2-ent-2-7-cu113-notebooks-v20211202-debian-10 \
    --machine-type=n1-standard-2 \
    --location=us-central1-a \
    --service-account=hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com
```

Once the notebook is created, it can be [opened](https://cloud.google.com/vertex-ai/docs/workbench/user-managed/create-user-managed-notebooks-instance-console-quickstart#open-jupyterlab). A copy of this notebook can be uploaded and opened in the JupyterLab instance. The rest of this notebook can be run in the JupyterLab instance.

## 2. Setup notebook

Even though the Deep Learning VM contains machine learning frameworks and tools, we should still use the latest versions. Therefore, it is necessary to upgrade pip and the libraries of interest.

In [1]:
!pip install --upgrade pip
!pip install --upgrade "tfx[kfp]<2"

Collecting pip
  Downloading pip-22.2.2-py3-none-any.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 4.8 MB/s            
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.3.1
    Uninstalling pip-21.3.1:
      Successfully uninstalled pip-21.3.1
Successfully installed pip-22.2.2
Collecting tfx[kfp]<2
  Downloading tfx-1.9.1-py3-none-any.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting docker<5,>=4.1
  Downloading docker-4.4.4-py2.py3-none-any.whl (147 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m147.0/147.0 kB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tensorflow-serving-api!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,!=2.8.*,<3,>=1.15
  Downloading tensorflow_serving_api-2.10.0-py2.py3-none-any.whl (37 kB)
Collecting tfx-bsl<1.10.0,>=1.9.0
 

It is necessary to restart the runtime after installing the libraries. Restarting the runtime can be accomplished by running the following:

In [2]:
import sys
import IPython
  
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

Since we've downloaded our libraries and restarted the runtime, we can now import the relevant updated libraries for this project.

In [1]:
import tensorflow as tf
from tfx import v1 as tfx
import kfp

We can setup some variables which will be used throughout the rest of this notebook. The variable definitions are as follows:
- `GOOGLE_CLOUD_PROJECT`: Google Cloud Platform project name
- `GOOGLE_CLOUD_REGION`: Google Cloud Platform zone we want
- `GCS_BUCKET_NAME`: name of bucket that holds everything except the model data
- `GCS_DATA_BUCKET_NAME`: name of bucket that holds the model data
- `PIPELINE_NAME`: name of pipeline

In [2]:
GOOGLE_CLOUD_PROJECT = 'hot-dog-pipeline'
GOOGLE_CLOUD_REGION = 'us-central1'     
GCS_BUCKET_NAME = 'hot-dog-pipeline-gcs'
GCS_DATA_BUCKET_NAME = 'hot-dog-pipeline-gcs-data'
PIPELINE_NAME = 'hot-dog-vertex-pipelines'

# Path to various pipeline artifact.
PIPELINE_ROOT = 'gs://{}/pipeline_root/{}'.format(
    GCS_BUCKET_NAME, PIPELINE_NAME
)

# Paths for users defined TFX modules.
MODULE_ROOT = 'gs://{}/pipeline_module/{}'.format(
    GCS_BUCKET_NAME, PIPELINE_NAME
)

# Path for data.
DATA_ROOT = 'gs://{}'.format(GCS_DATA_BUCKET_NAME)

# Path where model will be pushed
SERVING_MODEL_DIR = 'gs://{}/serving_model/{}'.format(
    GCS_BUCKET_NAME, PIPELINE_NAME
)

## 3. Tensorflow Extended Module Files
We have previously defined the module files in the interactive and Airflow pipeline. We create local copies of the TFX files.

In [3]:
HOT_DOG_TRANSFORM = 'hot_dog_transform.py'

In [4]:
%%writefile {HOT_DOG_TRANSFORM}

import os
import tensorflow as tf


def _process_image(raw_image):
    """Process a single image

    Parameters
    ----------
    raw_image : bytestring
        Encoded image string

    Returns
    -------
    tf.Tensor
        Decoded and resized image
    """
    raw_image = tf.reshape(raw_image, [])
    img_rgb = tf.image.decode_jpeg(raw_image, channels=3)
    img = tf.cast(img_rgb, dtype=tf.float32)
    resized_img = tf.image.resize_with_crop_or_pad(
        img, target_height=224, target_width=224,
    )
    
    return tf.reshape(resized_img, [224, 224, 3])
 
 
def preprocessing_fn(inputs):
    """Callback function for preprocessing inputs

    Serves as the entry point for TFX Transform component

    Parameters
    ----------
    inputs : nested tf.Tensor
        A batch of tensors to be processed

    Returns
    -------
    tf.Tensor
        Each tensor stacks the results of applying fn to tensors unstacked from 
        elems along the first dimension, from first to last
    """
    image_raw = inputs['image']
    label = inputs['label']
    # the pipeline processes images in batches
    # use the tf.map_fn to apply our user defined function to batch
    img_preprocessed=tf.map_fn(_process_image, image_raw, dtype=tf.float32)

    return {
      'image_xf': img_preprocessed,
      'label': label,
    }

Writing hot_dog_transform.py


Since the Python file is written locally, we want to move it to our bucket so it can be accessed when the notebook instance is closed.

In [5]:
!gsutil cp {HOT_DOG_TRANSFORM} {MODULE_ROOT}/

Copying file://hot_dog_transform.py [Content-Type=text/x-python]...
/ [1 files][  1.3 KiB/  1.3 KiB]                                                
Operation completed over 1 objects/1.3 KiB.                                      


We do a similar process for the `hot_dog_train.py` file.

In [6]:
HOT_DOG_TRAIN = 'hot_dog_train.py'

In [7]:
%%writefile {HOT_DOG_TRAIN}

import os
import tensorflow as tf
import tensorflow_transform as tft
from tfx import v1 as tfx
from tfx_bsl.public import tfxio
from tensorflow_transform import TFTransformOutput


_LABEL_KEY = 'label'
_BATCH_SIZE = 32


def _input_fn(
    file_pattern, data_accessor, tf_transform_output, batch_size
):
    """Generates features and label for tuning/training

    Parameters
    ----------
    file_pattern : List[str]
        List of paths or patterns of input tfrecord files.
    data_accessor : tfx.components.DataAccessor
        DataAccessor for converting input to RecordBatch.
    tf_transform_output : tft.TFTransformOutput
        Output from Transform component
    batch_size : int
        representing the number of consecutive elements of returned
        dataset to combine in a single batch

    Returns
    -------
    tf.data.Dataset
        A dataset that contains (features, indices) tuple where features is a
        dictionary of Tensors, and indices is a single Tensor of label indices.
    """
    dataset = data_accessor.tf_dataset_factory(
        file_pattern,
        tfxio.TensorFlowDatasetOptions(
            batch_size=batch_size, label_key=_LABEL_KEY,
            shuffle_buffer_size=1200, shuffle_seed=123
        ),
        tf_transform_output.transformed_metadata.schema
    )
    
    return dataset


def _build_keras_model():
    """Create a Keras model

    Returns
    -------
    tf.keras.Model
        Model to be used during training
    """
    inputs = tf.keras.layers.Input(shape=(224, 224, 3), name='image_xf')
    base_model= tf.keras.applications.EfficientNetB0(
      include_top=False, weights='imagenet', input_tensor=inputs
    )

    # Rebuild top
    x = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
    x = tf.keras.layers.Dropout(0.2)(x)
    output = tf.keras.layers.Dense(3, activation='softmax', name='label')(x)

    # Compile
    model = tf.keras.Model(inputs=inputs, outputs=output)

    model.compile(
          loss=tf.keras.losses.CategoricalCrossentropy(),
          optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
          metrics=['accuracy']
    )

    return model


def _get_serve_tf_examples_fn(model, tf_transform_output):
    """"Returns a function that parses a serialized tf.Example

    Parameters
    ----------
    model : tf.keras.Model
        Model to be used during training
    tf_transform_output : TFTransformOutput
        Output from Transform component

    Returns
    -------
    tf.function
        serve_tf_examples_fn
    """

    model.tft_layer = tf_transform_output.transform_features_layer()

    @tf.function
    def serve_tf_examples_fn(serialized_tf_examples):
        """Returns the output to be used in the serving signature

        Parameters
        ----------
        serialized_tf_examples : tf.Example
            Serialized tf.Example to be processed

        Returns
        -------
        dict
            Serving signature
        """
        feature_spec = tf_transform_output.raw_feature_spec()
        feature_spec.pop(_LABEL_KEY)
        parsed_features = tf.io.parse_example(
            serialized_tf_examples, feature_spec
        )

        transformed_features = model.tft_layer(parsed_features)

        outputs = model(transformed_features)
        return {"outputs": outputs}

    return serve_tf_examples_fn


# TFX Trainer will call this function.
def run_fn(fn_args):
    """Train the model based on given args

    Parameters
    ----------
    fn_args : _type_
        Arguments used to train the model as name/value pairs.
    """
    tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)

    train_dataset = _input_fn(
      fn_args.train_files, fn_args.data_accessor, 
      tf_transform_output, _BATCH_SIZE
    )
    eval_dataset = _input_fn(
      fn_args.eval_files, fn_args.data_accessor, 
      tf_transform_output, _BATCH_SIZE
    )

    model = _build_keras_model()

    early_stop = tf.keras.callbacks.EarlyStopping(
      monitor='val_accuracy', patience=3
    )

    model.fit(
        train_dataset,
        epochs=20,
        steps_per_epoch=fn_args.train_steps,
        validation_data=eval_dataset,
        validation_steps=fn_args.eval_steps,
        callbacks=[early_stop]
    )
    
    signatures = {
          "serving_default": _get_serve_tf_examples_fn(
              model, tf_transform_output
          ).get_concrete_function(
              tf.TensorSpec(shape=[None], dtype=tf.string, name="examples")
          ),
      }
    model.save(
        fn_args.serving_model_dir, save_format="tf", signatures=signatures
    )

Writing hot_dog_train.py


In [8]:
!gsutil cp {HOT_DOG_TRAIN} {MODULE_ROOT}/

Copying file://hot_dog_train.py [Content-Type=text/x-python]...
/ [1 files][  4.5 KiB/  4.5 KiB]                                                
Operation completed over 1 objects/4.5 KiB.                                      


## 4. Migrate Baseline Data
Similar to the interactive pipeline, we can use the baseline data set to get an initial model. The `baseline.tfrecord` file was [uploaded to using the console](https://cloud.google.com/storage/docs/uploading-objects#uploading-an-object). If the data set had been pushed to the appropriate github project beforehand, the command line could be used to transfer the file.


## 5. Write a Vertex AI/Kubeflow Pipelines Definition File
This file is very similar to Airflow's pipeline orchestration with a few exceptions. This is a pipeline orchestrated using a Kubeflow Pipelines; therefore, there are changes in the DAG runner code, and the `beam_pipeline_args` are no longer needed. Since this is a managed Kubeflow Pipelines, we no longer need the `metadata_connection_config` parameter since the metadata will be handled by Vertex Pipelines.

Running the following code will create a the pipeline definition in a json file.

In [9]:
"""TFX pipeline orchestrated with Kubeflow Pipelines"""

import os
from typing import List
from tfx import v1 as tfx
import tensorflow_model_analysis as tfma
from tfx.orchestration import metadata
from tfx.orchestration import pipeline

PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '.json'

def _create_pipeline(
    pipeline_name, pipeline_root, data_root, transform_module_file, 
    trainer_module_file, serving_model_dir
):
    """Create a TFX Pipeline"""
    example_gen = tfx.components.ImportExampleGen(input_base=data_root)

    statistics_gen = tfx.components.StatisticsGen(
        examples=example_gen.outputs['examples']
    )
    
    schema_gen = tfx.components.SchemaGen(
        statistics=statistics_gen.outputs['statistics'],
        infer_feature_shape=True
    )
    
    example_validator = tfx.components.ExampleValidator(
        statistics=statistics_gen.outputs['statistics'],
        schema=schema_gen.outputs['schema']
    )
    
    transform = tfx.components.Transform(
        examples=example_gen.outputs['examples'],
        schema=schema_gen.outputs['schema'],
        module_file=transform_module_file
    )
  
    # Uses user-provided Python function that implements a model.
    trainer = tfx.components.Trainer(
        #custom_executor_spec=executor_spec.ExecutorClassSpec(GenericExecutor),
        module_file=trainer_module_file,
        examples=transform.outputs['transformed_examples'],
        transform_graph=transform.outputs['transform_graph'],
        schema=schema_gen.outputs['schema'],
        train_args=tfx.proto.TrainArgs(num_steps=25),
        eval_args=tfx.proto.EvalArgs(num_steps=12)
    )

    model_resolver = tfx.dsl.Resolver(
        strategy_class=tfx.dsl.experimental.LatestBlessedModelStrategy,
        model=tfx.dsl.Channel(type=tfx.types.standard_artifacts.Model),
        model_blessing=tfx.dsl.Channel(
            type=tfx.types.standard_artifacts.ModelBlessing
        )
    ).with_id('latest_blessed_model_resolver')

    eval_config = tfma.EvalConfig(
        model_specs=[tfma.ModelSpec(label_key='label')],
        slicing_specs=[tfma.SlicingSpec()],
        metrics_specs=[
            tfma.MetricsSpec(metrics=[
                tfma.MetricConfig(
                    class_name='CategoricalAccuracy',
                    threshold=tfma.MetricThreshold(
                        value_threshold=tfma.GenericValueThreshold(
                            lower_bound={'value': 0.55}),
                        change_threshold=tfma.GenericChangeThreshold(
                            direction=tfma.MetricDirection.HIGHER_IS_BETTER,
                            absolute={'value': -1e-3}
                        )
                    )
                )
            ])
        ]
    )
    evaluator = tfx.components.Evaluator(
        examples=example_gen.outputs['examples'],
        model=trainer.outputs['model'],
        baseline_model=model_resolver.outputs['model'],
        eval_config=eval_config
    )  
    pusher = tfx.components.Pusher(
        model=trainer.outputs['model'],
        model_blessing=evaluator.outputs['blessing'],
        push_destination=tfx.proto.PushDestination(
            filesystem=tfx.proto.PushDestination.Filesystem(
                base_directory=serving_model_dir
            )
        )
    )
    return tfx.dsl.Pipeline(
        pipeline_name=pipeline_name,
        pipeline_root=pipeline_root,
        components=[
            example_gen,
            statistics_gen,
            schema_gen,
            example_validator,
            transform,
            trainer,
            model_resolver,
            evaluator,
            pusher,
        ],
    )


runner = tfx.orchestration.experimental.KubeflowV2DagRunner(
    config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
    output_filename=PIPELINE_DEFINITION_FILE
)
# Following function will write the pipeline definition to PIPELINE_DEFINITION_FILE.
_ = runner.run(
    _create_pipeline(
        pipeline_name=PIPELINE_NAME,
        pipeline_root=PIPELINE_ROOT,
        data_root=DATA_ROOT,
        transform_module_file=os.path.join(MODULE_ROOT, HOT_DOG_TRANSFORM),
        trainer_module_file=os.path.join(MODULE_ROOT, HOT_DOG_TRAIN),
        serving_model_dir=SERVING_MODEL_DIR,
    )
)

running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying hot_dog_transform.py -> build/lib
installing to /tmp/tmp2qiq8mfu
running install
running install_lib
copying build/lib/hot_dog_transform.py -> /tmp/tmp2qiq8mfu
running install_egg_info
running egg_info
creating tfx_user_code_Transform.egg-info
writing tfx_user_code_Transform.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Transform.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Transform.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Transform.egg-info/SOURCES.txt'
Copying tfx_user_code_Transform.egg-info to /tmp/tmp2qiq8mfu/tfx_user_code_Transform-0.0+10dab3ab3582f483fef29f7a55750a255d7a9015b4b91ab64fb0453901d7a263-py3.7.egg-info
running install_scripts
creating /tmp/tmp2qiq8mfu/tfx_user_code_Transform-0.0+10d



running bdist_wheel
running build
running build_py
creating build
creating build/lib
copying hot_dog_train.py -> build/lib
installing to /tmp/tmppd9suc9i
running install
running install_lib
copying build/lib/hot_dog_train.py -> /tmp/tmppd9suc9i
running install_egg_info
running egg_info
creating tfx_user_code_Trainer.egg-info
writing tfx_user_code_Trainer.egg-info/PKG-INFO
writing dependency_links to tfx_user_code_Trainer.egg-info/dependency_links.txt
writing top-level names to tfx_user_code_Trainer.egg-info/top_level.txt
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
reading manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
writing manifest file 'tfx_user_code_Trainer.egg-info/SOURCES.txt'
Copying tfx_user_code_Trainer.egg-info to /tmp/tmppd9suc9i/tfx_user_code_Trainer-0.0+ee43407f36cb715c678d56e4a3c2cceb6a9f5cc3725870c85bcd6b339c4857e4-py3.7.egg-info
running install_scripts
creating /tmp/tmppd9suc9i/tfx_user_code_Trainer-0.0+ee43407f36cb715c678d56e4a3c2cce



## 6. Start Vertex AI Pipeline
The pipeline definition file generated in the previous stpe can be submitted using kfp client to start a run of the pipeline.

In [10]:
from google.cloud import aiplatform
from google.cloud.aiplatform import pipeline_jobs

aiplatform.init(project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_REGION)

job = pipeline_jobs.PipelineJob(
    template_path=PIPELINE_DEFINITION_FILE, display_name=PIPELINE_NAME
)

job.run(
    sync=False, 
    service_account='hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com'
)

Once the pipeline has been successfully submitted, we can go to __Vertex AI > Pipelines__ to view the progress.

## 7. Continuous Training
Like in the Airflow example, we could also set up Cloud Scheduler so the pipeline [executes on a schedule](https://cloud.google.com/vertex-ai/docs/pipelines/schedule-cloud-scheduler). However, in this continuous training example we will set up the pipeline to rerun whenever a new file is uploaded to a storage bucket of interest. This is an example of an [Cloud Functions event-driven function](https://cloud.google.com/functions/docs/writing/write-event-driven-functions#background-functions).

We need to transfer our pipeline definition file (created in step 5) to a storage bucket so we can continue to access it once this notebook instance is no longer available.

In [11]:
# Paths for pipeline definition file.
DEFINITION_ROOT = 'gs://{}/pipeline_definition/{}'.format(GCS_BUCKET_NAME, PIPELINE_NAME)

!gsutil cp {PIPELINE_DEFINITION_FILE} {DEFINITION_ROOT}/

Copying file://hot-dog-vertex-pipelines.json [Content-Type=application/json]...
/ [1 files][ 18.7 KiB/ 18.7 KiB]                                                
Operation completed over 1 objects/18.7 KiB.                                     


We will need to files to successfully initialize a Cloud Functions: [`main.py` and `requirements.txt`](https://cloud.google.com/functions/docs/writing#directory-structure). The two files are put into a separate folder.

The `main.py` file contains the code to run when the Cloud Functions is activated (i.e. a new file is put into a bucket). In our case we want to rerun the pipeline so we are wrapping the code to start the pipeline into a function (`pipeline_run`) and then wrapping that function into an entry point (`trigger_pipeline`).


In [12]:
os.mkdir('cloud_function')

In [13]:
CLOUD_FUNCTION_MAIN = 'cloud_function/main.py'

In [14]:
%%writefile {CLOUD_FUNCTION_MAIN}
from google.cloud import aiplatform
from google.cloud.aiplatform import pipeline_jobs

PIPELINE_DEFINITION_FILE = 'gs://hot-dog-pipeline-gcs/pipeline_definition/hot-dog-vertex-pipelines/hot-dog-vertex-pipelines.json'
GOOGLE_CLOUD_PROJECT = 'hot-dog-pipeline'
GOOGLE_CLOUD_REGION = 'us-central1'
PIPELINE_NAME = 'hot-dog-vertex-pipelines'

def pipeline_run():
    aiplatform.init(project=GOOGLE_CLOUD_PROJECT, location=GOOGLE_CLOUD_REGION)

    job = pipeline_jobs.PipelineJob(
        template_path=PIPELINE_DEFINITION_FILE, display_name=PIPELINE_NAME
    )

    job.run(
        sync=False, 
        service_account='hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com'
    )


def trigger_pipeline(event, context):
    """Entry point for Cloud Function

    Parameters
    ----------
    event : dict
        Event payload
    context : google.cloud.functions.Context
         Metadata for the event.
    """
    pipeline_run()

Writing cloud_function/main.py


The `requirements.txt` folder contains the libraries needed for `main.py` to run. In our case, we only need the `google-cloud-aiplatform` library to run successfully.

In [15]:
CLOUD_FUNCTION_REQUIREMENTS = 'cloud_function/requirements.txt'

In [16]:
%%writefile {CLOUD_FUNCTION_REQUIREMENTS}
google-cloud-aiplatform>=1.17.0

Writing cloud_function/requirements.txt


We can now create a Cloud Functions function for our continuous training. The function will watch the storage bucket of interest (i.e. gs://hot-dog-pipeline-gcs-data). If any new files are observed (i.e. --trigger-event=google.storage.object.finalize), the pipeline will start (i.e. `trigger_pipeline` in `main.py`) and the model will be retrained.

In [18]:
  !gcloud functions deploy pipeline_run \
  --region=us-central1 \
  --entry-point=trigger_pipeline \
  --runtime=python37 \
  --service-account=hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com \
  --source=cloud_function \
  --timeout=400 \
  --trigger-resource=gs://hot-dog-pipeline-gcs-data\
  --trigger-event=google.storage.object.finalize

Deploying function (may take a while - up to 2 minutes)...⠹                    
For Cloud Build Logs, visit: https://console.cloud.google.com/cloud-build/builds;region=us-central1/6d584940-ec55-4133-8a7f-2fad2af7c97f?project=147591496135
Deploying function (may take a while - up to 2 minutes)...done.                
availableMemoryMb: 256
buildId: 6d584940-ec55-4133-8a7f-2fad2af7c97f
buildName: projects/147591496135/locations/us-central1/builds/6d584940-ec55-4133-8a7f-2fad2af7c97f
dockerRegistry: CONTAINER_REGISTRY
entryPoint: trigger_pipeline
eventTrigger:
  eventType: google.storage.object.finalize
  failurePolicy: {}
  resource: projects/_/buckets/hot-dog-pipeline-gcs-data
  service: storage.googleapis.com
ingressSettings: ALLOW_ALL
labels:
  deployment-tool: cli-gcloud
name: projects/hot-dog-pipeline/locations/us-central1/functions/pipeline_run
runtime: python37
serviceAccountEmail: hot-dog-pipeline-service@hot-dog-pipeline.iam.gserviceaccount.com
sourceUploadUrl: https://storage.g