# 02 - ML Experimentation with Custom Model

The purpose of this notebook is to use [custom training](https://cloud.google.com/ai-platform-unified/docs/training/custom-training) to train a keras classifier to predict whether a given trip will result in a tip > 20%. The notebook covers the following tasks:
1. Preprocess the data locally using Apache Beam.
2. Submit a Dataflow job to preprocess the data at scale.
3. Submit a custom training job to Vertex AI using a [pre-built container](https://cloud.google.com/ai-platform-unified/docs/training/pre-built-containers).
4. Upload the trained model to Vertex AI.
5. Track experiment parameters from [Vertex AI Metadata](https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction).
6. Submit a [hyperparameter tuning job](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview) to Vertex AI.

We use [Vertex TensorBoard](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview) 
and [Vertex ML Metadata](https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction) to  track, visualize, and compare ML experiments.

## Setup

### Import libraries

The **preprocessing** step has been implemented in 'src/preprocessing'.

In [1]:
import os
import logging
from datetime import datetime
import numpy as np

import tensorflow as tf
import tensorflow_transform as tft
import tensorflow.keras as keras

from google.cloud import aiplatform as vertex_ai
from google.cloud.aiplatform import hyperparameter_tuning as hp_tuning

from src.common import features, datasource_utils
from src.model_training import data, model, defaults, trainer, exporter
from src.preprocessing import etl

logging.getLogger().setLevel(logging.INFO)
tf.get_logger().setLevel('INFO')

print(f"TensorFlow: {tf.__version__}")
print(f"TensorFlow Transform: {tft.__version__}")

2022-03-30 12:54:57.508230: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0


INFO:apache_beam.typehints.native_type_compatibility:Using Any for unsupported type: typing.Sequence[~T]
TensorFlow: 2.5.3
TensorFlow Transform: 1.2.0


### Setup Google Cloud project

When not explicitly provided, Vertex AI relies on the **Compute Engine default SA**.

In [2]:
PROJECT = 'grandelli-demo-295810' # Change to your project id.
REGION = 'us-central1' # Change to your region.
BUCKET = 'grandelli-demo-295810-partner-training-2022' # Change to your bucket name.
SERVICE_ACCOUNT = "155283586619-compute@developer.gserviceaccount.com"

if PROJECT == "" or PROJECT is None or PROJECT == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT = shell_output[0]
    
if SERVICE_ACCOUNT == "" or SERVICE_ACCOUNT is None or SERVICE_ACCOUNT == "[your-service-account]":
    # Get your GCP account from gcloud
    shell_output = !gcloud config list --format 'value(core.account)' 2>/dev/null
    SERVICE_ACCOUNT = shell_output[0]
    
if BUCKET == "" or BUCKET is None or BUCKET == "[your-bucket-name]":
    # Get your bucket name to GCP projet id
    BUCKET = PROJECT
    # Try to create the bucket if it doesn'exists
    ! gsutil mb -l $REGION gs://$BUCKET
    print("")
    
PARENT = f"projects/{PROJECT}/locations/{REGION}"
    
print("Project ID:", PROJECT)
print("Region:", REGION)
print("Bucket name:", BUCKET)
print("Service Account:", SERVICE_ACCOUNT)
print("Vertex API Parent URI:", PARENT)

Project ID: grandelli-demo-295810
Region: us-central1
Bucket name: grandelli-demo-295810-partner-training-2022
Service Account: 155283586619-compute@developer.gserviceaccount.com
Vertex API Parent URI: projects/grandelli-demo-295810/locations/us-central1


### Set configurations

In [3]:
VERSION = 'v01'
DATASET_DISPLAY_NAME = 'chicago-taxi-tips'
MODEL_DISPLAY_NAME = f'{DATASET_DISPLAY_NAME}-classifier-{VERSION}'

WORKSPACE = f'gs://{BUCKET}/{DATASET_DISPLAY_NAME}'
EXPERIMENT_ARTIFACTS_DIR = os.path.join(WORKSPACE, 'experiments')
RAW_SCHEMA_LOCATION = 'src/raw_schema/schema.pbtxt'

TENSORBOARD_DISPLAY_NAME = f'tb-{DATASET_DISPLAY_NAME}'
EXPERIMENT_NAME = f'{MODEL_DISPLAY_NAME}'

## Create Vertex TensorBoard instance 

While the open source TensorBoard (TB) is a Google open source project for machine learning experiment visualization, **Vertex AI TensorBoard** is an enterprise-ready managed version of TensorBoard.

If you are using custom training to train models, you can set up your training job to automatically upload your Vertex AI TensorBoard logs to Vertex AI TensorBoard.

In [4]:
tensorboard_resource = vertex_ai.Tensorboard.create(display_name=TENSORBOARD_DISPLAY_NAME)
tensorboard_resource_name = tensorboard_resource.gca_resource.name
print("TensorBoard resource name:", tensorboard_resource_name)

E0330 12:55:06.655858338       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


INFO:google.cloud.aiplatform.tensorboard.tensorboard:Creating Tensorboard
INFO:google.cloud.aiplatform.tensorboard.tensorboard:Create Tensorboard backing LRO: projects/155283586619/locations/us-central1/tensorboards/1155824215304175616/operations/6348477403160903680
INFO:google.cloud.aiplatform.tensorboard.tensorboard:Tensorboard created. Resource name: projects/155283586619/locations/us-central1/tensorboards/1155824215304175616
INFO:google.cloud.aiplatform.tensorboard.tensorboard:To use this Tensorboard in another session:
INFO:google.cloud.aiplatform.tensorboard.tensorboard:tb = aiplatform.Tensorboard('projects/155283586619/locations/us-central1/tensorboards/1155824215304175616')


E0330 12:55:18.089779084       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:55:19.889699529       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


TensorBoard resource name: projects/155283586619/locations/us-central1/tensorboards/1155824215304175616


## Initialize workspace

In [5]:
REMOVE_EXPERIMENT_ARTIFACTS = False

if tf.io.gfile.exists(EXPERIMENT_ARTIFACTS_DIR) and REMOVE_EXPERIMENT_ARTIFACTS:
    print("Removing previous experiment artifacts...")
    tf.io.gfile.rmtree(EXPERIMENT_ARTIFACTS_DIR)

if not tf.io.gfile.exists(EXPERIMENT_ARTIFACTS_DIR):
    print("Creating new experiment artifacts directory...")
    tf.io.gfile.mkdir(EXPERIMENT_ARTIFACTS_DIR)

print("Workspace is ready.")
print("Experiment directory:", EXPERIMENT_ARTIFACTS_DIR)

Workspace is ready.
Experiment directory: gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments


## Start a new Vertex AI experiment run

We create an experiment in the Vertex AI init.

In [10]:
vertex_ai.init(
    project=PROJECT,
    staging_bucket=BUCKET,
    experiment=EXPERIMENT_NAME)

run_id = f"run-gcp-{datetime.now().strftime('%Y%m%d%H%M%S')}"
vertex_ai.start_run(run_id) # this will store the experiment

EXPERIMENT_RUN_DIR = os.path.join(EXPERIMENT_ARTIFACTS_DIR, EXPERIMENT_NAME, run_id)
print("Experiment run directory:", EXPERIMENT_RUN_DIR)

E0330 12:55:57.346762977       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:55:59.366486037       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:56:01.309959495       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


INFO:root:Resource chicago-taxi-tips-classifier-v01-run-gcp-20220330125601 not found.
INFO:root:Creating Resource chicago-taxi-tips-classifier-v01-run-gcp-20220330125601


E0330 12:56:03.360008584       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:56:05.451635677       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:56:07.436685222       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


INFO:root:Resource chicago-taxi-tips-classifier-v01-run-gcp-20220330125601-metrics not found.
INFO:root:Creating Resource chicago-taxi-tips-classifier-v01-run-gcp-20220330125601-metrics


E0330 12:56:09.367806410       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:56:11.423364340       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Experiment run directory: gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601


## 3. Submit a Data Processing Job to Dataflow

In [11]:
EXPORTED_DATA_PREFIX = os.path.join(EXPERIMENT_RUN_DIR, 'exported_data')
TRANSFORMED_DATA_PREFIX = os.path.join(EXPERIMENT_RUN_DIR, 'transformed_data')
TRANSFORM_ARTIFACTS_DIR = os.path.join(EXPERIMENT_RUN_DIR, 'transform_artifacts')

We use some BQ util functions defined in 'src/common'.

In [12]:
ML_USE = 'UNASSIGNED'
LIMIT = 1000000
raw_data_query = datasource_utils.get_training_source_query(
    project=PROJECT, 
    region=REGION, 
    dataset_display_name=DATASET_DISPLAY_NAME, 
    ml_use=ML_USE, 
    limit=LIMIT
)

etl_job_name = f"etl-{MODEL_DISPLAY_NAME}-{run_id}"

args = {
    'job_name': etl_job_name,
    'runner': 'DataflowRunner',
    'raw_data_query': raw_data_query,
    'exported_data_prefix': EXPORTED_DATA_PREFIX,
    'transformed_data_prefix': TRANSFORMED_DATA_PREFIX,
    'transform_artifact_dir': TRANSFORM_ARTIFACTS_DIR,
    'write_raw_data': False,
    'temporary_dir': os.path.join(WORKSPACE, 'tmp'),
    'gcs_location': os.path.join(WORKSPACE, 'bq_tmp'),
    'project': PROJECT,
    'region': REGION,
    'setup_file': './setup.py'
}

E0330 12:56:24.959175874       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:56:26.811459095       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


This is how you can log parameters related to an experiment. https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_get_experiment_df

In [13]:
vertex_ai.log_params(args)

E0330 12:56:28.968112749       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:56:30.973634260       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


We use three components: Apache Beam (running on Dataflow), [TF Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started), [TF Transform](https://www.tensorflow.org/tfx/transform/get_started).

In [14]:
logging.getLogger().setLevel(logging.ERROR)

print("Data preprocessing started...")
etl.run_transform_pipeline(args)
print("Data preprocessing completed.")

Data preprocessing started...




  temp_location = pcoll.pipeline.options.view_as(


Instructions for updating:
Use ref() instead.
Instructions for updating:
Use ref() instead.


2022-03-30 12:56:38.324586: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-03-30 12:56:38.324638: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-03-30 12:56:38.324673: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vm-508f776f-5512-42a9-b529-8989959ded1b): /proc/driver/nvidia/version does not exist
2022-03-30 12:56:38.324974: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  



2022-03-30 12:56:53.387389: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-03-30 12:56:53.388307: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2299995000 Hz




E0330 12:57:02.076690102       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


E0330 12:57:02.660253698       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:57:05.360688289       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:57:11.248496124       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 12:57:13.022598060       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Data preprocessing completed.


In [15]:
!gsutil ls {EXPERIMENT_RUN_DIR}

E0330 14:01:12.585079433       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/
gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/transform_artifacts/
gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/transformed_data/


## 4. Submit a Custom Training Job to Vertex AI

In [16]:
LOG_DIR = os.path.join(EXPERIMENT_RUN_DIR, 'logs')
EXPORT_DIR = os.path.join(EXPERIMENT_RUN_DIR, 'model')

### Prepare training package

You can train custom models using a custom Python script, custom Python package, or container. We go for **Python package**.

Please note that the model has been already developed by someone else (data scientist) and it's not at all related to Vertex AI or pipelines. The model is based on TensorFlow and stored in 'src/model_training'.

In [17]:
TRAINER_PACKAGE_DIR = os.path.join(WORKSPACE, 'trainer_packages')
TRAINER_PACKAGE_NAME = f'{MODEL_DISPLAY_NAME}_trainer'
print("Trainer package upload location:", TRAINER_PACKAGE_DIR)

Trainer package upload location: gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/trainer_packages


In [18]:
!rm -r src/__pycache__/
!rm -r src/.ipynb_checkpoints/
!rm -r src/raw_schema/.ipynb_checkpoints/
!rm -f {TRAINER_PACKAGE_NAME}.tar {TRAINER_PACKAGE_NAME}.tar.gz

!mkdir {TRAINER_PACKAGE_NAME}

!cp setup.py {TRAINER_PACKAGE_NAME}/
!cp -r src {TRAINER_PACKAGE_NAME}/
!tar cvf {TRAINER_PACKAGE_NAME}.tar {TRAINER_PACKAGE_NAME}
!gzip {TRAINER_PACKAGE_NAME}.tar
!gsutil cp {TRAINER_PACKAGE_NAME}.tar.gz {TRAINER_PACKAGE_DIR}/
!rm -r {TRAINER_PACKAGE_NAME}
!rm -r {TRAINER_PACKAGE_NAME}.tar.gz

E0330 14:01:27.652485116       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:28.765200982       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


rm: cannot remove 'src/.ipynb_checkpoints/': No such file or directory


E0330 14:01:29.844955263       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


rm: cannot remove 'src/raw_schema/.ipynb_checkpoints/': No such file or directory


E0330 14:01:30.899875023       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:31.972771395       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:33.316606690       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:34.519262214       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:35.966238915       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


chicago-taxi-tips-classifier-v01_trainer/
chicago-taxi-tips-classifier-v01_trainer/src/
chicago-taxi-tips-classifier-v01_trainer/src/raw_schema/
chicago-taxi-tips-classifier-v01_trainer/src/raw_schema/schema.pbtxt
chicago-taxi-tips-classifier-v01_trainer/src/pipeline_triggering/
chicago-taxi-tips-classifier-v01_trainer/src/pipeline_triggering/main.py
chicago-taxi-tips-classifier-v01_trainer/src/pipeline_triggering/requirements.txt
chicago-taxi-tips-classifier-v01_trainer/src/pipeline_triggering/__init__.py
chicago-taxi-tips-classifier-v01_trainer/src/tfx_pipelines/
chicago-taxi-tips-classifier-v01_trainer/src/tfx_pipelines/runner.py
chicago-taxi-tips-classifier-v01_trainer/src/tfx_pipelines/config.py
chicago-taxi-tips-classifier-v01_trainer/src/tfx_pipelines/__pycache__/
chicago-taxi-tips-classifier-v01_trainer/src/tfx_pipelines/__pycache__/config.cpython-37.pyc
chicago-taxi-tips-classifier-v01_trainer/src/tfx_pipelines/__pycache__/training_pipeline.cpython-37.pyc
chicago-taxi-tips-cla

E0330 14:01:37.121740128       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:38.233388048       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Copying file://chicago-taxi-tips-classifier-v01_trainer.tar.gz [Content-Type=application/x-tar]...
/ [1 files][ 44.3 KiB/ 44.3 KiB]                                                
Operation completed over 1 objects/44.3 KiB.                                     


E0330 14:01:42.789543388       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:01:43.888697767       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


### Prepare the training job

We use a pre-built image running on CPU only.

In [19]:
TRAIN_RUNTIME = 'tf-cpu.2-5'
TRAIN_IMAGE = f"us-docker.pkg.dev/vertex-ai/training/{TRAIN_RUNTIME}:latest"
print("Training image:", TRAIN_IMAGE)

Training image: us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-5:latest


In [20]:
num_epochs = 10
learning_rate = 0.001
hidden_units = "64,64"

trainer_args = [
    f'--train-data-dir={TRANSFORMED_DATA_PREFIX + "/train/*"}',
    f'--eval-data-dir={TRANSFORMED_DATA_PREFIX + "/eval/*"}',
    f'--tft-output-dir={TRANSFORM_ARTIFACTS_DIR}',
    f'--num-epochs={num_epochs}',
    f'--learning-rate={learning_rate}',
    f'--project={PROJECT}',
    f'--region={REGION}',
    f'--staging-bucket={BUCKET}',
    f'--experiment-name={EXPERIMENT_NAME}'
]

Here we specify the Python training module to execute and we specify the HW configuration

In [21]:
package_uri = os.path.join(TRAINER_PACKAGE_DIR, f'{TRAINER_PACKAGE_NAME}.tar.gz')

worker_pool_specs = [
    {
        "replica_count": 1,
        "machine_spec": {
            "machine_type": 'n1-standard-4',
            "accelerator_count": 0
    },
        "python_package_spec": {
            "executor_image_uri": TRAIN_IMAGE,
            "package_uris": [package_uri],
            "python_module": "src.model_training.task",
            "args": trainer_args,
        }
    }
]

### Submit the training job

Inside the trainer code we've specified how to save the created model (in a GCS folder).

In [22]:
print("Submitting a custom training job...")

training_job_display_name = f"{TRAINER_PACKAGE_NAME}_{run_id}"

training_job = vertex_ai.CustomJob(
    display_name=training_job_display_name,
    worker_pool_specs=worker_pool_specs,
    base_output_dir=EXPERIMENT_RUN_DIR,
)

training_job.run(
    service_account=SERVICE_ACCOUNT,
    tensorboard=tensorboard_resource_name,
    sync=True
)

Submitting a custom training job...


E0330 14:01:53.642137538       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


## 5. Upload exported model to Vertex AI Models

In [23]:
!gsutil ls {EXPORT_DIR}

E0330 14:12:42.874320526       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/model/
gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/model/keras_metadata.pb
gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/model/saved_model.pb
gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/model/assets/
gs://grandelli-demo-295810-partner-training-2022/chicago-taxi-tips/experiments/chicago-taxi-tips-classifier-v01/run-gcp-20220330125601/model/variables/


### Generate the Explanation metadata

In [24]:
explanation_config = features.generate_explanation_config()
explanation_config

{'inputs': {'trip_month': {'input_tensor_name': 'trip_month',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'trip_day': {'input_tensor_name': 'trip_day',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'trip_day_of_week': {'input_tensor_name': 'trip_day_of_week',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'trip_hour': {'input_tensor_name': 'trip_hour',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'trip_seconds': {'input_tensor_name': 'trip_seconds', 'modality': 'numeric'},
  'trip_miles': {'input_tensor_name': 'trip_miles', 'modality': 'numeric'},
  'payment_type': {'input_tensor_name': 'payment_type',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'pickup_grid': {'input_tensor_name': 'pickup_grid',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'dropoff_grid': {'input_tensor_name': 'dropoff_grid',
   'encoding': 'IDENTITY',
   'modality': 'categorical'},
  'euclidean': {'input_tensor_name': 'euclidean'

### Upload model

We upload the model saved in GCS to Vertex AI. We specify the serving environment (it's typically the same of the training one).

In [25]:
SERVING_RUNTIME='tf2-cpu.2-5'
SERVING_IMAGE = f"us-docker.pkg.dev/vertex-ai/prediction/{SERVING_RUNTIME}:latest"
print("Serving image:", SERVING_IMAGE)

Serving image: us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-5:latest


In [26]:
explanation_metadata = vertex_ai.explain.ExplanationMetadata(
    inputs=explanation_config["inputs"],
    outputs=explanation_config["outputs"],
)
explanation_parameters = vertex_ai.explain.ExplanationParameters(
    explanation_config["params"]
)

vertex_model = vertex_ai.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    artifact_uri=EXPORT_DIR,
    serving_container_image_uri=SERVING_IMAGE,
    parameters_schema_uri=None,
    instance_schema_uri=None,
    explanation_metadata=explanation_metadata,
    explanation_parameters=explanation_parameters,
    labels={
        'dataset_name': DATASET_DISPLAY_NAME,
        'experiment': run_id
    }
)

E0330 14:13:03.178516870       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:17:08.280295660       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


In [27]:
vertex_model.gca_resource

name: "projects/155283586619/locations/us-central1/models/2580162364250783744"
display_name: "chicago-taxi-tips-classifier-v01"
predict_schemata {
}
metadata {
}
container_spec {
  image_uri: "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-5:latest"
}
supported_deployment_resources_types: DEDICATED_RESOURCES
supported_input_storage_formats: "jsonl"
supported_input_storage_formats: "csv"
supported_input_storage_formats: "tf-record"
supported_input_storage_formats: "tf-record-gzip"
supported_input_storage_formats: "file-list"
supported_output_storage_formats: "jsonl"
create_time {
  seconds: 1648649585
  nanos: 538720000
}
update_time {
  seconds: 1648649815
  nanos: 15556000
}
etag: "AMEw9yM3N5vAkZNva6ZMHBIJn_lJj0Hfy13u6VhuPMB_2weaFi2wPj9_p7iFGlqgPwc="
labels {
  key: "dataset_name"
  value: "chicago-taxi-tips"
}
labels {
  key: "experiment"
  value: "run-gcp-20220330125601"
}
supported_export_formats {
  id: "custom-trained"
  exportable_contents: ARTIFACT
}
explanation_spec {
  para

## 6. Extract experiment run parameters

Again metadata, but this time related to the experiments.

In [28]:
experiment_df = vertex_ai.get_experiment_df()
experiment_df = experiment_df[experiment_df.experiment_name == EXPERIMENT_NAME]
experiment_df.T

E0330 14:17:42.589766506       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:17:44.849559050       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:17:46.973743023       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:17:48.909385655       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0330 14:17:50.809069380       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


Unnamed: 0,0,1,2
experiment_name,chicago-taxi-tips-classifier-v01,chicago-taxi-tips-classifier-v01,chicago-taxi-tips-classifier-v01
run_name,run-gcp-20220330140544,run-gcp-20220330125601,run-gcp-20220330125531
param.model_dir,gs://grandelli-demo-295810-partner-training-20...,,
param.region,us-central1,us-central1,
param.staging_bucket,grandelli-demo-295810-partner-training-2022,,
param.train_data_dir,gs://grandelli-demo-295810-partner-training-20...,,
param.experiment_name,chicago-taxi-tips-classifier-v01,,
param.learning_rate,0.001,,
param.hidden_units,"[64.0, 32.0]",,
param.run_name,,,


In [30]:
print("Vertex AI Experiments:")
print(
    f"https://console.cloud.google.com/vertex-ai/locations{REGION}/experiments/{EXPERIMENT_NAME}/metrics?project={PROJECT}"
)

Vertex AI Experiments:
https://console.cloud.google.com/vertex-ai/locationsus-central1/experiments/chicago-taxi-tips-classifier-v01/metrics?project=grandelli-demo-295810


## 7. Submit a Hyperparameter Tuning Job to Vertex AI

For more information about configuring a hyperparameter study, refer to [Vertex AI Hyperparameter job configuration](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning).

### Configure a hyperparameter job

In [31]:
metric_spec = {
    'ACCURACY': 'maximize'
}

parameter_spec = {
    'learning-rate': hp_tuning.DoubleParameterSpec(min=0.0001, max=0.01, scale='log'),
    'hidden-units': hp_tuning.CategoricalParameterSpec(values=["32,32", "64,64", "128,128"])
}

In [32]:
tuning_job_display_name = f"hpt_{TRAINER_PACKAGE_NAME}_{run_id}"

hp_tuning_job = vertex_ai.HyperparameterTuningJob(
    display_name=tuning_job_display_name,
    custom_job=training_job,
    metric_spec=metric_spec,
    parameter_spec=parameter_spec,
    max_trial_count=4,
    parallel_trial_count=2,
    search_algorithm=None # Bayesian optimization.
)

E0330 14:18:25.048926947       1 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


### Submit the hyperparameter tuning job

In [33]:
print("Submitting a hyperparameter tunning job...")

hp_tuning_job.run(
    service_account=SERVICE_ACCOUNT,
    tensorboard=tensorboard_resource_name,
    restart_job_on_worker_restart=False,
    sync=True,
)

Submitting a hyperparameter tunning job...


### Retrieve trial results

In [34]:
hp_tuning_job.trials

[id: "1"
 state: SUCCEEDED
 parameters {
   parameter_id: "hidden-units"
   value {
     string_value: "64,64"
   }
 }
 parameters {
   parameter_id: "learning-rate"
   value {
     number_value: 0.0010000000000000002
   }
 }
 final_measurement {
   step_count: 5120
   metrics {
     metric_id: "ACCURACY"
     value: 0.8840206861495972
   }
 }
 start_time {
   seconds: 1648649922
   nanos: 562028496
 }
 end_time {
   seconds: 1648650359
 },
 id: "2"
 state: SUCCEEDED
 parameters {
   parameter_id: "hidden-units"
   value {
     string_value: "128,128"
   }
 }
 parameters {
   parameter_id: "learning-rate"
   value {
     number_value: 0.0027540527548299987
   }
 }
 final_measurement {
   step_count: 5120
   metrics {
     metric_id: "ACCURACY"
     value: 0.8848657608032227
   }
 }
 start_time {
   seconds: 1648649922
   nanos: 562176393
 }
 end_time {
   seconds: 1648650352
 },
 id: "3"
 state: SUCCEEDED
 parameters {
   parameter_id: "hidden-units"
   value {
     string_value: "128,

In [35]:
best_trial = sorted(
    hp_tuning_job.trials, 
    key=lambda trial: trial.final_measurement.metrics[0].value, 
    reverse=True
)[0]

print("Best trial ID:", best_trial.id)
print("Validation Accuracy:", best_trial.final_measurement.metrics[0].value)
print("Hyperparameter Values:")
for parameter in best_trial.parameters:
    print(f" - {parameter.parameter_id}:{parameter.value}")

Best trial ID: 3
Validation Accuracy: 0.8852288722991943
Hyperparameter Values:
 - hidden-units:128,128
 - learning-rate:0.000331131598752814
