# Deploy Query Model to online endpoint with Monitoring

### Steps in this notebook:

* Deploy Query model to online prediction endpoint
* Setup model monitoring for online prediction endpoint

## Load env config

In [1]:
# naming convention for all cloud resources
VERSION        = "v1"                  # TODO
PREFIX         = f'ndr-{VERSION}'      # TODO

print(f"PREFIX = {PREFIX}")

PREFIX = ndr-v1


In [2]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "ndr-v1"
VERSION                  = "v1"

APP                      = "sp"
MODEL_TYPE               = "2tower"
FRAMEWORK                = "tfrs"
DATA_VERSION             = "v1"
TRACK_HISTORY            = "5"

BUCKET_NAME              = "ndr-v1-hybrid-vertex-bucket"
BUCKET_URI               = "gs://ndr-v1-hybrid-vertex-bucket"
SOURCE_BUCKET            = "spotify-million-playlist-dataset"

DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://ndr-v1-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"

CANDIDATE_PREFIX         = "candidates"
TRAIN_DIR_PREFIX      

#### Edit these:

In [3]:
CREATE_NEW_ASSETS          = False # True | False
ENABLE_XAI_MONITORING = False # True | False

In [4]:
# local-train-v1/run-20230919-150451/candidates/candidate_embeddings.json

EXPERIMENT_NAME       = "tfrs-pipe-v1"         # local-train-v1" 
RUN_NAME              = "run-20230919-173845"  # "run-20230919-150451"

RUN_DIR_PATH = f'{EXPERIMENT_NAME}/{RUN_NAME}'

print(f"EXPERIMENT_NAME : {EXPERIMENT_NAME}")
print(f"RUN_NAME        : {RUN_NAME}")
print(f"RUN_DIR_PATH    : {RUN_DIR_PATH}")

EXPERIMENT_NAME : tfrs-pipe-v1
RUN_NAME        : run-20230919-173845
RUN_DIR_PATH    : tfrs-pipe-v1/run-20230919-173845


## Imports

In [5]:
import os
import sys
import time
import numpy as np

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 

# google cloud SDKs
from google.cloud import storage
from google.cloud import aiplatform as vertex_ai
from google.cloud.aiplatform import model_monitoring

import tensorflow as tf

# this repo
from src.two_tower_jt import test_instances as test_instances
from util import feature_set_utils as feature_utils

In [6]:
vertex_ai.init(project=PROJECT_ID, location=LOCATION)

storage_client = storage.Client(project=PROJECT_ID)

# Deploy Query Model

## Register Query model to Vertex Model Registry

**TODO:** parametrize new vs existing assets

```
model = vertex_ai.Model.list(filter=f"display_name=bqml_fraud_classifier")[-1]
```

In [42]:
QUERY_MODEL_DIR = f"{BUCKET_URI}/{RUN_DIR_PATH}/model-dir/query_model"

print(f"QUERY_MODEL_DIR: {QUERY_MODEL_DIR}")

QUERY_MODEL_DIR: gs://ndr-v1-hybrid-vertex-bucket/tfrs-pipe-v1/run-20230919-173845/model-dir/query_model


In [43]:
! gsutil ls $QUERY_MODEL_DIR

gs://ndr-v1-hybrid-vertex-bucket/tfrs-pipe-v1/run-20230919-173845/model-dir/query_model/
gs://ndr-v1-hybrid-vertex-bucket/tfrs-pipe-v1/run-20230919-173845/model-dir/query_model/fingerprint.pb
gs://ndr-v1-hybrid-vertex-bucket/tfrs-pipe-v1/run-20230919-173845/model-dir/query_model/saved_model.pb
gs://ndr-v1-hybrid-vertex-bucket/tfrs-pipe-v1/run-20230919-173845/model-dir/query_model/assets/
gs://ndr-v1-hybrid-vertex-bucket/tfrs-pipe-v1/run-20230919-173845/model-dir/query_model/variables/


In [52]:
if CREATE_NEW_ASSETS == True:
    
    uploaded_query_model = vertex_ai.Model.upload(
        display_name=f'query_model_{DISPLAY_NAME}',
        artifact_uri=QUERY_MODEL_DIR,
        serving_container_image_uri=SERVING_IMAGE_URI_CPU,
        description="Top of the query tower, meant to return an embedding for each playlist instance",
        sync=True,
    )
else:
    # use existing
    uploaded_query_model = vertex_ai.Model('projects/934903580331/locations/us-central1/models/2404541769992634368@1')

print(f"display_name         : {uploaded_query_model.display_name}\n")
print(f"uploaded_query_model : {uploaded_query_model}")

display_name         : query_model_tfrs_128dim_v1

uploaded_query_model : <google.cloud.aiplatform.models.Model object at 0x7fe6bb5a2690> 
resource name: projects/934903580331/locations/us-central1/models/2404541769992634368


In [336]:
# uploaded_query_model.

## Deploy registered model to online endpoint

**Create model endpoint**

In [53]:
if CREATE_NEW_ASSETS == True:
    
    endpoint = vertex_ai.Endpoint.create(
        display_name=f'endpoint_{DISPLAY_NAME}',
        project=PROJECT_ID,
        location=LOCATION,
        sync=True,
    )

else:
    endpoint = vertex_ai.Endpoint('projects/934903580331/locations/us-central1/endpoints/7270536031831588864')

print(f"display_name : {endpoint.display_name}\n")
print(f"endpoint     : {endpoint}")

display_name : endpoint_tfrs_128dim_v1

endpoint     : <google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb38bed0> 
resource name: projects/934903580331/locations/us-central1/endpoints/7270536031831588864


**Deploy to endpoint**

In [54]:
if CREATE_NEW_ASSETS == True:
    
    deployed_query_model = uploaded_query_model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=f'deployed_qmodel_{DISPLAY_NAME}',
        machine_type="n1-standard-4",
        min_replica_count=1,
        max_replica_count=2,
        accelerator_type=None,
        accelerator_count=0,
        sync=True,
    )

else:
    deployed_query_model = vertex_ai.Endpoint('projects/934903580331/locations/us-central1/endpoints/7270536031831588864')

print(f"display_name         : {deployed_query_model.display_name}\n")
print(f"deployed_query_model : {deployed_query_model}")

display_name         : endpoint_tfrs_128dim_v1

deployed_query_model : <google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb48eb10> 
resource name: projects/934903580331/locations/us-central1/endpoints/7270536031831588864


#### list all model endpoints

In [56]:
# list_of_model_endpoints = deployed_query_model.list()
# list_of_model_endpoints[:5]

[<google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb1eecd0> 
 resource name: projects/934903580331/locations/us-central1/endpoints/7270536031831588864,
 <google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb205d10> 
 resource name: projects/934903580331/locations/us-central1/endpoints/1164499362047328256,
 <google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb1fc3d0> 
 resource name: projects/934903580331/locations/us-central1/endpoints/2398485659946844160,
 <google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb1d5cd0> 
 resource name: projects/934903580331/locations/us-central1/endpoints/9099841905474142208,
 <google.cloud.aiplatform.models.Endpoint object at 0x7fe6bb199850> 
 resource name: projects/934903580331/locations/us-central1/endpoints/1785996110624456704]

#### list all models on a single endpoint

In [57]:
# list_of_model_endpoints = deployed_query_model.list_models()
# list_of_model_endpoints #[:5]

[id: "3605318418686803968"
 model: "projects/934903580331/locations/us-central1/models/2404541769992634368"
 display_name: "deployed_qmodel_tfrs_128dim_v1"
 create_time {
   seconds: 1695143566
   nanos: 585641000
 }
 dedicated_resources {
   machine_spec {
     machine_type: "n1-standard-4"
   }
   min_replica_count: 1
   max_replica_count: 2
 }
 model_version_id: "1"]

# Set Model Monitoring for Query Model Endpoint

### Define and create a Model Monitoring job
To set up either skew detection or drift detection, create a model deployment monitoring job.

The job requires the following specifications:

* `alert_config`: Configures how alerts are sent to the user. Right now only email alert is supported.
* `schedule_config`: Configures model monitoring job scheduling interval in hours. This defines how often the monitoring jobs are triggered.
* `logging_sampling_strategy`: Sample Strategy for logging.
* `drift_config`: Configures drift thresholds per each feature to monitor.
* `skew_config`: Configures skew thresholds per each feature to monitor.

#### Define the alerting configuration

The alerting configuration contains the mails to send alerts to. Also you can use the configuration to stream anomalies to Cloud Logging.

In [41]:
import spotipy_secret_creds as creds

USER_EMAILS = [creds.USER_EMAIL] #'recipient1@domain.com', 'recipient2@domain.com'
alert_config = model_monitoring.EmailAlertConfig(USER_EMAILS, enable_logging=True)
alert_config

<google.cloud.aiplatform.model_monitoring.alert.EmailAlertConfig at 0x7f4ae67c6190>

#### Define the schedule configuration

The schedule configuration sets the hourly model monitoring job scheduling interval.

> Sets the model monitoring job scheduling interval in hours. This defines how often the monitoring jobs are triggered.

In [42]:
MONITOR_INTERVAL = 1
schedule_config = model_monitoring.ScheduleConfig(monitor_interval=MONITOR_INTERVAL)
schedule_config

<google.cloud.aiplatform.model_monitoring.schedule.ScheduleConfig at 0x7f4b71ad9fd0>

#### Define the logging sample strategy

With the logging sample strategy, you configure how the model monitoring service randomly sample predictions to calculate monitoring metrics. The selected samples are logged to a BigQuery table.

In [43]:
SAMPLE_RATE = 0.8

logging_sampling_strategy = model_monitoring.RandomSampleConfig(sample_rate=SAMPLE_RATE)
logging_sampling_strategy

<google.cloud.aiplatform.model_monitoring.sampling.RandomSampleConfig at 0x7f4ae65a3fd0>

#### Define the drift detection configuration

With the drift detection configuration, you define the input features and the associated thresholds for monitoring feature distribution drift and (TODO) feature attribution drift.

In [46]:
feature_dict = feature_utils.get_all_features(TRACK_HISTORY, ranker=False)
# feature_dict

In [48]:
feature_names = list(feature_dict.keys())
# feature_names

In [50]:
DRIFT_THRESHOLD_VALUE = 0.05
ATTRIBUTION_DRIFT_THRESHOLD_VALUE = 0.05

# =========================== #
##   Feature value drift     ##
# =========================== #
drift_thresholds = dict()

for feature in feature_names:
    if feature in drift_thresholds:
        print("feature name already in dict")
    else:
        drift_thresholds[feature] = DRIFT_THRESHOLD_VALUE
        
print(f"drift_thresholds      : {drift_thresholds}\n")

# =========================== #
## Feature attribution drift ##
# =========================== #
# attr_drift_thresholds = dict()

# for feature in feature_names:
#     if feature in attr_drift_thresholds:
#         print("feature name already in dict")
#     else:
#         attr_drift_thresholds[feature] = ATTRIBUTION_DRIFT_THRESHOLD_VALUE

# print(f"attr_drift_thresholds : {attr_drift_thresholds}")

drift_thresholds      : {'track_uri_can': 0.05, 'track_name_can': 0.05, 'artist_uri_can': 0.05, 'artist_name_can': 0.05, 'album_uri_can': 0.05, 'album_name_can': 0.05, 'duration_ms_can': 0.05, 'track_pop_can': 0.05, 'artist_pop_can': 0.05, 'artist_genres_can': 0.05, 'artist_followers_can': 0.05, 'track_danceability_can': 0.05, 'track_energy_can': 0.05, 'track_key_can': 0.05, 'track_loudness_can': 0.05, 'track_mode_can': 0.05, 'track_speechiness_can': 0.05, 'track_acousticness_can': 0.05, 'track_instrumentalness_can': 0.05, 'track_liveness_can': 0.05, 'track_valence_can': 0.05, 'track_tempo_can': 0.05, 'track_time_signature_can': 0.05, 'pl_name_src': 0.05, 'pl_collaborative_src': 0.05, 'pl_duration_ms_new': 0.05, 'num_pl_songs_new': 0.05, 'num_pl_artists_new': 0.05, 'num_pl_albums_new': 0.05, 'track_uri_pl': 0.05, 'track_name_pl': 0.05, 'artist_uri_pl': 0.05, 'artist_name_pl': 0.05, 'album_uri_pl': 0.05, 'album_name_pl': 0.05, 'artist_genres_pl': 0.05, 'duration_ms_songs_pl': 0.05, 'tra

In [51]:
drift_config = model_monitoring.DriftDetectionConfig(
    drift_thresholds=drift_thresholds,
    # attribute_drift_thresholds=attr_drift_thresholds,
)

drift_config

<google.cloud.aiplatform.model_monitoring.objective.DriftDetectionConfig at 0x7f4ae4164390>

#### Define the skew detection configuration

With the skew detection configuration, you define the input features and the associated thresholds for monitoring feature distribution skew and feature attribution skew.

In [53]:
SKEW_THRESHOLD_VALUE = 0.05
ATTRIBUTION_SKEW_THRESHOLD_VALUE = 0.05

# =========================== #
##   Feature value skew      ##
# =========================== #
skew_thresholds = dict()

for feature in feature_names:
    if feature in skew_thresholds:
        print("feature name already in dict")
    else:
        skew_thresholds[feature] = SKEW_THRESHOLD_VALUE        
print(f"skew_thresholds      : {skew_thresholds}\n")

# =========================== #
## Feature attribution skew  ##
# =========================== #
# attr_skew_thresholds = dict()

# for feature in feature_names:
#     if feature in attr_skew_thresholds:
#         print("feature name already in dict")
#     else:
#         attr_skew_thresholds[feature] = ATTRIBUTION_SKEW_THRESHOLD_VALUE
# print(f"attr_skew_thresholds : {attr_skew_thresholds}")

skew_thresholds      : {'track_uri_can': 0.05, 'track_name_can': 0.05, 'artist_uri_can': 0.05, 'artist_name_can': 0.05, 'album_uri_can': 0.05, 'album_name_can': 0.05, 'duration_ms_can': 0.05, 'track_pop_can': 0.05, 'artist_pop_can': 0.05, 'artist_genres_can': 0.05, 'artist_followers_can': 0.05, 'track_danceability_can': 0.05, 'track_energy_can': 0.05, 'track_key_can': 0.05, 'track_loudness_can': 0.05, 'track_mode_can': 0.05, 'track_speechiness_can': 0.05, 'track_acousticness_can': 0.05, 'track_instrumentalness_can': 0.05, 'track_liveness_can': 0.05, 'track_valence_can': 0.05, 'track_tempo_can': 0.05, 'track_time_signature_can': 0.05, 'pl_name_src': 0.05, 'pl_collaborative_src': 0.05, 'pl_duration_ms_new': 0.05, 'num_pl_songs_new': 0.05, 'num_pl_artists_new': 0.05, 'num_pl_albums_new': 0.05, 'track_uri_pl': 0.05, 'track_name_pl': 0.05, 'artist_uri_pl': 0.05, 'artist_name_pl': 0.05, 'album_uri_pl': 0.05, 'album_name_pl': 0.05, 'artist_genres_pl': 0.05, 'duration_ms_songs_pl': 0.05, 'trac

In [54]:
# TRAIN_DATA_SOURCE_URI = f"gs://{BUCKET_NAME}/data/{DATA_VERSION}/{TRAIN_DIR_PREFIX}/"
# TRAIN_DATA_FORMAT = "tf-record"

TRAIN_DATA_SOURCE_URI = f"bq://{PROJECT_ID}.{BQ_DATASET}.{BQ_TABLE_TRAIN}"
TRAIN_DATA_FORMAT = None

if TRAIN_DATA_FORMAT:
    skew_config = model_monitoring.SkewDetectionConfig(
        data_source=TRAIN_DATA_SOURCE_URI,
        data_format = TRAIN_DATA_FORMAT,
        skew_thresholds=skew_thresholds,
        # attribute_skew_thresholds=attribute_skew_thresholds,
        # target_field=TARGET, # no target; embedding model
    )
else:
    skew_config = model_monitoring.SkewDetectionConfig(
        data_source=TRAIN_DATA_SOURCE_URI,
        # data_format = TRAIN_DATA_FORMAT, # only used if source in GCS
        skew_thresholds=skew_thresholds,
        # attribute_skew_thresholds=attribute_skew_thresholds,
        # target_field=TARGET, # no target; embedding model
    )
    
skew_config

<google.cloud.aiplatform.model_monitoring.objective.SkewDetectionConfig at 0x7f4ae416b310>

#### Define Explanation Config

* If you are enabling skew detection, upload your training data or output of a [batch explanation job](https://cloud.google.com/vertex-ai/docs/explainable-ai/getting-explanations#batch) for your training dataset to `Cloud Storage` or `BigQuery`. Obtain the URI link to the data. For drift detection, training data or explanation baseline isn't required.

* An imported custom-trained model must be [configured for Vertex Explainable AI](https://cloud.google.com/vertex-ai/docs/model-monitoring/monitor-explainable-ai#enable-feature-attribution-skew-or-drift-detection) when you create, import, or deploy the model.

* [Configure your model](https://cloud.google.com/vertex-ai/docs/explainable-ai/configuring-explanations) to use Vertex Explainable AI when you create, import, or deploy the model. The `ExplanationSpec.ExplanationParameters` field must be populated for your model.

In [55]:
if ENABLE_XAI_MONITORING:
    explanation_config = model_monitoring.ExplanationConfig()
else:
    explanation_config = None
    
explanation_config

### Create job config

In [56]:
objective_config = model_monitoring.ObjectiveConfig(
    skew_detection_config=skew_config,
    drift_detection_config=drift_config,
    explanation_config=explanation_config,
)

objective_config

<google.cloud.aiplatform.model_monitoring.objective.ObjectiveConfig at 0x7f4ae415bed0>

## Create Model Monitoring Job

In [57]:
JOB_DISPLAY_NAME = f"{MODEL_TYPE}_{PREFIX}_monitoring"
print(f"JOB_DISPLAY_NAME: {JOB_DISPLAY_NAME}")

monitoring_job = vertex_ai.ModelDeploymentMonitoringJob.create(
    display_name=JOB_DISPLAY_NAME,
    project=PROJECT_ID,
    location=REGION,
    endpoint=deployed_query_model,
    logging_sampling_strategy=logging_sampling_strategy,
    schedule_config=schedule_config,
    alert_config=alert_config,
    objective_configs=objective_config,
)

monitoring_job

JOB_DISPLAY_NAME: 2tower_ndr-v1_monitoring
Creating ModelDeploymentMonitoringJob


INFO:google.cloud.aiplatform.jobs:Creating ModelDeploymentMonitoringJob


ModelDeploymentMonitoringJob created. Resource name: projects/934903580331/locations/us-central1/modelDeploymentMonitoringJobs/5182827677073014784


INFO:google.cloud.aiplatform.jobs:ModelDeploymentMonitoringJob created. Resource name: projects/934903580331/locations/us-central1/modelDeploymentMonitoringJobs/5182827677073014784


To use this ModelDeploymentMonitoringJob in another session:


INFO:google.cloud.aiplatform.jobs:To use this ModelDeploymentMonitoringJob in another session:


mdm_job = aiplatform.ModelDeploymentMonitoringJob('projects/934903580331/locations/us-central1/modelDeploymentMonitoringJobs/5182827677073014784')


INFO:google.cloud.aiplatform.jobs:mdm_job = aiplatform.ModelDeploymentMonitoringJob('projects/934903580331/locations/us-central1/modelDeploymentMonitoringJobs/5182827677073014784')


View Model Deployment Monitoring Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/model-deployment-monitoring/5182827677073014784?project=934903580331


INFO:google.cloud.aiplatform.jobs:View Model Deployment Monitoring Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/model-deployment-monitoring/5182827677073014784?project=934903580331


<google.cloud.aiplatform.jobs.ModelDeploymentMonitoringJob object at 0x7f4ae415b5d0> 
resource name: projects/934903580331/locations/us-central1/modelDeploymentMonitoringJobs/5182827677073014784

Check the monitoring job state

You can check the status of the model monitoring job using the state attribute of the job instance.

In [62]:
JOB_DISPLAY_NAME

'2tower_ndr-v1_monitoring'

In [63]:
jobs = monitoring_job.list(filter=f"display_name={JOB_DISPLAY_NAME}")
job = jobs[0]
print(job.state)

JobState.JOB_STATE_PENDING


**Receiving email alert**

> After a minute or two, you should receive email at the address you configured above for `USER_EMAIL`. This email confirms successful deployment of your monitoring job.

**Monitoring results in the Cloud Console**

> After one hour, you can examine your model monitoring data from the Cloud Console.

**See `Notes` at end of notebook for details on interpreting Model Monitoring results**

# Test endpoint deployment

In [48]:
if TRACK_HISTORY == '5':
    TEST_INSTANCE = test_instances.TEST_INSTANCE_5
elif TRACK_HISTORY == '15':
    TEST_INSTANCE = test_instances.TEST_INSTANCE_15
else:
    TEST_INSTANCE = None
    print("Track History length not supported")
    
# TEST_INSTANCE

### Make prediction request

test single prediction request and response

In [50]:
response = deployed_query_model.predict(instances=[TEST_INSTANCE])

prediction = response[0]

# print the prediction for the first instance
print(prediction[0])

[-0.319549441, -0.736324608, -0.999139726, 0.936678231, 0.140084431, -0.413407475, 0.878894806, -0.970329106, -0.659263194, -1.60079765, 1.11298513, 0.534306586, 1.12248015, -0.79353, -0.221303761, 0.214388043, 0.851346672, -1.70991278, -0.428875, -0.796241462, -0.130663425, 0.679144442, 1.57031, -1.71820736, -0.0138283893, -0.696535349, 0.518329501, -1.51568925, -0.54820931, 0.00688769668, -1.16123128, 1.08391941, -0.113285184, 0.457706213, 0.0641227961, 0.416062444, -1.27625763, 0.214524657, -1.79184937, 0.368900865, -0.097617425, -2.05919147, -0.195343286, 0.136424914, -1.31718016, -0.237893417, 1.59560561, -0.966435671, 1.97090781, 0.787532568, -0.221562356, -0.302150905, 1.45196605, 0.0823364705, -1.3538276, 1.40799367, -1.17275703, 2.04082108, -0.43333602, 0.913677335, 0.126593262, -0.656877041, 0.239591926, 0.283293277, 0.875116467, -0.861238241, 0.537754834, 0.748203337, 0.236702815, -0.605949759, -0.857457638, -1.20023417, -1.0099895, -0.0130776241, -0.00597327948, -0.29366761

### Write (many) test instances to file

> test endpoint monitoring with >= 1000 prediction requests

In [91]:
PRED_REQUEST_N = 50
INTERVAL       = PRED_REQUEST_N // 2
SKIP_N         = INTERVAL

print(f"PRED_REQUEST_N : {PRED_REQUEST_N}")
print(f"INTERVAL       : {INTERVAL}")

PRED_REQUEST_N : 50
INTERVAL       : 25


In [92]:
valid_files = []
for blob in storage_client.list_blobs(f"{BUCKET_NAME}", prefix=f'data/{DATA_VERSION}/{VALID_DIR_PREFIX}/'):
    if '.tfrecords' in blob.name:
        valid_files.append(blob.public_url.replace("https://storage.googleapis.com/", "gs://"))
    
valid = tf.data.TFRecordDataset(valid_files)

valid_parsed = valid.map(feature_utils.parse_towers_tfrecord)
# valid_parsed

In [93]:
import numpy

subset_val = valid_parsed.skip(SKIP_N).take(PRED_REQUEST_N)

list_of_dicts = []

for tensor_dict in subset_val:
    list_dict = {}
    td_keys = tensor_dict.keys()
    for k in td_keys:
        
        value = tensor_dict[k].numpy()
        
        if type(value) == bytes:

            list_dict.update({k: value.decode()})
        
        elif type(value) == numpy.ndarray:
            
            if type(value[0]) != bytes:
                list_dict.update({k: value.tolist()})
            else:

                tmp_list = []

                for ele in value:
                    tmp_list.append(ele.decode())

                list_dict.update({k: tmp_list})
                
        elif type(value) == numpy.float32:
            list_dict.update({k: value.item()})
                
        else:
            list_dict.update({k: value})
            
        list_of_dicts.append(list_dict)
    
# list_dict
len(list_of_dicts)

2600

In [94]:
count = 0

for test in list_of_dicts:
    response = deployed_query_model.predict(instances=[test])
    
    if count > 0 and count % INTERVAL == 0:
        print(f"{count} prediciton requests..")
        
    count += 1
    
prediction = response[0]
# print the prediction for the first instance
print(prediction[0])

25 prediciton requests..
50 prediciton requests..
75 prediciton requests..
100 prediciton requests..
125 prediciton requests..
150 prediciton requests..
175 prediciton requests..
200 prediciton requests..
225 prediciton requests..
250 prediciton requests..
275 prediciton requests..
300 prediciton requests..
325 prediciton requests..
350 prediciton requests..
375 prediciton requests..
400 prediciton requests..
425 prediciton requests..
450 prediciton requests..
475 prediciton requests..
500 prediciton requests..
525 prediciton requests..
550 prediciton requests..
575 prediciton requests..
600 prediciton requests..
625 prediciton requests..
650 prediciton requests..
675 prediciton requests..
700 prediciton requests..
725 prediciton requests..
750 prediciton requests..
775 prediciton requests..
800 prediciton requests..
825 prediciton requests..
850 prediciton requests..
875 prediciton requests..
900 prediciton requests..
925 prediciton requests..
950 prediciton requests..
975 prediciton 

### Save test instances to pickle file

In [95]:
import pickle as pkl

LOCAL_INSTANCE_FILE = 'test_instance_list.pkl'

filehandler = open(LOCAL_INSTANCE_FILE, 'wb')
pkl.dump(list_of_dicts, filehandler)
filehandler.close()

In [96]:
filehandler = open(LOCAL_INSTANCE_FILE, 'rb')
LIST_OF_INSTANCES = pkl.load(filehandler)
filehandler.close()

In [99]:
# LIST_OF_INSTANCES[200]

In [100]:
ENDPOINT_TEST_SUBDIR = "endpoint-tests"

!gsutil -q cp $LOCAL_INSTANCE_FILE $BUCKET_URI/$ENDPOINT_TEST_SUBDIR/$LOCAL_INSTANCE_FILE

!gsutil ls $BUCKET_URI/$ENDPOINT_TEST_SUBDIR

gs://ndr-v1-hybrid-vertex-bucket/endpoint-tests/test_instance_list.pkl


# (Optional): Clean-up

In [None]:
# monitoring_job.pause()
# monitoring_job.delete()

In [None]:
# deployed_query_model.undeploy_all()
# deployed_query_model.delete()
# uploaded_query_model.delete()

# Notes

## Model Monitoring

### Cloud storage layout

> Notice the following components in these Cloud Storage paths:

* **cloud-ai-platform-** .. - This is a bucket created for you and assigned to capture your service's prediction data. Each monitoring job you create will trigger creation of a new folder in this bucket.
* **`model_monitoring|instance_schemas`/job-** .. - This is your unique monitoring job number, which you can see above in both the response to your job creation requesst and the email notification.
* **instance_schemas/job-** ../analysis - This is the monitoring jobs understanding and encoding of your training data's schema (field names, types, etc.).
* **instance_schemas/job-** ../predict - This is the first prediction made to your model after the current monitoring job was enabled.
* **model_monitoring/job-** ../serving - This folder is used to record data relevant to drift calculations. It contains measurement summaries for every hour your model serves traffic.
* **model_monitoring/job-** ../training - This folder is used to record data relevant to training-serving skew calculations. It contains an ongoing summary of prediction data relative to training data.
* **model_monitoring/job-** ../feature_attribution_score - This folder is used to record data relevant to feature attribution calculations. It contains an ongoing summary of feature attribution scores relative to training data.

### Interpret your results

Vertex AI Model Monitoring detects an anomaly when the threshold set for a feature is exceeded. The following cells give you a sense of the alerting and reporting experience after model monitoring anomalies have been detected.

Vertex AI Model Monitoring automatically notifies you of detected anomalies through email, but you can also [set up alerts through Cloud Logging](https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring#monitor-job).

### Learn more about model monitoring

**Congratulations!** You've now learned what model monitoring is, how to configure and enable it, and how to find and interpret the results. Check out the following resources to learn more about model monitoring and ML Ops.

- [TensorFlow Data Validation](https://www.tensorflow.org/tfx/guide/tfdv)
- [Data Understanding, Validation, and Monitoring At Scale](https://blog.tensorflow.org/2018/09/introducing-tensorflow-data-validation.html)
- [Vertex Product Documentation](https://cloud.google.com/vertex-ai)
- [Vertex AI Model Monitoring Reference Docs](https://cloud.google.com/vertex-ai/docs/reference)
- [Vertex AI Model Monitoring blog article](https://cloud.google.com/blog/topics/developers-practitioners/monitor-models-training-serving-skew-vertex-ai)
- [Explainable AI Whitepaper](https://storage.googleapis.com/cloud-ai-whitepapers/AI%20Explainability%20Whitepaper.pdf)