# Vertex AI Blog Post - Vertex AI Model monitoring capabilities 

## Overview
This notbook accompanies the blog post <"Title"> posted at <a href="https://cloud.google.com/blog/products/ai-machine-learning" target="_blank">this link</a>

***

In this notebook we will show how you can use Vertex AI client library to train, deploy and monitor ML models


The steps performed include:
* Create a dataset in Vertex AI and import Bank Marketing data from Google Cloud Storage bucket
* Train a binary classification model to predict propensity of a bank customer to open a 'deposit'
* For prediction serving through Vetex AI, create an Endpoint and deploy the model trained above in it
* Enable Vertex AI Monitoring on the above Endpoint and configure what features to be monitored and the tolerance levels for notifications
* Generate an artificially skewed dataset and submit that for predictions to trigger alerts in Vertex AI Monitoring

#### Dataset Details
Dataset that will be use: Bank Marketing dataset    
**Dataset Source** : [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

*Reference:*
Training code source : [Link](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/master/notebooks/community/gapic/automl/showcase_automl_tabular_binary_classification_batch.ipynb)

### Before you begin - Install Python dependencies

In [5]:
#! pip3 install -U google-cloud-aiplatform $USER_FLAG
#! pip3 install -U google-cloud-storage $USER_FLAG

#####  Restart the Kernel

In [6]:
if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

In [6]:
import os
import sys
import pandas as pd
import json

import time

from google.cloud.aiplatform import gapic as aip
from google.protobuf import json_format
from google.protobuf.json_format import MessageToJson, ParseDict
from google.protobuf.struct_pb2 import Struct, Value

### GCP Configurations

Enter the details for your GCP project in below section

#### Region

Below are some of the regions supported for Vertex. You can see the full list <a href="https://cloud.google.com/vertex-ai/docs/general/locations" target="_blank">here</a>.

We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

In [7]:
REGION = "us-central1" 
PROJECT_ID = "vertex-ai-blog"  #Replace with your GCP Project
print("Project ID:", PROJECT_ID)

Project ID: vertex-ai-blog


In [8]:
! gcloud config set project $PROJECT_ID

Updated property [core/project].


#### Setup variables

In [9]:
# API service endpoint
API_ENDPOINT = "{}-aiplatform.googleapis.com".format(REGION)

# Vertex location root path for your dataset, model and endpoint resources
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

#### Hardware Accelerators

Set the hardware accelerators (e.g., GPU), if any, for prediction.

Set the variable `DEPLOY_GPU/DEPLOY_NGPU` to use a container image supporting a GPU and the number of GPUs allocated to the virtual machine (VM) instance. 
For example, to use a GPU container image with 4 Nvidia Telsa K80 GPUs allocated to each VM, you would specify:
    (aip.AcceleratorType.NVIDIA_TESLA_K80, 4)

For GPU, available accelerators include:
   - aip.AcceleratorType.NVIDIA_TESLA_K80
   - aip.AcceleratorType.NVIDIA_TESLA_P100
   - aip.AcceleratorType.NVIDIA_TESLA_P4
   - aip.AcceleratorType.NVIDIA_TESLA_T4
   - aip.AcceleratorType.NVIDIA_TESLA_V100

You can also specify `(None, None)` to use a container image to run on a CPU.

In [10]:
DEPLOY_GPU, DEPLOY_NGPU = (aip.AcceleratorType.NVIDIA_TESLA_K80, 1)
MACHINE_TYPE = "n1-standard"
VCPU = "4"
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU

#### Timestamp

We will use Timestamp by attaching it to model and endpoint names to make them unique and avoid name collisions with any existing assets.

In [11]:
from datetime import datetime
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Create GCS Bucket

In [20]:
BUCKET_NAME = "gs://vertex-ai-blog" 

In [None]:
!gsutil mb -l $REGION $BUCKET_NAME

In [None]:
#Test access to the bucket
!gsutil ls -al $BUCKET_NAME

### Create client instances for key tasks to be performed

In this section we will create the 

In [12]:
# client options - same for all services
client_options = {"api_endpoint": API_ENDPOINT}

# Create client instances for key tasks to be performed
def create_dataset_client():
    client = aip.DatasetServiceClient(client_options=client_options)
    return client


def create_model_client():
    client = aip.ModelServiceClient(client_options=client_options)
    return client


def create_pipeline_client():
    client = aip.PipelineServiceClient(client_options=client_options)
    return client

# Needed for batch prediction
def create_job_client():
    client = aip.JobServiceClient(client_options=client_options)
    return client

# Endpoint creation
def create_endpoint_client():
    client = aip.EndpointServiceClient(client_options=client_options)
    return client

# Needed for Prediction call
def create_prediction_client():
    client = aip.PredictionServiceClient(client_options=client_options)
    return client


clients = {}
clients["dataset"] = create_dataset_client()
clients["model"] = create_model_client()
clients["pipeline"] = create_pipeline_client()
clients["job"] = create_job_client()
clients["endpoint"] = create_endpoint_client()
clients["prediction"] = create_prediction_client()

### Create Dataset
This section sows how you can create a dataset in Vertex AI, to be used for training

In [13]:
# Bank marketing Dataset
# Dataset Source: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict 
# the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

IMPORT_FILE = "gs://cloud-ml-tables-data/bank-marketing.csv"

In [None]:
#Alternate way of creating dataset in Vertex
from typing import List, Union

from google.cloud import aiplatform

def create_and_import_dataset_tabular_gcs_sample(
    display_name: str, project: str, location: str, gcs_source: Union[str, List[str]],):

    aiplatform.init(project=project, location=location)

    dataset = aiplatform.TabularDataset.create(
        display_name=display_name, gcs_source=gcs_source,)

    dataset.wait()

    print(f'\tDataset: "{dataset.display_name}"')
    print(f'\tname: "{dataset.resource_name}"')
    
    return(dataset.resource_name)

In [None]:
dataset_id = create_and_import_dataset_tabular_gcs_sample("bank-" + TIMESTAMP, PROJECT_ID, REGION, IMPORT_FILE)

### Create training pipeline


In [None]:
def create_pipeline(pipeline_name, model_name, dataset, schema, task):

    dataset_id = dataset.split("/")[-1]

    input_config = {
        "dataset_id": dataset_id,
        "fraction_split": {
            "training_fraction": 0.8,
            "validation_fraction": 0.1,
            "test_fraction": 0.1,
        },
    }

    training_pipeline = {
        "display_name": pipeline_name,
        "training_task_definition": schema,
        "training_task_inputs": task,
        "input_data_config": input_config,
        "model_to_upload": {"display_name": model_name},
    }

    try:
        pipeline = clients["pipeline"].create_training_pipeline(
            parent=PARENT, training_pipeline=training_pipeline
        )
        print(pipeline)
    except Exception as e:
        print("exception:", e)
        return None
    return pipeline

In [None]:
label_column = 'Deposit'

In [None]:
TRANSFORMATIONS = [
    {"auto": {"column_name": "Age"}},
    {"auto": {"column_name": "Job"}},
    {"auto": {"column_name": "MaritalStatus"}},
    {"auto": {"column_name": "Education"}},
    {"auto": {"column_name": "Default"}},
    {"auto": {"column_name": "Balance"}},
    {"auto": {"column_name": "Housing"}},
    {"auto": {"column_name": "Loan"}},
    {"auto": {"column_name": "Contact"}},
    {"auto": {"column_name": "Day"}},
    {"auto": {"column_name": "Month"}},
    {"auto": {"column_name": "Duration"}},
    {"auto": {"column_name": "Campaign"}},
    {"auto": {"column_name": "PDays"}},
    {"auto": {"column_name": "POutcome"}},
]

#### AutoML constant

Set constants unique to AutoML datasets and training:
- Dataset Training Schema: Tells the `Pipeline` resource service the task (e.g., classification) to train the model for.

In [14]:
TRAINING_SCHEMA = "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_tables_1.0.0.yaml"

In [None]:
PIPE_NAME = "bank_pipe-" + TIMESTAMP
MODEL_NAME = "bank_model-" + TIMESTAMP

task = Value(
    struct_value=Struct(
        fields={
            "target_column": Value(string_value=label_column),
            "prediction_type": Value(string_value="classification"),
            "train_budget_milli_node_hours": Value(number_value=1000),
            "disable_early_stopping": Value(bool_value=False),
            "transformations": json_format.ParseDict(TRANSFORMATIONS, Value()),
        }
    )
)

response = create_pipeline(PIPE_NAME, MODEL_NAME, dataset_id, TRAINING_SCHEMA, task)

Now save the unique identifier of the training pipeline you created.

In [None]:
# The full unique ID for the pipeline
pipeline_id = response.name
# The short numeric ID for the pipeline
pipeline_short_id = pipeline_id.split("/")[-1]

print(pipeline_id)

In [None]:
def get_training_pipeline(name, silent=False):
    response = clients["pipeline"].get_training_pipeline(name=name)
    if silent:
        return response

    print("pipeline")
    print(" name:", response.name)
    print(" display_name:", response.display_name)
    print(" state:", response.state)
    print(" training_task_definition:", response.training_task_definition)
    print(" training_task_inputs:", dict(response.training_task_inputs))
    print(" create_time:", response.create_time)
    print(" start_time:", response.start_time)
    print(" end_time:", response.end_time)
    print(" update_time:", response.update_time)
    print(" labels:", dict(response.labels))
    return response


response = get_training_pipeline(pipeline_id)

In [None]:
#Monitor when the training job would be completed
while True:
    response = get_training_pipeline(pipeline_id, True)
    if response.state != aip.PipelineState.PIPELINE_STATE_SUCCEEDED:
        print("Training job has not completed:", response.state)
        model_to_deploy_id = None
        if response.state == aip.PipelineState.PIPELINE_STATE_FAILED:
            raise Exception("Training Job Failed")
    else:
        model_to_deploy = response.model_to_upload
        model_to_deploy_id = model_to_deploy.name
        print("Training Time:", response.end_time - response.start_time)
        break
    time.sleep(60)

print("model to deploy:", model_to_deploy_id)

### Evaluation

In [None]:
def list_model_evaluations(name):
    response = clients["model"].list_model_evaluations(parent=name)
    for evaluation in response:
        print("model_evaluation")
        print(" name:", evaluation.name)
        print(" metrics_schema_uri:", evaluation.metrics_schema_uri)
        metrics = json_format.MessageToDict(evaluation._pb.metrics)
        for metric in metrics.keys():
            print(metric)
        print("logloss", metrics["logLoss"])
        print("auPrc", metrics["auPrc"])

    return evaluation.name

last_evaluation = list_model_evaluations(model_to_deploy_id)

In [15]:
#For Testing - Delete
#model_to_deploy_id = "projects/92852031310/locations/us-central1/models/4681676530605096960"
model_to_deploy_id = "projects/92852031310/locations/us-central1/models/2880518154633609216"




### Create Endpoint and Deploy Model for online predictions and monitoring demo


Before you can set up monitoring for your inference solution, you need to create an API endpoint in Vertex AI and deploy the trained model in this endpoint.

In [16]:
MIN_NODES = 1
MAX_NODES = 1

In [17]:
ENDPOINT_NAME = "bank_endpoint-" + TIMESTAMP


def create_endpoint(display_name):
    endpoint = {"display_name": display_name}
    response = clients["endpoint"].create_endpoint(parent=PARENT, endpoint=endpoint)
    print("Long running operation:", response.operation.name)

    result = response.result(timeout=300)
    print("result")
    print(" name:", result.name)
    print(" display_name:", result.display_name)
    print(" description:", result.description)
    print(" labels:", result.labels)
    print(" create_time:", result.create_time)
    print(" update_time:", result.update_time)
    return result


result = create_endpoint(ENDPOINT_NAME)

Long running operation: projects/92852031310/locations/us-central1/endpoints/7861297032365867008/operations/4242402668733005824
result
 name: projects/92852031310/locations/us-central1/endpoints/7861297032365867008
 display_name: 
 description: 
 labels: {}
 create_time: None
 update_time: None


In [18]:
# The full unique ID for the endpoint
endpoint_id = result.name
# The short numeric ID for the endpoint
endpoint_short_id = endpoint_id.split("/")[-1]

print(endpoint_id)

projects/92852031310/locations/us-central1/endpoints/7861297032365867008




#### Next step is to deploy the model we trained in previous steps on to the Vertex AI Endpoint


In [None]:
DEPLOYED_NAME = "bank_deployed-" + TIMESTAMP


def deploy_model(
    model, deployed_model_display_name, endpoint, traffic_split={"0": 100}):

    if DEPLOY_GPU:
        machine_spec = {
            "machine_type": DEPLOY_COMPUTE,
            "accelerator_type": DEPLOY_GPU,
            "accelerator_count": DEPLOY_NGPU,
        }
    else:
        machine_spec = {
            "machine_type": DEPLOY_COMPUTE,
            "accelerator_count": 0,
        }

    deployed_model = {
        "model": model,
        "display_name": deployed_model_display_name,
        "dedicated_resources": {
            "min_replica_count": MIN_NODES,
            "max_replica_count": MAX_NODES,
            "machine_spec": machine_spec,
        },
        "disable_container_logging": False,
    }

    response = clients["endpoint"].deploy_model(
        endpoint=endpoint, deployed_model=deployed_model, traffic_split=traffic_split
    )

    print("Long running operation:", response.operation.name)
    result = response.result()
    print("result")
    deployed_model = result.deployed_model
    print(" deployed_model")
    print("  id:", deployed_model.id)
    print("  model:", deployed_model.model)
    print("  display_name:", deployed_model.display_name)
    print("  create_time:", deployed_model.create_time)

    return deployed_model.id


deployed_model_id = deploy_model(model_to_deploy_id, DEPLOYED_NAME, endpoint_id)

## Online Prediction

To test that the  Vertex AI Endpoint is active and serving predictions correctly, we can submit a prediction request

Let's create a sample profile for which we want to predict the probability of customer opening a deposit

In [47]:
INSTANCE = {
    "Age": "58",
    "Job": "managment",
    "MaritalStatus": "married",
    "Education": "teritary",
    "Default": "no",
    "Balance": "2143",
    "Housing": "yes",
    "Loan": "no",
    "Contact": "unknown",
    "Day": "5",
    "Month": "may",
    "Duration": "261",
    "Campaign": "1",
    "PDays": "-1",
    "Previous": 0,
    "POutcome": "unknown",
}

In [51]:
# Define prediction function

def predict_item(data, endpoint, parameters_dict, verbose = 0):
    parameters = json_format.ParseDict(parameters_dict, Value())

    # The format of each instance should conform to the deployed model's prediction input schema.
    instances_list = [data]
    instances = [json_format.ParseDict(s, Value()) for s in instances_list]

    response = clients["prediction"].predict(
        endpoint=endpoint, instances=instances, parameters=parameters
    )
    
    predictions = response.predictions
    for prediction in predictions:
        score = dict(prediction)
    
    if (verbose == 1):
        print("deployed_model_id:", response.deployed_model_id)
        print("predictions: ", score)
        #print(score)


    return(score)


In [53]:
# Run a test prediction
predict_item(INSTANCE, endpoint_id, None, verbose=1)

deployed_model_id: 2918974673326702592
predictions:  {'classes': ['1', '2'], 'scores': [0.9964883923530579, 0.003511558985337615]}


{'classes': ['1', '2'], 'scores': [0.9964883923530579, 0.003511558985337615]}

## Add monitoring to endpoint

Now let's configure and enable the Vertex Monitoring feature for certain Features in our model

In [None]:
# Temporary.  Delete
#ENDPOINT = endpoint_id
#DEFAULT_INPUT = INSTANCE

In [None]:
import pprint as pp
print(ENDPOINT)
print("request:")
pp.pprint(DEFAULT_INPUT)
try:
    #resp = send_predict_request(ENDPOINT, DEFAULT_INPUT)
    resp = predict_item(INSTANCE, endpoint_id, None)
    print("response")
    pp.pprint(resp)
except Exception:
    print("prediction request failed")

### Monitoring Config

First, let's set is to setup Monitoring configurations

* Sample rate: [Optional, Default = 0.8] - This defines the percentage of all incoming requests that will be logged and analyzed to detect anomalies(skew and drift) in the incoming requests. This ensures an efficient usage of resources, especially for solutions which have a high volume of incoming requests.
* Monitor Interval: [Optional, Default = 24 Hrs] - Time intervals at which the monitoring job should run
* User email: [Optional] - Provide this if you want email notifications to be sent out for monitoring alerts
* Dataset - Training dataset to calculate baseline distributions for features being monitored 
* Feature Thresholds - Names of the features to be monitored and threshold values that need to be used to trigger monitoring alerts

In [20]:

USER_EMAIL = "jasmeetbhatia@google.com"  # @param {type:"string"}
JOB_NAME = "bank_marketing_monitor"

# Sampling rate (optional, default=.8)
LOG_SAMPLE_RATE = 0.8  # @param {type:"number"}

# Monitoring Interval in seconds (optional, default=3600).
MONITOR_INTERVAL = 3600  # @param {type:"number"}


# URI to training dataset.
DATASET_GCS_URI = ['gs://cloud-ml-tables-data/bank-marketing.csv'] # @param {type:"string"}

# Prediction target column name in training dataset.
TARGET = "Deposit"

# Skew and drift thresholds.
SKEW_DEFAULT_THRESHOLDS = "Age,Job,Balance,Education"  # @param {type:"string"}
SKEW_CUSTOM_THRESHOLDS = "Balance:.5"  # @param {type:"string"}
DRIFT_DEFAULT_THRESHOLDS = "Age,Job,Balance,Education"  # @param {type:"string"}
DRIFT_CUSTOM_THRESHOLDS = "Balance:.5"  # @param {type:"string"}

### Create Monitoring Job

Once Monitoring is enabled on a Vertex AI endpoint, system will start logging a subset of the prediction requests coming to the Vertex endpoint and analyze them periodically(depending on the interval value).

In [21]:
def create_monitoring_job(objective_configs):
    # Create sampling configuration.
    random_sampling = SamplingStrategy.RandomSampleConfig(sample_rate=LOG_SAMPLE_RATE)
    sampling_config = SamplingStrategy(random_sample_config=random_sampling)

    # Create schedule configuration.
    duration = Duration(seconds=MONITOR_INTERVAL)
    schedule_config = ModelDeploymentMonitoringScheduleConfig(monitor_interval=duration)

    # Create alerting configuration.
    emails = [USER_EMAIL]
    email_config = ModelMonitoringAlertConfig.EmailAlertConfig(user_emails=emails)
    alerting_config = ModelMonitoringAlertConfig(email_alert_config=email_config)

    # Create the monitoring job.
    #endpoint = f"projects/{PROJECT_ID}/locations/{REGION}/endpoints/{ENDPOINT_ID}"
    endpoint = f"{endpoint_id}"
    predict_schema = ""
    analysis_schema = ""
    job = ModelDeploymentMonitoringJob(
        display_name=JOB_NAME,
        endpoint=endpoint,
        model_deployment_monitoring_objective_configs=objective_configs,
        logging_sampling_strategy=sampling_config,
        model_deployment_monitoring_schedule_config=schedule_config,
        model_monitoring_alert_config=alerting_config,
        predict_instance_schema_uri=predict_schema,
        analysis_instance_schema_uri=analysis_schema,
    )
    options = dict(api_endpoint=API_ENDPOINT)
    client = JobServiceClient(client_options=options)
    parent = f"projects/{PROJECT_ID}/locations/{REGION}"
    response = client.create_model_deployment_monitoring_job(
        parent=parent, model_deployment_monitoring_job=job
    )
    print("Created monitoring job:")
    print(response)
    return response


def get_thresholds(default_thresholds, custom_thresholds):
    thresholds = {}
    default_threshold = ThresholdConfig(value=DEFAULT_THRESHOLD_VALUE)
    for feature in default_thresholds.split(","):
        feature = feature.strip()
        thresholds[feature] = default_threshold
    for custom_threshold in custom_thresholds.split(","):
        pair = custom_threshold.split(":")
        if len(pair) != 2:
            print(f"Invalid custom skew threshold: {custom_threshold}")
            return
        feature, value = pair
        thresholds[feature] = ThresholdConfig(value=float(value))
    return thresholds


def get_deployed_model_ids(endpoint_id):
    client_options = dict(api_endpoint=API_ENDPOINT)
    client = EndpointServiceClient(client_options=client_options)
    #parent = f"projects/{PROJECT_ID}/locations/{REGION}"
    #response = client.get_endpoint(name=f"{parent}/endpoints/{endpoint_id}")
    response = client.get_endpoint(name=f"{endpoint_id}")
    model_ids = []
    for model in response.deployed_models:
        model_ids.append(model.id)
    return model_ids


def set_objectives(model_ids, objective_template):
    # Use the same objective config for all models.
    objective_configs = []
    for model_id in model_ids:
        objective_config = copy.deepcopy(objective_template)
        objective_config.deployed_model_id = model_id
        objective_configs.append(objective_config)
    return objective_configs


def send_predict_request(endpoint, input):
    client_options = {"api_endpoint": PREDICT_API_ENDPOINT}
    client = PredictionServiceClient(client_options=client_options)
    params = {}
    params = json_format.ParseDict(params, Value())
    request = PredictRequest(endpoint=endpoint, parameters=params)
    inputs = [json_format.ParseDict(input, Value())]
    request.instances.extend(inputs)
    response = client.predict(request)
    return response


def list_monitoring_jobs():
    client_options = dict(api_endpoint=API_ENDPOINT)
    parent = f"projects/{PROJECT_ID}/locations/us-central1"
    client = JobServiceClient(client_options=client_options)
    response = client.list_model_deployment_monitoring_jobs(parent=parent)
    print(response)


def pause_monitoring_job(job):
    client_options = dict(api_endpoint=API_ENDPOINT)
    client = JobServiceClient(client_options=client_options)
    response = client.pause_model_deployment_monitoring_job(name=job)
    print(response)


def delete_monitoring_job(job):
    client_options = dict(api_endpoint=API_ENDPOINT)
    client = JobServiceClient(client_options=client_options)
    response = client.delete_model_deployment_monitoring_job(name=job)
    print(response)

In [28]:
#Test. Delete
endpoint_id = 'projects/92852031310/locations/us-central1/endpoints/9018722136600084480'


In [30]:

import copy

from google.cloud.aiplatform_v1beta1.services.endpoint_service import EndpointServiceClient
from google.cloud.aiplatform_v1beta1.services.job_service import JobServiceClient
from google.cloud.aiplatform_v1beta1.services.prediction_service import PredictionServiceClient
from google.cloud.aiplatform_v1beta1.types.io import BigQuerySource
from google.cloud.aiplatform_v1beta1.types.io import GcsSource
from google.cloud.aiplatform_v1beta1.types.model_deployment_monitoring_job import (
    ModelDeploymentMonitoringJob, ModelDeploymentMonitoringObjectiveConfig,
    ModelDeploymentMonitoringScheduleConfig)
from google.cloud.aiplatform_v1beta1.types.model_monitoring import (
    ModelMonitoringAlertConfig, ModelMonitoringObjectiveConfig, SamplingStrategy, ThresholdConfig)
from google.cloud.aiplatform_v1beta1.types.prediction_service import PredictRequest
from google.protobuf import json_format
from google.protobuf.duration_pb2 import Duration
from google.protobuf.struct_pb2 import Value

# This is the default value at which you would like the monitoring function to trigger an alert.
# In other words, this value fine tunes the alerting sensitivity. This threshold can be customized
# on a per feature basis but this is the global default setting.
DEFAULT_THRESHOLD_VALUE = 0.001

In [27]:
# Set thresholds specifying alerting criteria for training/serving skew and create config object.
skew_thresholds = get_thresholds(SKEW_DEFAULT_THRESHOLDS, SKEW_CUSTOM_THRESHOLDS)
skew_config = ModelMonitoringObjectiveConfig.TrainingPredictionSkewDetectionConfig(
    skew_thresholds=skew_thresholds
)

# Set thresholds specifying alerting criteria for serving drift and create config object.
drift_thresholds = get_thresholds(DRIFT_DEFAULT_THRESHOLDS, DRIFT_CUSTOM_THRESHOLDS)
drift_config = ModelMonitoringObjectiveConfig.PredictionDriftDetectionConfig(
    drift_thresholds=drift_thresholds
)

# Specify training dataset source location (used for schema generation). 
# BQ or Vertex Managed datasets can also be used as source
training_dataset = ModelMonitoringObjectiveConfig.TrainingDataset(target_field=TARGET)
training_dataset.data_format = 'csv'
training_dataset.gcs_source = GcsSource(uris=DATASET_GCS_URI)


# Aggregate the above settings into a ModelMonitoringObjectiveConfig object and use
# that object to adjust the ModelDeploymentMonitoringObjectiveConfig object.
objective_config = ModelMonitoringObjectiveConfig(
    training_dataset=training_dataset,
    training_prediction_skew_detection_config=skew_config,
    prediction_drift_detection_config=drift_config,
)
objective_template = ModelDeploymentMonitoringObjectiveConfig(
    objective_config=objective_config
)

# Find all deployed model ids on the created endpoint and set objectives for each.
#model_ids = get_deployed_model_ids(ENDPOINT_ID)
model_ids = get_deployed_model_ids(endpoint_id)
objective_configs = set_objectives(model_ids, objective_template)

# Create the monitoring job for all deployed models on this endpoint.
monitoring_job = create_monitoring_job(objective_configs)

Created monitoring job:
name: "projects/92852031310/locations/us-central1/modelDeploymentMonitoringJobs/3598571815338770432"
display_name: "bank_marketing_monitor"
endpoint: "projects/92852031310/locations/us-central1/endpoints/9018722136600084480"
state: JOB_STATE_PENDING
schedule_state: OFFLINE
model_deployment_monitoring_objective_configs {
  deployed_model_id: "2918974673326702592"
  objective_config {
    training_dataset {
      data_format: "csv"
      gcs_source {
        uris: "gs://cloud-ml-tables-data/bank-marketing.csv"
      }
      target_field: "Deposit"
    }
    training_prediction_skew_detection_config {
      skew_thresholds {
        key: "Age"
        value {
          value: 0.001
        }
      }
      skew_thresholds {
        key: "Balance"
        value {
          value: 0.5
        }
      }
      skew_thresholds {
        key: "Education"
        value {
          value: 0.001
        }
      }
      skew_thresholds {
        key: "Job"
        value {
   

### Generate Skewed Data

To introduce an artificial skew in the data, we will filter the dataset to exclude all records for users above the age of 25.

In [98]:
data = pd.read_csv(IMPORT_FILE)

In [99]:
data.describe()

Unnamed: 0,Age,Balance,Day,Duration,Campaign,PDays,Previous,Deposit
count,45211.0,45211.0,45211.0,45211.0,45211.0,45211.0,45211.0,45211.0
mean,40.93621,1362.272058,15.806419,258.16308,2.763841,40.197828,0.580323,1.116985
std,10.618762,3044.765829,8.322476,257.527812,3.098021,100.128746,2.303441,0.321406
min,18.0,-8019.0,1.0,0.0,1.0,-1.0,0.0,1.0
25%,33.0,72.0,8.0,103.0,1.0,-1.0,0.0,1.0
50%,39.0,448.0,16.0,180.0,2.0,-1.0,0.0,1.0
75%,48.0,1428.0,21.0,319.0,3.0,-1.0,0.0,1.0
max,95.0,102127.0,31.0,4918.0,63.0,871.0,275.0,2.0


In [35]:
#[Data_Stats](img/image_file_name_here.png)

In [122]:
#To showcase drift, let's select only the records of people younger than 25
skewed_data = data[data['Age']<=25]
#skewed_data.reset_index(drop=True,inplace=True)
skewed_data

Unnamed: 0,Age,Job,MaritalStatus,Education,Default,Balance,Housing,Loan,Contact,Day,Month,Duration,Campaign,PDays,Previous,POutcome,Deposit
23,25,services,married,secondary,no,50,yes,no,unknown,5,may,342,1,-1,0,unknown,1
36,25,blue-collar,married,secondary,no,-7,yes,no,unknown,5,may,365,1,-1,0,unknown,1
54,24,technician,single,secondary,no,-103,yes,yes,unknown,5,may,145,1,-1,0,unknown,1
135,23,blue-collar,married,secondary,no,94,yes,no,unknown,5,may,193,1,-1,0,unknown,1
246,22,blue-collar,single,secondary,no,0,yes,no,unknown,5,may,179,2,-1,0,unknown,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45170,19,student,single,primary,no,245,no,no,telephone,10,nov,98,2,110,2,other,1
45189,25,services,single,secondary,no,199,no,no,cellular,16,nov,173,1,92,5,failure,1
45196,25,student,single,secondary,no,358,no,no,cellular,16,nov,330,1,-1,0,unknown,2
45203,23,student,single,tertiary,no,113,no,no,cellular,17,nov,266,1,-1,0,unknown,2


In [123]:
skewed_data.describe()

Unnamed: 0,Age,Balance,Day,Duration,Campaign,PDays,Previous,Deposit
count,1336.0,1336.0,1336.0,1336.0,1336.0,1336.0,1336.0,1336.0
mean,23.538174,897.97006,15.350299,271.773952,2.391467,36.50524,0.519461,1.239521
std,1.682261,1830.746005,8.350113,235.693767,3.357619,92.841309,1.498437,0.426951
min,18.0,-1414.0,1.0,3.0,1.0,-1.0,0.0,1.0
25%,23.0,83.75,8.0,115.0,1.0,-1.0,0.0,1.0
50%,24.0,361.0,15.0,204.0,2.0,-1.0,0.0,1.0
75%,25.0,988.5,22.0,346.25,3.0,-1.0,0.0,1.0
max,25.0,23878.0,31.0,1519.0,58.0,479.0,14.0,2.0


You can see how the feature distribution has changed due to the age based filter we applied above

### Trigger training serving skew by sending prediction requests with skewed data

Now we can submit this skewed dataset to the API endpoint created above. 
Since the prediction inputs we are going to send to Vertex API endpoints are skewed, at the next monitoring interval Vertex AI Monitoring system will detect the skew and generate alerts for the Features it is monitoring. Since we also configured an email address, it will also send the alerts to those email addresses.

In [124]:
input_records = pd.DataFrame()

In [125]:
#Convert to string types
input_records[["Age", "Job", "MaritalStatus", "Education", "Default", "Balance", "Housing", "Loan", "Contact", "Day", \
               "Month", "Duration","Campaign","PDays", "Previous","POutcome"]] = \
    skewed_data[["Age", "Job", "MaritalStatus", "Education", "Default", "Balance", "Housing", "Loan", "Contact", "Day", \
                 "Month", "Duration","Campaign","PDays", "Previous","POutcome"]].astype(str)

In [126]:
record_count = input_records.Age.count()
record_count

1336

In [127]:
#Convert dataframe to json format
result = input_records.to_json(orient="records")
parsed_input = json.loads(result)

In [None]:
#Send the records to prediction end-point
for i in range(0,record_count):
    resp = predict_item(parsed_input[i], endpoint_id, None, verbose=0)
    #print(resp['classes'][0])
    #print(resp['scores'][0])
print("Prediction requests submitted")

Submission of the skewed dataset as prediction requests would trigger the alerts for Training - Serving data skew at next Monitoring interval.

### Trigger Data Drift alert by sending prediction requests with data that has different distribution than previous requests

In [138]:
#To showcase drift, let's select only the records of people younger than 25
drift_data = data[data['Age']>=60]
#skewed_data.reset_index(drop=True,inplace=True)
drift_data

Unnamed: 0,Age,Job,MaritalStatus,Education,Default,Balance,Housing,Loan,Contact,Day,Month,Duration,Campaign,PDays,Previous,POutcome,Deposit
18,60,retired,married,primary,no,60,yes,no,unknown,5,may,219,1,-1,0,unknown,1
32,60,admin.,married,secondary,no,39,yes,yes,unknown,5,may,208,1,-1,0,unknown,1
42,60,blue-collar,married,unknown,no,104,yes,no,unknown,5,may,22,1,-1,0,unknown,1
66,60,retired,married,tertiary,no,100,no,no,unknown,5,may,528,1,-1,0,unknown,1
92,60,admin.,married,secondary,no,290,yes,no,unknown,5,may,583,1,-1,0,unknown,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45191,75,retired,divorced,tertiary,no,3810,yes,no,cellular,16,nov,262,1,183,1,failure,2
45195,68,retired,married,secondary,no,1146,no,no,cellular,16,nov,212,1,187,6,success,2
45204,73,retired,married,secondary,no,2850,no,no,cellular,17,nov,300,1,40,8,failure,2
45207,71,retired,divorced,primary,no,1729,no,no,cellular,17,nov,456,2,-1,0,unknown,2


In [139]:
input_records = pd.DataFrame()

In [140]:
#Convert to string types
input_records[["Age", "Job", "MaritalStatus", "Education", "Default", "Balance", "Housing", "Loan", "Contact", "Day", \
               "Month", "Duration","Campaign","PDays", "Previous","POutcome"]] = \
    drift_data[["Age", "Job", "MaritalStatus", "Education", "Default", "Balance", "Housing", "Loan", "Contact", "Day", \
                 "Month", "Duration","Campaign","PDays", "Previous","POutcome"]].astype(str)

In [141]:
record_count = input_records.Age.count()
record_count

1784

In [142]:
#Convert dataframe to json format
result = input_records.to_json(orient="records")
parsed_input = json.loads(result)

In [None]:
#Send the records to prediction end-point
for i in range(0,record_count):
    resp = predict_item(parsed_input[i], endpoint_id, None, verbose=0)
    #print(resp['classes'][0])
    #print(resp['scores'][0])
print("Prediction requests submitted")

Submission of this dataset as prediction requests would trigger the alerts for Data Drift at next Monitoring interval.

### Disable Monitoring

In [None]:
#If a monitoring job needs to be deleted, use below calls
pause_monitoring_job(monitoring_job.name)
delete_monitoring_job(monitoring_job.name)