# Deploying a model to Vertex AI and getting predictions from the model

In this notebook, we will train and deploy an ML model in Google Cloud, and get predictions from our model.


## Prerequisites
**Note:** This notebook and repository are supporting artifacts for the "Google Machine Learning and Generative AI for Solutions Architects" book. The book describes the concepts associated with this notebook, and for some of the activities, the book contains instructions that should be performed before running the steps in the notebooks. Each top-level folder in this repo is associated with a chapter in the book. Please ensure that you have read the relevant chapter sections before performing the activities in this notebook.

**There are also important generic prerequisite steps outlined [here](https://github.com/PacktPublishing/Google-Machine-Learning-for-Solutions-Architects/blob/main/Prerequisite-steps/Prerequisites.ipynb).**


**Attention:** The code in this notebook creates Google Cloud resources that can incur costs.

Refer to the Google Cloud pricing documentation for details.

For example:

* [Vertex AI Pricing](https://cloud.google.com/vertex-ai/pricing)
* [Google Cloud Storage Pricing](https://cloud.google.com/storage/pricing)
* [BigQuery Pricing](https://cloud.google.com/bigquery/pricing)


First install the latest version of the Vertex AI library

## Install required packages

In [None]:
!pip install --upgrade google.cloud.aiplatform --user --quiet

In [None]:
!pip uninstall protobuf -y
!pip install protobuf==3.19.* --quiet

*The pip installation commands sometimes report various errors. Those errors usually do not affect the activities in this notebook, and you can ignore them.*


## Restart the kernel

The code in the next cell will retart the kernel, which is sometimes required after installing/upgrading packages.

**When prompted, click OK to restart the kernel.**

The sleep command simply prevents further cells from executing before the kernel restarts.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)


In [None]:
import time
time.sleep(10)

# (Wait for kernel to restart before proceeding...)

## Import required libraries

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import fetch_california_housing

## Set Google Cloud resource variables

The following code will set variables specific to your Google Cloud resources that will be used in this notebook, such as the Project ID, Region, and GCS Bucket.

**Note: This notebook is intended to execute in a Vertex AI Workbench Notebook, in which case the API calls issued in this notebook are authenticated according to the permissions (e.g., service account) assigned to the Vertex AI Workbench Notebook.**

We will use the `gcloud` command to get the Project ID details from the local Google Cloud project, and assign the results to the PROJECT_ID variable. If, for any reason, PROJECT_ID is not set, you can set it manually or change it, if preferred.

We also use a default bucket name for most of the examples and activities in this book, which has the format: `{PROJECT_ID}-aiml-sa-bucket`. You can change the bucket name if preferred.

Also, we're defaulting to the **us-central1** region, but you can optionally replace this with your [preferred region](https://cloud.google.com/about/locations).

In [None]:
PROJECT_ID_DETAILS = !gcloud config get-value project
PROJECT_ID = PROJECT_ID_DETAILS[0]  # The project ID is item 0 in the list returned by the gcloud command
BUCKET=f"{PROJECT_ID}-aiml-sa-bucket" # Optional: replace with your preferred bucket name, which must be a unique name.
REGION="us-central1" # Optional: replace with your preferred region (See: https://cloud.google.com/about/locations) 
print(f"Project ID: {PROJECT_ID}")
print(f"Bucket Name: {BUCKET}")

## Create bucket

The following code will create the bucket if it doesn't already exist.

If you get an error saying that it already exists, that's fine, you can ignore it and continue with the rest of the steps, unless you want to use a different bucket.

In [None]:
!gsutil mb -l us-central1 gs://{BUCKET}

## Begin implementation

Now that we have performed the prerequisite steps for this activity, it's time to implement the activity.

In [None]:
# Other re-usable variables (no need to change these)
TEST_DATA_FILENAME="housing_test_data.jsonl"
MODEL_NAME="housing_model"
TEST_DATA_LOCATION=f"gs://{BUCKET}/data/deployment-chapter"
MODEL_LOCATION=f"gs://{BUCKET}/models/deployment-chapter/tensorflow"
OUTPUT_LOCATION=f"gs://{BUCKET}/outputs/deployment-chapter"

# Load the dataset
housing = fetch_california_housing()

# Standardize the features
scaler = StandardScaler()
data_scaled = scaler.fit_transform(housing.data)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(data_scaled, housing.target)

## Create and train the TensorFlow model

Next, we actually training a model using TensorFlow.
Again, you can ignore the warnings if any are shown when you execute the code. You will also see results from each training epoch.

In [None]:
tf_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
    tf.keras.layers.Dense(1)
])

tf_model.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.SGD(0.01))
history = tf_model.fit(X_train, y_train, epochs=20, validation_data=(X_test, y_test))

## Save the trained model
Next, we save the trained model locally.

In [None]:
tf_model.save(MODEL_NAME)

Note the model input names to be used when creating the batch prediction test data later in this notebook:

In [None]:
model_input_names = tf_model.input_names

## Copy the model to GCS

We copy our model to GCS to that we can use it with Vertex AI prediction service.

In [None]:
!gsutil cp -r $MODEL_NAME $MODEL_LOCATION/$MODEL_NAME

## Create a Google Cloud Vertex AI Model object

The process of creating a Google Cloud Vertex AI Model object from our trained model involves several steps:

1. Serialize the model into a format that Vertex AI can understand.
2. Upload the serialized model to our Google Cloud Storage bucket.
3. Create a Vertex AI Model resource, pointing it to the serialized model in the Cloud Storage bucket. This step will add the model to the Vertex AI Model Registry.

The code to do that is as follows:

In [None]:
from google.cloud import aiplatform

# Initialize the Vertex AI SDK
aiplatform.init(project=PROJECT_ID, location=REGION)

# Create a Vertex AI Model resource
model = aiplatform.Model.upload(
    display_name=MODEL_NAME,
    artifact_uri=f"{MODEL_LOCATION}/{MODEL_NAME}",
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-10:latest",
)

## Convert the test dataset to JSON Lines to be used with our model

Our model expects the test data to be in JSON Lines format. The following code converts it accordingly, and writes it to a file that we can then upload to GCS.

In [None]:
import json

# Convert X_test to a DataFrame
df = pd.DataFrame(X_test, columns=housing.feature_names)

# Convert the DataFrame to a list of dict records
records = df.to_dict('records')

# Write out the records in JSON Lines format
with open(f'{TEST_DATA_FILENAME}', 'w') as f:
    for record in records:
        json.dump({model_input_names[0]: list(record.values())}, f)
        f.write('\n')

## Upload the file to GCS

In [None]:
!gsutil cp $TEST_DATA_FILENAME $TEST_DATA_LOCATION/$TEST_DATA_FILENAME

## Create a batch prediction job to get predictions from our model

Next, we create the actual batch prediction job on Vertex AI. The job may take 20 minutes or more to complete. This is because Vertex AI spins up the infrastructure to run our batch job, such as the servers and containers, as well as networking infrastructure, then loads our model and input data, and executes our batch prediction job. The execution time also depends on the size of your model and input data. For example, a production job that processes large amounts of data may run for a much longer time.

The status of the job will be displayed periodically below this code cell. If all goes well, you will eventually see a status message saying "JobState.JOB_STATE_SUCCEEDED".

In [None]:
job = aiplatform.BatchPredictionJob.create(
    job_display_name="housing_prediction",
    model_name=model.resource_name,
    gcs_source=f"{TEST_DATA_LOCATION}/{TEST_DATA_FILENAME}",
    gcs_destination_prefix=OUTPUT_LOCATION,
    instances_format="jsonl",
    predictions_format="jsonl",
    machine_type="n1-standard-4",
)

## Get the location of the prediction results in GCS

In [None]:
JOB_OUTPUT_DIRECTORY_PATH=job.output_info.gcs_output_directory

In [None]:
output_directory_name=JOB_OUTPUT_DIRECTORY_PATH.split("/")[-1]

## Copy the predictions from GCS into our notebook for inspection

In [None]:
!gsutil -m cp -r $JOB_OUTPUT_DIRECTORY_PATH .

## Load and print out the predictions

We'll just print the first 10 lines to view a subset of the prediction outputs.

In [None]:
line_count = 0 

with open(f"{output_directory_name}/prediction.results-00000-of-00001") as f:
    for line in f:
        if line_count < 10:
            print(json.loads(line))
            line_count += 1

In each line, the 'prediction' is our model's predicted price for that house, given the input features in that line.

Now that we have our model defined, we can run a batch inference job whenever we wish, such as immediately when new data becomes available, or every night (with new data from the prior day).

# Online Inference

Next, let's deploy our model for online inference. I recommend referencing chapter 1 of the book at this point, in which I discuss what would be required to host your models on your own infrastructure. It would require a lot of work and financial investment!

**In vertex AI, however, it's only going to take one simple line of code!**

This is astonishingly simple, considering all of the work it's going to do on our behalf in order to put our model into production.

Also notice how easy it is to enable autoscaling for our model. All we have to do is specify the minimum and maximum number of nodes we want to configure, and Vertex AI will take care of the rest. We do this by configuring min_replica_count and max_replica_count.

The following piece of code will reference our model in the Vertex AI Model Registry, and deploy it to a Vertex AI Prediction endpoint for online inference.

In [None]:
endpoint = model.deploy(machine_type="n1-standard-4", min_replica_count=1, max_replica_count=3)

## Get our endpoint ID

In [None]:
ENDPOINT_ID = endpoint.name
print(ENDPOINT_ID)

In [None]:
dir(endpoint)

In [None]:
ENDPOINT_NAME = endpoint.display_name
print(ENDPOINT_NAME)

## Get our deployed model ID

We can get a list of models deployed to our endpoint. At this point, we only have one model deployed.

In [None]:
deployed_models = endpoint.list_models()
deployed_model_id = deployed_models[0].id
print(deployed_model_id)

## Create and execute an online inference request

### Specify the data to be used in our inference request

Earlier in this notebook, we already split our housing dataset into multiple subsets as part of the training process.
One subset is X_test, which represents the portion of our dataset that was not used to train our model (i.e., it was reserved for testing purposes), and which only contains the housing features (i.e., it does not contain the target label column, "price").

We will use elements (or observations) from that dataset to test our model.

First, let's test with just a single observation from the dataset, which represents a single house in our dataset. To make things simple, we will use the first house in the dataset.

Note that X_test is a NumPy array, but our model expects to receive a list of float numbers as input, so we will convert the input to a list of float numbers:

In [None]:
test_instance = X_test[0].tolist()

### Execute the prediction request

The next piece of code sends a prediction request to our endpoint, using the input we defined in the previous piece of code above.

In [None]:
response = endpoint.predict([test_instance])

# Print out the prediction
print("Prediction result:", response.predictions)

There you have it! We've successfully send an inference request to a model that is hosted on an endpoint in Vertex AI. That inference request contained details of a specific house in our dataset, and our model returned it's predicted price for that house, based on the inputs provided in our request!

We could also repeat this process to get multiple predictions in a single request, if we'd like.

In this case, we take the first three instances from our X_test dataset, convert each one a list of float numbers, and send those details in our request.
Our model then returns a list containing the predictions for each of our inputs:

In [None]:
test_instances = [x.tolist() for x in X_test[:3]]

# Make the prediction request
response = endpoint.predict(test_instances)

# Print out the prediction
print("Prediction result:", response.predictions)

# A/B Testing

Next, let's test a new version of our model and compare it against our current model using A/B testing.

We'll repeat previous steps in this notebook, and create a new model that was trained using 100 epochs instead of 20.

Then we'll see how that model performs in comparison to our prior model.

In [None]:
MODEL_100_NAME="housing_model_100"

model_100 = tf.keras.models.Sequential([
    tf.keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
    tf.keras.layers.Dense(1)
])

model_100.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.SGD(0.01))
history = model_100.fit(X_train, y_train, epochs=100, validation_data=(X_test, y_test))

model_100.save(MODEL_100_NAME)

In [None]:
!gsutil cp -r $MODEL_100_NAME $MODEL_LOCATION/$MODEL_100_NAME

In [None]:
# Create a Vertex AI Model resource
model_100 = aiplatform.Model.upload(
    display_name=MODEL_100_NAME,
    artifact_uri=f"{MODEL_LOCATION}/{MODEL_100_NAME}",
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-10:latest",
)

## Note current traffic split settings

In order to understand how the traffic_split settings work, let's check the current configuration before we deploy another model to our endpoint.
We currently only have one model deployed, so that model should be getting 100% of the traffic.

In [None]:
endpoint.traffic_split

## Deploy our new model to our endpoint

We will deploy our new model to the same endpoint that we aleady deployed. We do this with the line that says *endpoint=endpoint*.
If we omitted that line then Vertex AI would create a new dedicated endpoint for our new model.

In [None]:
model_100.deploy(
        endpoint=endpoint,
        deployed_model_display_name=MODEL_100_NAME,
        machine_type="n1-standard-4",
        min_replica_count=1, 
        max_replica_count=3
    )

## Check the traffic split settings

If we don’t set any value for the traffic_split variable, the default behavior is to keep all traffic directed to the original model that was already deployed to our endpoint. This is a safety mechanism that prevents unexpected behavior in terms of how our models serve traffic from our clients.

In [None]:
endpoint.traffic_split

We can see that we now have two deployed models, but the newly deployed model is not yet receiving any traffic.
Let's take a note of both deployed model IDs, so we can use them to update the traffic_split configuration.

In [None]:
deployed_models = endpoint.list_models()
model_ids = []

for deployed_model in deployed_models:
    model_ids.append(deployed_model.id)

model_ids

## Update the traffic split settings
A common approach would be to test our new model with a small portion of our traffic; perhaps 10%.

Let's update our traffic split settings to allocate 10% of traffic to our new model.

In [None]:
traffic_split = {model_ids[0]: 90, model_ids[1]: 10}

endpoint.update(traffic_split=traffic_split)

## Enable prediction request-response logging

The prediction request-response logging feature will log our models’ responses for the prediction requests received. We can save those responses in a Google Cloud BigQuery table, which enables us to perform analysis on the prediction responses from each of our models, and see how they are performing.

### Create BigQuery table for the logs

The next two cells are technically optional but it can take some time for the BigQuery table to be created, so it's better for our purposes here to specify the BigQuery table details and create it explicitly in this manner.
Without this part, a dataset and table would be automatically created after some time.

The BigQuery table needs to be created in this specific way, due to how the prediction request-response logging feature currently works (it has strict and rigid requirements for how the table and schema need to be created).

In [None]:
from google.cloud import bigquery

# Initialize a BigQuery client
client = bigquery.Client(PROJECT_ID)

# Construct a BigQuery client object.
client = bigquery.Client()

# Specify the dataset_id within the project
dataset_id = f'cpt10_{ENDPOINT_ID}'

# Create a DatasetReference using a chosen dataset ID
dataset_ref = client.dataset(dataset_id)

# Construct a full Dataset object to send to the API
dataset = bigquery.Dataset(dataset_ref)

# Specify the geographic location where the dataset should reside
dataset.location = REGION

# Specify the table_id within the dataset
table_id = 'request_response_logging'

log_table_ref_id = f'{PROJECT_ID}.{dataset_id}.{table_id}'

# Create the new dataset
try:
    dataset = client.create_dataset(dataset)  # Make an API request
    print(f"Created dataset {client.project}.{dataset.dataset_id}")
except google.api_core.exceptions.Conflict:
    print(f"Dataset {client.project}.{dataset.dataset_id} already exists")

# Set your query
create_table_query = f"""
CREATE TABLE `{log_table_ref_id}` (
  endpoint STRING,
  deployed_model_id STRING,
  logging_time TIMESTAMP,
  request_id NUMERIC
)
"""

# Execute the query
query_job = client.query(create_table_query)  

# Wait for the job to complete
query_job.result() 

print("Table created successfully.")


In [None]:
table = client.get_table(log_table_ref_id) 

original_schema = table.schema
new_schema = original_schema[:]  # Creates a copy of the schema.
new_schema.append(bigquery.SchemaField("request_payload", "STRING", mode="REPEATED"))
new_schema.append(bigquery.SchemaField("response_payload", "STRING", mode="REPEATED"))

table.schema = new_schema
table = client.update_table(table, ["schema"])

if len(table.schema) == len(new_schema):
    print("A new schema has been added.")
else:
    print("The schema has not been modified.")


### Update the request-response logging configuration

At the time of writing this in August 2023, the only supported way to enable prediction request-response logging is by using the Vertex AI REST API. Check the documentatiopn [here](https://cloud.google.com/vertex-ai/docs/predictions/online-prediction-logging#enable_disable_logs-drest) for details.

The following code will generate an access token that we can use to make a request to the Vertex AI REST API.
It will then use the Python *requests* library to build and send an HTTPS request to the Vertex AI REST API.
The body of the request sets the predict_request_response_logging_config to enable the feature.

In [None]:
import requests
import subprocess

OUTPUT_URI = f"bq://{log_table_ref_id}"

# Execute gcloud command to get access token
get_token_command = "gcloud auth print-access-token"
TOKEN = subprocess.getoutput(get_token_command)

# URL
url = f"https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/{ENDPOINT_ID}"

# Headers
headers = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json"
}

# Body
body = {
    "predict_request_response_logging_config": {
        "enabled": True,
        "sampling_rate": 1,
        "bigquery_destination": {
            "output_uri": OUTPUT_URI
        }
    }
}

# Make the PATCH request
response = requests.patch(url, headers=headers, data=json.dumps(body))

# Print the response
print(response.json())


## Generate request traffic to our endpoint

In this case, we're going to create a set of 5000 test requests from our original X_test dataset that we created at the beginning of this notebook, and we will then send those requests to our endpoint.

In [None]:
larger_test_instances = [x.tolist() for x in X_test[:5000]]

In [None]:
# Create an empty list to store responses
responses = []

# Generate some predictions
for instance in larger_test_instances:
    response = endpoint.predict([instance])
    
    # Append the response to the list
    responses.append(response.predictions)

## Print out the first 10 predictions

Let's take a look to see a sample of the responses provided by our model.

In [None]:
# Print out the first 10 predictions
print("Prediction results:", responses[:10])

## View a prediction response in more detail

Let's take a look at some of the other fields that are sent back in the reponses to our prediction requests.
Note that we can see which model served the request by viewing the *deployed_model_id* field in the response.

In [None]:
# Check the last response
print(response)

## Perform analysis on the responses from our model

In this case, we'll just perform a simple analysis that shows what percentage of the requests were served by each model, but BigQuery also provides the ability to perform much more complex analytics use cases.
Considering the traffic_split configuration we specified, what would you expect the results to be?

In [None]:
# Set our query with variables
sql = f"""
SELECT
  deployed_model_id,
  COUNT(*) AS response_count,
  ROUND(COUNT(*) * 100 / SUM(COUNT(*)) OVER (), 2) AS percentage
FROM
  `{log_table_ref_id}`
GROUP BY
  deployed_model_id
ORDER BY
  percentage DESC
"""

# Execute the query
query_job = client.query(sql)

# Wait for the job to complete and get the result
result = query_job.result()

# Print the result
for row in result:
    print(row)


You should see that approximately 90% of the requests were served by our first model, and approximately 10% of the requests were served by our second model.

# Model monitoring

We can use Vertex AI Model Monitoring to inspect the performance of our models in much more detail, and to periodically monitor for any degredation in our model performance.

## Create BigQuery dataset and table to store training data for reference

In order to detect this type of skew, we generally need to have access to the original training data, because Vertex AI Model Monitoring will compare the distribution of the training data against what is seen in the inference requests that are sent to our model in production.

We'll put a copy of our training data from earlier in this notebook into BigQuery for future reference.

In [None]:
# Define your dataset ID
train_dataset_id = "cpt10_california_housing_dataset"

# Initialize a BigQuery client
client = bigquery.Client(project=PROJECT_ID)

# Create a BigQuery dataset
train_dataset = bigquery.Dataset(f"{PROJECT_ID}.{train_dataset_id}")
train_dataset = client.create_dataset(train_dataset)  # API request
print(f"Created dataset {PROJECT_ID}.{train_dataset_id}")

# Define your BigQuery table ID
train_table_id = f"{PROJECT_ID}.{train_dataset_id}.california_housing"

# Define the schema of your BigQuery table
schema = [
    bigquery.SchemaField("MedInc", "FLOAT64"),
    bigquery.SchemaField("HouseAge", "FLOAT64"),
    bigquery.SchemaField("AveRooms", "FLOAT64"),
    bigquery.SchemaField("AveBedrms", "FLOAT64"),
    bigquery.SchemaField("Population", "FLOAT64"),
    bigquery.SchemaField("AveOccup", "FLOAT64"),
    bigquery.SchemaField("Latitude", "FLOAT64"),
    bigquery.SchemaField("Longitude", "FLOAT64"),
    bigquery.SchemaField("MedHouseVal", "FLOAT64")
]

# Create a BigQuery table
train_table = bigquery.Table(train_table_id, schema=schema)
train_table = client.create_table(train_table)  # API request

print(f"Created table {train_table_id}")

## Prepare training dataset to be put into BigQuery

We can use our scaled housing dataset (data_scaled) that we created earlier in this notebook, which was used to train our models.

In [None]:
# Use our scaled dataset that we created earlier in this notebook
df = pd.DataFrame(data=data_scaled, columns=housing.feature_names)
df['target'] = housing.target

## Install pandas-gbq

We can use the pandas-gbq library to make it easy for use to put our pandas dataframe data into BigQuery

In [None]:
!pip install --upgrade pandas-gbq --user

## Put training dataset in BigQuery

We can use our scaled housing dataset (data_scaled) that we created earlier in this notebook, which was used to train our models.

In [None]:
from pandas_gbq import to_gbq

# Upload the DataFrame to BigQuery
to_gbq(df, train_table_id, project_id=PROJECT_ID, if_exists='replace')

## Create Vertex AI Model Monitoring Job

Now it's time to create our Vertex AI Model Monitoring job.

Some of the code in the following cell is repurposed from [this example notebook](https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/model_monitoring/model_monitoring.ipynb#scrollTo=-62TYm2iYv3K). 

The output from the cell will contain a link at which you can view the details of the Vertex AI Model Monitoring Job. However, note that the job does not get created immediately, and therefore it may take some time for it to show up in the console. 

**You will receive an email at the email address you specify you in the code below, informing you about the status of the Vertex AI Model Monitoring Job.**

In [None]:
from google.cloud.aiplatform import model_monitoring

USER_EMAIL="example@gmail.com" # REPLACE WITH YOUR EMAIL

JOB_NAME = "cpt10-housing_monitoring_job"

# Sampling rate (optional, default=.8)
LOG_SAMPLE_RATE = 0.9 

# Monitoring Interval in hours (optional, default=1).
MONITOR_INTERVAL = 1  

# URI to training dataset.
DATASET_BQ_URI = f"bq://{train_table_id}"  
# Prediction target column name in training dataset.
TARGET = "target"

# # Skew and drift thresholds.

DEFAULT_THRESHOLD_VALUE = 0.001

SKEW_THRESHOLDS = {
    "MedInc": DEFAULT_THRESHOLD_VALUE,
    "HouseAge": DEFAULT_THRESHOLD_VALUE,
    "AveRooms": DEFAULT_THRESHOLD_VALUE,
    "AveBedrms": DEFAULT_THRESHOLD_VALUE,
    "Population": DEFAULT_THRESHOLD_VALUE,
    "AveOccup": DEFAULT_THRESHOLD_VALUE,
    "Latitude": DEFAULT_THRESHOLD_VALUE,
    "Longitude": DEFAULT_THRESHOLD_VALUE,
}

DRIFT_THRESHOLDS = {
    "MedInc": DEFAULT_THRESHOLD_VALUE,
    "HouseAge": DEFAULT_THRESHOLD_VALUE,
    "AveRooms": DEFAULT_THRESHOLD_VALUE,
    "AveBedrms": DEFAULT_THRESHOLD_VALUE,
    "Population": DEFAULT_THRESHOLD_VALUE,
    "AveOccup": DEFAULT_THRESHOLD_VALUE,
    "Latitude": DEFAULT_THRESHOLD_VALUE,
    "Longitude": DEFAULT_THRESHOLD_VALUE,
}

ATTRIB_SKEW_THRESHOLDS = {
    "MedInc": DEFAULT_THRESHOLD_VALUE,
    "HouseAge": DEFAULT_THRESHOLD_VALUE,
    "AveRooms": DEFAULT_THRESHOLD_VALUE,
    "AveBedrms": DEFAULT_THRESHOLD_VALUE,
    "Population": DEFAULT_THRESHOLD_VALUE,
    "AveOccup": DEFAULT_THRESHOLD_VALUE,
    "Latitude": DEFAULT_THRESHOLD_VALUE,
    "Longitude": DEFAULT_THRESHOLD_VALUE,
}

ATTRIB_DRIFT_THRESHOLDS = {
    "MedInc": DEFAULT_THRESHOLD_VALUE,
    "HouseAge": DEFAULT_THRESHOLD_VALUE,
    "AveRooms": DEFAULT_THRESHOLD_VALUE,
    "AveBedrms": DEFAULT_THRESHOLD_VALUE,
    "Population": DEFAULT_THRESHOLD_VALUE,
    "AveOccup": DEFAULT_THRESHOLD_VALUE,
    "Latitude": DEFAULT_THRESHOLD_VALUE,
    "Longitude": DEFAULT_THRESHOLD_VALUE,
}

skew_config = model_monitoring.SkewDetectionConfig(
    data_source=DATASET_BQ_URI,
    skew_thresholds=SKEW_THRESHOLDS,
    attribute_skew_thresholds=ATTRIB_SKEW_THRESHOLDS,
    target_field=TARGET,
)

drift_config = model_monitoring.DriftDetectionConfig(
    drift_thresholds=DRIFT_THRESHOLDS,
    attribute_drift_thresholds=ATTRIB_DRIFT_THRESHOLDS,
)

objective_config = model_monitoring.ObjectiveConfig(
    skew_config, drift_config
)

# Create sampling configuration
random_sampling = model_monitoring.RandomSampleConfig(sample_rate=LOG_SAMPLE_RATE)

# Create schedule configuration
schedule_config = model_monitoring.ScheduleConfig(monitor_interval=MONITOR_INTERVAL)

# Create alerting configuration.
emails = [USER_EMAIL]
alerting_config = model_monitoring.EmailAlertConfig(
    user_emails=emails, enable_logging=True
)

# Create the monitoring job.
mon_job = aiplatform.ModelDeploymentMonitoringJob.create(
    display_name=JOB_NAME,
    logging_sampling_strategy=random_sampling,
    schedule_config=schedule_config,
    alert_config=alerting_config,
    objective_configs=objective_config,
    project=PROJECT_ID,
    location=REGION,
    endpoint=endpoint,
)

# Getting monitoring outputs

**After you receive an email** telling you that the monitoring configuration has been set up, we will generate additonal traffic to our endpoint and view the related outputs.

## Generate request traffic to our endpoint

Just as we did previously in this notebook, we're going send 5000 test requests from our original `X_test` dataset (that we created at the beginning of this notebook) to our endpoint. This activity will generate monitoring outputs.

In [None]:
# Create an empty list to store responses
responses = []

# Generate some predictions
for instance in larger_test_instances:
    response = endpoint.predict([instance])
    
    # Append the response to the list
    responses.append(response.predictions)

## View monitoring outputs

**The email you received will contain details and links regarding the monitoring job, including the BigQuery location at which the monitoring outputs are stored.**

Next, let's move on to optimizing our model for edge deployment.

# Optimizing for Edge deployment

Finally, let's optimize our model to be deployed at the edge. We're going to use TensorFlow Lite for that purpose.

The following code will convert our model to TensorFlow Lite format, which is a light-weight format that is optimized for devices with limited computing resources. It will then save the converted model locally, after which we can upload it to GCS.

In [None]:
import os
import tempfile
import keras

TF_MODEL_NAME = 'housing_model.tflite'

# Convert the model to TensorFlow Lite
converter = tf.lite.TFLiteConverter.from_keras_model(tf_model)
tflite_model = converter.convert()

# Save the TF Lite model
with tf.io.gfile.GFile(TF_MODEL_NAME, 'wb') as f:
  f.write(tflite_model)

## Copy converted model to GCS

In [None]:
!gsutil cp -r $TF_MODEL_NAME $MODEL_LOCATION

Now we have used TensorFlow Lite to optimize our model and we then stored the optimized model in Google Cloud Storage. From there, we can easily deploy our model to any device that supports the TensorFlow Lite interpreter. A list of supported platforms is provided in the TensorFlow Lite [documentation](https://www.tensorflow.org/lite/guide/inference#supported_platforms), which also contains lots of useful information on how TensorFlow Lite works in great detail.

# That's it! Well Done!

# Clean up

When you no longer need the resources created by this notebook. You can delete them as follows.

**Note: if you do not delete the resources, you will continue to pay for them.**

In [None]:
clean_up = False  # Set to True if you want to delete the resources

## Delete Vertex AI resources

In [None]:
if clean_up:  
    # Delete batch prediction job
    try:
        job.delete()
        print(f"Deleted Batch Prediction Job: {job.resource_name}")
    except Exception as e:
        print(f"Error deleting Batch Prediction Job: {e}")

    # Delete the monitoring job
    try:
        mon_job.delete()
        print(f"Deleted Monitoring Job: {mon_job.resource_name}")
    except Exception as e:
        print(f"Error deleting Batch Prediction Job: {e}")
        
    # Delete endpoint
    try:
        endpoint_list = aiplatform.Endpoint.list(filter=f'display_name="{ENDPOINT_NAME}"')
        if endpoint_list:
            endpoint = endpoint_list[0]  # Assuming only one endpoint with that name

            # Undeploy all models (if any)
            try:
                endpoint.undeploy_all()
                print(f"Undeployed all models from endpoint: {ENDPOINT_NAME}")
            except exceptions.NotFound:
                print(f"No models found to undeploy from endpoint: {ENDPOINT_NAME}")
            except Exception as e:  # Catching general errors for better debugging
                print(f"Unexpected error while undeploying models: {e}")

            # Delete endpoint
            try:
                endpoint.delete()
                print(f"Deleted endpoint: {ENDPOINT_NAME}")
            except Exception as e:
                print(f"Error deleting endpoint: {e}")
        else:
            print(f"Endpoint not found: {ENDPOINT_NAME}")

    except exceptions.NotFound:
        print(f"Endpoint not found: {ENDPOINT_NAME}")

    # Delete models
    try:
        model_list = aiplatform.Model.list(filter=f'display_name="{MODEL_NAME}"')
        if model_list:
            for model in model_list:
                print(f"Deleting model: {model.display_name}")
                model.delete()
        else:
            print(f"No models found matching: {MODEL_NAME}")
    except exceptions.NotFound:
        print(f"Model not found: {MODEL_NAME}")
    
    try:
        model_100_list = aiplatform.Model.list(filter=f'display_name="{MODEL_100_NAME}"')
        if model_100_list:
            for model in model_100_list:
                print(f"Deleting model: {model.display_name}")
                model.delete()
        else:
            print(f"No models found matching: {MODEL_100_NAME}")
    except exceptions.NotFound:
        print(f"Model not found: {MODEL_100_NAME}")
else:
    print("clean_up parameter is set to False.")


## Delete BigQuery tables and datasets

In [None]:
if clean_up:  
    try:
        client.delete_table(log_table_ref_id, not_found_ok=True)
        print(f"Deleted table {log_table_ref_id}")
    except Exception as e:
        print(f"Error deleting table: {e}")
    try:
        client.delete_table(train_table_id, not_found_ok=True)
        print(f"Deleted table {train_table_id}")
    except Exception as e:
        print(f"Error deleting table: {e}")
    try:
        client.delete_dataset(dataset_id, delete_contents=True, not_found_ok=True)
        print(f"Deleted dataset: {dataset_id}")
    except Exception as e:
        print(f"Error deleting dataset: {e}")
    try:
        client.delete_dataset(train_dataset_id, delete_contents=True, not_found_ok=True)
        print(f"Deleted dataset: {train_dataset_id}")
    except Exception as e:
        print(f"Error deleting dataset: {e}")
else:
    print("clean_up parameter is set to False.")

## Delete GCS Bucket
The bucket can be reused throughout multiple activities in the book. Sometimes, activities in certain chapters make use of artifacts from previous chapters that are stored in the GCS bucket.

I highly recommend **not deleting the bucket** unless you will be performing no further activities in the book. For this reason, there's a separate `delete_bucket` variable to specify if you want to delete the bucket.

If you want to delete the bucket, set the `delete_bucket` parameter to `True`.

In [None]:
delete_bucket = False

In [None]:
if delete_bucket == True:
    # Delete the bucket
    ! gcloud storage rm --recursive gs://$BUCKET
else:
    print("delete_bucket parameter is set to False")