![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FFramework+Workflows%2FCatBoost&file=CatBoost+Prediction+With+Vertex+AI+Feature+Store.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Framework%20Workflows/CatBoost/CatBoost%20Prediction%20With%20Vertex%20AI%20Feature%20Store.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FFramework%2520Workflows%2FCatBoost%2FCatBoost%2520Prediction%2520With%2520Vertex%2520AI%2520Feature%2520Store.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Framework%20Workflows/CatBoost/CatBoost%20Prediction%20With%20Vertex%20AI%20Feature%20Store.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Framework%20Workflows/CatBoost/CatBoost%20Prediction%20With%20Vertex%20AI%20Feature%20Store.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# CatBoost Prediction With Vertex AI Feature Store

Serving a model, like a CatBoost model, can be complex. It involves several steps:

- Retrieving the features
- Ensuring all features are present and in the correct order
- Verifying the schema to guarantee each feature has the correct data type
- Preparing the serving instance for the model's prediction function
- Sending the serving instance to the model
- Processing and formatting the response
- Returning the response to the requestor

With so many steps, there's potential for errors, leading to **training-serving skew**. A great way to minimize this is by using a **feature store** for serving.

A feature store's offline component resembles training data. In this case, the model was trained on data from a BigQuery table (see the [CatBoost Overview](./CatBoost%20Overview.ipynb) workflow).

Hosting an online version of the data in a feature store allows for **fast and simple feature retrieval** during serving requests. Whether it's a single BigQuery table synced directly or features processed through the feature registry, it ensures features are readily available at inference time.

This workflow provides a comprehensive guide to get you started:

- What is Vertex AI Feature Store?
- Setup Vertex AI Feature Store
- Sync BigQuery data to an online feature view in Vertex AI Feature Store
- Incorporate retrieval from Vertex AI Feature Store in inference locally (in this notebook)
- Build A Custom Prediction Container That Incorporates feature retrieval from Vertex AI Feature Store
    - A single container with multiple serving profiles used to:
        - Test the container locally with Docker
        - Deploy and use the container on Google Cloud Run
        - Deploy and use the container on Vertex AI Endpoints

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('numpy', 'numpy'),
    ('catboost', 'catboost'),
    ('docker', 'docker'),
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.artifactregistry_v1', 'google-cloud-artifact-registry'),
    ('google.cloud.devtools', 'google-cloud-build'),
    ('google.cloud.run_v2', 'google-cloud-run'),   
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable artifactregistry.googleapis.com
!gcloud services enable cloudbuild.googleapis.com
!gcloud services enable run.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

inputs:

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

In [8]:
REGION = 'us-central1'
SERIES = 'frameworks'
EXPERIMENT = 'catboost-prediction-feature-store'

# GCS Names
GCS_BUCKET = PROJECT_ID

# make this the BigQuery Project / Dataset / Table prefix to store results
BQ_PROJECT = PROJECT_ID
BQ_DATASET = SERIES.replace('-', '_')
BQ_TABLE = SERIES
BQ_REGION = REGION[0:2] # use a multi region

# Vertex AI Feature Store names:
FS_NAME = PROJECT_ID.replace('-', '_')
FV_NAME = f"{SERIES}".replace('-', '_')

packages:

In [14]:
import json, os
import time, datetime
import requests

import catboost 
import numpy as np
import docker

import google.auth
from google.cloud import storage
from google.cloud import artifactregistry_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import run_v2
from google.cloud import bigquery

from google.cloud import aiplatform
from vertexai.resources.preview import feature_store

In [10]:
aiplatform.__version__

'1.78.0'

clients:

In [11]:
# gcs storage client
gcs = storage.Client(project = GCS_BUCKET)
bucket = gcs.bucket(GCS_BUCKET)

# cloud build client
cb = cloudbuild_v1.CloudBuildClient()

# artifact registry client
ar = artifactregistry_v1.ArtifactRegistryClient()

# cloud run client
cr = run_v2.ServicesClient()

# BigQuery client
bq = bigquery.Client(project = PROJECT_ID)

# vertex ai client
aiplatform.init(project = PROJECT_ID, location = REGION)

Parameters:

In [12]:
DIR = f"files/{EXPERIMENT}"

Environment:

In [13]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## CatBoost Model

Retrieve the model trained in prior workflow along with test records.  Test the model directly in this environment.

### Check For Files

In [15]:
files = list(bucket.list_blobs(prefix = f'{SERIES}/catboost-overview'))
if len(files) > 0:
    print('Found the files created by the prerequisite workflow:')
    for file in files:
        print(f'- gs://{bucket.name}/{file.name}')
else:
    print('Files not found - Please run the prerequisite notebook (listed at top of this workflow)')

Found the files created by the prerequisite workflow:
- gs://statmike-mlops-349915/frameworks/catboost-overview/examples.json
- gs://statmike-mlops-349915/frameworks/catboost-overview/model.cbm


### Load Model

In [16]:
model_blob = bucket.blob(f'{SERIES}/catboost-overview/model.cbm')
model_bytes = model_blob.download_as_bytes()
model = catboost.CatBoostClassifier().load_model(blob = model_bytes)

### Load Inference Examples

In [17]:
examples_blob = bucket.blob(f'{SERIES}/catboost-overview/examples.json')
examples_np = np.array(
    json.loads(examples_blob.download_as_string())
)

### Test Model With Examples

In [18]:
model.predict(examples_np)

array([0, 1, 0, 1, 1, 0, 1, 1, 0, 1])

In [19]:
model.feature_names_

['Time',
 'V1',
 'V2',
 'V3',
 'V4',
 'V5',
 'V6',
 'V7',
 'V8',
 'V9',
 'V10',
 'V11',
 'V12',
 'V13',
 'V14',
 'V15',
 'V16',
 'V17',
 'V18',
 'V19',
 'V20',
 'V21',
 'V22',
 'V23',
 'V24',
 'V25',
 'V26',
 'V27',
 'V28',
 'Amount']

---
## BigQuery - The Offline Store For Vertex AI Feature Store

The offline store for [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview) is BigQuery.  This streamlines ML feature management prior to serving online with feature store.  The data used to train this model in [CatBoost Overview](./CatBoost%20Overview.ipynb) actually came from BigQuery already.

This section prepares a version of the BigQuery public table used in the training notebook as a table with an id column to identify individual rows - `transaction_id`.

### Create/Recall Dataset

In [20]:
dataset = bigquery.Dataset(f"{BQ_PROJECT}.{BQ_DATASET}")
dataset.location = BQ_REGION
bq_dataset = bq.create_dataset(dataset, exists_ok = True)

### Create/Recall Table

In [21]:
query = f"""
    CREATE TABLE IF NOT EXISTS `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` AS
    SELECT GENERATE_UUID() AS transaction_id, *
    FROM `bigquery-public-data.ml_datasets.ulb_fraud_detection`;
"""
job = bq.query(query = query)
job.result()
(job.ended - job.started).total_seconds()

0.577

### Review BigQuery Table

In [22]:
bq.query(f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` LIMIT 3").to_dataframe()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,122959.0,-1.327297,0.422904,1.617505,2.291196,2.375055,0.411735,0.213517,0.424743,-1.809624,...,0.192604,0.068281,-0.245725,-0.697654,0.038216,0.150059,0.0,0,c97b6e2f-603a-4dbe-9bca-0add881f2084,TEST
1,122312.0,-1.988557,-0.720301,0.863204,3.114494,1.847474,0.255881,0.580362,-0.083756,-0.939044,...,1.564951,0.546312,-0.548531,-0.74662,-0.748016,0.41064,0.0,0,791e403e-d59f-491d-b0b7-d8f8710c07fb,TEST
2,119592.0,2.139741,0.245651,-2.654856,0.178287,1.336991,-0.724664,0.906032,-0.436125,-0.528015,...,-0.216033,0.345316,0.747103,0.700184,-0.123739,-0.099989,0.0,0,d9e720c5-311d-4cf7-95cb-2256823803ba,TEST


### Get A List of `transaction_id` Values For Testing

Get a list of `transaction_id` values to use later in this workflow:

In [23]:
transaction_ids = bq.query(f"SELECT transaction_id FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}` WHERE Class = 1 LIMIT 10").to_dataframe()['transaction_id'].tolist()

In [24]:
transaction_ids

['28f7cfa6-5862-428a-8d35-c9e08d297008',
 '1dbfe576-2c52-4f94-bd88-db1e3e294561',
 'd13fbb1a-d4cb-40c6-bfd7-05cd714cee30',
 '18327e33-3ff6-43d1-8bbc-8647adc2044d',
 'ec505a29-387f-4b3d-a03b-67375d9140e1',
 '1b7b6e26-7248-420c-9eb5-dd9692437843',
 'bc996e23-c364-4e1a-875d-5151e47fdb81',
 '165ac053-1fec-4863-aea2-63d380a3adad',
 '15b93da7-e600-449b-a4c1-bacfd7e68df7',
 'd081be14-dcb5-4228-813c-75b1804dc8a2']

---
## Understanding Vertex AI Feature Store

The next sections will setup online serving for the BigQuery table with [Vertex AI Feature Store](https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview). This workflow takes the shortest path of synchronizing a BigQuery table to feature store.  There are more flexible paths as well using the Feature Registry where features across multiple tables and views can come together in a single serving structure called a feature view.  You can read more about this within the [MLOps](../../MLOps/readme.md) section of this repository, which includes a deep dive into [feature stores](../../MLOps/Feature%20Store/readme.md).

<p align="center" ><center>
    <img src="../../MLOps/resources/images/created/featurestore/overview.png" width="75%">
</center></p>

---
## Setup Vertex AI Feature Store

### Create/Retrieve Online Store

The first step is to create a Vertex AI Feature Store.  There are two serving types to choose from when setting up a feature store: Bigtable and Optimized.  For this work the Optimized online serving is picked which can even [provide vector similarity search](https://cloud.google.com/vertex-ai/docs/featurestore/latest/embeddings-search) functionality that Bigtable serving does not.
>**NOTE:** This can take around 10 minutes if creating a new feature store instance

**Reference:**
- [Create an Online Store Instance](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-onlinestore)
- [Online Serving Types](https://cloud.google.com/vertex-ai/docs/featurestore/latest/online-serving-types)

In [25]:
try:
    online_store = feature_store.FeatureOnlineStore(name = FS_NAME)
    print(f"Found the feature store:\n{online_store.resource_name}")
except Exception:
    print("Create the feature store...")
    online_store = feature_store.FeatureOnlineStore.create_optimized_store(
        name = FS_NAME
    )
    print(f"Create the feature store:\n{online_store.resource_name}")

Found the feature store:
projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915


In [26]:
online_store.name

'statmike_mlops_349915'

### Create/Retrieve Feature View From BigQuery Source

There are two paths to [creating feature views](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-featureview) in feature store. The one used here is syncing a BigQuery table or view directly to the online store. The alternative involves using the feature registry which gives greater control of selecting features (columns) form multiple BigQuery source tables and views.  Learn more about Vertex AI Feature Store in this repository's [MLOps](../../MLOps/readme.md) section, which includes a deep dive into [feature stores](../../MLOps/Feature%20Store/readme.md).

**Reference:**
- [Create a feature view instance](https://cloud.google.com/vertex-ai/docs/featurestore/latest/create-featureview)

In [27]:
try:
    feature_view = feature_store.FeatureView(
        name = FV_NAME,
        feature_online_store_id = online_store.resource_name
    )
    print(f"Found the feature view:\n{feature_view.resource_name}")
except Exception:
    print(f"Create the feature view...")
    feature_view = online_store.create_feature_view(
        name = FV_NAME,
        source = feature_store.utils.FeatureViewBigQuerySource(
            uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}',
            entity_id_columns = ['transaction_id'] # can be multiple columns 
        ),
        sync_config = 'TZ=America/New_York 0 22 * * *' # Ex: every day at 10PM, just once per day
    )   
    print(f"Created the feature view:\n{feature_view.resource_name}")

Found the feature view:
projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915/featureViews/frameworks


In [28]:
feature_view.name

'frameworks'

### Managing Synchronization

Force a synchronization rather than wait for the next scheduled sync:

In [29]:
force_sync = feature_view.sync()

In [30]:
type(force_sync)

vertexai.resources.preview.feature_store.feature_view.FeatureView.FeatureViewSync

In [31]:
force_sync.to_dict()

{'name': 'projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915/featureViews/frameworks/featureViewSyncs/218960636795682816',
 'createTime': '2025-02-10T15:00:34.814032Z',
 'runTime': {'startTime': '2025-02-10T15:00:34.814032Z'}}

Get updated information about the sync job:

In [32]:
force_sync = feature_view.get_sync(name = force_sync.name)
force_sync.to_dict()

{'name': 'projects/1026793852137/locations/us-central1/featureOnlineStores/statmike_mlops_349915/featureViews/frameworks/featureViewSyncs/218960636795682816',
 'createTime': '2025-02-10T15:00:34.814032Z',
 'runTime': {'startTime': '2025-02-10T15:00:34.814032Z'}}

Wait on the sync job to complete and report timing and rows synced:

In [33]:
waited = 0
while True:
    sync_status = feature_view.get_sync(name = force_sync.name).to_dict()
    if 'endTime' in list(sync_status['runTime'].keys()):
        seconds = (
            datetime.datetime.fromisoformat(sync_status['runTime']['endTime'].replace('Z', '+00:00'))
            -
            datetime.datetime.fromisoformat(sync_status['runTime']['startTime'].replace('Z', '+00:00'))
        ).total_seconds()
        rows = sync_status['syncSummary']['rowSynced']
        print(f"Sync completed in {seconds} seconds and synced {rows} rows.")
        break
    else:
        print(f"Waited {waited} seconds, Update again in 30 seconds...")
        time.sleep(30)
        waited += 30

Waited 0 seconds, Update again in 30 seconds...
Waited 30 seconds, Update again in 30 seconds...
Waited 60 seconds, Update again in 30 seconds...
Waited 90 seconds, Update again in 30 seconds...
Waited 120 seconds, Update again in 30 seconds...
Waited 150 seconds, Update again in 30 seconds...
Sync completed in 170.857691 seconds and synced 284807 rows.


Get a list of sync jobs:

In [34]:
list_syncs = feature_view.list_syncs()

Print out the end time and rows synced for each job:

In [35]:
for sync in list_syncs:
    s = feature_view.get_sync(name = sync.name).to_dict()
    ended = datetime.datetime.fromisoformat(s['runTime']['endTime'].replace('Z', '+00:00')).strftime("%m/%d/%Y %H:%M:%S")
    rows = s['syncSummary']['rowSynced']
    print(f"Sync completed at {ended} and synced {rows} rows.")

Sync completed at 02/10/2025 15:03:25 and synced 284807 rows.
Sync completed at 02/10/2025 03:02:04 and synced 284807 rows.
Sync completed at 02/10/2025 00:36:35 and synced 284807 rows.


### Retrieve: Features For Entity


In [36]:
results = feature_view.read(key = [transaction_ids[0]]).to_dict()['features']

Public endpoint for the optimized online store statmike_mlops_349915 is 6457115130579648512.us-central1-1026793852137.featurestore.vertexai.goog


In [37]:
results

[{'name': 'Time', 'value': {'double_value': 64585.0}},
 {'name': 'V1', 'value': {'double_value': 1.08043336974687}},
 {'name': 'V2', 'value': {'double_value': 0.962830680816555}},
 {'name': 'V3', 'value': {'double_value': -0.27806547765439}},
 {'name': 'V4', 'value': {'double_value': 2.74331806279958}},
 {'name': 'V5', 'value': {'double_value': 0.4123640756810779}},
 {'name': 'V6', 'value': {'double_value': -0.320778352962878}},
 {'name': 'V7', 'value': {'double_value': 0.0412895489552235}},
 {'name': 'V8', 'value': {'double_value': 0.176170299597767}},
 {'name': 'V9', 'value': {'double_value': -0.9669515510536291}},
 {'name': 'V10', 'value': {'double_value': -0.19412046650558104}},
 {'name': 'V11', 'value': {'double_value': 2.1400568155366297}},
 {'name': 'V12', 'value': {'double_value': -0.276308752749667}},
 {'name': 'V13', 'value': {'double_value': -1.1913057375248899}},
 {'name': 'V14', 'value': {'double_value': -1.88027540328199}},
 {'name': 'V15', 'value': {'double_value': 0.398

---
## CatBoost Prediction With Feature Store

Predictions with CatBoost are the same and the input is just an array of feature values.  The value of feature store is providing the current value of these features as a simple API call with just the entity id for which the features are needed.  This section does these steps separately and then build a simple Python function to bring them together:
- request features for an entity id, `transaction_id`
- prepare the features for the model: ensure the order is correct and covert to a Numpy array for input to the model
- make the prediction request

### Step 1: Request Features Based On Entity ID

In [38]:
entity_id = transaction_ids[1]

In [39]:
features = feature_view.read(key = [entity_id]).to_dict()['features']

### Step 2: Prepare Features For Inference

Create a dictionary from the 'features' list:

In [40]:
feature_dict = {item['name']: list(item['value'].values())[0] for item in features}

In [41]:
feature_dict['V1']

-5.603690279856019

Extract values in the order of 'model.feature_names_':

In [42]:
ordered_features = [feature_dict[name] for name in model.feature_names_]

In [43]:
ordered_features

[102669.0,
 -5.603690279856019,
 5.22219270751471,
 -7.516829977246081,
 8.11772427554261,
 -2.75685795265464,
 -1.57456462233583,
 -6.3303432504036605,
 2.99841888080239,
 -4.50816689310713,
 -7.33437708370839,
 7.18872396524447,
 -10.6551809118898,
 2.5946798607819903,
 -10.242859083204,
 -0.191158119172494,
 -5.504334070139331,
 -8.697777412570309,
 -1.9342253618543401,
 1.9587499315347499,
 0.227525994742654,
 1.2428962055240902,
 0.42840839246679707,
 -0.101183595116722,
 -0.520199121186447,
 -0.176937756589733,
 0.4614499825660971,
 -0.106625180749266,
 -0.479661947808134,
 0.0]

### Step 3: Make Prediction

In [44]:
model.predict(ordered_features)

1

In [45]:
model.predict_proba(ordered_features)

array([0.00479321, 0.99520679])

### Combine Steps Into Function

In [46]:
def fs_retriever(entities):
    instances = []
    for entity_id in entities:
        features = feature_view.read(key = [entity_id]).to_dict()['features']
        feature_dict = {item['name']: list(item['value'].values())[0] for item in features}
        ordered_features = [feature_dict[name] for name in model.feature_names_]
        instances.append(ordered_features)
    return instances

Use function to predict all the same entity ids in `transaction_ids`:

In [47]:
model.predict(fs_retriever(transaction_ids)).tolist()

[0, 1, 0, 1, 1, 0, 1, 1, 0, 1]

Use function to predict all the same entity ids in `transaction_ids` with detailed output:

In [48]:
_classes = [str(c) for c in list(model.classes_)]
[
    dict(
        classes = _classes,
        entity_id = transaction_ids[p],
        scores = probs,
        predicted_class = _classes[np.argmax(probs)]
    ) for p, probs in enumerate(model.predict_proba(fs_retriever(transaction_ids)).tolist())
]

[{'classes': ['0', '1'],
  'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008',
  'scores': [0.9684653609075246, 0.031534639092475496],
  'predicted_class': '0'},
 {'classes': ['0', '1'],
  'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561',
  'scores': [0.004793214059859108, 0.9952067859401409],
  'predicted_class': '1'},
 {'classes': ['0', '1'],
  'entity_id': 'd13fbb1a-d4cb-40c6-bfd7-05cd714cee30',
  'scores': [0.9961659959204033, 0.0038340040795967153],
  'predicted_class': '0'},
 {'classes': ['0', '1'],
  'entity_id': '18327e33-3ff6-43d1-8bbc-8647adc2044d',
  'scores': [0.009687501567520629, 0.9903124984324794],
  'predicted_class': '1'},
 {'classes': ['0', '1'],
  'entity_id': 'ec505a29-387f-4b3d-a03b-67375d9140e1',
  'scores': [0.05002634033491404, 0.949973659665086],
  'predicted_class': '1'},
 {'classes': ['0', '1'],
  'entity_id': '1b7b6e26-7248-420c-9eb5-dd9692437843',
  'scores': [0.9988754743912287, 0.0011245256087712965],
  'predicted_class': '0'},
 {'classes': ['0', '1'

---
## Build A Custom Prediction Container

It is really not all that hard with Python!

For this example [FastAPI](https://fastapi.tiangolo.com/) is used.

This process uses docker to build a custom container and then runs the container locally, on Cloud Run, and Vertex AI Endpoints.

This could be done locally with Docker and pushed to Artifact Registry before deployment to Cloud Run and Vertex.  The process below assumes that docker is not available locally and used Cloud Build to both build and push the resulting container to Artifact Registry.

**Environment Variables**

The scripts `main.py` and `main2.py` are created assuming the existance of several enviornment variables. When running the container directly in a local instance or on Cloud Run the environment variable can be set for use in the services.  Vertex AI Endpoints actually create a number of helpful environment variables during deployment.  To make the single container easily deployable across the three (local, Cloud Run, and Vertex AI Endpoints) the environment variables were assumed to be the Vertex AI ones and then set at manually at launch for Cloud Run and local.  Read more about [Environment variables available in the container](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#variables) for Vertex AI Endpoints.

### Setup Artifact Registry

[Artifact registry](https://cloud.google.com/artifact-registry/docs) organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

#### List Repositories

This may be empty if no repositories have been created for this project

In [49]:
for repo in ar.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks
projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks-catboost
projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts
projects/statmike-mlops-349915/locations/us-central1/repositories/mlops
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


#### Create/Retrieve Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images created by this notebook.  First, check to see if it is already created by a previous run and retrieve it if it has.  Otherwise, create one named for this project.

In [50]:
docker_repo = None
for repo in ar.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{SERIES}' == repo.name.split('/')[-1]:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{SERIES}',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {SERIES} series that holds docker images.',
                name = f'{SERIES}',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks


In [51]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks',
 'DOCKER')

In [52]:
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"

In [53]:
REPOSITORY

'us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks'

---
### Create Application Files

```
|__ Dockerfile
|__ requirements.txt
|__ app
    |__ __init__.py
    |__ main.py
    |__ prestart.sh
```

In [54]:
if not os.path.exists(DIR + '/source/app'):
    os.makedirs(DIR + '/source/app')

In [55]:
%%writefile {DIR}/source/Dockerfile
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9

COPY ./app /app
COPY ./requirements.txt requirements.txt

RUN pip install --no-cache-dir --upgrade pip \
  && pip install --no-cache-dir -r requirements.txt

Overwriting files/catboost-prediction-feature-store/source/Dockerfile


In [56]:
%%writefile {DIR}/source/requirements.txt
google-cloud-storage
google-cloud-aiplatform==1.71.0
catboost
numpy

Overwriting files/catboost-prediction-feature-store/source/requirements.txt


In [57]:
%%writefile {DIR}/source/app/__init__.py
# init file

Overwriting files/catboost-prediction-feature-store/source/app/__init__.py


In [58]:
FS_NAME,FV_NAME

('statmike_mlops_349915', 'frameworks')

In [59]:
%%writefile {DIR}/source/app/main.py
# this version:
# - inputs to be json like {'instances': [[list],[list], ...]}
# - outputs in json like {'predictions': [[list],[list], ...]}
# trying to adhere to Vetex Endpoints Requirements:
# - https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions

# packages
import os
from fastapi import FastAPI, Request
import catboost
import numpy as np
from google.cloud import storage
from google.cloud import aiplatform
from vertexai.resources.preview import feature_store

import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())
logging.info(f'aiplatform version: {aiplatform.__version__}')

# NAMES
PROJECT_ID = 'statmike-mlops-349915'
REGION = 'us-central1'
FS_NAME = 'statmike_mlops_349915'
FV_NAME = 'frameworks'

# clients
app = FastAPI()
gcs = storage.Client()
aiplatform.init(project = PROJECT_ID, location = REGION)

# feature store
online_store = feature_store.FeatureOnlineStore(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FS_NAME}"
)
feature_view = feature_store.FeatureView(
    name = FV_NAME,
    feature_online_store_id = online_store.resource_name
)

# download the model file from GCS
paths = os.environ['AIP_STORAGE_URI'].split('/') + ['model.cbm']
bucket = gcs.bucket(paths[2])
blob = bucket.blob('/'.join(paths[3:]))
blob.download_to_filename('model.cbm')

# Load the catboost model
_model = catboost.CatBoostClassifier().load_model('model.cbm')

# get model classification levels and feature names
_classes = [str(c) for c in list(_model.classes_)]
_features = _model.feature_names_

# function to retrieve from feature store
def fs_retriever(entities):
    instances = []
    for entity_id in entities:
        features = feature_view.read(key = [entity_id]).to_dict()['features']
        feature_dict = {item['name']: list(item['value'].values())[0] for item in features}
        ordered_features = [feature_dict[name] for name in _features]
        instances.append(ordered_features)
    return instances

# Define function for health route
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    return {}

# Define function for prediction route
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    # await the request
    body = await request.json()
    
    # parse the request
    entities = body["instances"]
    
    # get predicted probabilities
    predictions = _model.predict_proba(fs_retriever(entities)).tolist()

    # this returns just the predicted probabilities:
    return {"predictions": predictions}

Overwriting files/catboost-prediction-feature-store/source/app/main.py


In [60]:
%%writefile {DIR}/source/app/main2.py
# this version:
# - inputs to be json like {'instances': [[list],[list], ...]}
# - outputs in json like {'predictions': [{'classes': list, 'scores': list, 'predicted_class': str}, ...]}
# trying to adhere to Vetex Endpoints Requirements:
# - https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions

# packages
import os
from fastapi import FastAPI, Request
import catboost
import numpy as np
from google.cloud import storage
from google.cloud import aiplatform
from vertexai.resources.preview import feature_store

# NAMES
PROJECT_ID = 'statmike-mlops-349915'
REGION = 'us-central1'
FS_NAME = 'statmike_mlops_349915'
FV_NAME = 'frameworks'

# clients
app = FastAPI()
gcs = storage.Client()
aiplatform.init(project = PROJECT_ID, location = REGION)

# feature store
online_store = feature_store.FeatureOnlineStore(
    name = f"projects/{PROJECT_ID}/locations/{REGION}/featureOnlineStores/{FS_NAME}"
)
feature_view = feature_store.FeatureView(
    name = FV_NAME,
    feature_online_store_id = online_store.resource_name
)

# download the model file from GCS
paths = os.environ['AIP_STORAGE_URI'].split('/') + ['model.cbm']
bucket = gcs.bucket(paths[2])
blob = bucket.blob('/'.join(paths[3:]))
blob.download_to_filename('model.cbm')

# Load the catboost model
_model = catboost.CatBoostClassifier().load_model('model.cbm')

# get model classification levels and feature names
_classes = [str(c) for c in list(_model.classes_)]
_features = _model.feature_names_

# function to retrieve from feature store
def fs_retriever(entities):
    instances = []
    for entity_id in entities:
        features = feature_view.read(key = [entity_id]).to_dict()['features']
        feature_dict = {item['name']: list(item['value'].values())[0] for item in features}
        ordered_features = [feature_dict[name] for name in _features]
        instances.append(ordered_features)
    return instances

# Define function for health route
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    return {}

# Define function for prediction route
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    # await the request
    body = await request.json()
    
    # parse the request
    entities = body["instances"]
    
    # get predicted probabilities
    predictions = _model.predict_proba(fs_retriever(entities)).tolist()
    
    # format predictions:
    preds = [
        dict(
            classes = _classes,
            entity_id = entities[p],
            scores = probs,
            predicted_class = _classes[np.argmax(probs)]
        ) for p, probs in enumerate(predictions)
    ]
    
    # following outputs detail prediction info for classification:
    return {"predictions": preds}

Overwriting files/catboost-prediction-feature-store/source/app/main2.py


In [61]:
%%writefile {DIR}/source/app/prestart.sh
#!/bin/bash
export PORT=$AIP_HTTP_PORT

Overwriting files/catboost-prediction-feature-store/source/app/prestart.sh


In [62]:
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/Dockerfile').upload_from_filename(f'{DIR}/source/Dockerfile')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/requirements.txt').upload_from_filename(f'{DIR}/source/requirements.txt')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/__init__.py').upload_from_filename(f'{DIR}/source/app/__init__.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/main.py').upload_from_filename(f'{DIR}/source/app/main.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/main2.py').upload_from_filename(f'{DIR}/source/app/main2.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/prestart.sh').upload_from_filename(f'{DIR}/source/app/prestart.sh')

In [63]:
list(bucket.list_blobs(prefix = f'{SERIES}/{EXPERIMENT}/source'))

[<Blob: statmike-mlops-349915, frameworks/catboost-prediction-feature-store/source/Dockerfile, 1739199881309397>,
 <Blob: statmike-mlops-349915, frameworks/catboost-prediction-feature-store/source/app/__init__.py, 1739199881498239>,
 <Blob: statmike-mlops-349915, frameworks/catboost-prediction-feature-store/source/app/main.py, 1739199881581336>,
 <Blob: statmike-mlops-349915, frameworks/catboost-prediction-feature-store/source/app/main2.py, 1739199881650277>,
 <Blob: statmike-mlops-349915, frameworks/catboost-prediction-feature-store/source/app/prestart.sh, 1739199881731120>,
 <Blob: statmike-mlops-349915, frameworks/catboost-prediction-feature-store/source/requirements.txt, 1739199881402787>]

---
### Build Application Container

Use the Cloud Build client to construct and run the build instructions. Here the files collected in GCS are copied to the build instance, then the Docker build is run in the folder with the `Dockerfile`. The resulting image is pushed to Artifact Registry (setup above).

In [64]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/source/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{EXPERIMENT}', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/{EXPERIMENT}"]

In [65]:
operation = cb.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [66]:
build_response = operation.result()
build_response.status, build_response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks/catboost-prediction-feature-store")

In [67]:
build_response.artifacts.images[0]

'us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks/catboost-prediction-feature-store'

---
## Test Locally

If Docker is installed and running locally then use it to test the image.

In [72]:
try:
    local_test = True
    docker_client = docker.from_env()
    if docker_client.ping():
        print(f"Docker is installed and running. Version: {docker_client.version()['Version']}")
    # Run the following command to configure gcloud as the credential helper for the Artifact Registry domain associated with this repository's location:
    config_docker = !gcloud auth configure-docker us-central1-docker.pkg.dev --quiet
except Exception as e:
    local_test = False
    print('Docker is either not installed or not running - please fix before proceeding.\nhttps://docs.docker.com/engine/install/')

Docker is installed and running. Version: 20.10.17


### Pull and Run Container

Run the container image with:
- ports: inside 8080 mapped to outside 80
- set environment variables for:
    - `AIP_HTTP_PORT` is `8080`
    - `AIP_HEALTH_ROUTE` is `/health`
    - `AIP_PREDICT_ROUTE` is `/predict`
    - `AIP_STORAGE_URI` is the `gs://bucket/path/to/folder`
    - `MODULE_NAME` is 'main'
        - this actually defaults to main so is not required
        - an alternative script with different prediction output is created in `main2.py` above
        - use this environment variable to start the container using the alternative script in module `main2`
        - see the [FastAPI Docker Image Advanced Usage](https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker?tab=readme-ov-file#advanced-usage) details

In [73]:
if local_test:
    # make sure any prior runs are stopped:
    try:
        container = docker_client.containers.get('local-run')
        container.stop()
        container.remove()
    except docker.errors.NotFound:
        pass
    
    # get image:
    image_uri = build_response.artifacts.images[0]
    try:
        local_image = docker_client.images.get(image_uri)
        remote_image = docker_client.images.pull(image_uri)
        if local_image.id != remote_image.id:
            print('New image found, updating ...')
            local_image = remote_image
        else:
            print('Using existing image ...')
    except docker.errors.ImageNotFound:
        print('Pulling image ...')
        local_image = docker_client.images.pull(image_uri)
        
    # run container:
    print('Starting container ...')
    container = docker_client.containers.run(
        image = image_uri,
        detach = True,
        ports = {'8080/tcp':80}, # Map inside:outside (where docker run -p is outside:inside)
        name = 'local-run',
        environment = {
            'AIP_HTTP_PORT': '8080',
            'AIP_HEALTH_ROUTE': '/health',
            'AIP_PREDICT_ROUTE': '/predict',
            'AIP_STORAGE_URI': f'gs://{bucket.name}/{SERIES}/catboost-overview',
            'MODULE_NAME': 'main2' # try main or main2 for alternative output
        }
    )
    print('Container ready.\n\tUse `container.logs()` to view startup logs.')

Pulling image ...
Starting container ...
Container ready.
	Use `container.logs()` to view startup logs.


In [74]:
#container.logs()

### Health Check

Want to see `200`:

In [75]:
# wait a few seconds
time.sleep(20)

In [76]:
if local_test:
    response = requests.get(f"http://localhost:80/health")
    print(response.status_code)

200


### Inference Test

In [77]:
def predict(instances):
    url = f"http://localhost:80/predict"
    headers = {'Content_Type': 'application/json'}
    data = json.dumps({'instances': instances})
    response = requests.post(url, headers = headers, data = data)
    return json.loads(response.text)

In [78]:
predict(transaction_ids[0:1])

{'predictions': [{'classes': ['0', '1'],
   'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008',
   'scores': [0.9684653609075246, 0.031534639092475496],
   'predicted_class': '0'}]}

In [79]:
predict(transaction_ids[1:2])

{'predictions': [{'classes': ['0', '1'],
   'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561',
   'scores': [0.004793214059859108, 0.9952067859401409],
   'predicted_class': '1'}]}

In [80]:
predict(transaction_ids[0:2])

{'predictions': [{'classes': ['0', '1'],
   'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008',
   'scores': [0.9684653609075246, 0.031534639092475496],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561',
   'scores': [0.004793214059859108, 0.9952067859401409],
   'predicted_class': '1'}]}

In [81]:
predict(transaction_ids)

{'predictions': [{'classes': ['0', '1'],
   'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008',
   'scores': [0.9684653609075246, 0.031534639092475496],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561',
   'scores': [0.004793214059859108, 0.9952067859401409],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'entity_id': 'd13fbb1a-d4cb-40c6-bfd7-05cd714cee30',
   'scores': [0.9961659959204033, 0.0038340040795967153],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'entity_id': '18327e33-3ff6-43d1-8bbc-8647adc2044d',
   'scores': [0.009687501567520629, 0.9903124984324794],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'entity_id': 'ec505a29-387f-4b3d-a03b-67375d9140e1',
   'scores': [0.05002634033491404, 0.949973659665086],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'entity_id': '1b7b6e26-7248-420c-9eb5-dd9692437843',
   'scores': [0.9988754743912287, 0.0011245256087712965],
   'predict

### Stop Container

In [82]:
container.name

'local-run'

In [83]:
container = docker_client.containers.get(container.name)

In [84]:
container.status

'running'

In [85]:
container.stop()
container.remove()

---
## Cloud Run

Deploy the model to [Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run) using the same container build and tested above from Artifact Registry.

Some highlights for Cloud Run:
- Rapid scaling to handle requests
- Scale to zero (default) or other minimum if set
- Can handle larger input (request) and output (response) sizes
    - See [requests limits](https://cloud.google.com/run/quotas#request_limits)
- Configure [memory limits](https://cloud.google.com/run/docs/configuring/services/memory-limits) and [cpu limits](https://cloud.google.com/run/docs/configuring/services/cpu) and [concurrency](https://cloud.google.com/run/docs/configuring/concurrency) and [autoscaling](https://cloud.google.com/run/docs/about-instance-autoscaling) and [request timeout](https://cloud.google.com/run/docs/configuring/request-timeout)

### Deploy The Endpoint(s)

In [111]:
def start_cloud_run_service(MODULE, name_suffix):
    # define the service:
    parent = f"projects/{PROJECT_ID}/locations/{REGION}"
    service = run_v2.Service()
    service.template.containers = [
        run_v2.Container(
            image = build_response.artifacts.images[0],
            ports = [run_v2.ContainerPort(container_port = 8080)],
            env = [
                run_v2.EnvVar(name = 'AIP_HTTP_PORT', value = '8080'),
                run_v2.EnvVar(name = 'AIP_HEALTH_ROUTE', value = '/health'),
                run_v2.EnvVar(name = 'AIP_PREDICT_ROUTE', value = '/predict'),
                run_v2.EnvVar(name = 'AIP_STORAGE_URI', value = f'gs://{bucket.name}/{SERIES}/catboost-overview'),
                run_v2.EnvVar(name = 'MODULE_NAME', value = MODULE)
            ],
            resources = run_v2.ResourceRequirements(
                limits = {"cpu": '8', "memory": '32Gi'}
            )
        )
    ]
    service.ingress = run_v2.IngressTraffic.INGRESS_TRAFFIC_INTERNAL_ONLY
    
    # start the service:
    try:
        # create the service:
        run_response = cr.create_service(request = {
            "parent": parent,
            "service": service,
            "service_id": SERIES+'-'+EXPERIMENT+'-'+name_suffix
        })
        # wait on the operation to complete:
        run_response.result()
        # print the name of the service
        print(f"Started Service: {run_response.metadata.name}")
        return run_response
    
    except Exception as e:
        print(f"Error creating service: {e}")
        return

In [112]:
service_start_main = start_cloud_run_service('main', '1')

Started Service: projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost-prediction-feature-store-1


In [113]:
service_start_main.metadata.uri

'https://frameworks-catboost-prediction-feature-store-1-urlxi72dpa-uc.a.run.app'

In [114]:
service_start_main.metadata.name

'projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost-prediction-feature-store-1'

In [115]:
service_start_main2 = start_cloud_run_service('main2', '2')

Started Service: projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost-prediction-feature-store-2


In [116]:
service_start_main2.metadata.uri

'https://frameworks-catboost-prediction-feature-store-2-urlxi72dpa-uc.a.run.app'

### Permissions

The endpoint requires authentication.  Check outthe [Authentication Overview](https://cloud.google.com/run/docs/authenticating/overview) and in the case below the [Authenticating service-to-service](https://cloud.google.com/run/docs/authenticating/service-to-service) method is used by giving the same service account used to run the notebook and create the endpoint the role to invoke the endpoint as well.

In [117]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

In [118]:
def set_policy(service_name):
    policy = cr.get_iam_policy(request = {'resource': service_name})
    policy.bindings.add(
        role = 'roles/run.invoker',
        members = [f"serviceAccount:{SERVICE_ACCOUNT}", 'allUsers'] #'allUsers'
    )
    policy_response = cr.set_iam_policy(request = {"resource": service_name, "policy": policy})
    print(f"IAM policy updated: {policy_response.bindings}")
    return policy

In [119]:
policy_main = set_policy(service_start_main.metadata.name)

IAM policy updated: [role: "roles/run.invoker"
members: "allUsers"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
]


In [120]:
policy_main2 = set_policy(service_start_main2.metadata.name)

IAM policy updated: [role: "roles/run.invoker"
members: "allUsers"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
]


**WAIT: The update of the IAM Policy might take a few moments to take effect.  Rerun the following health check section until you get a `200` response code.**

### Health Check

Want to see `200`:

In [121]:
def health(uri):
    url = f"{uri}/health"
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    headers = {'Authorization': f'Bearer {credentials.token}'}
    response = requests.get(url, headers = headers)    
    return response.status_code

def check_health(uri, timeout_seconds = 200, retry_seconds = 10):
    start_time = time.time()
    while True:
        status_code = health(uri)
        if status_code == 200:
            break
        elapsed_time = time.time() - start_time
        if elapsed_time > timeout_seconds:
            break
        time.sleep(retry_seconds)
    return status_code

In [122]:
check_health(service_start_main.metadata.uri)

200

In [123]:
check_health(service_start_main2.metadata.uri)

200

In [124]:
health(service_start_main.metadata.uri), health(service_start_main2.metadata.uri)

(200, 200)

### Inference Test

In [125]:
def predict(instances, endpoint):
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    url = f"{endpoint}/predict"
    headers = {
        'Authorization': f'Bearer {credentials.token}',
        'Content_Type': 'application/json'
    }
    data = json.dumps({'instances': instances})
    response = requests.post(url, headers = headers, data = data)    
    return json.loads(response.text)

In [126]:
predict(transaction_ids[0:1], service_start_main.metadata.uri)

{'predictions': [[0.9684653609075246, 0.031534639092475496]]}

In [127]:
predict(transaction_ids[0:1], service_start_main2.metadata.uri)

{'predictions': [{'classes': ['0', '1'],
   'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008',
   'scores': [0.9684653609075246, 0.031534639092475496],
   'predicted_class': '0'}]}

In [128]:
predict(transaction_ids[1:2], service_start_main.metadata.uri)

{'predictions': [[0.004793214059859108, 0.9952067859401409]]}

In [129]:
predict(transaction_ids[0:2], service_start_main2.metadata.uri)

{'predictions': [{'classes': ['0', '1'],
   'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008',
   'scores': [0.9684653609075246, 0.031534639092475496],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561',
   'scores': [0.004793214059859108, 0.9952067859401409],
   'predicted_class': '1'}]}

In [130]:
predict(transaction_ids, service_start_main.metadata.uri)

{'predictions': [[0.9684653609075246, 0.031534639092475496],
  [0.004793214059859108, 0.9952067859401409],
  [0.9961659959204033, 0.0038340040795967153],
  [0.009687501567520629, 0.9903124984324794],
  [0.05002634033491404, 0.949973659665086],
  [0.9988754743912287, 0.0011245256087712965],
  [0.0058287633360615265, 0.9941712366639385],
  [0.0031326212134841214, 0.9968673787865159],
  [0.8122003166930434, 0.18779968330695668],
  [0.007672836590005838, 0.9923271634099942]]}

### Remove The Service(s)

Cloud Run will scale to zero here since a minimum has not been set.  This notebook does proceed with deleting the service in the following code.

In [131]:
remove_response_main = cr.delete_service(request = {"name": service_start_main.metadata.name})

In [132]:
remove_response_main2 = cr.delete_service(request = {"name": service_start_main2.metadata.name})

In [133]:
remove_response_main.result()

name: "projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost-prediction-feature-store-1"
uid: "10c638ba-ce33-4d79-b97b-8fc06245822d"
generation: 2
create_time {
  seconds: 1739191764
  nanos: 471204000
}
update_time {
  seconds: 1739192018
  nanos: 868980000
}
delete_time {
  seconds: 1739192018
  nanos: 68081000
}
expire_time {
  seconds: 1741784018
  nanos: 68081000
}
creator: "1026793852137-compute@developer.gserviceaccount.com"
last_modifier: "1026793852137-compute@developer.gserviceaccount.com"
ingress: INGRESS_TRAFFIC_INTERNAL_ONLY
launch_stage: GA
template {
  scaling {
    max_instance_count: 100
  }
  timeout {
    seconds: 300
  }
  service_account: "1026793852137-compute@developer.gserviceaccount.com"
  containers {
    image: "us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks/catboost-prediction-feature-store"
    env {
      name: "AIP_HTTP_PORT"
      value: "8080"
    }
    env {
      name: "AIP_HEALTH_ROUTE"
      value: "/health"


---
## Vertex AI Prediction Endpoint

Register the model in the [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) and [Deploy it to an endpoint](https://cloud.google.com/vertex-ai/docs/general/deployment).

**Note 1** that Vertex AI automatically sets the environment variables that start with `AIP_` so they should not be supplied in the model setup.  Read more in [Environment variables available in the container](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#variables).

**Note 2** Vertex AI Endpoints Launch with the [permissions](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#permissions) of the service agent.  If your container needs to run as a specific IAM user, like a service account, with permission to other resources then you can supply the `SERVICE_ACCOUNT` when deploying the model which is used below.  Read more about this parameter in the Python API guide [ for `Model.deploy()` here](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model#google_cloud_aiplatform_Model_deploy).

### Model Registry

Check for existing version of the model:

In [69]:
parent_model = ''
for model in aiplatform.Model.list(filter=f'display_name="{SERIES}"'):
    parent_model = model.resource_name
    break
parent_model

'projects/1026793852137/locations/us-central1/models/frameworks'

Upload the model to the registry with different versions for:
- plain responses - using main.py which is default
- formated responses - using main2.py which is set with environment variable

In [70]:
vertex_model = aiplatform.Model.upload(
    display_name = SERIES,
    model_id = SERIES,
    parent_model = parent_model,
    serving_container_image_uri = build_response.artifacts.images[0],
    artifact_uri = f'gs://{bucket.name}/{SERIES}/catboost-overview',
    is_default_version = True,
    version_aliases = [f'{EXPERIMENT}-plain'],
    version_description = EXPERIMENT,
    labels = {'series': SERIES, 'experiment': EXPERIMENT}
)

Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/frameworks/operations/6278860003382132736
Model created. Resource name: projects/1026793852137/locations/us-central1/models/frameworks@3
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/frameworks@3')


In [71]:
vertex_model = aiplatform.Model.upload(
    parent_model = vertex_model.resource_name,
    serving_container_image_uri = build_response.artifacts.images[0],
    serving_container_environment_variables = {'MODULE_NAME': 'main2'},
    artifact_uri = f'gs://{bucket.name}/{SERIES}/catboost-overview',
    is_default_version = False,
    version_aliases = [f'{EXPERIMENT}-formatted'],
    version_description = EXPERIMENT,
    labels = {'series': SERIES, 'experiment': EXPERIMENT}
)

Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/frameworks/operations/7596162894388002816
Model created. Resource name: projects/1026793852137/locations/us-central1/models/frameworks@4
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/frameworks@4')


### Create Endpoint

Check for existing endpoint:

In [72]:
vertex_endpoint = None
for endpoint in aiplatform.Endpoint.list(filter=f'display_name="{SERIES}"'):
    vertex_endpoint = endpoint
    break
vertex_endpoint

Create endpoint if missing:

In [73]:
if not vertex_endpoint:
    vertex_endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series': SERIES}   
    )
vertex_endpoint

Creating Endpoint
Create Endpoint backing LRO: projects/1026793852137/locations/us-central1/endpoints/757089622026092544/operations/521007879788953600
Endpoint created. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/1026793852137/locations/us-central1/endpoints/757089622026092544')


<google.cloud.aiplatform.models.Endpoint object at 0x7f9dc81f6530> 
resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544

### Deploy Model: Default version with `main.py` - Plain

Get the latest model version with alias `plain`:

In [74]:
vertex_model = aiplatform.Model(model_name = SERIES, version = f'{EXPERIMENT}-plain')
vertex_model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/frameworks@3'

In [75]:
vertex_model.deploy(
    endpoint = vertex_endpoint,
    traffic_percentage = 100,
    machine_type = 'n1-standard-4',
    min_replica_count = 1,
    max_replica_count = 2,
    service_account = SERVICE_ACCOUNT
)

Deploying model to Endpoint : projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/757089622026092544/operations/8029071408568991744
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544


<google.cloud.aiplatform.models.Endpoint object at 0x7f9dc81f6530> 
resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544

### Test Predictions: Plain

In [76]:
vertex_endpoint.predict(instances = transaction_ids[0:1])

Prediction(predictions=[[0.9684653609075246, 0.0315346390924755]], deployed_model_id='8642070134854254592', metadata=None, model_version_id='3', model_resource_name='projects/1026793852137/locations/us-central1/models/frameworks', explanations=None)

In [77]:
vertex_endpoint.predict(instances = transaction_ids[1:2]).predictions

[[0.004793214059859108, 0.9952067859401409]]

In [78]:
vertex_endpoint.predict(instances = transaction_ids[0:2]).predictions

[[0.9684653609075246, 0.0315346390924755],
 [0.004793214059859108, 0.9952067859401409]]

In [79]:
vertex_endpoint.predict(instances = transaction_ids).predictions

[[0.9684653609075246, 0.0315346390924755],
 [0.004793214059859108, 0.9952067859401409],
 [0.9961659959204033, 0.003834004079596715],
 [0.009687501567520629, 0.9903124984324794],
 [0.05002634033491404, 0.949973659665086],
 [0.9988754743912287, 0.001124525608771297],
 [0.005828763336061527, 0.9941712366639385],
 [0.003132621213484121, 0.9968673787865159],
 [0.8122003166930434, 0.1877996833069567],
 [0.007672836590005838, 0.9923271634099942]]

### Deploy Model: Version with `main2.py` - Formatted

In [80]:
vertex_model = aiplatform.Model(model_name = SERIES, version = f'{EXPERIMENT}-formatted')
vertex_model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/frameworks@4'

In [81]:
vertex_model.deploy(
    endpoint = vertex_endpoint,
    traffic_percentage = 0,
    machine_type = 'n1-standard-4',
    min_replica_count = 1,
    max_replica_count = 2,
    service_account = SERVICE_ACCOUNT
)

Deploying model to Endpoint : projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/757089622026092544/operations/4034378539091361792
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544


<google.cloud.aiplatform.models.Endpoint object at 0x7f9dc81f6530> 
resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544

### Shift Traffic To Model Version: Formatted

In [82]:
vertex_endpoint.traffic_split

{'8642070134854254592': 100, '2101717576005451776': 0}

In [83]:
new_traffic_split = {}
for deployed_model in vertex_endpoint.list_models():
    if deployed_model.model_version_id == vertex_model.version_id:
        new_traffic_split[deployed_model.id] = 100
    else:
        new_traffic_split[deployed_model.id] = 0
new_traffic_split

{'2101717576005451776': 100, '8642070134854254592': 0}

In [84]:
vertex_endpoint.update(traffic_split = new_traffic_split)

Updating Endpoint endpoint: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Endpoint endpoint updated. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544


<google.cloud.aiplatform.models.Endpoint object at 0x7f9dc81f6530> 
resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544

In [85]:
vertex_endpoint.traffic_split

{'8642070134854254592': 0, '2101717576005451776': 100}

### Test Predictions: Formatted

In [86]:
vertex_endpoint.predict(instances = transaction_ids[0:1])

Prediction(predictions=[{'predicted_class': '0', 'classes': ['0', '1'], 'scores': [0.9684653609075246, 0.0315346390924755], 'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008'}], deployed_model_id='2101717576005451776', metadata=None, model_version_id='4', model_resource_name='projects/1026793852137/locations/us-central1/models/frameworks', explanations=None)

In [87]:
vertex_endpoint.predict(instances = transaction_ids[1:2]).predictions

[{'predicted_class': '1',
  'classes': ['0', '1'],
  'scores': [0.004793214059859108, 0.9952067859401409],
  'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561'}]

In [88]:
vertex_endpoint.predict(instances = transaction_ids[0:2]).predictions

[{'classes': ['0', '1'],
  'predicted_class': '0',
  'scores': [0.9684653609075246, 0.0315346390924755],
  'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008'},
 {'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.004793214059859108, 0.9952067859401409],
  'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561'}]

In [89]:
vertex_endpoint.predict(instances = transaction_ids).predictions

[{'classes': ['0', '1'],
  'predicted_class': '0',
  'scores': [0.9684653609075246, 0.0315346390924755],
  'entity_id': '28f7cfa6-5862-428a-8d35-c9e08d297008'},
 {'predicted_class': '1',
  'classes': ['0', '1'],
  'scores': [0.004793214059859108, 0.9952067859401409],
  'entity_id': '1dbfe576-2c52-4f94-bd88-db1e3e294561'},
 {'classes': ['0', '1'],
  'predicted_class': '0',
  'scores': [0.9961659959204033, 0.003834004079596715],
  'entity_id': 'd13fbb1a-d4cb-40c6-bfd7-05cd714cee30'},
 {'predicted_class': '1',
  'classes': ['0', '1'],
  'scores': [0.009687501567520629, 0.9903124984324794],
  'entity_id': '18327e33-3ff6-43d1-8bbc-8647adc2044d'},
 {'predicted_class': '1',
  'classes': ['0', '1'],
  'scores': [0.05002634033491404, 0.949973659665086],
  'entity_id': 'ec505a29-387f-4b3d-a03b-67375d9140e1'},
 {'predicted_class': '0',
  'classes': ['0', '1'],
  'scores': [0.9988754743912287, 0.001124525608771297],
  'entity_id': '1b7b6e26-7248-420c-9eb5-dd9692437843'},
 {'predicted_class': '1',


### Shift Traffic To Model: Plain

In [90]:
vertex_endpoint.traffic_split

{'8642070134854254592': 0, '2101717576005451776': 100}

In [91]:
new_traffic_split = {}
for deployed_model in vertex_endpoint.list_models():
    if deployed_model.model_version_id != vertex_model.version_id:
        new_traffic_split[deployed_model.id] = 100
    else:
        new_traffic_split[deployed_model.id] = 0
vertex_endpoint.update(traffic_split = new_traffic_split)

Updating Endpoint endpoint: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Endpoint endpoint updated. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544


<google.cloud.aiplatform.models.Endpoint object at 0x7f9dc81f6530> 
resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544

In [92]:
vertex_endpoint.traffic_split

{'8642070134854254592': 100, '2101717576005451776': 0}

In [93]:
vertex_endpoint.predict(instances = transaction_ids[0:2]).predictions

[[0.9684653609075246, 0.0315346390924755],
 [0.004793214059859108, 0.9952067859401409]]

### Undeploy Models Without Traffic

In [94]:
for deployed_model in vertex_endpoint.list_models():
    if vertex_endpoint.traffic_split[deployed_model.id] == 0:
        vertex_endpoint.undeploy(deployed_model_id = deployed_model.id)
vertex_endpoint.traffic_split

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/757089622026092544/operations/8120832250976665600
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544


{'8642070134854254592': 100}

### Undeploy All Models

In [95]:
for deployed_model in vertex_endpoint.list_models():
    vertex_endpoint.undeploy(deployed_model_id = deployed_model.id)
vertex_endpoint.list_models()

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/757089622026092544/operations/3211908657142824960
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544


[]

### Delete Endpoint

In [96]:
vertex_endpoint.delete(force = True)

Deleting Endpoint : projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Endpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Deleting Endpoint resource: projects/1026793852137/locations/us-central1/endpoints/757089622026092544
Delete Endpoint backing LRO: projects/1026793852137/locations/us-central1/operations/6218061408412631040
Endpoint resource projects/1026793852137/locations/us-central1/endpoints/757089622026092544 deleted.
