# 03Tools - Predictions

>**Note:** Formerly named `03b - Vertex AI + BQML - Online Predictions with BQML Models.ipynb`.  This [link](https://github.com/statmike/vertex-ai-mlops/blob/fd442b458c710a0a7afdc41bae690d2a3282e93c/03b%20-%20Vertex%20AI%20%2B%20BQML%20-%20Online%20Predictions%20with%20BQML%20Models.ipynb) goes to the previous version featured in the video.

Models built with BigQuery ML (BQML), like the one in `03a`, can also be exported for use and deployment outside of BigQuery.  Registering the model in Vertex AI Model Registry makes it easy to use with Vertex AI Endpoints for online predictions and with Vertex AI Batch Predictions Jobs.  The model also remains in BigQuery for further use like batch prediction directly with `ML.PREDICT` as shown previously in `03a` through `03f`.

**Video Walkthrough of this notebook:**

Includes conversational walkthrough and more explanatory information than the notebook:

<p align="center" width="100%"><center><a href="https://youtu.be/7y_t_bW0LHQ" target="_blank" rel="noopener noreferrer"><img src="../architectures/thumbnails/playbutton/03tools_pred.png" width="40%"></a></center></p>

Notes Since Video:
- updated notebook on 9/8/2022 to:
    - exporting and registering the model in Vertex AI Model Registry is now part of the model training notebookss `03a` through `03f`
    - reworked this example to work with any BQML export: TensorFlow or XGBoost
    - switch to raw prediction clients to allow specification of signature for TensorFlow models


**Prerequisites:**
-  03a - BQML Logistic Regression

**Resources:**
-  [Export formats for BigQuery ML models](https://cloud.google.com/bigquery-ml/docs/exporting-models)
-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)

**Conceptual Flow & Workflow**
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/03tools_pred_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/03tools_pred_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'google.com:cbo-mnl'

In [2]:
REGION = 'us-central1'
EXPERIMENT = '03b' # pick the 03 series model you want to use
SERIES = '03'

# Replace model on endpoint with the one for this experiment?
REPLACE_MODEL = False

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'
BQ_MODEL = f'{EXPERIMENT}_{BQ_DATASET}'

# Resources for serving BigQuery Model Exports
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

>**Notes For Resources**
This series uses BigQuery ML (BQML) models.  Depending on the model type the export file match the underlying framework (TensorFlow, XGBoost, ...).  These export formats are [specified here](https://cloud.google.com/bigquery-ml/docs/exporting-models).<p>When registering the model in the Vertex AI Model Registry a URI for a serving container is specified.  Pre-built serving containers are available for frameworks and version as [specified here](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers).</p>

packages:

In [3]:
from google.cloud import aiplatform
from datetime import datetime

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [5]:
DIR = f"temp/{EXPERIMENT}"

environment:

In [6]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Serving With Vertex AI Endpoints

### Retrieve The Model From Vertex AI Model Registry
In each of the model training technique notebooks `03a` through `03f` the final model training in BigQuery ML (BQML) was exported and registered in the Vertex AI Model Registry.  The first step here is retrieving the model resource representation:

In [7]:
model = aiplatform.Model(model_name = f'model_{EXPERIMENT}_{BQ_DATASET}@default')

In [8]:
model.display_name

'03b_fraud'

In [9]:
model.resource_name

'projects/169853841455/locations/us-central1/models/model_03b_fraud'

In [10]:
model.versioned_resource_name

'projects/169853841455/locations/us-central1/models/model_03b_fraud@1'

In [11]:
model.version_aliases

['run-20220929061623', 'default']

In [12]:
model.labels

{'series': '03',
 'timestamp': '20220929061623',
 'run_name': 'run-20220929061623',
 'framework': 'xgboost',
 'experiment': '03b'}

In [13]:
model.uri

'gs://jonas-ai-experiment/fraud/models/03/03b/20220929061623/model'

### Create An Endpoint
References:
- Python SDK for [`aiplatform.Endpoint`](https://googleapis.dev/python/aiplatform/latest/aiplatform/services.html#google.cloud.aiplatform.Endpoint)
- Python Client for [`aiplatform.Endpoint`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint)

In [14]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}_{BQ_DATASET}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")

Creating Endpoint
Create Endpoint backing LRO: projects/169853841455/locations/us-central1/endpoints/2680209126285377536/operations/933224031811796992
Endpoint created. Resource name: projects/169853841455/locations/us-central1/endpoints/2680209126285377536
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/169853841455/locations/us-central1/endpoints/2680209126285377536')
Endpoint Created: projects/169853841455/locations/us-central1/endpoints/2680209126285377536


In [15]:
endpoint.display_name

'03_fraud'

In [16]:
endpoint.traffic_split

{}

In [17]:
deployed_models = endpoint.list_models()
deployed_models

[]

In [18]:
print(f"View the endpoint in the Vertex AI Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.resource_name.split('/')[-1]}?project={PROJECT_ID}")

View the endpoint in the Vertex AI Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/2680209126285377536?project=google.com:cbo-mnl


### Deploy Model To Endpoint

In [None]:
if REPLACE_MODEL or len(deployed_models) == 0:
    endpoint.deploy(
        model = model,
        deployed_model_display_name = model.display_name,
        traffic_percentage = 100,
        machine_type = DEPLOY_COMPUTE,
        min_replica_count = 1,
        max_replica_count = 1
    )

Deploying Model projects/169853841455/locations/us-central1/models/model_03b_fraud to Endpoint : projects/169853841455/locations/us-central1/endpoints/2680209126285377536
Deploy Endpoint model backing LRO: projects/169853841455/locations/us-central1/endpoints/2680209126285377536/operations/1582305328106569728


### Remove Deployed Models without Traffic

In [28]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 03b_fraud with version 1 has traffic = 100


In [29]:
endpoint.traffic_split

{'7641152814252556288': 100}

In [30]:
endpoint.list_models()

[id: "7641152814252556288"
 model: "projects/169853841455/locations/us-central1/models/model_03b_fraud"
 display_name: "03b_fraud"
 create_time {
   seconds: 1664446486
   nanos: 124743000
 }
 dedicated_resources {
   machine_spec {
     machine_type: "n1-standard-4"
   }
   min_replica_count: 1
   max_replica_count: 1
 }
 model_version_id: "1"]

### Retrieve The Deployed Model
This gets used later to determine the framework used in order to shape the instances for prediction correctly.

In [31]:
model = aiplatform.Model(model_name = f'{endpoint.list_models()[0].model}@{endpoint.list_models()[0].model_version_id}')

In [32]:
model.uri

'gs://jonas-ai-experiment/fraud/models/03/03b/20220929061623/model'

---
## Prediction

### Retrieve Records For Prediction

In [52]:
n = 10
pred = bq.query(query = f"SELECT * FROM {BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' and CLASS=1 LIMIT {n}").to_dataframe()

In [53]:
pred.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,85285,-7.030308,3.421991,-9.525072,5.270891,-4.02463,-2.865682,-6.989195,3.791551,-4.62273,...,0.036943,-0.355519,0.353634,1.042458,1.359516,-0.272188,0.0,1,b506c850-fd39-4d94-82ef-dde2d9bf443d,TEST
1,165981,-5.766879,-8.402154,0.056543,6.950983,9.880564,-5.773192,-5.748879,0.721743,-1.076274,...,2.241471,0.665346,-1.890041,-0.120803,0.073269,0.583799,0.0,1,99fc6a30-d542-4ae5-a4dd-47f09c097502,TEST
2,126219,-1.141559,1.92765,-3.905356,-0.073943,-0.044858,-1.756999,-1.217416,0.364563,-2.770148,...,-0.328741,0.3931,0.568435,0.786605,-0.146102,0.076211,25.0,1,09fee55d-f791-4134-baf6-c260d575a2de,TEST
3,40662,-4.446847,-0.014793,-5.126307,6.94513,5.269255,-4.297177,-2.591242,0.342671,-3.880663,...,-0.226017,-0.401236,0.856124,0.661272,0.49256,0.971834,1.0,1,2744f6a8-6caf-45f9-b4ab-b4f84da3f50f,TEST
4,68207,-13.192671,12.785971,-9.90665,3.320337,-4.801176,5.760059,-18.750889,-37.353443,-0.39154,...,5.303607,-0.639435,0.263203,-0.108877,1.269566,0.939407,1.0,1,26968001-1e83-4720-835e-3b5a39fb5194,TEST


Shape as instances: dictionaries of key:value pairs for only features used in model

In [36]:
newobs = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET,'splits'])]].to_dict(orient='records')
#newobs[0]

In [37]:
len(newobs)

10

In [38]:
newobs[0]

{'Time': 5043,
 'V1': -0.610352895751295,
 'V2': 0.8762678436336991,
 'V3': 3.13457187529889,
 'V4': 2.26016851428487,
 'V5': 0.00118499266188011,
 'V6': 0.268439122726156,
 'V7': 0.12709401840541,
 'V8': -0.0086801339830761,
 'V9': 0.95280234430319,
 'V10': -0.144996781876633,
 'V11': -0.18135310744176503,
 'V12': -3.07117304114555,
 'V13': 1.00476719678711,
 'V14': 0.841752604462658,
 'V15': -1.14101969201816,
 'V16': 0.397014218420624,
 'V17': 0.290108744949502,
 'V18': 0.303356168060438,
 'V19': -0.8192219202785891,
 'V20': -0.0553143112003825,
 'V21': -0.20228040102074302,
 'V22': -0.122893247243976,
 'V23': -0.18313204469056302,
 'V24': 0.295979318446426,
 'V25': -0.159988797716441,
 'V26': -0.130196187091873,
 'V27': -0.0761391830787327,
 'V28': -0.10907594107838602,
 'Amount': 0.0}

### Get The Model Signature Name (if TensorFlow)

In [39]:
if model.labels['framework'] == 'tensorflow':
    import tensorflow as tf
    reloaded_model = tf.saved_model.load(model.uri)
    print(list(reloaded_model.signatures.keys())[0])

### Get The Feature Order (if XGBoost)

In [40]:
if model.labels['framework'] == 'xgboost':
    import gcsfs
    import tensorflow as tf
    file = f'{model.uri}/assets/model_metadata.json'
    if tf.io.gfile.exists(file):
        gcs = gcsfs.GCSFileSystem(project = PROJECT_ID)
        with gcs.open(file) as fp:
            features = json.load(fp)['feature_names']
    else:
        features = list(newobs[0].keys())

### Prepare Instance For Prediction

Depending on which framework the model is trained with the instance format may be different.  More information can be found [here](https://cloud.google.com/vertex-ai/docs/predictions/online-predictions-custom-models#request-body-details).

Instances:

In [54]:
print(model.labels)

{'experiment': '03b', 'framework': 'xgboost', 'series': '03', 'timestamp': '20220929061623', 'run_name': 'run-20220929061623'}


In [41]:
from google.api import httpbody_pb2

if model.labels['framework'] == 'tensorflow':
    instances = {"instances": [newobs[0]], "signature_name": list(reloaded_model.signatures.keys())[0]}
elif model.labels['framework'] == 'xgboost':
    instances = {"instances": [[newobs[0][f] for f in features]]}
http_body = httpbody_pb2.HttpBody(data = json.dumps(instances).encode("utf-8"), content_type = "application/json")

In [42]:
print(instances)

{'instances': [[5043, -0.610352895751295, 0.8762678436336991, 3.13457187529889, 2.26016851428487, 0.00118499266188011, 0.268439122726156, 0.12709401840541, -0.0086801339830761, 0.95280234430319, -0.144996781876633, -0.18135310744176503, -3.07117304114555, 1.00476719678711, 0.841752604462658, -1.14101969201816, 0.397014218420624, 0.290108744949502, 0.303356168060438, -0.8192219202785891, -0.0553143112003825, -0.20228040102074302, -0.122893247243976, -0.18313204469056302, 0.295979318446426, -0.159988797716441, -0.130196187091873, -0.0761391830787327, -0.10907594107838602, 0.0]]}


### Get Predictions: Python Client

Using Raw Prediction here. This is methods of export from BigQuery have different model signatures and raw prediction client for Vertex AI allow the signature to be include in the request.

Client:

In [43]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
predictor = aiplatform.gapic.PredictionServiceClient(client_options = client_options)

Prediction:

In [58]:
print(f'Resource Name: {endpoint.resource_name}\n')
print(f'HTTP Body: {http_body}')

Resource Name: projects/169853841455/locations/us-central1/endpoints/2680209126285377536

HTTP Body: content_type: "application/json"
data: "{\"instances\": [[5043, -0.610352895751295, 0.8762678436336991, 3.13457187529889, 2.26016851428487, 0.00118499266188011, 0.268439122726156, 0.12709401840541, -0.0086801339830761, 0.95280234430319, -0.144996781876633, -0.18135310744176503, -3.07117304114555, 1.00476719678711, 0.841752604462658, -1.14101969201816, 0.397014218420624, 0.290108744949502, 0.303356168060438, -0.8192219202785891, -0.0553143112003825, -0.20228040102074302, -0.122893247243976, -0.18313204469056302, 0.295979318446426, -0.159988797716441, -0.130196187091873, -0.0761391830787327, -0.10907594107838602, 0.0]]}"



In [44]:
prediction = predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = http_body
)
prediction

content_type: "application/json"
data: "{\"predictions\": [[0.09640675783157349, 0.9035932421684265]]}"

Format raw prediction response using JSON:

In [45]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.09640675783157349, 0.9035932421684265]]}

### Get Predictions: REST

Prepare request:

In [46]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps(instances))

Prediction:

In [61]:
prediction = !curl -s POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict

prediction

['{"predictions": [[0.09640675783157349, 0.9035932421684265]]}']

Format raw prediction response using JSON:

In [48]:
prediction = json.loads(''.join([p.strip() for p in prediction]))
prediction

{'predictions': [[0.09640675783157349, 0.9035932421684265]]}

### Get Predictions: gcloud (CLI)

Prepare request:

In [49]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps(instances))

Prediction:

In [50]:
prediction = !gcloud beta ai endpoints raw-predict \
{endpoint.name.rsplit('/',1)[-1]} \
--region={REGION} --format="json" --request=@{DIR}/request.json

prediction

['Using endpoint [https://us-central1-aiplatform.googleapis.com/]',
 '{',
 '  "predictions": [',
 '    [',
 '      0.09640675783157349,',
 '      0.9035932421684265',
 '    ]',
 '  ]',
 '}']

Format raw prediction response using JSON:

In [51]:
prediction = json.loads("".join(prediction[1:]))
prediction

{'predictions': [[0.09640675783157349, 0.9035932421684265]]}

---
## Remove Resources
see notebook "99 - Cleanup"