# 03Tools - Predictions

>**Note:** Formerly named `03b - Vertex AI + BQML - Online Predictions with BQML Models.ipynb`.  This [link](https://github.com/statmike/vertex-ai-mlops/blob/fd442b458c710a0a7afdc41bae690d2a3282e93c/03b%20-%20Vertex%20AI%20%2B%20BQML%20-%20Online%20Predictions%20with%20BQML%20Models.ipynb) goes to the previous version featured in the video.

Models built with BigQuery ML (BQML), like the one in `03a`, can also be exported for use and deployment outside of BigQuery.  Registering the model in Vertex AI Model Registry makes it easy to use with Vertex AI Endpoints for online predictions and with Vertex AI Batch Predictions Jobs.  The model also remains in BigQuery for further use like batch prediction directly with `ML.PREDICT` as shown previously in `03a` through `03f`.

**Video Walkthrough of this notebook:**

Includes conversational walkthrough and more explanatory information than the notebook:

<p align="center" width="100%"><center><a href="https://youtu.be/7y_t_bW0LHQ" target="_blank" rel="noopener noreferrer"><img src="../architectures/thumbnails/playbutton/03tools_pred.png" width="40%"></a></center></p>

Notes Since Video:
- updated notebook on 9/8/2022 to:
    - exporting and registering the model in Vertex AI Model Registry is now part of the model training notebookss `03a` through `03f`
    - reworked this example to work with any BQML export: TensorFlow or XGBoost
    - switch to raw prediction clients to allow specification of signature for TensorFlow models


**Prerequisites:**
-  03a - BQML Logistic Regression

**Resources:**
-  [Export formats for BigQuery ML models](https://cloud.google.com/bigquery-ml/docs/exporting-models)
-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)
- [BigQuery](https://cloud.google.com/bigquery)
    - [Documentation:](https://cloud.google.com/bigquery/docs/query-overview)
    - [API:](https://cloud.google.com/bigquery/docs/reference/libraries-overview)
        - [Clients](https://cloud.google.com/bigquery/docs/reference/libraries)
            - [Python SDK:](https://github.com/googleapis/python-bigquery)
            - [Python Library Reference:](https://cloud.google.com/python/docs/reference/bigquery/latest)
- [Vertex AI](https://cloud.google.com/vertex-ai)
    - [Documentation:](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform)
    - [API:](https://cloud.google.com/vertex-ai/docs/reference)
        - [Clients:](https://cloud.google.com/vertex-ai/docs/start/client-libraries)
            - [Python SDK:](https://github.com/googleapis/python-aiplatform)
            - [Python Library Reference:](https://cloud.google.com/python/docs/reference/aiplatform/latest)
            
**Conceptual Flow & Workflow**
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/03tools_pred_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/03tools_pred_console.png" width="45%">
</p>

---
## Setup

inputs:

In [154]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [155]:
REGION = 'us-central1'
EXPERIMENT = '03a' # pick the 03 series model you want to use
SERIES = '03'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'
BQ_MODEL = f'{SERIES}_{EXPERIMENT}'

# Resources for serving BigQuery Model Exports
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

>**Notes For Resources**
This series uses BigQuery ML (BQML) models.  Depending on the model type the export file match the underlying framework (TensorFlow, XGBoost, ...).  These export formats are [specified here](https://cloud.google.com/bigquery-ml/docs/exporting-models).<p>When registering the model in the Vertex AI Model Registry a URI for a serving container is specified.  Pre-built serving containers are available for frameworks and version as [specified here](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers).</p>

packages:

In [156]:
from google.cloud import aiplatform
from datetime import datetime

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

from google.api import httpbody_pb2

clients:

In [157]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [158]:
DIR = f"temp/{EXPERIMENT}"

environment:

In [159]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Serving With Vertex AI Endpoints

### Retrieve The Model From Vertex AI Model Registry
In each of the model training technique notebooks `03a` through `03f` the final model training in BigQuery ML (BQML) was exported and registered in the Vertex AI Model Registry.  The first step here is retrieving the model resource representation:

In [160]:
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}@default')

In [161]:
model.display_name

'03_03a'

In [162]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_03_03a'

In [163]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_03_03a@1'

In [164]:
model.version_aliases

['run-20221003121446', 'default']

In [165]:
model.labels

{'timestamp': '20221003121446',
 'run_name': 'run-20221003121446',
 'series': '03',
 'experiment': '03a',
 'framework': 'tensorflow'}

In [166]:
model.uri

'gs://statmike-mlops-349915/03/03a/models/20221003121446/model'

### Create An Endpoint
References:
- Python SDK for [`aiplatform.Endpoint`](https://googleapis.dev/python/aiplatform/latest/aiplatform/services.html#google.cloud.aiplatform.Endpoint)
- Python Client for [`aiplatform.Endpoint`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint)

In [167]:
endpoints = aiplatform.Endpoint.list(filter = f"display_name={SERIES} AND labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")

Endpoint Exists: projects/1026793852137/locations/us-central1/endpoints/7506942026919706624


In [168]:
endpoint.display_name

'03'

In [169]:
endpoint.traffic_split

{'8331997960216772608': 100}

In [170]:
deployed_models = endpoint.list_models()
deployed_models

[id: "8331997960216772608"
 model: "projects/1026793852137/locations/us-central1/models/model_03_autoencoder"
 display_name: "03_autoencoder"
 create_time {
   seconds: 1665512499
   nanos: 441260000
 }
 dedicated_resources {
   machine_spec {
     machine_type: "n1-standard-4"
   }
   min_replica_count: 1
   max_replica_count: 1
 }
 model_version_id: "1"]

In [171]:
print(f"View the endpoint in the Vertex AI Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.resource_name.split('/')[-1]}?project={PROJECT_ID}")

View the endpoint in the Vertex AI Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/7506942026919706624?project=statmike-mlops-349915


### Deploy Model To Endpoint

In [174]:
if (model.display_name, model.version_id) not in [(deployed_model.display_name, deployed_model.model_version_id) for deployed_model in endpoint.list_models()]:
    print(f'Deploying model with 100% of traffic...')
    endpoint.deploy(
        model = model,
        deployed_model_display_name = model.display_name,
        traffic_percentage = 100,
        machine_type = DEPLOY_COMPUTE,
        min_replica_count = 1,
        max_replica_count = 1
    )
else: 
    print(f'The current model/version is already deployed.')

Deploying model with 100% of traffic...
Deploying Model projects/1026793852137/locations/us-central1/models/model_03_03a to Endpoint : projects/1026793852137/locations/us-central1/endpoints/7506942026919706624
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/7506942026919706624/operations/2732398888806776832
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/7506942026919706624


### Remove Deployed Models without Traffic

In [175]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 03_03a with version 1 has traffic = 100
Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/7506942026919706624
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/7506942026919706624/operations/8454222215380992000
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/7506942026919706624
Undeploying 03_autoencoder with version 1 because it has no traffic.


In [176]:
endpoint.traffic_split

{'994789742300102656': 100}

In [177]:
endpoint.list_models()

[id: "994789742300102656"
 model: "projects/1026793852137/locations/us-central1/models/model_03_03a"
 display_name: "03_03a"
 create_time {
   seconds: 1665513062
   nanos: 928609000
 }
 dedicated_resources {
   machine_spec {
     machine_type: "n1-standard-4"
   }
   min_replica_count: 1
   max_replica_count: 1
 }
 model_version_id: "1"]

### Retrieve The Deployed Model
This gets used later to determine the framework used in order to shape the instances for prediction correctly.

In [178]:
model = aiplatform.Model(model_name = f'{endpoint.list_models()[0].model}@{endpoint.list_models()[0].model_version_id}')

In [179]:
model.uri

'gs://statmike-mlops-349915/03/03a/models/20221003121446/model'

---
## Prediction

### Retrieve Records For Prediction

In [180]:
n = 10
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()

In [181]:
pred.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST
4,165236,2.180149,0.218732,-2.637726,0.348776,1.063546,-1.249197,0.942021,-0.547652,-0.087823,...,-0.176957,0.563779,0.730183,0.707494,-0.131066,-0.090428,0.0,0,c8a5b93a-1598-4689-80be-4f9f5df0b8ce,TEST


Shape as instances: dictionaries of key:value pairs for only features used in model

In [182]:
newobs = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET,'splits'])]].to_dict(orient='records')
#newobs[0]

In [183]:
len(newobs)

10

In [184]:
newobs[0]

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

### Get The Model Signature Name (if TensorFlow)

In [185]:
if model.labels['framework'] == 'tensorflow':
    import tensorflow as tf
    reloaded_model = tf.saved_model.load(model.uri)
    print(list(reloaded_model.signatures.keys())[0])

serving_default


### Get The Feature Order (if XGBoost)

In [186]:
if model.labels['framework'] == 'xgboost':
    import gcsfs
    import tensorflow as tf
    file = f'{model.uri}/assets/model_metadata.json'
    if tf.io.gfile.exists(file):
        gcs = gcsfs.GCSFileSystem(project = PROJECT_ID)
        with gcs.open(file) as fp:
            features = json.load(fp)['feature_names']
    else:
        features = list(newobs[0].keys())

### Prepare Instance For Prediction

Depending on which framework the model is trained with the instance format may be different.  More information can be found [here](https://cloud.google.com/vertex-ai/docs/predictions/online-predictions-custom-models#request-body-details).

Instances:

In [187]:
if model.labels['framework'] == 'tensorflow':
    instances = {"instances": [newobs[0]], "signature_name": list(reloaded_model.signatures.keys())[0]}
elif model.labels['framework'] == 'xgboost':
    instances = {"instances": [[newobs[0][f] for f in features]]}
    
http_body = httpbody_pb2.HttpBody(
    data = json.dumps(instances).encode("utf-8"),
    content_type = "application/json"
)

In [188]:
print(instances)

{'instances': [{'Time': 35337, 'V1': 1.0928441854981998, 'V2': -0.0132303486713432, 'V3': 1.35982868199426, 'V4': 2.7315370965921004, 'V5': -0.707357349219652, 'V6': 0.8738370029866129, 'V7': -0.7961301510622031, 'V8': 0.437706509544851, 'V9': 0.39676985012996396, 'V10': 0.587438102569443, 'V11': -0.14979756231827498, 'V12': 0.29514781622888103, 'V13': -1.30382621882143, 'V14': -0.31782283120234495, 'V15': -2.03673231037199, 'V16': 0.376090905274179, 'V17': -0.30040350116459497, 'V18': 0.433799615590844, 'V19': -0.145082264348681, 'V20': -0.240427548108996, 'V21': 0.0376030733329398, 'V22': 0.38002620963091405, 'V23': -0.16764742731151097, 'V24': 0.0275573495476881, 'V25': 0.59211469704354, 'V26': 0.219695164116351, 'V27': 0.0369695108704894, 'V28': 0.010984441006191, 'Amount': 0.0}], 'signature_name': 'serving_default'}


### Get Predictions: Python Client

Using Raw Prediction here. This is methods of export from BigQuery have different model signatures and raw prediction client for Vertex AI allow the signature to be include in the request.

Client:

In [189]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
predictor = aiplatform.gapic.PredictionServiceClient(client_options = client_options)

Prediction:

In [190]:
prediction = predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = http_body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [\n        {\n            \"Class_probs\": [0.23520476382469818, 0.76479523617530187],\n            \"Class_values\": [\"1\", \"0\"],\n            \"predicted_Class\": [\"0\"]\n        }\n    ]\n}"

Format raw prediction response using JSON:

In [191]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [{'Class_probs': [0.23520476382469818, 0.7647952361753019],
   'Class_values': ['1', '0'],
   'predicted_Class': ['0']}]}

### Get Predictions: REST

Prepare request:

In [192]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps(instances))

Prediction:

In [193]:
prediction = !curl -s POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict

prediction

['{',
 '    "predictions": [',
 '        {',
 '            "predicted_Class": ["0"],',
 '            "Class_probs": [0.23520476382469818, 0.76479523617530187],',
 '            "Class_values": ["1", "0"]',
 '        }',
 '    ]',
 '}']

Format raw prediction response using JSON:

In [194]:
prediction = json.loads(''.join([p.strip() for p in prediction]))
prediction

{'predictions': [{'predicted_Class': ['0'],
   'Class_probs': [0.23520476382469818, 0.7647952361753019],
   'Class_values': ['1', '0']}]}

### Get Predictions: gcloud (CLI)

Prepare request:

In [195]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps(instances))

Prediction:

In [196]:
prediction = !gcloud beta ai endpoints raw-predict \
{endpoint.name.rsplit('/',1)[-1]} \
--region={REGION} --format="json" --request=@{DIR}/request.json

prediction

['Using endpoint [https://us-central1-aiplatform.googleapis.com/]',
 '{',
 '  "predictions": [',
 '    {',
 '      "Class_probs": [',
 '        0.23520476382469818,',
 '        0.7647952361753019',
 '      ],',
 '      "Class_values": [',
 '        "1",',
 '        "0"',
 '      ],',
 '      "predicted_Class": [',
 '        "0"',
 '      ]',
 '    }',
 '  ]',
 '}']

Format raw prediction response using JSON:

In [197]:
prediction = json.loads("".join(prediction[1:]))
prediction

{'predictions': [{'Class_probs': [0.23520476382469818, 0.7647952361753019],
   'Class_values': ['1', '0'],
   'predicted_Class': ['0']}]}

---
## Remove Resources
see notebook "99 - Cleanup"