# 05Tools: Prediction
Predictions from models created in the 05 series of notebooks.

This notebook is a collection of examples that showcase many ways to serve models:
- Online:
    - Vertex AI Endpoints: Python, REST, CLI (gcloud)
    - Local with TensorFlow ModelServer
    - Remote with Cloud Run with TensorFlow ModelServer
- Batch:
    - BigQuery ML Model Import
    - Vertex AI Batch Prediction Jobs

### Prerequisites:
-  At least 1 of the notebooks in this series [05, 05a-05i]

### Conceptual Flow & Workflow
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_pred_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_pred_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [6]:
REGION = 'us-central1'
EXPERIMENT = '05_predictions'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [7]:
from google.cloud import aiplatform
from google.cloud import bigquery

import tensorflow as tf

from datetime import datetime
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

import asyncio
import time
import multiprocessing

clients:

In [8]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [9]:
BUCKET = PROJECT_ID
DIR = f"temp/{EXPERIMENT}"

In [10]:
# Give service account roles/storage.objectAdmin permissions
# Console > IAM > Select Account <projectnumber>-compute@developer.gserviceaccount.com > edit - give role
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [11]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Get Endpoint

[Endpoint Properties and Methods](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint):

```python
endpoint
endpoint.display_name
endpoint.resource_name
endpoint.traffic_split
endpoint.list_models()
```

In [12]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
endpoint = endpoints[0]

---
## Retrieve Records For Prediction

In [13]:
n = 1000
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()

In [14]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST


Remove columns not included as features in the model:

In [15]:
newobs = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')
#newobs[0]

In [16]:
len(newobs)

1000

---
## Online Predictions: Methods for Vertex AI Endpoints

For each of the methods below the `predict` part of the request can be exchanged for `explain` if the endpoint has a model deployed with explanations setup.  Note: This will not work for the raw prediction methods.  See more about setting up explainability in the explainability notebook within this series.

### Get Prediction: Python Client

In [17]:
instances = [json_format.ParseDict(newobs[0], Value())]

In [18]:
prediction = endpoint.predict(instances=instances)
prediction

Prediction(predictions=[[0.999786437, 0.00021360183]], deployed_model_id='7361102804611497984', model_version_id='1', model_resource_name='projects/1026793852137/locations/us-central1/models/model_05c_fraud', explanations=None)

In [19]:
prediction.predictions[0]

[0.999786437, 0.00021360183]

In [20]:
np.argmax(prediction.predictions[0])

0

### Get Prediction: Python Client (gapic)

In [21]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [22]:
predictor = aiplatform.gapic.PredictionServiceClient(client_options = client_options)

In [23]:
instances = [json_format.ParseDict(newobs[0], Value())]

In [24]:
prediction = predictor.predict(
    endpoint = endpoint.resource_name,
    instances = instances
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.999786437
    }
    values {
      number_value: 0.00021360183
    }
  }
}
deployed_model_id: "7361102804611497984"
model: "projects/1026793852137/locations/us-central1/models/model_05c_fraud"
model_display_name: "05c_fraud"
model_version_id: "1"

In [25]:
prediction.predictions[0]

[0.999786437, 0.00021360183]

In [26]:
np.argmax(prediction.predictions[0])

0

#### Use gapic for Raw Predictions

In [27]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [28]:
predictor = aiplatform.gapic.PredictionServiceClient(client_options = client_options)

In [29]:
from google.api import httpbody_pb2
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
http_body = httpbody_pb2.HttpBody(data = json.dumps(instances).encode("utf-8"), content_type = "application/json")

In [30]:
prediction = predictor.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = http_body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.999786437, 0.00021360183]\n    ]\n}"

In [31]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.999786437, 0.00021360183]]}

In [32]:
prediction['predictions'][0]

[0.999786437, 0.00021360183]

In [33]:
np.argmax(prediction['predictions'][0])

0

### Get Prediction: Python Client V1

In [34]:
from google.cloud import aiplatform_v1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

v1_client = aiplatform_v1.PredictionServiceClient(client_options = client_options)

In [35]:
instances = [json_format.ParseDict(newobs[0], Value())]

In [36]:
prediction = v1_client.predict(
    endpoint = endpoint.resource_name,
    instances = instances
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.999786437
    }
    values {
      number_value: 0.00021360183
    }
  }
}
deployed_model_id: "7361102804611497984"
model: "projects/1026793852137/locations/us-central1/models/model_05c_fraud"
model_display_name: "05c_fraud"
model_version_id: "1"

In [37]:
prediction.predictions[0]

[0.999786437, 0.00021360183]

In [38]:
np.argmax(prediction.predictions[0])

0

#### Use Python Client V1 For Raw Predictions

In [39]:
from google.cloud import aiplatform_v1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

v1_client = aiplatform_v1.PredictionServiceClient(client_options = client_options)

In [40]:
from google.api import httpbody_pb2
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
http_body = httpbody_pb2.HttpBody(data = json.dumps(instances).encode("utf-8"), content_type = "application/json")

In [41]:
prediction = v1_client.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = http_body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.999786437, 0.00021360183]\n    ]\n}"

In [42]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.999786437, 0.00021360183]]}

In [43]:
prediction['predictions'][0]

[0.999786437, 0.00021360183]

In [44]:
np.argmax(prediction['predictions'][0])

0

### Get Prediction: Python Client V1 beta 1

In [45]:
from google.cloud import aiplatform_v1beta1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

v1beta1_client = aiplatform_v1beta1.PredictionServiceClient(client_options = client_options)

In [46]:
instances = [json_format.ParseDict(newobs[0], Value())]

In [47]:
prediction = v1beta1_client.predict(
    endpoint = endpoint.resource_name,
    instances = instances
)
prediction

predictions {
  list_value {
    values {
      number_value: 0.999786437
    }
    values {
      number_value: 0.00021360183
    }
  }
}
deployed_model_id: "7361102804611497984"
model: "projects/1026793852137/locations/us-central1/models/model_05c_fraud"
model_display_name: "05c_fraud"
model_version_id: "1"

In [48]:
prediction.predictions[0]

[0.999786437, 0.00021360183]

In [49]:
np.argmax(prediction.predictions[0])

0

#### Use Python Client V1 beta 1 For Raw Predictions

In [50]:
from google.cloud import aiplatform_v1beta1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

v1beta1_client = aiplatform_v1.PredictionServiceClient(client_options = client_options)

In [51]:
from google.api import httpbody_pb2
instances = {"instances": [newobs[0]], "signature_name": "serving_default"}
http_body = httpbody_pb2.HttpBody(data = json.dumps(instances).encode("utf-8"), content_type = "application/json")

In [52]:
prediction = v1beta1_client.raw_predict(
    endpoint = endpoint.resource_name,
    http_body = http_body
)
prediction

content_type: "application/json"
data: "{\n    \"predictions\": [[0.999786437, 0.00021360183]\n    ]\n}"

In [53]:
prediction = json.loads(prediction.data)
prediction

{'predictions': [[0.999786437, 0.00021360183]]}

In [54]:
prediction['predictions'][0]

[0.999786437, 0.00021360183]

In [55]:
np.argmax(prediction['predictions'][0])

0

### Get Prediction: REST

#### Method 1: Command Line CURL

In [56]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newobs[0]]}))

In [57]:
prediction = !curl -s POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

prediction = json.loads(''.join([p.strip() for p in prediction]))
prediction

{'predictions': [[0.999786437, 0.00021360183]],
 'deployedModelId': '7361102804611497984',
 'model': 'projects/1026793852137/locations/us-central1/models/model_05c_fraud',
 'modelDisplayName': '05c_fraud',
 'modelVersionId': '1'}

In [58]:
prediction['predictions'][0]

[0.999786437, 0.00021360183]

In [59]:
np.argmax(prediction['predictions'][0])

0

##### Use CURL for Raw Predictions

In [60]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"signature_name": "serving_default", "instances": [newobs[0]]}))

In [61]:
prediction = !curl -s POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict

prediction = json.loads(''.join([p.strip() for p in prediction]))
prediction

{'predictions': [[0.999786437, 0.00021360183]]}

In [62]:
prediction['predictions'][0]

[0.999786437, 0.00021360183]

In [63]:
np.argmax(prediction['predictions'][0])

0

#### Method 2: Python with requests

In [64]:
import requests

In [65]:
token = !gcloud auth application-default print-access-token
headers = {"content-type": "application/json; charset=utf-8", "Authorization": f'Bearer {token[0]}'}
json_response = requests.post(f'https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict', data=json.dumps({"instances": [newobs[0]]}), headers=headers)

In [66]:
print(json_response.text)

{
  "predictions": [
    [
      0.999786437,
      0.00021360183
    ]
  ],
  "deployedModelId": "7361102804611497984",
  "model": "projects/1026793852137/locations/us-central1/models/model_05c_fraud",
  "modelDisplayName": "05c_fraud",
  "modelVersionId": "1"
}



In [67]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999786437, 0.00021360183]]

In [68]:
np.argmax(predictions[0])

0

##### Use Requests for Raw Predictions

In [69]:
import requests

In [70]:
token = !gcloud auth application-default print-access-token
headers = {"content-type": "application/json; charset=utf-8", "Authorization": f'Bearer {token[0]}'}
json_response = requests.post(f'https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict', data=json.dumps({"signature_name": "serving_default", "instances": [newobs[0]]}), headers=headers)

In [71]:
print(json_response.text)

{
    "predictions": [[0.999786437, 0.00021360183]
    ]
}


In [72]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999786437, 0.00021360183]]

In [73]:
np.argmax(predictions[0])

0

### Get Prediction: gcloud (CLI)

In [74]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newobs[0]]}))

In [75]:
prediction = !gcloud ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json
prediction

['Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]',
 '[[0.999786437, 0.00021360183]]']

In [76]:
import ast
prediction = ast.literal_eval(prediction[1])
prediction[0]

[0.999786437, 0.00021360183]

In [77]:
np.argmax(prediction[0])

0

#### Use gcloud (CLI) For Raw Predictions

In [78]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"signature_name": "serving_default", "instances": [newobs[0]]}))

In [79]:
prediction = !gcloud ai endpoints raw-predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --format="json" --request=@{DIR}/request.json
prediction

['Using endpoint [https://us-central1-aiplatform.googleapis.com/]',
 '{',
 '  "predictions": [',
 '    [',
 '      0.999786437,',
 '      0.00021360183',
 '    ]',
 '  ]',
 '}']

In [80]:
prediction = json.loads("".join(prediction[1:]))
prediction

{'predictions': [[0.999786437, 0.00021360183]]}

In [81]:
prediction['predictions'][0]

[0.999786437, 0.00021360183]

In [82]:
np.argmax(prediction['predictions'][0])

0

---
## Online Predictions: Synchronous Examples
Synchronous calls to the Vertex AI Endpoint with different batch size of instances:

In [83]:
def syncPredictions(batch_size = 1):
    predictions = []
    start = time.perf_counter()
    for p in range(0, len(newobs), batch_size):
        instances = [json_format.ParseDict(example, Value()) for example in newobs[p:p+batch_size]]
        preds = endpoint.predict(instances = instances)
        predictions.extend(np.argmax(pred) for pred in preds.predictions)
    elapsed = time.perf_counter() - start
    print(f'{elapsed:0.5f} seconds')
    return predictions

In [84]:
predictions = syncPredictions()

14.41081 seconds


In [85]:
predictions = syncPredictions(1)

13.90619 seconds


In [86]:
predictions = syncPredictions(batch_size = 1)

13.85071 seconds


In [87]:
predictions = syncPredictions(batch_size = 2)

7.12938 seconds


In [88]:
predictions = syncPredictions(batch_size = 10)

1.71058 seconds


In [89]:
from collections import Counter
c = Counter(predictions)
c

Counter({0: 998, 1: 2})

In [90]:
[i for i, j in enumerate(predictions) if j == 1]

[53, 576]

In [91]:
pred.iloc[[i for i, j in enumerate(predictions) if j == 1]]

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
53,85285,-7.030308,3.421991,-9.525072,5.270891,-4.02463,-2.865682,-6.989195,3.791551,-4.62273,...,0.036943,-0.355519,0.353634,1.042458,1.359516,-0.272188,0.0,1,0a3b566f-e662-4cd0-b702-99299890cb0f,TEST
576,56887,-0.075483,1.812355,-2.566981,4.127549,-1.628532,-0.805895,-3.390135,1.019353,-2.451251,...,-0.143624,0.013566,0.634203,0.213693,0.773625,0.387434,5.0,1,e17f6ee4-8dd8-4a38-9f51-e1bbf1e1aa2a,TEST


---
## Online Predictions: Asynchronous Examples
Use asyncio for concurrent request of different batch sizes

In [92]:
from google.cloud import aiplatform_v1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
parent = f"projects/{PROJECT_ID}/locations/{REGION}"

client = aiplatform_v1.PredictionServiceAsyncClient(client_options = client_options)

In [93]:
endpoint.resource_name

'projects/1026793852137/locations/us-central1/endpoints/4573537362990071808'

In [94]:
instances = [json_format.ParseDict(newobs[0], Value())]

In [95]:
await client.predict(endpoint = endpoint.resource_name, instances = instances)

predictions {
  list_value {
    values {
      number_value: 0.999786437
    }
    values {
      number_value: 0.00021360183
    }
  }
}
deployed_model_id: "7361102804611497984"
model: "projects/1026793852137/locations/us-central1/models/model_05c_fraud"
model_display_name: "05c_fraud"
model_version_id: "1"

In [96]:
async def asyncPredictions(batch_size = 1, concur_requests = 100):
    limit = asyncio.Semaphore(concur_requests)

    predictions = [None] * len(newobs)

    async def predictor(p, newob):
        instances = [json_format.ParseDict(example, Value()) for example in newob]
        async with limit:
            prediction = await client.predict(endpoint = endpoint.resource_name, instances = instances)
            if limit.locked():
                await asyncio.sleep(.01)

        predictions[p:p+batch_size] = [np.argmax(pred) for pred in prediction.predictions]

    async def runner(newobs):
        tasks = []
        for p in range(0, len(newobs), batch_size):
            task = asyncio.create_task(predictor(p, newobs[p:p+batch_size]))
            tasks.append(task)

        results = await asyncio.gather(*tasks)

    start = time.perf_counter()
    await runner(newobs)
    elapsed = time.perf_counter() - start
    print(f'{elapsed:0.5f} seconds')
    
    return predictions

In [99]:
# may need to run a second time - the initial ramp can throw an error
predictions = await asyncPredictions()

0.89793 seconds


In [100]:
predictions = await asyncPredictions(1, 100)

0.90359 seconds


In [101]:
predictions = await asyncPredictions(batch_size = 1, concur_requests = 100)

1.06988 seconds


In [102]:
predictions = await asyncPredictions(batch_size = 2, concur_requests = 100)

0.56022 seconds


In [103]:
predictions = await asyncPredictions(batch_size = 10, concur_requests = 100)

0.27755 seconds


In [104]:
len(predictions)

1000

In [105]:
from collections import Counter
c = Counter(predictions)
c

Counter({0: 998, 1: 2})

In [106]:
[i for i, j in enumerate(predictions) if j == 1]

[53, 576]

In [107]:
pred.iloc[[i for i, j in enumerate(predictions) if j == 1]]

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
53,85285,-7.030308,3.421991,-9.525072,5.270891,-4.02463,-2.865682,-6.989195,3.791551,-4.62273,...,0.036943,-0.355519,0.353634,1.042458,1.359516,-0.272188,0.0,1,0a3b566f-e662-4cd0-b702-99299890cb0f,TEST
576,56887,-0.075483,1.812355,-2.566981,4.127549,-1.628532,-0.805895,-3.390135,1.019353,-2.451251,...,-0.143624,0.013566,0.634203,0.213693,0.773625,0.387434,5.0,1,e17f6ee4-8dd8-4a38-9f51-e1bbf1e1aa2a,TEST


---
## Vertex AI Batch Prediction Jobs

Create a [Vertex AI Batch Predictions](https://cloud.google.com/vertex-ai/docs/predictions/batch-predictions#aiplatform_batch_predict_custom_trained-python) Job in one of these ways:
- Directly with [aiplatform.BatchPredictionJob.create()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.BatchPredictionJob)
- From a Model with [aiplatform.Model.batch_predict()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model#google_cloud_aiplatform_Model_batch_predict)
- From JobServiceClient with [aiplatform_v1.JobServiceClient.create_batch_prediction_job()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.job_service.JobServiceClient#google_cloud_aiplatform_v1_services_job_service_JobServiceClient_create_batch_prediction_job)
- From JobServiceClient with [aiplatform_v1beta1.JobServiceClient.create_batch_prediction_job()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1beta1.services.job_service.JobServiceClient#google_cloud_aiplatform_v1beta1_services_job_service_JobServiceClient_create_batch_prediction_job)

### Model Information
Using the model on the endpoint selected at the top of this notebook:

In [108]:
endpoint

<google.cloud.aiplatform.models.Endpoint object at 0x7f7cc87c64d0> 
resource name: projects/1026793852137/locations/us-central1/endpoints/4573537362990071808

In [109]:
endpoint.list_models()[0]

id: "7361102804611497984"
model: "projects/1026793852137/locations/us-central1/models/model_05c_fraud"
display_name: "05c_fraud"
create_time {
  seconds: 1661532514
  nanos: 298407000
}
dedicated_resources {
  machine_spec {
    machine_type: "n1-standard-4"
  }
  min_replica_count: 1
  max_replica_count: 1
}
model_version_id: "1"

In [110]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [111]:
model.display_name

'05c_fraud'

In [112]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05c_fraud'

In [113]:
model.version_id

'1'

In [114]:
model.version_description

'run-20220826163231'

In [115]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05c_fraud@1'

In [116]:
model.supported_input_storage_formats

['jsonl', 'bigquery', 'csv', 'tf-record', 'tf-record-gzip', 'file-list']

#### Using aiplatform_v1 Model Client
It may also be helpful to try the [ModelServiceClient](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.services.model_service.ModelServiceClient) to review the model attributes.  Here is example code for trying this:

In [117]:
from google.cloud import aiplatform_v1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
v1_clients = {}
v1_clients['Model'] = aiplatform_v1.ModelServiceClient(client_options = client_options)

model_v1 = v1_clients['Model'].get_model(
    name = model.versioned_resource_name
)
model_v1

name: "projects/1026793852137/locations/us-central1/models/model_05c_fraud@1"
display_name: "05c_fraud"
predict_schemata {
}
metadata {
}
container_spec {
  image_uri: "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest"
}
supported_deployment_resources_types: DEDICATED_RESOURCES
supported_deployment_resources_types: 3
supported_input_storage_formats: "jsonl"
supported_input_storage_formats: "bigquery"
supported_input_storage_formats: "csv"
supported_input_storage_formats: "tf-record"
supported_input_storage_formats: "tf-record-gzip"
supported_input_storage_formats: "file-list"
supported_output_storage_formats: "jsonl"
supported_output_storage_formats: "bigquery"
create_time {
  seconds: 1661532461
  nanos: 38868000
}
update_time {
  seconds: 1661532464
  nanos: 330883000
}
deployed_models {
  endpoint: "projects/1026793852137/locations/us-central1/endpoints/4573537362990071808"
  deployed_model_id: "7361102804611497984"
}
etag: "AMEw9yNxmAyJ8myw53JShZyTpxgjVBU6QS2k0s_qZrePJ7rDs

### Batch Prediction Job With JSONL
This process will export a BigQuery table to JSONL, use a Vertex AI Batch Prediction Job to create predictions, then import the result into BigQuery

#### BigQuery Extract To JSONL in GCS
The Batch Prediction job will validate the provided JSONL to the model and will not accept additional columns.  It is also important to note that when BigQuery exports to JSONL it wraps INT64 values in qoutes which makes them appear as strings.  For this reason, I first cast the INT64 columns to FLOAT64 before exporting.

In [120]:
# Exports to JSON appear to wrap INT64 types in qoutes
job = bq.query(
    query = f"""
        CREATE OR REPLACE TABLE `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_json` AS
            SELECT * EXCEPT(Time, Class, transaction_id, splits), CAST(Time AS FLOAT64) AS Time
            FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
    """
)
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7f7c3eb39450>

In [122]:
ds = bigquery.DatasetReference(BQ_PROJECT, BQ_DATASET)
tb = ds.table(f'{BQ_TABLE}_json')
extractJob = bq.extract_table(
    source = tb,
    destination_uris = [f'gs://{PROJECT_ID}/{BQ_DATASET}/data/jsonl/{BQ_TABLE}.json'],
    job_config = bigquery.job.ExtractJobConfig(destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON)
)

In [123]:
extractJob.result()

ExtractJob<project=statmike-mlops-349915, location=us-central1, id=a2a9e0ea-9876-4c5c-ac35-36bd25a15a06>

#### Batch Prediction Job

In [124]:
model.labels

{'run_name': 'run-20220826163231',
 'experiment': '05c',
 'series': '05',
 'experiment_name': 'experiment-05-05c-tf-classification-dnn'}

In [125]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
batchJob = aiplatform.BatchPredictionJob.create(
    job_display_name = f'{SERIES}_{BQ_DATASET}_{TIMESTAMP}',
    model_name = model.versioned_resource_name,
    labels = model.labels,
    instances_format = 'jsonl',
    predictions_format = 'jsonl',
    gcs_source = f'gs://{PROJECT_ID}/{BQ_DATASET}/data/jsonl/{BQ_TABLE}.json',
    gcs_destination_prefix = f'gs://{PROJECT_ID}/{BQ_DATASET}/data/jsonl/predictions_{TIMESTAMP}',
    machine_type = 'n1-standard-8', #DEPLOY_COMPUTE,
    accelerator_count = 0,
    starting_replica_count = 1,
    max_replica_count = 10,
    batch_size = 1000,
    sync = False #if True the call will wait for the job to complete
)

Creating BatchPredictionJob


INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob


BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376


To use this BatchPredictionJob in another session:


INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:


bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376')


INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376')


View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/6048971191668965376?project=1026793852137


INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/6048971191668965376?project=1026793852137


In [126]:
batchJob.wait()

BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_RUNNING


BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_SUCCEEDED


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376 current state:
JobState.JOB_STATE_SUCCEEDED


BatchPredictionJob run completed. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376


INFO:google.cloud.aiplatform.jobs:BatchPredictionJob run completed. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/6048971191668965376


#### Predictions > BigQuery

In [127]:
batchJob.output_info.gcs_output_directory

'gs://statmike-mlops-349915/fraud/data/jsonl/predictions_20220826203142/prediction-05c_fraud-2022_08_26T13_31_42_421Z'

In [128]:
job_config = bigquery.LoadJobConfig(
    source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
    write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE, #.WRITE_APPEND, #.WRITE_TRUNCATE,
    create_disposition = bigquery.CreateDisposition.CREATE_IF_NEEDED,
    autodetect = True
)

In [129]:
tb = ds.table(f'{BQ_TABLE}_predictions_{TIMESTAMP}')
load_job = bq.load_table_from_uri(
    source_uris = f"{batchJob.output_info.gcs_output_directory}/*",
    destination = tb,
    job_config = job_config
)
load_job.result()

LoadJob<project=statmike-mlops-349915, location=us-central1, id=94ce2692-3e14-473f-9590-8b6b69b0f1c9>

In [130]:
job = bq.query(
    query = f"""
        CREATE OR REPLACE TABLE `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_predictions_{TIMESTAMP}` AS
            SELECT instance.*, confidence, predicted_Class
            FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_predictions_{TIMESTAMP}`
            CROSS JOIN
            UNNEST(prediction) as confidence WITH OFFSET predicted_Class
            WHERE confidence > 0.5
    """
)
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7f7c3c3fcbd0>

In [132]:
bq.query(query = f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_predictions_{TIMESTAMP}` LIMIT 5").to_dataframe()

Unnamed: 0,Time,Amount,V28,V27,V25,V23,V22,V13,V20,V18,...,V14,V8,V19,V11,V6,V21,V4,V1,confidence,predicted_Class
0,145155,0.0,-0.22641,-0.231012,0.28388,-0.447339,0.911485,-3.390399,-0.33802,-0.296251,...,1.134348,-0.616646,-0.241413,-1.383635,-1.737663,0.266611,-0.425378,0.271642,0.999989,0
1,30507,0.0,0.025494,0.065072,0.60536,-0.182292,0.346758,-0.184433,-0.145886,0.524842,...,-0.228832,0.145367,-0.165017,-1.534203,0.200168,0.060312,1.199715,1.235995,0.999728,0
2,64585,0.0,0.063238,0.027043,0.4003,-0.053692,-0.057036,-1.191306,-0.172659,1.026317,...,-1.880275,0.17617,-1.57707,2.140057,-0.320778,-0.008996,2.743318,1.080433,0.941189,0
3,32799,0.0,0.035332,0.060115,0.417245,-0.025964,0.480049,-1.365634,-0.245686,0.265599,...,0.035306,0.128115,-0.59702,-0.659895,-0.48169,0.125514,1.48062,1.153477,0.999822,0
4,145014,0.0,0.071346,0.194401,0.319797,-0.208977,0.855684,0.88379,0.243722,1.008516,...,-0.754661,1.636713,1.926909,-0.745086,2.356677,0.143207,4.663185,-1.708039,0.999976,0


### Example With aiplatform_v1.JobServiceClient.create_batch_prediction_job()
This shows how to use the aiplatform_v1 Python client to create a batch prediction job.

```python
from google.cloud import aiplatform_v1

client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
v1_clients['Jobs'] = aiplatform_v1.JobServiceClient(client_options = client_options)

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
batchJobSpec = aiplatform_v1.BatchPredictionJob()
batchJobSpec.display_name = f'{SERIES}_{BQ_DATASET}_{TIMESTAMP}'
batchJobSpec.model = model.versioned_resource_name
batchJobSpec.input_config.instances_format = 'bigquery'
batchJobSpec.input_config.bigquery_source.input_uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}'
batchJobSpec.output_config.predictions_format = 'bigquery'
batchJobSpec.output_config.bigquery_destination.output_uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}' #.predictions_{TIMESTAMP}
batchJobSpec.dedicated_resources.machine_spec.machine_type = DEPLOY_COMPUTE
batchJobSpec.dedicated_resources.machine_spec.accelerator_count = 0
batchJobSpec.dedicated_resources.starting_replica_count = 1
batchJobSpec.dedicated_resources.max_replica_count = 10
batchJobSpec.manual_batch_tuning_parameters.batch_size = 100
batchJobSpec

batchJob = v1_client.create_batch_prediction_job(
    parent = parent,
    batch_prediction_job = batchJobSpec
)
batchJob
```

### Example With aiplatform.BatchPredictionJob.create()

```python
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
batchJob = aiplatform.BatchPredictionJob.create(
    job_display_name = f'{SERIES}_{BQ_DATASET}_{TIMESTAMP}',
    model_name = model.versioned_resource_name,
    bigquery_source= f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}',
    bigquery_destination_prefix= f'bq://{BQ_PROJECT}.{BQ_DATASET}',
    machine_type = DEPLOY_COMPUTE,
    accelerator_count = 0,
    starting_replica_count = 1,
    max_replica_count = 2,
    batch_size = 100,
    sync = False #if True the call will wait for the job to complete
)
batchJob.wait()
```

### Example With aiplatform.Model.batch_predict()

```python
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
batchJob = model.batch_predict(
    job_display_name = f'{SERIES}_{BQ_DATASET}_{TIMESTAMP}',
    bigquery_source= f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}',
    instances_format = "bigquery",
    bigquery_destination_prefix= f'bq://{BQ_PROJECT}.{BQ_DATASET}',
    predictions_format = "bigquery",
    machine_type = DEPLOY_COMPUTE,
    accelerator_count = 0,
    starting_replica_count = 1,
    max_replica_count = 10,
    batch_size = 100,
    sync = False #if True the call will wait for the job to complete
)
batchJob.wait()
```

---
## Batch Predictions: BigQuery ML

Load a model to BigQuery and use BQML to create predictions.  Need the URI of TensorFlow model.

### Model Information
Using the model on the endpoint selected at the top of this notebook:

In [133]:
endpoint.list_models()[0]

id: "261177992061911040"
model: "projects/1026793852137/locations/us-central1/models/model_05h_fraud"
display_name: "05h_fraud"
create_time {
  seconds: 1661546709
  nanos: 108896000
}
dedicated_resources {
  machine_spec {
    machine_type: "n1-standard-4"
  }
  min_replica_count: 1
  max_replica_count: 1
}
model_version_id: "1"

In [134]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [135]:
model.display_name

'05h_fraud'

In [136]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05h_fraud'

In [137]:
model.version_id

'1'

In [138]:
model.version_description

'run-20220826194057-3'

In [139]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05h_fraud@1'

In [140]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model'

In [141]:
model.name

'model_05h_fraud'

### Import Model Into BigQuery

In [142]:
query = f'''
CREATE OR REPLACE MODEL `{BQ_PROJECT}.{BQ_DATASET}.{model.name}`
    OPTIONS(
        MODEL_TYPE = 'TENSORFLOW',
        MODEL_PATH = '{model.uri}/*')
'''

In [143]:
print(query)


CREATE OR REPLACE MODEL `statmike-mlops-349915.fraud.model_05h_fraud`
    OPTIONS(
        MODEL_TYPE = 'TENSORFLOW',
        MODEL_PATH = 'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/*')



In [144]:
job = bq.query(query = query)
job.result()
(job.ended-job.started).total_seconds()

3.252

### Generate Predictions With BigQuery (ML.Predict)

In [145]:
query = f'''
SELECT *
FROM ML.PREDICT(
    MODEL `{BQ_PROJECT}.{BQ_DATASET}.{model.name}`, (
        SELECT * 
        FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}
        WHERE splits='TEST' AND Class = 1
        LIMIT 10
    )
)
'''
results = bq.query(query = query).to_dataframe()
results

Unnamed: 0,prediction_layer,Time,V1,V2,V3,V4,V5,V6,V7,V8,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,"[5.436737865238683e-06, 0.9999944567680359]",85285,-7.030308,3.421991,-9.525072,5.270891,-4.02463,-2.865682,-6.989195,3.791551,...,0.036943,-0.355519,0.353634,1.042458,1.359516,-0.272188,0.0,1,0a3b566f-e662-4cd0-b702-99299890cb0f,TEST
1,"[0.017139235511422157, 0.9828606843948364]",56887,-0.075483,1.812355,-2.566981,4.127549,-1.628532,-0.805895,-3.390135,1.019353,...,-0.143624,0.013566,0.634203,0.213693,0.773625,0.387434,5.0,1,e17f6ee4-8dd8-4a38-9f51-e1bbf1e1aa2a,TEST
2,"[0.0003428384952712804, 0.9996570944786072]",43369,-3.365319,2.426503,-3.752227,0.276017,-2.30587,-1.961578,-3.029283,-1.674462,...,-0.248502,0.12655,0.104166,-1.055997,-1.200165,-1.012066,88.0,1,125e7617-a79a-468f-af1b-f184544347f4,TEST
3,"[0.0043426500633358955, 0.9956573247909546]",143354,1.118331,2.074439,-3.837518,5.44806,0.071816,-1.020509,-1.808574,0.521744,...,-0.02191,-0.37656,0.192817,0.114107,0.500996,0.259533,1.0,1,17e4d066-124e-4a22-84fb-40b8b3456adf,TEST
4,"[5.201393287279643e-05, 0.9999479651451111]",93888,-10.040631,6.139183,-12.972972,7.740555,-8.684705,-3.837429,-11.907702,5.833273,...,-0.567343,0.843012,0.549938,0.113892,-0.307375,0.061631,1.0,1,0a4e27c9-89e7-4588-86fd-70790cb3d45f,TEST
5,"[1.7808358165893878e-07, 0.9999998211860657]",20332,-15.271362,8.326581,-22.338591,11.885313,-8.721334,-2.324307,-16.196419,0.512882,...,1.085617,-1.039797,-0.182006,0.649921,2.149247,-1.406811,1.0,1,5810b45a-602e-4113-b675-a0dcccaed0d5,TEST
6,"[0.0026680349837988615, 0.9973319172859192]",7551,0.316459,3.809076,-5.615159,6.047445,1.554026,-2.651353,-0.746579,0.055586,...,-0.583813,-0.219845,1.474753,0.491192,0.518868,0.402528,1.0,1,2e70bcc4-4a04-44c5-82c1-5ee6061196c5,TEST
7,"[2.383835408181767e-06, 0.9999975562095642]",13126,-2.880042,5.225442,-11.06333,6.689951,-5.759924,-2.244031,-11.199975,4.014722,...,0.795255,-0.778379,-1.646815,0.487539,1.427713,0.583172,1.0,1,ab20db30-3b79-4ebb-ac94-2f98e1eabee9,TEST
8,"[3.500327238725731e-06, 0.999996542930603]",152036,-4.320609,3.199939,-5.799736,6.50233,0.378479,-1.948246,-2.16786,-0.728207,...,-0.13694,-0.620072,0.642531,0.280717,-2.649107,0.533641,1.0,1,1803e0ce-b531-4919-a2d3-cf0d07172166,TEST
9,"[2.7075780053564813e-06, 0.9999972581863403]",100298,-22.341889,15.536133,-22.865228,7.043374,-14.183129,-0.463145,-28.215112,-14.607791,...,1.412928,0.382801,0.447154,-0.632816,-4.380154,-0.467863,1.0,1,7742da54-3f4c-4e8b-bccc-2d55ac0edd7e,TEST


---
## Notebook Predictions: Load Keras Model

Note: The version of TensorFlow used in the training job that created the model may be a different version than the one running in this notebook.  This can cause an issue with tf.keras.models.load_model.  Make sure the versions are the same to prevent issues.

In [146]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [147]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model'

In [None]:
keras_model = tf.keras.models.load_model(model.uri)

In [273]:
predictions = keras_model.predict({key: tf.constant([value], dtype=tf.float32, name = key) for key, value in newobs[0].items()})
predictions

array([[9.9986863e-01, 1.3134126e-04]], dtype=float32)

In [275]:
np.argmax(predictions[0])

0

---
## Local Predictions: With TensorFlow ModelServer
Locally run [TensorFlow Serving with Docker](https://www.tensorflow.org/tfx/serving/docker#serving_example)

### Model Information
Using the model on the endpoint selected at the top of this notebook:

In [149]:
endpoint.list_models()[0]

id: "261177992061911040"
model: "projects/1026793852137/locations/us-central1/models/model_05h_fraud"
display_name: "05h_fraud"
create_time {
  seconds: 1661546709
  nanos: 108896000
}
dedicated_resources {
  machine_spec {
    machine_type: "n1-standard-4"
  }
  min_replica_count: 1
  max_replica_count: 1
}
model_version_id: "1"

In [150]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [151]:
model.display_name

'05h_fraud'

In [152]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05h_fraud'

In [153]:
model.version_id

'1'

In [154]:
model.version_description

'run-20220826194057-3'

In [155]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05h_fraud@1'

In [156]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model'

In [157]:
model.name

'model_05h_fraud'

Locate the model files:

In [158]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model'

Review the local directory for this notebook (created above):

In [159]:
DIR

'temp/05_predictions'

In [160]:
!ls {DIR}

request.json


Copy the model files to the local directory for this notebook:

In [161]:
!gsutil cp -R {model.uri} {DIR}

Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/keras_metadata.pb...
Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/saved_model.pb...
/ [2 files][513.3 KiB/513.3 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/variables/variables.data-00000-of-00001...
Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/variables/variables.index...
/ [4 files][558.2 KiB/558.2 KiB]                                                
Operation completed over 4 objects/558.2 KiB.                                    


In [162]:
!ls {DIR}

model  request.json


In [163]:
!ls {DIR}/model

keras_metadata.pb  saved_model.pb  variables


### Load the Model and Review

In [164]:
reloaded_model = tf.saved_model.load(f'{DIR}/model')

In [165]:
reloaded_model.signatures

_SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F7C185999D0>})

In [166]:
reloaded_model.signatures['serving_default']

<ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F7C185999D0>

In [167]:
reloaded_model.signatures['serving_default'].structured_input_signature

((),
 {'Time': TensorSpec(shape=(None, 1), dtype=tf.float32, name='Time'),
  'V26': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V26'),
  'V15': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V15'),
  'V21': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V21'),
  'V23': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V23'),
  'V28': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V28'),
  'V1': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V1'),
  'V3': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V3'),
  'V24': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V24'),
  'V17': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V17'),
  'V19': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V19'),
  'Amount': TensorSpec(shape=(None, 1), dtype=tf.float32, name='Amount'),
  'V5': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V5'),
  'V4': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V4'),
  'V12': TensorSpec(shape=(None, 1), dtype=

In [168]:
#!saved_model_cli show --dir {DIR}/model --all

### Download Docker Image and Start Serving Container

In [169]:
!docker pull tensorflow/serving

Using default tag: latest
latest: Pulling from tensorflow/serving
Digest: sha256:6651f4839e1124dbde75ee531825112af0a6b8ef082c88ab14ca53eb69a2e4bb
Status: Image is up to date for tensorflow/serving:latest
docker.io/tensorflow/serving:latest


In [170]:
command = f'''docker run -t -p 8501:8501 \
-v "/$(pwd)/{DIR}/model:/models/{SERIES}/1" \
-e MODEL_NAME={SERIES} \
tensorflow/serving'''
print(command)

docker run -t -p 8501:8501 -v "/$(pwd)/temp/05_predictions/model:/models/05/1" -e MODEL_NAME=05 tensorflow/serving


**Run the command above in a subprocess at the local folder of this notebook - use multiprocess.Process():**

In [171]:
!pwd

/home/jupyter/vertex-ai-mlops


In [172]:
import multiprocessing

def docker_runner():
    !{command}
    #!docker run -t -p 8501:8501 -v "/$(pwd)/temp/05tools_1/model:/models/fraud/1" -e MODEL_NAME=fraud tensorflow/serving

def main():
    p = multiprocessing.Process(target=docker_runner)
    p.start()
    return p
    
p = main()

2022-08-26 21:03:12.441978: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: 05 model_base_path: /models/05
2022-08-26 21:03:12.442822: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-08-26 21:03:12.442867: I tensorflow_serving/model_servers/server_core.cc:591]  (Re-)adding model: 05
2022-08-26 21:03:12.543596: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: 05 version: 1}
2022-08-26 21:03:12.543645: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: 05 version: 1}
2022-08-26 21:03:12.543661: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: 05 version: 1}
2022-08-26 21:03:12.543840: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /models/05/1
2022-08-26 21:03:12.559290: I external/org_tensorflow/tensorflow/cc/saved_model/read

### Get Predictions on Exposed Port

In [173]:
import requests

In [174]:
headers = {"content-type": "application/json"}
json_response = requests.post(f'http://localhost:8501/v1/models/{SERIES}:predict', data=json.dumps({"instances": [newobs[0]]}), headers=headers)

In [175]:
print(json_response.text)

{
    "predictions": [[0.999176681, 0.000823272683]
    ]
}


In [176]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999176681, 0.000823272683]]

In [177]:
np.argmax(predictions[0])

0

### Shutdown TensorFlow Serving Container
There are two entities running: a subprocess called `p` and a docker container that was run by the subprocess.  It is not enough to just stop `p` but it might be enough to stop the container and then the subprocess will terminate due to completion.  The command below stop the subprocess `p` and then stop and remove the container.

In [178]:
p.terminate()

In [179]:
p.is_alive()

False

In [180]:
docker = !docker ps -a
docker

['CONTAINER ID   IMAGE                          COMMAND                  CREATED          STATUS          PORTS                              NAMES',
 'f588bbe9f53b   tensorflow/serving             "/usr/bin/tf_serving…"   42 seconds ago   Up 41 seconds   8500/tcp, 0.0.0.0:8501->8501/tcp   reverent_zhukovsky',
 'cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c \'/opt/bi…"   13 days ago      Up 13 days                                         proxy-agent']

In [181]:
for d in docker:
    if 'tensorflow/serving' in d:
        print(d.split()[-1])
        !docker stop {d.split()[-1]}
        !docker rm {d.split()[0]}

reverent_zhukovsky
reverent_zhukovsky
f588bbe9f53b


In [182]:
!docker ps -a

CONTAINER ID   IMAGE                          COMMAND                  CREATED       STATUS       PORTS     NAMES
cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c '/opt/bi…"   13 days ago   Up 13 days             proxy-agent


---
## Serving With Cloud Run: TensorFlow ModelServer

### Model Information
Using the model on the endpoint selected at the top of this notebook:

In [183]:
endpoint.list_models()[0]

id: "261177992061911040"
model: "projects/1026793852137/locations/us-central1/models/model_05h_fraud"
display_name: "05h_fraud"
create_time {
  seconds: 1661546709
  nanos: 108896000
}
dedicated_resources {
  machine_spec {
    machine_type: "n1-standard-4"
  }
  min_replica_count: 1
  max_replica_count: 1
}
model_version_id: "1"

In [184]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [185]:
model.display_name

'05h_fraud'

In [186]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05h_fraud'

In [187]:
model.version_id

'1'

In [188]:
model.version_description

'run-20220826194057-3'

In [189]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05h_fraud@1'

In [190]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model'

In [191]:
model.name

'model_05h_fraud'

Locate the model files:

In [192]:
model.uri

'gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model'

Review the local directory for this notebook (created above):

In [193]:
DIR

'temp/05_predictions'

In [194]:
!ls {DIR}

model  request.json


Copy the model files to the local directory for this notebook:

In [195]:
!gsutil cp -R {model.uri} {DIR}

Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/keras_metadata.pb...
Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/saved_model.pb...
/ [2 files][513.3 KiB/513.3 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/variables/variables.data-00000-of-00001...
Copying gs://statmike-mlops-349915/fraud/models/05/05h/20220826194057/3/model/variables/variables.index...
/ [4 files][558.2 KiB/558.2 KiB]                                                
Operation completed over 4 objects/558.2 KiB.                                    


In [196]:
!ls {DIR}

model  request.json


In [197]:
!ls {DIR}/model

keras_metadata.pb  saved_model.pb  variables


### Load the Model (local) and Review

In [198]:
reloaded_model = tf.saved_model.load(f'{DIR}/model')

In [199]:
reloaded_model.signatures

_SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F7C08748090>})

In [200]:
reloaded_model.signatures['serving_default']

<ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F7C08748090>

In [201]:
reloaded_model.signatures['serving_default'].structured_input_signature

((),
 {'V18': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V18'),
  'V16': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V16'),
  'V23': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V23'),
  'V21': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V21'),
  'V6': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V6'),
  'V10': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V10'),
  'V25': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V25'),
  'V13': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V13'),
  'V28': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V28'),
  'V4': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V4'),
  'V9': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V9'),
  'V15': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V15'),
  'V5': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V5'),
  'V3': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V3'),
  'V27': TensorSpec(shape=(None, 1), dtype=tf.float32

In [202]:
#!saved_model_cli show --dir {DIR}/model --all

### Build Docker Container
This build is local to the notebook.  It could be done on a service like Cloud Build.

In [203]:
dockerfile = f"""
FROM tensorflow/serving
#ENTRYPOINT [“/usr/bin/env”]
ENV MODEL_NAME={SERIES}
ENV PORT=8501
COPY . /models/{SERIES}/1
#RUN ls -la /models/{SERIES}
CMD tensorflow_model_server --port8500 --rest_api_port=$PORT --model_base_path=/models/{SERIES} --model_name=$MODEL_NAME
"""
with open(f'{DIR}/model/Dockerfile', 'w') as f:
    f.write(dockerfile)

Create an Image Tag for Artifact Registry - the repository name:

In [204]:
IMAGE_URI=f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{PROJECT_ID}/{EXPERIMENT}:latest"
IMAGE_URI

'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions:latest'

Docker build - local:

In [205]:
!docker build -t $IMAGE_URI {DIR}/model/.

Sending build context to Docker daemon    577kB
Step 1/5 : FROM tensorflow/serving
 ---> e874bf5e4700
Step 2/5 : ENV MODEL_NAME=05
 ---> Running in 19af52825bb6
Removing intermediate container 19af52825bb6
 ---> 61a96ae77269
Step 3/5 : ENV PORT=8501
 ---> Running in 861e386ea73a
Removing intermediate container 861e386ea73a
 ---> 4683d88f4b02
Step 4/5 : COPY . /models/05/1
 ---> 5ac9b9a4e441
Step 5/5 : CMD tensorflow_model_server --port8500 --rest_api_port=$PORT --model_base_path=/models/05 --model_name=$MODEL_NAME
 ---> Running in dce6ff89bebb
Removing intermediate container dce6ff89bebb
 ---> 822c2c21e86b
Successfully built 822c2c21e86b
Successfully tagged us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions:latest


### Test Docker Container Locally (in a subprocess)
Use the `-rm` flag to indicate the container should be automatically removed once stopped.

In [206]:
import multiprocessing

def docker_runner():
    !docker run -t --rm -i -p 8501:8501 $IMAGE_URI

def main():
    p = multiprocessing.Process(target=docker_runner)
    p.start()
    return p
    
p = main()

unknown argument: /bin/sh
usage: tensorflow_model_server
Flags:
	--port=8500                      	int32	TCP port to listen on for gRPC/HTTP API. Disabled if port set to zero.
	--grpc_socket_path=""            	string	If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
	--rest_api_port=0                	int32	Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
	--rest_api_num_threads=16        	int32	Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
	--rest_api_timeout_in_ms=30000   	int32	Timeout for HTTP/REST API calls.
	--rest_api_enable_cors_support=false	bool	Enable CORS headers in response
	--enable_batching=false          	bool	enable batching
	--allow_version_labels_for_unavailable_models=false	bool	If true, allows assigning unused version labels to models that are not ava

#### Get Predictions on Exposed Port

In [207]:
import requests

In [208]:
headers = {"content-type": "application/json"}
json_response = requests.post(f'http://localhost:8501/v1/models/{SERIES}:predict', data=json.dumps({"instances": [newobs[0]]}), headers=headers)

In [209]:
print(json_response.text)

{
    "predictions": [[0.999176681, 0.000823272683]
    ]
}


In [210]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999176681, 0.000823272683]]

In [211]:
np.argmax(predictions[0])

0

#### Shutdown TensorFlow Serving Container
There are two entities running: a subprocess called `p` and a docker container that was run by the subprocess.  It is not enough to just stop `p` but it might be enough to stop the container and then the subprocess will terminate due to completion.  The commands below stop the subprocess `p` and then stop and remove (automatic since run with `-rm` flag) the container.

In [212]:
p.terminate()

In [213]:
p.is_alive()

False

In [214]:
docker = !docker ps -a
docker

['CONTAINER ID   IMAGE                                                                                          COMMAND                  CREATED          STATUS          PORTS                              NAMES',
 'b705fef88ede   us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions:latest   "/usr/bin/tf_serving…"   39 seconds ago   Up 38 seconds   8500/tcp, 0.0.0.0:8501->8501/tcp   funny_agnesi',
 'cfc6fa1ae606   gcr.io/inverting-proxy/agent                                                                   "/bin/sh -c \'/opt/bi…"   13 days ago      Up 13 days                                         proxy-agent']

In [215]:
for d in docker:
    if f'{IMAGE_URI}' in d:
        print(d.split()[-1])
        !docker stop {d.split()[-1]}

funny_agnesi
funny_agnesi


In [216]:
!docker ps -a

CONTAINER ID   IMAGE                          COMMAND                  CREATED       STATUS       PORTS     NAMES
cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c '/opt/bi…"   13 days ago   Up 13 days             proxy-agent


### Push the Docker Container to Artifact Registry

#### Enable Artifact Registry API:
Check to see if the api is enabled, if not then enable it:

In [217]:
services = !gcloud services list --format="json" --available --filter=name:artifactregistry.googleapis.com
services = json.loads("".join(services))

if (services[0]['config']['name'] == 'artifactregistry.googleapis.com') & (services[0]['state'] == 'ENABLED'):
    print(f"Artifact Registry is Enabled for This Project: {PROJECT_ID}")
else:
    print(f"Enabeling Artifact Registry for this Project: {PROJECT_ID}")
    !gcloud services enable artifactregistry.googleapis.com

Artifact Registry is Enabled for This Project: statmike-mlops-349915


#### Create A Repository
Check to see if the registry is already created, if not then create it

In [218]:
check_for_repo = !gcloud artifacts repositories describe {PROJECT_ID} --location={REGION}

if check_for_repo[0].startswith('ERROR'):
    print(f'Creating a repository named {PROJECT_ID}')
    !gcloud  artifacts repositories create {PROJECT_ID} --repository-format=docker --location={REGION} --description="Vertex AI Training Custom Containers"
else:
    print(f'There is already a repository named {PROJECT_ID}')

There is already a repository named statmike-mlops-349915


#### Configure Local Docker to Use GCLOUD CLI

In [219]:
!gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


#### Push The Container to The Repository

In [220]:
!docker push $IMAGE_URI

The push refers to repository [us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions]

[1B9c0f0fb7: Preparing 
[1B23850a27: Preparing 
[1Ba33781cd: Preparing 
[1B89523b17: Preparing 
[1Bf28d5f3c: Preparing 
[1Bc6d2db45: Preparing 
[1Bbacb0351: Preparing 
[8B9c0f0fb7: Pushed lready exists 9kB[5A[2K[1A[2K[8A[2Klatest: digest: sha256:0cb8dacb33652b932190b7192a630742395948db07db06ee53f43042f7ff4ad5 size: 1989


### Deploy as Cloud Run Service
This demonstration creates an open service allowing all traffic.  Review documentation for [Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run) and the [CLOUD SKD CLI sections](https://cloud.google.com/sdk/gcloud/reference/run) for `gcloud run`.


If you have a policy inforced for 'Domain Restricted Sharing' then it may need adjusting for the project to allow this.  This should be done with care and you may wish to only accept authenticated or internal traffic.  Review options for authentication [here](https://cloud.google.com/run/docs/authenticating/overview).

Updated Org Policy:
- Logged in as Admin
- IAM > Organization Policies
    - Changed to Project (not org level)
    - Filter 'Domain Restricted Sharing'
    - Select and Edit
        - Applies to = Customize
        - Policy enforcement = Replace
        - Rules = Allow all
    - Save

View the Cloud Run Console for this project:

In [221]:
print(f'https://console.cloud.google.com/run?project={PROJECT_ID}')

https://console.cloud.google.com/run?project=statmike-mlops-349915


In [222]:
!gcloud run deploy endpoint-$SERIES-$BQ_DATASET --image=$IMAGE_URI --port=8501 --region=$REGION --platform=managed --allow-unauthenticated --no-user-output-enabled

Deploying new service...                                                       
  . Creating Revision...                                                       
  . Routing traffic...                                                         
  . Setting IAM Policy...                                                      


In [223]:
!gcloud run services list

   SERVICE            REGION       URL                                                LAST DEPLOYED BY                                     LAST DEPLOYED AT
[32m✔[39;0m  endpoint-05-fraud  us-central1  https://endpoint-05-fraud-urlxi72dpa-uc.a.run.app  1026793852137-compute@developer.gserviceaccount.com  2022-08-26T21:08:34.408166Z


In [224]:
services = !gcloud run services list --format="json" --filter=SERVICE:endpoint-$SERIES-$BQ_DATASET
services = json.loads("".join(services))[0]
services['status']['url']

'https://endpoint-05-fraud-urlxi72dpa-uc.a.run.app'

If you had to adjust a `Domain Restricted Sharing` policy after deployment then this command can update the service to allow all traffic:

In [225]:
#!gcloud run services add-iam-policy-binding --region=us-central1 --member='allUsers' --role=roles/run.invoker endpoint-$SERIES-$DATANAME

### Get Predictions Using Cloud Run Service

In [226]:
import requests

In [227]:
headers = {"content-type": "application/json"}
json_response = requests.post(f"{services['status']['url']}/v1/models/{SERIES}:predict", data=json.dumps({"instances": [newobs[0]]}), headers=headers)

In [228]:
print(json_response.text)

{
    "predictions": [[0.999176681, 0.000823272683]
    ]
}


In [229]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999176681, 0.000823272683]]

In [230]:
np.argmax(predictions[0])

0

### Remove Service
Alternatively, you could adjust the service to not accept traffic.  Cloud Run will scale down to zero - or only charge when CPU is used (startup, shutdown, and receiving requests) unless `--no-cpu-throttling` is used ([documentation](https://cloud.google.com/run/docs/configuring/cpu-allocation#setting)).

In [231]:
!gcloud run services delete --region=us-central1 --quiet endpoint-$SERIES-$BQ_DATASET

Deleting [endpoint-05-fraud]...done.                                           
Deleted service [endpoint-05-fraud].


---