# 05Tools: Prediction - Cloud Run
## BROKEN - Current Being Worked ON - Dependency in tensorflow/serving container

Predictions from models created in the 05 series of notebooks.

This notebook is part of collection of examples that showcase many ways to serve models:
- Online:
    - Vertex AI Endpoints: Python, REST, CLI (gcloud): [05Tools - Prediction - Online.ipynb](./05Tools%20-%20Prediction%20-%20Online.ipynb)
    - Local with TensorFlow ModelServer: [05Tools - Prediction - Local.ipynb](./05Tools%20-%20Prediction%20-%20Local.ipynb)
    - (**THIS NOTEBOOK**) Remote with Cloud Run with TensorFlow ModelServer: [05Tools - Prediction - Cloud Run.ipynb](./05Tools%20-%20Prediction%20-%20Cloud%20Run.ipynb)
- Batch: [05Tools - Prediction - Batch.ipynb](./05Tools%20-%20Prediction%20-%20Batch.ipynb)
    - BigQuery ML Model Import
    - Vertex AI Batch Prediction Jobs

### Prerequisites:
-  At least 1 of the notebooks in this series [05, 05a-05i]5

### Conceptual Flow & Workflow
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_pred_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_pred_console.png" width="45%">
</p>

---
## Setup

inputs:

In [7]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [8]:
REGION = 'us-central1'
EXPERIMENT = '05_predictions'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [9]:
from google.cloud import aiplatform
from google.cloud import bigquery

import tensorflow as tf

from datetime import datetime
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

import asyncio
import time
import multiprocessing

clients:

In [10]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [11]:
BUCKET = PROJECT_ID
DIR = f"temp/{EXPERIMENT}"

environment:

In [12]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Get Endpoint

[Endpoint Properties and Methods](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Endpoint):

```python
endpoint
endpoint.display_name
endpoint.resource_name
endpoint.traffic_split
endpoint.list_models()
```

In [13]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
endpoint = endpoints[0]

In [14]:
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/1961322035766362112?project=statmike-mlops-349915


### Model Information
Using the model on the endpoint for the current series:

In [15]:
endpoint

<google.cloud.aiplatform.models.Endpoint object at 0x7f539f277d10> 
resource name: projects/1026793852137/locations/us-central1/endpoints/1961322035766362112

In [16]:
#endpoint.list_models()[0]

In [17]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [18]:
model.display_name

'05_05h'

In [19]:
model.resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05h'

In [20]:
model.version_id

'1'

In [21]:
model.version_description

'run-20220927230247-6'

In [22]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05h@1'

In [23]:
model.supported_input_storage_formats

['jsonl', 'bigquery', 'csv', 'tf-record', 'tf-record-gzip', 'file-list']

In [24]:
model.name

'model_05_05h'

In [25]:
model.uri

'gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model'

In [26]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}/versions/{model.version_id}/properties?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/model_05_05h/versions/1/properties?project=statmike-mlops-349915


---
## Retrieve Records For Prediction

In [27]:
n = 1000
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()

In [28]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST


Remove columns not included as features in the model:

In [29]:
newobs = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')
#newobs[0]

In [30]:
len(newobs)

1000

---
## Serving With Cloud Run: TensorFlow ModelServer

Review the local directory for this notebook (created above):

In [31]:
DIR

'temp/05_predictions'

In [32]:
!ls {DIR}

Copy the model files to the local directory for this notebook:

In [33]:
!gsutil cp -R {model.uri} {DIR}

Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/keras_metadata.pb...
Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/saved_model.pb...
/ [2 files][513.3 KiB/513.3 KiB]                                                
==> NOTE: You are performing a sequence of gsutil operations that may
run significantly faster if you instead use gsutil -m cp ... Please
see the -m section under "gsutil help options" for further information
about when gsutil -m can be advantageous.

Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/variables/variables.data-00000-of-00001...
Copying gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model/variables/variables.index...
/ [4 files][558.2 KiB/558.2 KiB]                                                
Operation completed over 4 objects/558.2 KiB.                                    


In [34]:
!ls {DIR}

model


In [35]:
!ls {DIR}/model

keras_metadata.pb  saved_model.pb  variables


### Load the Model (local) and Review

In [36]:
reloaded_model = tf.saved_model.load(f'{DIR}/model')

2022-09-28 23:46:47.911736: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299995000 Hz
2022-09-28 23:46:47.912646: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c77f95c580 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-28 23:46:47.912679: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-09-28 23:46:47.912878: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [37]:
reloaded_model.signatures

_SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F532ECB3150>})

In [38]:
reloaded_model.signatures['serving_default']

<ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F532ECB3150>

In [39]:
reloaded_model.signatures['serving_default'].structured_input_signature

((),
 {'V12': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V12'),
  'V13': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V13'),
  'Time': TensorSpec(shape=(None, 1), dtype=tf.float32, name='Time'),
  'V24': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V24'),
  'V16': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V16'),
  'V10': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V10'),
  'V4': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V4'),
  'V26': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V26'),
  'V14': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V14'),
  'V23': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V23'),
  'V22': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V22'),
  'V2': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V2'),
  'V8': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V8'),
  'V7': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V7'),
  'V9': TensorSpec(shape=(None, 1), dtype=tf.floa

In [40]:
#!saved_model_cli show --dir {DIR}/model --all

### Build Docker Container
This build is local to the notebook.  It could be done on a service like Cloud Build.

In [41]:
dockerfile = f"""
FROM tensorflow/serving
#ENTRYPOINT [“/usr/bin/env”]
ENV MODEL_NAME={SERIES}
ENV PORT=8501
COPY . /models/{SERIES}/1
#RUN ls -la /models/{SERIES}
CMD tensorflow_model_server --port8500 --rest_api_port=$PORT --model_base_path=/models/{SERIES} --model_name=$MODEL_NAME
"""
with open(f'{DIR}/model/Dockerfile', 'w') as f:
    f.write(dockerfile)

Create an Image Tag for Artifact Registry - the repository name:

In [42]:
IMAGE_URI=f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{PROJECT_ID}/{EXPERIMENT}:latest"
IMAGE_URI

'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions:latest'

Docker build - local:

In [43]:
!docker build -t $IMAGE_URI {DIR}/model/.

Sending build context to Docker daemon    577kB
Step 1/5 : FROM tensorflow/serving
 ---> 296dbc78ab3b
Step 2/5 : ENV MODEL_NAME=05
 ---> Running in fac71b126a3f
Removing intermediate container fac71b126a3f
 ---> de740d578537
Step 3/5 : ENV PORT=8501
 ---> Running in 43f2c4dbb6ef
Removing intermediate container 43f2c4dbb6ef
 ---> 9d0b93b8887f
Step 4/5 : COPY . /models/05/1
 ---> d1b00a6d0c16
Step 5/5 : CMD tensorflow_model_server --port8500 --rest_api_port=$PORT --model_base_path=/models/05 --model_name=$MODEL_NAME
 ---> Running in 0e15c5979e6c
Removing intermediate container 0e15c5979e6c
 ---> 7dad0e109de0
Successfully built 7dad0e109de0
Successfully tagged us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions:latest


### Test Docker Container Locally (in a subprocess)
Use the `-rm` flag to indicate the container should be automatically removed once stopped.

In [44]:
import multiprocessing

def docker_runner():
    !docker run -t --rm -i -p 8501:8501 $IMAGE_URI

def main():
    p = multiprocessing.Process(target=docker_runner)
    p.start()
    return p
    
p = main()

unknown argument: /bin/sh
usage: tensorflow_model_server
Flags:
	--port=8500                      	int32	TCP port to listen on for gRPC/HTTP API. Disabled if port set to zero.
	--grpc_socket_path=""            	string	If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
	--rest_api_port=0                	int32	Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
	--rest_api_num_threads=16        	int32	Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
	--rest_api_timeout_in_ms=30000   	int32	Timeout for HTTP/REST API calls.
	--rest_api_enable_cors_support=false	bool	Enable CORS headers in response
	--enable_batching=false          	bool	enable batching
	--allow_version_labels_for_unavailable_models=false	bool	If true, allows assigning unused version labels to models that are not ava

#### Get Predictions on Exposed Port

In [45]:
import requests

In [46]:
headers = {"content-type": "application/json"}
json_response = requests.post(f'http://localhost:8501/v1/models/{SERIES}:predict', data=json.dumps({"instances": [newobs[0]]}), headers=headers)

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

In [209]:
print(json_response.text)

{
    "predictions": [[0.999176681, 0.000823272683]
    ]
}


In [210]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999176681, 0.000823272683]]

In [211]:
np.argmax(predictions[0])

0

#### Shutdown TensorFlow Serving Container
There are two entities running: a subprocess called `p` and a docker container that was run by the subprocess.  It is not enough to just stop `p` but it might be enough to stop the container and then the subprocess will terminate due to completion.  The commands below stop the subprocess `p` and then stop and remove (automatic since run with `-rm` flag) the container.

In [47]:
p.terminate()

In [48]:
p.is_alive()

False

In [49]:
docker = !docker ps -a
docker

['CONTAINER ID   IMAGE                          COMMAND                  CREATED       STATUS      PORTS     NAMES',
 'cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c \'/opt/bi…"   6 weeks ago   Up 2 days             proxy-agent']

In [50]:
for d in docker:
    if f'{IMAGE_URI}' in d:
        print(d.split()[-1])
        !docker stop {d.split()[-1]}

In [51]:
!docker ps -a

CONTAINER ID   IMAGE                          COMMAND                  CREATED       STATUS      PORTS     NAMES
cfc6fa1ae606   gcr.io/inverting-proxy/agent   "/bin/sh -c '/opt/bi…"   6 weeks ago   Up 2 days             proxy-agent


### Push the Docker Container to Artifact Registry

#### Enable Artifact Registry API:
Check to see if the api is enabled, if not then enable it:

In [217]:
services = !gcloud services list --format="json" --available --filter=name:artifactregistry.googleapis.com
services = json.loads("".join(services))

if (services[0]['config']['name'] == 'artifactregistry.googleapis.com') & (services[0]['state'] == 'ENABLED'):
    print(f"Artifact Registry is Enabled for This Project: {PROJECT_ID}")
else:
    print(f"Enabeling Artifact Registry for this Project: {PROJECT_ID}")
    !gcloud services enable artifactregistry.googleapis.com

Artifact Registry is Enabled for This Project: statmike-mlops-349915


#### Create A Repository
Check to see if the registry is already created, if not then create it

In [218]:
check_for_repo = !gcloud artifacts repositories describe {PROJECT_ID} --location={REGION}

if check_for_repo[0].startswith('ERROR'):
    print(f'Creating a repository named {PROJECT_ID}')
    !gcloud  artifacts repositories create {PROJECT_ID} --repository-format=docker --location={REGION} --description="Vertex AI Training Custom Containers"
else:
    print(f'There is already a repository named {PROJECT_ID}')

There is already a repository named statmike-mlops-349915


#### Configure Local Docker to Use GCLOUD CLI

In [219]:
!gcloud auth configure-docker {REGION}-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


#### Push The Container to The Repository

In [220]:
!docker push $IMAGE_URI

The push refers to repository [us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_predictions]

[1B9c0f0fb7: Preparing 
[1B23850a27: Preparing 
[1Ba33781cd: Preparing 
[1B89523b17: Preparing 
[1Bf28d5f3c: Preparing 
[1Bc6d2db45: Preparing 
[1Bbacb0351: Preparing 
[8B9c0f0fb7: Pushed lready exists 9kB[5A[2K[1A[2K[8A[2Klatest: digest: sha256:0cb8dacb33652b932190b7192a630742395948db07db06ee53f43042f7ff4ad5 size: 1989


### Deploy as Cloud Run Service
This demonstration creates an open service allowing all traffic.  Review documentation for [Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run) and the [CLOUD SKD CLI sections](https://cloud.google.com/sdk/gcloud/reference/run) for `gcloud run`.


If you have a policy inforced for 'Domain Restricted Sharing' then it may need adjusting for the project to allow this.  This should be done with care and you may wish to only accept authenticated or internal traffic.  Review options for authentication [here](https://cloud.google.com/run/docs/authenticating/overview).

Updated Org Policy:
- Logged in as Admin
- IAM > Organization Policies
    - Changed to Project (not org level)
    - Filter 'Domain Restricted Sharing'
    - Select and Edit
        - Applies to = Customize
        - Policy enforcement = Replace
        - Rules = Allow all
    - Save

View the Cloud Run Console for this project:

In [221]:
print(f'https://console.cloud.google.com/run?project={PROJECT_ID}')

https://console.cloud.google.com/run?project=statmike-mlops-349915


In [222]:
!gcloud run deploy endpoint-$SERIES-$BQ_DATASET --image=$IMAGE_URI --port=8501 --region=$REGION --platform=managed --allow-unauthenticated --no-user-output-enabled

Deploying new service...                                                       
  . Creating Revision...                                                       
  . Routing traffic...                                                         
  . Setting IAM Policy...                                                      


In [223]:
!gcloud run services list

   SERVICE            REGION       URL                                                LAST DEPLOYED BY                                     LAST DEPLOYED AT
[32m✔[39;0m  endpoint-05-fraud  us-central1  https://endpoint-05-fraud-urlxi72dpa-uc.a.run.app  1026793852137-compute@developer.gserviceaccount.com  2022-08-26T21:08:34.408166Z


In [224]:
services = !gcloud run services list --format="json" --filter=SERVICE:endpoint-$SERIES-$BQ_DATASET
services = json.loads("".join(services))[0]
services['status']['url']

'https://endpoint-05-fraud-urlxi72dpa-uc.a.run.app'

If you had to adjust a `Domain Restricted Sharing` policy after deployment then this command can update the service to allow all traffic:

In [225]:
#!gcloud run services add-iam-policy-binding --region=us-central1 --member='allUsers' --role=roles/run.invoker endpoint-$SERIES-$DATANAME

### Get Predictions Using Cloud Run Service

In [226]:
import requests

In [227]:
headers = {"content-type": "application/json"}
json_response = requests.post(f"{services['status']['url']}/v1/models/{SERIES}:predict", data=json.dumps({"instances": [newobs[0]]}), headers=headers)

In [228]:
print(json_response.text)

{
    "predictions": [[0.999176681, 0.000823272683]
    ]
}


In [229]:
predictions = json.loads(json_response.text)['predictions']
predictions

[[0.999176681, 0.000823272683]]

In [230]:
np.argmax(predictions[0])

0

### Remove Service
Alternatively, you could adjust the service to not accept traffic.  Cloud Run will scale down to zero - or only charge when CPU is used (startup, shutdown, and receiving requests) unless `--no-cpu-throttling` is used ([documentation](https://cloud.google.com/run/docs/configuring/cpu-allocation#setting)).

In [231]:
!gcloud run services delete --region=us-central1 --quiet endpoint-$SERIES-$BQ_DATASET

Deleting [endpoint-05-fraud]...done.                                           
Deleted service [endpoint-05-fraud].
