# 05Tools - Prediction - NVIDIA Triton

## WORK IN PROGRESS

Hosts multiple models on a single Vertex AI Endpoint using NVIDA Triton server?

Workflow:
- Registry models from 05a-05f in a Triton Server Instance on Vertex AI Endpoint
- Show prediction calls to individual models
- Add a model
- Replace a model

Resources:
- Vertex AI Model Registry
- GCS
- Vetex AI Endpoints

Prerequisites:
- Multiple of [05, 05a-05i]

References:
- https://cloud.google.com/vertex-ai/docs/predictions/using-nvidia-triton
- https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/vertex_endpoints/nvidia-triton/nvidia-triton-custom-container-prediction.ipynb
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
- Main Concepts from [user guide](https://github.com/triton-inference-server/server/tree/main/docs/user_guide):
    - [model registry](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md)
    - [model configuration](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md)

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
EXPERIMENT = 'triton'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [3]:
from google.cloud import aiplatform
from google.cloud import bigquery

import json
from google.api import httpbody_pb2

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [5]:
BUCKET = PROJECT_ID
DIR = f"temp/{EXPERIMENT}"

environment:

In [6]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## List Models
This series, `05`, has a multiple workflows that create models that each predict the `Class` of transaction from a fraud dataset. This section will list all models in the series as well as all versions of each model.
- [Reference](https://cloud.google.com/vertex-ai/docs/model-registry/versioning#list-all-api)

In [7]:
model_list = aiplatform.Model.list(filter = f"labels.series={SERIES}")
for m in model_list: print(m.name)

model_05_05f
model_05_05a
model_05_05i
model_05_05h
model_05_05g
model_05_05e
model_05_05d
model_05_05c
model_05_05b
model_05_05
model_05f_fraud
model_05e_fraud


In [10]:
for model in model_list[0:1]:
    print('Model Name:', model.name)
    versions = model.versioning_registry.list_versions()
    for v in versions:
        print('Model Version:', v.version_id, aiplatform.Model(model_name = f'{v.model_resource_name}@{v.version_id}').uri)

Model Name: model_05_05f
Getting versions for projects/1026793852137/locations/us-central1/models/model_05_05f
Model Version: 1 gs://statmike-mlops-349915/05/05f/models/20220927190441/model
Model Version: 2 gs://statmike-mlops-349915/05/05f/models/20221024130131/model
Model Version: 3 gs://statmike-mlops-349915/05/05f/models/20221101224649/model
Model Version: 4 gs://statmike-mlops-349915/05/05f/models/20221102030007/model
Model Version: 5 gs://statmike-mlops-349915/05/05f/models/20221109040010/model
Model Version: 6 gs://statmike-mlops-349915/05/05f/models/20221116040015/model
Model Version: 8 gs://statmike-mlops-349915/05/05f/models/20221130040015/model
Model Version: 9 gs://statmike-mlops-349915/05/05f/models/20221207040014/model
Model Version: 10 gs://statmike-mlops-349915/05/05f/models/20221214040012/model
Model Version: 11 gs://statmike-mlops-349915/05/05f/models/20221221040013/model
Model Version: 12 gs://statmike-mlops-349915/05/05f/models/20221228040012/model


---
## Create A Triton Inference Server Model Repository

The Triton inference server expects a prepared model repository in the specific layout listed below.  This section will prepare this folder in the GCS bucket for this project.

```
  <model-repository-path>/
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      <version>/
        <model-definition-file>
      <version>/
        <model-definition-file>
      ...
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      <version>/
        <model-definition-file>
      <version>/
        <model-definition-file>
      ...
    ...
```

- [Reference](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md)

In [11]:
for model in model_list[0:1]:
    print('Model Name:', model.name)
    versions = model.versioning_registry.list_versions()
    for v in versions:
        version = aiplatform.Model(model_name = f'{v.model_resource_name}@{v.version_id}')
        print('Model Version:', v.version_id, version.uri)
        !gsutil -m cp -r $version.uri gs://$PROJECT_ID/$SERIES/$EXPERIMENT/model_repo/$model.name/$v.version_id/model.savedmodel

Model Name: model_05_05f
Getting versions for projects/1026793852137/locations/us-central1/models/model_05_05f
Model Version: 1 gs://statmike-mlops-349915/05/05f/models/20220927190441/model
Copying gs://statmike-mlops-349915/05/05f/models/20220927190441/model/saved_model.pb...
Copying gs://statmike-mlops-349915/05/05f/models/20220927190441/model/variables/variables.index...
Copying gs://statmike-mlops-349915/05/05f/models/20220927190441/model/variables/variables.data-00000-of-00001...
Model Version: 2 gs://statmike-mlops-349915/05/05f/models/20221024130131/model  
Copying gs://statmike-mlops-349915/05/05f/models/20221024130131/model/saved_model.pb...
Copying gs://statmike-mlops-349915/05/05f/models/20221024130131/model/variables/variables.data-00000-of-00001...
Copying gs://statmike-mlops-349915/05/05f/models/20221024130131/model/variables/variables.index...
Model Version: 3 gs://statmike-mlops-349915/05/05f/models/20221101224649/model  
Copying gs://statmike-mlops-349915/05/05f/models

In [12]:
print(f'Review the model repository path here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID}/{SERIES}/{EXPERIMENT}?project={PROJECT_ID}')

Review the model repository path here:
https://console.cloud.google.com/storage/browser/statmike-mlops-349915/05/triton?project=statmike-mlops-349915


---
## Copy Triton Image to Artifact Registry

Make a copy of a pre-built [Triton Inference Server] container by using docker to pull the image, tag it, then push it to Artifact Registry.

**Note:** Vertex AI prediction containers need to be in either Artifact Registry or Container Registry as documented [here](https://cloud.google.com/vertex-ai/docs/predictions/use-custom-container#push_the_container_image_to_or).


In [13]:
TRITON_IMAGE = "nvcr.io/nvidia/tritonserver:22.01-py3"

REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{PROJECT_ID}"

AR_IMAGE = f"{REPOSITORY}/{SERIES}_{EXPERIMENT}:22.01"

In [14]:
TRITON_IMAGE, REPOSITORY, AR_IMAGE

('nvcr.io/nvidia/tritonserver:22.01-py3',
 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915',
 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_triton:22.01')

In [15]:
!docker pull $TRITON_IMAGE

22.01-py3: Pulling from nvidia/tritonserver
Digest: sha256:bb4c71b62bf206c8d6b0db57b66c18e86b471f6549676849508de2afe9f435c0
Status: Image is up to date for nvcr.io/nvidia/tritonserver:22.01-py3
nvcr.io/nvidia/tritonserver:22.01-py3


In [16]:
#!docker rmi $(docker images --filter "dangling=true" -q)

In [17]:
#!docker rmi $(docker images "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/*" -q)

In [18]:
!docker tag $TRITON_IMAGE $AR_IMAGE

In [19]:
!gcloud auth configure-docker $REGION-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


In [20]:
!docker push $AR_IMAGE

The push refers to repository [us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05_triton]

[1Bdc9163ba: Preparing 
[1Bec5cbe74: Preparing 
[1Bdc188c3b: Preparing 
[1B7398f9de: Preparing 
[1B82e2ac43: Preparing 
[1Be2585d1c: Preparing 
[1Bb2904404: Preparing 
[1B9978eca4: Preparing 
[1B70c80f6c: Preparing 
[1B318f9fd9: Preparing 
[1B1053ce30: Preparing 
[1B396e2977: Preparing 
[1Beaf8a0ad: Preparing 
[1B3abb1f4f: Preparing 
[1B27bc77b7: Preparing 
[1B3b9675cd: Preparing 
[1B7dff5583: Preparing 
[1B5b092a4b: Preparing 
[1Bdcb16d7f: Preparing 
[1B6fef3119: Preparing 
[1B19722fae: Preparing 
[1Be0bb437c: Preparing 
[1Bee343cb8: Preparing 
[1B94bde60b: Preparing 
[1B262b816c: Preparing 
[20B2904404: Waiting g 
[14Babb1f4f: Waiting g 
[14B7bc77b7: Waiting g 
[1B1fe283a8: Preparing 
[22B0c80f6c: Waiting g 
[2B6d186137: Layer already exists [26A[2K[22A[2K[19A[2K[13A[2K[8A[2K[3A[2K22.01: digest: sha256:4aac6b9b4b8865a5edd5b83cd72

---
## Upload The Triton Server As A Vertex AI Model Resource

Upload the Triton server as a Vertex AI Model where the model repository is the `artifact_uri` and the `serving_container` is the pre-built [Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) container that has been copied to artifact registry above.  By default, the Triton Inference Server will attempt to automatically generate the model configuration if they are missing.  This work for all required setting of some model types as documented [here](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#auto-generated-model-configuration).

The logic here will check for the models existance and if it exist it will check to see if the associated serving container is the same.  If the model exist and has the same serving container it links the existing model, otherwise it createsor upload new version of the model.

In [21]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT} AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if modelmatch[0].container_spec.image_uri == AR_IMAGE:
        print("This version already loaded, no action taken.")
        upload_model = False
        model = modelmatch[0].resource_name
    else:
        print('Loading model as new default version.')
        parent_model = modelmatch[0].resource_name
else:
    print('This is a new model, creating in model registry')
    parent_model = ''
if upload_model:    
    model = aiplatform.Model.upload(
        display_name = f'{SERIES}_{EXPERIMENT}',
        model_id = f'model_{SERIES}_{EXPERIMENT}',
        parent_model = parent_model,
        serving_container_image_uri = AR_IMAGE,
        serving_container_args = ['--strict-model-config=false'],
        artifact_uri = f"gs://{PROJECT_ID}/{SERIES}/{EXPERIMENT}/model_repo",
        is_default_version = True,
        labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}'}        
    )

This is a new model, creating in model registry
Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/model_05_triton/operations/3801582687353831424
Model created. Resource name: projects/1026793852137/locations/us-central1/models/model_05_triton@1
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/model_05_triton@1')


In [22]:
model.name

'model_05_triton'

In [23]:
model.version_id

'1'

In [24]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/model_05_triton?project=statmike-mlops-349915


---
## Create/Retrieve The Endpoint For This Series

In [25]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")
    
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Creating Endpoint
Create Endpoint backing LRO: projects/1026793852137/locations/us-central1/endpoints/319355351310794752/operations/795429936084025344
Endpoint created. Resource name: projects/1026793852137/locations/us-central1/endpoints/319355351310794752
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/1026793852137/locations/us-central1/endpoints/319355351310794752')
Endpoint Created: projects/1026793852137/locations/us-central1/endpoints/319355351310794752
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/319355351310794752?project=statmike-mlops-349915


In [26]:
endpoint.display_name

'05'

In [27]:
endpoint.traffic_split

{}

In [28]:
deployed_models = endpoint.list_models()
[(d.display_name, d.model_version_id) for d in deployed_models]

[]

---
## Deploy Model To Endpoint

In [29]:
if (model.display_name, model.version_id) not in [(d.display_name, d.model_version_id) for d in deployed_models]:
    print(f'Deploying model with 100% of traffic...')
    endpoint.deploy(
        model = model,
        deployed_model_display_name = model.display_name,
        traffic_percentage = 100,
        machine_type = DEPLOY_COMPUTE,
        min_replica_count = 1,
        max_replica_count = 2,
        #service_account = SERVICE_ACCOUNT
    )
else:
    print(f'Not deploying because model = {model.display_name} with version {model.version_id} is already on endpoint = {endpoint.display_name}')

Deploying model with 100% of traffic...
Deploying Model projects/1026793852137/locations/us-central1/models/model_05_triton to Endpoint : projects/1026793852137/locations/us-central1/endpoints/319355351310794752
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/319355351310794752/operations/8384276783180021760
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/319355351310794752


### Remove Deployed Models without Traffic

In [30]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 05_triton with version 1 has traffic = 100


In [31]:
endpoint.traffic_split

{'7897814012547563520': 100}

In [32]:
[d.display_name for d in endpoint.list_models()]

['05_triton']

---
## Prepare a record for prediction: instance and parameters lists

In [33]:
n = 10
pred = bq.query(
    query = f"""
        SELECT * EXCEPT(splits, {VAR_TARGET}, {VAR_OMIT})
        FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}
        WHERE splits='TEST'
        LIMIT {n}
        """
).to_dataframe()

In [34]:
pred

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.240428,0.037603,0.380026,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.26508,-0.060003,-0.053585,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,-0.374356,0.196006,0.656552,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-0.923275,-0.545992,-0.252324,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0
4,165236,2.180149,0.218732,-2.637726,0.348776,1.063546,-1.249197,0.942021,-0.547652,-0.087823,...,-0.250653,0.234502,0.825237,-0.176957,0.563779,0.730183,0.707494,-0.131066,-0.090428,0.0
5,62606,1.199408,0.352007,0.379645,1.372017,0.291347,0.524919,-0.117555,0.132907,-0.935169,...,-0.042979,-0.050291,-0.126609,-0.022218,-0.599026,0.258188,0.928721,-0.058988,-0.008856,0.0
6,90719,1.937447,0.337882,-0.00063,3.816486,0.276515,1.079842,-0.730626,0.197353,1.137566,...,-0.315667,-0.038376,0.208914,0.160189,-0.015145,-0.162678,-0.000843,-0.018178,-0.039339,0.0
7,113350,1.8919,0.401086,-0.119983,4.0475,0.049952,0.192793,-0.108512,-0.0404,-0.390391,...,-0.267639,0.094177,0.613712,0.070986,0.079543,0.135219,0.128961,0.003667,-0.045079,0.0
8,156499,0.060003,1.461355,0.378915,2.835455,1.626526,-0.164732,1.551858,-0.412927,-1.735264,...,-0.175275,0.042293,0.277536,-0.123379,1.081552,-0.053079,-0.149809,-0.314438,-0.216539,0.0
9,73902,-1.85926,2.158799,1.085671,2.615483,0.24666,2.133925,-1.569015,-2.612353,-1.312509,...,0.590142,-0.867178,-0.700479,0.231972,-1.374527,0.140285,0.128806,0.153606,0.092042,0.0


In [35]:
newobs = pred.to_dict(orient='records')
#newobs[0]

In [36]:
newobs[0]

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

In [37]:
[{"name": key, "data": newobs[0][key], "datatype": "FP32", "shape": [1,1]} for key in newobs[0]]

[{'name': 'Time', 'data': 35337, 'datatype': 'FP32', 'shape': [1, 1]},
 {'name': 'V1',
  'data': 1.0928441854981998,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V2',
  'data': -0.0132303486713432,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V3', 'data': 1.35982868199426, 'datatype': 'FP32', 'shape': [1, 1]},
 {'name': 'V4',
  'data': 2.7315370965921004,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V5',
  'data': -0.707357349219652,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V6',
  'data': 0.8738370029866129,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V7',
  'data': -0.7961301510622031,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V8',
  'data': 0.437706509544851,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V9',
  'data': 0.39676985012996396,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V10',
  'data': 0.587438102569443,
  'datatype': 'FP32',
  'shape': [1, 1]},
 {'name': 'V11',
  'data': -0.14979756231827498,
  'dat

---

In [64]:
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}

In [65]:
predictor = aiplatform.gapic.PredictionServiceClient(client_options = client_options)

In [66]:
endpoint.resource_name

'projects/1026793852137/locations/us-central1/endpoints/319355351310794752'

In [67]:
instances = {"id": "1", "model_name": "model_05_05f", "model_version": "1", "inputs": [{"name": key, "data": [newobs[0][key]], "datatype": "FP32", "shape": [1,1]} for key in newobs[0]]}
http_body = httpbody_pb2.HttpBody(
        data = json.dumps(instances).encode("utf-8"),
        content_type = "application/json"
    )
request = aiplatform.gapic.RawPredictRequest(
    endpoint = endpoint.resource_name,
    http_body = http_body
)

In [68]:
response = predictor.raw_predict(
    request = request
)

In [69]:
json.loads(response.data)

{'id': '1',
 'model_name': 'model_05_05f',
 'model_version': '12',
 'outputs': [{'name': 'logistic',
   'datatype': 'FP32',
   'shape': [1, 2],
   'data': [0.9997530579566956, 0.00024691224098205566]}]}

---

In [55]:
import requests

In [71]:
token = !gcloud auth application-default print-access-token
headers = {
    "content-type": "application/json; charset=utf-8",
    "Authorization": f'Bearer {token[0]}'
}

In [72]:
json_response = requests.post(
    f'https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict',
    data=json.dumps(instances),
    headers=headers
)

In [73]:
json_response

<Response [200]>

In [74]:
json.loads(json_response.text)

{'id': '1',
 'model_name': 'model_05_05f',
 'model_version': '12',
 'outputs': [{'name': 'logistic',
   'datatype': 'FP32',
   'shape': [1, 2],
   'data': [0.9997530579566956, 0.00024691224098205566]}]}

Change the Version:

In [75]:
headers = {
    "content-type": "application/json; charset=utf-8",
    "X-Vertex-Ai-Triton-Redirect": "v2/models/model_05_05f/versions/11/infer",
    "Authorization": f'Bearer {token[0]}'
}
json_response = requests.post(
    f'https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:rawPredict',
    data=json.dumps(instances),
    headers=headers
)
json.loads(json_response.text)

{'error': "Request for unknown model: 'model_05_05f' version 11 is not found"}

In [76]:
endpoint.delete(force = True)

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/319355351310794752
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/319355351310794752/operations/974729496248713216
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/319355351310794752
Deleting Endpoint : projects/1026793852137/locations/us-central1/endpoints/319355351310794752
Delete Endpoint  backing LRO: projects/1026793852137/locations/us-central1/operations/1304336693976891392
Endpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/319355351310794752


In [77]:
model.delete()

Deleting Model : projects/1026793852137/locations/us-central1/models/model_05_triton
Delete Model  backing LRO: projects/1026793852137/locations/us-central1/operations/8982974058643587072
Model deleted. . Resource name: projects/1026793852137/locations/us-central1/models/model_05_triton
