# Deploying NVIDIA Triton Inference Server in AI Platform Prediction Custom Container (REST API)

In this notebook, we will walk through the process of deploying NVIDIA's Triton Inference Server into AI Platform Prediction Custom Container service in the Direct Model Server mode:

![](img/caip_triton_container_diagram_direct.jpg)


In [None]:
PROJECT_ID='[Enter project name - REQUIRED]'
REPOSITORY='caipcustom'
REGION='us-central1'
TRITON_VERSION='20.06'

In [None]:
import os
import random
import requests
import json

MODEL_BUCKET='gs://{}-{}'.format(PROJECT_ID,random.randint(10000,99999))
ENDPOINT='https://{}-ml.googleapis.com/v1'.format(REGION)
TRITON_IMAGE='tritonserver:{}-py3'.format(TRITON_VERSION)
CAIP_IMAGE='{}-docker.pkg.dev/{}/{}/{}'.format(REGION,PROJECT_ID,REPOSITORY,TRITON_IMAGE)

In [1]:
PROJECT_ID='tsaikevin-1238'
REPOSITORY='caipcustom'
REGION='us-central1'
TRITON_VERSION='20.06'

import os
import random
import requests
import json

MODEL_BUCKET='gs://{}-{}'.format(PROJECT_ID,random.randint(10000,99999))
ENDPOINT='https://{}-ml.googleapis.com/v1'.format(REGION)
TRITON_IMAGE='tritonserver:{}-py3'.format(TRITON_VERSION)
CAIP_IMAGE='{}-docker.pkg.dev/{}/{}/{}'.format(REGION,PROJECT_ID,REPOSITORY,TRITON_IMAGE)

In [2]:
!gcloud config set project $PROJECT_ID

Updated property [core/project].


In [3]:
MODEL_BUCKET='gs://tsaikevin-1238-80838'

In [42]:
print(MODEL_BUCKET)
print(ENDPOINT)
print(TRITON_IMAGE)
print(CAIP_IMAGE)


gs://tsaikevin-1238-80838
https://us-central1-ml.googleapis.com/v1
tritonserver:20.06-py3
us-central1-docker.pkg.dev/tsaikevin-1238/caipcustom/tritonserver:20.06-py3


In [4]:
os.environ["PROJECT_ID"]=PROJECT_ID
os.environ["MODEL_BUCKET"]=MODEL_BUCKET
os.environ["ENDPOINT"]=ENDPOINT
os.environ["CAIP_IMAGE"]=CAIP_IMAGE

### Create the Artifact Registry
This will be used to store the container image for the model server Triton.

In [5]:
!gcloud beta artifacts repositories create $REPOSITORY --repository-format=docker --location=$REGION

[1;31mERROR:[0m (gcloud.beta.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [6]:
!gcloud beta auth configure-docker $REGION-docker.pkg.dev --quiet


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


### Prepare the container
We will make a copy of the Triton container image into the Artifact Registry, where AI Platform Custom Container Prediction will only pull from during Model Version setup. The following steps will download the NVIDIA Triton Inference Server container to your VM, then upload it to your repo.

In [7]:
!docker pull nvcr.io/nvidia/$TRITON_IMAGE && \
 docker tag nvcr.io/nvidia/$TRITON_IMAGE $CAIP_IMAGE && \
 docker push $CAIP_IMAGE

20.06-py3: Pulling from nvidia/tritonserver
Digest: sha256:36f94c39221c4e19921d44296690991057bbebbb15f59dacd88e25ff331bd307
Status: Image is up to date for nvcr.io/nvidia/tritonserver:20.06-py3
nvcr.io/nvidia/tritonserver:20.06-py3
The push refers to repository [us-central1-docker.pkg.dev/tsaikevin-1238/caipcustom/tritonserver]

[1B7aefd4ea: Preparing 
[1Bab22f50a: Preparing 
[1B4bb8a14c: Preparing 
[1Bc357696a: Preparing 
[1B35b111ce: Preparing 
[1B422b8a56: Preparing 
[1B5c73ed66: Preparing 
[1B91761c8c: Preparing 
[1Bdcbd0b8f: Preparing 
[1B3fad0b37: Preparing 
[1Bbca7086a: Preparing 
[1Ba1fe0dac: Preparing 
[1B16262158: Preparing 
[1Bfaf9c798: Preparing 
[1B4dd7a77b: Preparing 
[1B4f618f62: Preparing 
[1B114ab5c3: Preparing 
[1Bb7588393: Preparing 
[1B7a4b3a0b: Preparing 
[1B3708beeb: Preparing 
[1Bc2e3c7b1: Preparing 
[1B43d8d50a: Preparing 
[1B9bd9798f: Preparing 
[1B27c9414b: Preparing 
[1B4c1700eb: Preparing 
[1B46c23e3a: Preparing 
[1Bb877a610: Prepa

### Prepare model Artifacts

Clone the NVIDIA Triton Inference Server repo.

In [8]:
!git clone https://github.com/NVIDIA/triton-inference-server.git

Cloning into 'triton-inference-server'...
remote: Enumerating objects: 285, done.[K
remote: Counting objects: 100% (285/285), done.[K
remote: Compressing objects: 100% (190/190), done.[K
remote: Total 25484 (delta 149), reused 161 (delta 89), pack-reused 25199[K
Receiving objects: 100% (25484/25484), 14.34 MiB | 23.99 MiB/s, done.
Resolving deltas: 100% (18800/18800), done.


Create the GCS bucket where the model artifacts will be copied to.

In [9]:
!gsutil mb $MODEL_BUCKET

Creating gs://tsaikevin-1238-80838/...


Stage model artifacts and copy to bucket.

In [10]:
!mkdir model_repository

In [11]:
!cp -R triton-inference-server/docs/examples/model_repository/* model_repository/

In [12]:
# !echo cd triton-inference-server && git checkout r$TRITON_VERSION

cd triton-inference-server
error: pathspec 'r20.06' did not match any file(s) known to git.


In [12]:
%cd triton-inference-server
!git checkout r$TRITON_VERSION
%cd ..
%ls -l

/home/jupyter/caip-triton/v2/simple_setup/triton-inference-server
Branch r20.06 set up to track remote branch r20.06 from origin.
Switched to a new branch 'r20.06'
/home/jupyter/caip-triton/v2/simple_setup
total 88
-rw-r--r--  1 jupyter jupyter 13652 Oct 19 08:48 get_request_body_simple.py
drwxr-xr-x  2 jupyter jupyter  4096 Oct 19 08:48 [0m[01;34mimg[0m/
drwxr-xr-x  6 jupyter jupyter  4096 Oct 26 07:24 [01;34mmodel_repository[0m/
-rw-r--r--  1 jupyter jupyter  1605 Oct 22 18:27 README.md
drwxr-xr-x 10 jupyter jupyter  4096 Oct 26 07:24 [01;34mtriton-inference-server[0m/
-rw-r--r--  1 jupyter jupyter 34769 Oct 26 07:23 triton-simple-setup-rest.ipynb
-rw-r--r--  1 jupyter jupyter 20384 Oct 26 07:07 triton-simple-setup-sdk.ipynb


In [14]:
!./triton-inference-server/docs/examples/fetch_models.sh

+ mkdir -p model_repository/resnet50_netdef/1
+ wget -O model_repository/resnet50_netdef/1/model.netdef http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/predict_net.pb
--2020-10-26 07:25:15--  http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/predict_net.pb
Resolving download.caffe2.ai.s3.amazonaws.com (download.caffe2.ai.s3.amazonaws.com)... 52.216.147.11
Connecting to download.caffe2.ai.s3.amazonaws.com (download.caffe2.ai.s3.amazonaws.com)|52.216.147.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31649 (31K) [binary/octet-stream]
Saving to: ‘model_repository/resnet50_netdef/1/model.netdef’


2020-10-26 07:25:15 (1.05 MB/s) - ‘model_repository/resnet50_netdef/1/model.netdef’ saved [31649/31649]

+ wget -O model_repository/resnet50_netdef/1/init_model.netdef http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/init_net.pb
--2020-10-26 07:25:15--  http://download.caffe2.ai.s3.amazonaws.com/models/resnet50/init_net.pb
Resolving downl

In [15]:
!gsutil -m cp -R model_repository/ $MODEL_BUCKET

Copying file://model_repository/simple/config.pbtxt [Content-Type=application/octet-stream]...
Copying file://model_repository/simple/1/model.graphdef [Content-Type=application/octet-stream]...
Copying file://model_repository/inception_graphdef/inception_labels.txt [Content-Type=text/plain]...
Copying file://model_repository/inception_graphdef/1/model.graphdef [Content-Type=application/octet-stream]...
Copying file://model_repository/resnet50_netdef/1/init_model.netdef [Content-Type=application/octet-stream]...
Copying file://model_repository/resnet50_netdef/1/model.netdef [Content-Type=application/octet-stream]...
Copying file://model_repository/inception_graphdef/config.pbtxt [Content-Type=application/octet-stream]...
Copying file://model_repository/densenet_onnx/config.pbtxt [Content-Type=application/octet-stream]...
Copying file://model_repository/densenet_onnx/densenet_labels.txt [Content-Type=text/plain]...
Copying file://model_repository/densenet_onnx/1/model.onnx [Content-Type=

In [16]:
!gsutil ls $MODEL_BUCKET/model_repository

gs://tsaikevin-1238-80838/model_repository/densenet_onnx/
gs://tsaikevin-1238-80838/model_repository/inception_graphdef/
gs://tsaikevin-1238-80838/model_repository/resnet50_netdef/
gs://tsaikevin-1238-80838/model_repository/simple/
gs://tsaikevin-1238-80838/model_repository/simple_string/


### Prepare request payload

To prepare the payload format, we have included a utility get_request_body_simple.py.  To use this utility, install the following library:

In [17]:
!pip3 install geventhttpclient



#### Prepare non-binary request payload

The first model will illustrate a non-binary payload.  The following command will create a KF Serving v2 format non-binary payload to be used with the "simple" model:

In [18]:
!python3 get_request_body_simple.py -m simple

#### Prepare binary request payload

Triton's implementation of KF Serving v2 protocol for binary data appends the binary data after the json body.  Triton requires an additional header for offset:

`Inference-Header-Content-Length: [offset]`

We have provided a script that will automatically resize the image to the proper size for ResNet-50 [224, 224, 3] and calculate the proper offset.  The following command takes an image file and outputs the necessary data structure to be use with the "resnet50_netdef" model.  Please note down this offset as it will be used later.

In [19]:
!python3 get_request_body_simple.py -m image -f triton-inference-server/qa/images/mug.jpg

(3, 224, 224)
Add Header: Inference-Header-Content-Length: 138


## Create and deploy Model and Model Version

In this section, we will deploy two models:
1. Simple model with non-binary data.  KF Serving v2 protocol specifies a json format with non-binary data in the json body itself.
2. Binary data model with ResNet-50.  Triton's implementation of binary data for KF Server v2 protocol.

### Simple model (non-binary data)

#### Create Model

AI Platform Prediction uses a Model/Model Version Hierarchy, where the Model is a logical grouping of Model Versions.  We will first create the Model.

Because the MODEL_NAME variable will be used later to specify the predict route, and Triton will use that route to run prediction on a specific model, we must set the value of this variable to a valid name of a model.  For this section, will use the "simple" model.

In [56]:
%env MODEL_NAME=simple

env: MODEL_NAME=simple


In [57]:
!curl -X \
    POST -v -k -H "Content-Type: application/json" \
    -d "{'name': '"$MODEL_NAME"'}" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/"

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.253.114.95:443...
* Connected to us-central1-ml.googleapis.com (172.253.114.95) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /opt/conda/ssl/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=upload.video.google.com
*  start date: Oct  6 06:40:00 2020 GMT
*  expire date: Dec 29 06

#### Create Model Version

After the Model is created, we can now create a Model Version under this Model.  Each Model Version will need a name that is unique within the Model.  In AI Platform Prediction Custom Container, a {Project}/{Model}/{ModelVersion} uniquely identifies the specific container and model artifact used for inference.

In [58]:
%env VERSION_NAME=v01

env: VERSION_NAME=v01


The following specifications tell AI Platform how to create the Model Version.

In [59]:
import json
import os

triton_simple_version = {
  "name": os.getenv("VERSION_NAME"),
  "deployment_uri": os.getenv("MODEL_BUCKET")+"/model_repository",
  "container": {
    "image": os.getenv("CAIP_IMAGE"),
    "args": ["tritonserver",
             "--model-repository=$(AIP_STORAGE_URI)"
    ],
    "env": [
    ], 
    "ports": [
      { "containerPort": 8000 }
    ]
  },
  "routes": {
    "predict": "/v2/models/"+os.getenv("MODEL_NAME")+"/infer",
    "health": "/v2/models/"+os.getenv("MODEL_NAME")
  },
  "machine_type": "n1-standard-4",
  "acceleratorConfig": {
    "count":1,
    "type":"nvidia-tesla-t4"
  },
  "autoScaling": {
    "minNodes": 1
  }
}

with open("triton_simple_version.json", "w") as f: 
  json.dump(triton_simple_version, f)

In [60]:
!curl -X \
    POST -v -k -H "Content-Type: application/json" \
    -d @triton_simple_version.json \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions"

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 74.125.124.95:443...
* Connected to us-central1-ml.googleapis.com (74.125.124.95) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /opt/conda/ssl/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=upload.video.google.com
*  start date: Oct  6 06:40:00 2020 GMT
*  expire date: Dec 29 06:4

#### Check the status of Model Version creation

Creating a Model Version may take several minutes.  You can check on the status of this specfic Model Version with the following, and a successful deployment will show:

`"state": "READY"`

In [65]:
!curl -X GET -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}" 

{
  "name": "projects/tsaikevin-1238/models/simple/versions/v01",
  "deploymentUri": "gs://tsaikevin-1238-80838/model_repository",
  "createTime": "2020-10-26T08:06:33Z",
  "autoScaling": {
    "minNodes": 1
  },
  "state": "FAILED",
  "errorMessage": "Model server terminated: model server container terminated: exit_code: 1\nreason: \"Error\"\nstarted_at {\n  seconds: 1603701184\n}\nfinished_at {\n  seconds: 1603701195\n}\n",
  "etag": "epzDdKe/dRw=",
  "machineType": "n1-standard-4",
  "acceleratorConfig": {
    "count": "1",
    "type": "NVIDIA_TESLA_T4"
  },
  "container": {
    "image": "us-central1-docker.pkg.dev/tsaikevin-1238/caipcustom/tritonserver:20.06-py3",
    "args": [
      "tritonserver",
      "--model-repository=$(AIP_STORAGE_URI)"
    ],
    "ports": [
      {
        "containerPort": 8000
      }
    ]
  },
  "routes": {
    "predict": "/v2/models/simple/infer",
    "health": "/v2/models/simple"
  }
}


#### To list all Model Versions and their states in this Model:

In [38]:
!curl -X GET -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/" 

{
  "versions": [
    {
      "name": "projects/tsaikevin-1238/models/simple/versions/v01",
      "deploymentUri": "gs://tsaikevin-1238-80838/model_repository",
      "createTime": "2020-10-26T07:26:01Z",
      "autoScaling": {
        "minNodes": 1
      },
      "state": "CREATING",
      "etag": "gGWWjmXn/Os=",
      "machineType": "n1-standard-4",
      "acceleratorConfig": {
        "count": "1",
        "type": "NVIDIA_TESLA_T4"
      },
      "container": {
        "image": "us-central1-docker.pkg.dev/tsaikevin-1238/caipcustom/tritonserver:20.06-py3",
        "args": [
          "tritonserver",
          "--model-repository=$(AIP_STORAGE_URI)"
        ],
        "ports": [
          {
            "containerPort": 8000
          }
        ]
      },
      "routes": {
        "predict": "/v2/models/simple/infer",
        "health": "/v2/models/simple"
      }
    }
  ]
}


#### Run prediction using `curl`

The "simple" model takes two tensors with shape [1,16] and does a couple of basic arithmetic operation.

In [28]:
!curl -X POST ${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}:predict \
    -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    -d '{ \
            "id": "0", \
            "inputs": [ \
                { \
                    "name": "INPUT0", \
                    "shape": [1, 16], \
                    "datatype": "INT32", \
                    "parameters": {}, \
                    "data": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] \
                }, \
                { \
                    "name": "INPUT1", \
                    "shape": [1, 16], \
                    "datatype": "INT32", \
                    "parameters": {}, \
                    "data": [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1] \
                } \
            ] \
        }'

{
  "error": {
    "code": 404,
    "message": "Field: name Error: Online prediction is unavailable for this version. Please verify that CreateVersion has completed successfully.",
    "status": "NOT_FOUND",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "name",
            "description": "Online prediction is unavailable for this version. Please verify that CreateVersion has completed successfully."
          }
        ]
      }
    ]
  }
}


#### Run prediction using Using `requests` library

In [29]:
with open('simple.json', 'r') as s:
    data=s.read()
    
PREDICT_URL = "{}/projects/{}/models/{}/versions/{}:predict".format(ENDPOINT, PROJECT_ID, os.getenv('MODEL_NAME'), os.getenv('VERSION_NAME'))
HEADERS = {
  'Content-Type': 'application/octet-stream',
  'Authorization': 'Bearer {}'.format(os.popen('gcloud auth application-default print-access-token').read().rstrip())
}

response = requests.request("POST", PREDICT_URL, headers=HEADERS, data = data).content.decode()

json.loads(response)

{'error': {'code': 404,
  'message': 'Field: name Error: Online prediction is unavailable for this version. Please verify that CreateVersion has completed successfully.',
  'status': 'NOT_FOUND',
  'details': [{'@type': 'type.googleapis.com/google.rpc.BadRequest',
    'fieldViolations': [{'field': 'name',
      'description': 'Online prediction is unavailable for this version. Please verify that CreateVersion has completed successfully.'}]}]}}

### ResNet-50 model (binary data)

#### Create Model

In [30]:
%env BINARY_MODEL_NAME=resnet50_netdef

env: BINARY_MODEL_NAME=resnet50_netdef


In [31]:
!curl -X POST -v -k -H "Content-Type: application/json" \
  -d "{'name': '"$BINARY_MODEL_NAME"'}" \
  -H "Authorization: Bearer `gcloud auth print-access-token`" \
  "${ENDPOINT}/projects/${PROJECT_ID}/models/"

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 74.125.201.95:443...
* Connected to us-central1-ml.googleapis.com (74.125.201.95) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /opt/conda/ssl/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=upload.video.google.com
*  start date: Oct  6 06:40:00 2020 GMT
*  expire date: Dec 29 06:4

#### Create Model Version

In [32]:
%env BINARY_VERSION_NAME=v1

env: BINARY_VERSION_NAME=v1


In [33]:
triton_binary_version = {
  "name": os.getenv("BINARY_VERSION_NAME"),
  "deployment_uri": os.getenv("MODEL_BUCKET")+"/model_repository",
  "container": {
    "image": os.getenv("CAIP_IMAGE"),
    "args": ["tritonserver",
             "--model-repository=$(AIP_STORAGE_URI)"
    ],
    "env": [
    ], 
    "ports": [
      { "containerPort": 8000 }
    ]
  },
  "routes": {
    "predict": "/v2/models/"+os.getenv("BINARY_MODEL_NAME")+"/infer",
    "health": "/v2/models/"+os.getenv("BINARY_MODEL_NAME")
  },
  "machine_type": "n1-standard-4",
  "acceleratorConfig": {
    "count":1,
    "type":"nvidia-tesla-t4"
  },
  "autoScaling": {
    "minNodes": 1
  }
}

with open("triton_binary_version.json", "w") as f: 
  json.dump(triton_binary_version, f)

In [34]:
!curl --request POST -v -k -H "Content-Type: application/json" \
  -d @triton_binary_version.json \
  -H "Authorization: Bearer `gcloud auth print-access-token`" \
  ${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 172.217.214.95:443...
* Connected to us-central1-ml.googleapis.com (172.217.214.95) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /opt/conda/ssl/cacert.pem
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Mountain View; O=Google LLC; CN=upload.video.google.com
*  start date: Oct  6 06:40:00 2020 GMT
*  expire date: Dec 29 06

#### Check Model Version status

In [41]:
!curl --request GET -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}" 

{
  "name": "projects/tsaikevin-1238/models/resnet50_netdef/versions/v1",
  "deploymentUri": "gs://tsaikevin-1238-80838/model_repository",
  "createTime": "2020-10-26T07:26:31Z",
  "autoScaling": {
    "minNodes": 1
  },
  "state": "FAILED",
  "errorMessage": "Model server terminated: model server container terminated: exit_code: 1\nreason: \"Error\"\nstarted_at {\n  seconds: 1603698840\n}\nfinished_at {\n  seconds: 1603698849\n}\n",
  "etag": "sq2JF8yPNkU=",
  "machineType": "n1-standard-4",
  "acceleratorConfig": {
    "count": "1",
    "type": "NVIDIA_TESLA_T4"
  },
  "container": {
    "image": "us-central1-docker.pkg.dev/tsaikevin-1238/caipcustom/tritonserver:20.06-py3",
    "args": [
      "tritonserver",
      "--model-repository=$(AIP_STORAGE_URI)"
    ],
    "ports": [
      {
        "containerPort": 8000
      }
    ]
  },
  "routes": {
    "predict": "/v2/models/resnet50_netdef/infer",
    "health": "/v2/models/resnet50_netdef"
  }
}


#### Run prediction using `curl`

Recall the offset value calcuated above.  The binary case has an additional header:

`Inference-Header-Content-Length: [offset]`

In [None]:
!curl --request POST ${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}:predict \
    -k -H "Content-Type: application/octet-stream" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    -H "Inference-Header-Content-Length: 138" \
    --data-binary @payload.dat

#### Run prediction using Using `requests` library

In [None]:
with open('payload.dat', 'rb') as s:
    data=s.read()

PREDICT_URL = "{}/projects/{}/models/{}/versions/{}:predict".format(ENDPOINT, PROJECT_ID, os.getenv('BINARY_MODEL_NAME'), os.getenv('BINARY_VERSION_NAME'))
HEADERS = {
  'Content-Type': 'application/octet-stream',
  'Inference-Header-Content-Length': '138',
  'Authorization': 'Bearer {}'.format(os.popen('gcloud auth application-default print-access-token').read().rstrip())
}

response = requests.request("POST", PREDICT_URL, headers=HEADERS, data = data).content.decode()

json.loads(response)

## Clean up

In [52]:
!curl --request DELETE -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}/versions/${BINARY_VERSION_NAME}" 

{
  "error": {
    "code": 404,
    "message": "Field: name Error: The specified model version was not found.",
    "status": "NOT_FOUND",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "name",
            "description": "The specified model version was not found."
          }
        ]
      }
    ]
  }
}


In [53]:
!curl --request DELETE -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${BINARY_MODEL_NAME}" 

{
  "name": "projects/tsaikevin-1238/operations/delete_model_resnet50_netdef-1603699560",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.ml.v1.OperationMetadata",
    "createTime": "2020-10-26T08:06:00Z",
    "operationType": "DELETE_MODEL",
    "modelName": "projects/tsaikevin-1238/models/resnet50_netdef"
  }
}


In [54]:
!curl --request DELETE -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}/versions/${VERSION_NAME}" 

{
  "error": {
    "code": 404,
    "message": "Field: name Error: The specified model version was not found.",
    "status": "NOT_FOUND",
    "details": [
      {
        "@type": "type.googleapis.com/google.rpc.BadRequest",
        "fieldViolations": [
          {
            "field": "name",
            "description": "The specified model version was not found."
          }
        ]
      }
    ]
  }
}


In [55]:
!curl --request DELETE -k -H "Content-Type: application/json" \
    -H "Authorization: Bearer `gcloud auth print-access-token`" \
    "${ENDPOINT}/projects/${PROJECT_ID}/models/${MODEL_NAME}" 

{
  "name": "projects/tsaikevin-1238/operations/delete_model_simple-1603699566",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.ml.v1.OperationMetadata",
    "createTime": "2020-10-26T08:06:06Z",
    "operationType": "DELETE_MODEL",
    "modelName": "projects/tsaikevin-1238/models/simple"
  }
}


In [13]:
!gsutil -m rm -r -f $MODEL_BUCKET

Removing gs://tsaikevin-1238-80838/...


In [14]:
!rm -rf model_repository triton-inference-server *.dat *.json