# OpenVINO Model Server in OpenShift demo

This notebook demonstrate how to deploy and use OpenVINO Model Server.
That will include the use case with BERT model and a pipeline performing face detection operation and also age, gender and emotion recognition for each detected face.

Requirements:
- OpenShift cluster with the API access to a project
- installed OpenVINO Model Server Operator
- Jupyter session with python3 deployed in the cluster

## Creating Minio storage

OpenVINO Model Server can expose over gRPC and REST interface the models stored in the local or cloud storage like AWS S3, google storage or Azure blobs. In OpenShift and Kubernetes every Persistent Storage Claim could be used as well. In this demo will be employed Minio service which is an equivalent of AWS S3.

First login to OpenShift cluster API using `oc` tool. In the commands below change the cluster DNS name and the user token.

In [1]:
!curl -s https://downloads-openshift-console.apps.<cluster DNS name>/amd64/linux/oc.tar | tar x

In [2]:
!oc login --token=<user token> --server=https://api.<cluster DNS name>:6443

Logged into "https://api.openvino5.3q12.p1.openshiftapps.com:6443" as "dtrawins" using the token provided.

You have access to 100 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "default".


Change the project context where you would like to deploy your services.

In [3]:
!oc project ovms

Now using project "ovms" on server "https://api.openvino5.3q12.p1.openshiftapps.com:6443".


Now deploy Minio service. Note that the configuration below creates Minio server with emphemeral storage which will be deleted each time the pod is restarted. It includes also the default credentials. All in all, it is only a demonstrative purpose.

In [4]:
!oc apply -f https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/ovms-demo/notebooks/202-model-server/minio.yaml

deployment.apps/minio created
service/minio-service created


Next step is to connect to the Minio service and create models repository for the OpenVINO Model Server

In [5]:
!wget https://dl.min.io/client/mc/release/linux-amd64/mc

--2021-04-21 15:01:26--  https://dl.min.io/client/mc/release/linux-amd64/mc
Resolving dl.min.io (dl.min.io)... 178.128.69.202
Connecting to dl.min.io (dl.min.io)|178.128.69.202|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20426752 (19M) [application/octet-stream]
Saving to: ‘mc’


2021-04-21 15:01:28 (15.0 MB/s) - ‘mc’ saved [20426752/20426752]



In [6]:
!chmod 755 mc

In the command below make sure you have the correct project name. Replace `ovms` with your project name, where minio got deployed.

In [10]:
!./mc alias set minio http://minio-service.ovms:9000 minio minio123

[m[32mAdded `minio` successfully.[0m
[0m

In [13]:
!./mc mb minio/models

[m[32;1mBucket created successfully `minio/models`.[0m
[0m

## Creating models repository

While the Minio is available, we can upload the models for serving in the OpenVINO Model Server. In the demos below will be needed 4 models:
- [resnet](https://github.com/onnx/models/tree/master/vision/classification/resnet)
- [face detection](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/face-detection-retail-0004/description/face-detection-retail-0004.md)
- [age-gender recognition](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/age-gender-recognition-retail-0013/description/age-gender-recognition-retail-0013.md)
- [emotion recognition](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/intel/emotions-recognition-retail-0003/description/emotions-recognition-retail-0003.md)

In [16]:
!curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.3/models_bin/2/age-gender-recognition-retail-0013/FP32/age-gender-recognition-retail-0013.xml -o age-gender/1/age-gender-recognition-retail-0013.xml 
!curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.3/models_bin/2/age-gender-recognition-retail-0013/FP32/age-gender-recognition-retail-0013.bin -o age-gender/1/age-gender-recognition-retail-0013.bin
!curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.3/models_bin/2/face-detection-retail-0004/FP32/face-detection-retail-0004.xml -o face-detection/1/face-detection-retail-0004.xml
!curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.3/models_bin/2/face-detection-retail-0004/FP32/face-detection-retail-0004.bin -o face-detection/1/face-detection-retail-0004.bin
!curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.3/models_bin/2/emotions-recognition-retail-0003/FP32/emotions-recognition-retail-0003.xml -o emotions/1/emotions-recognition-retail-0003.xml
!curl --create-dirs https://storage.openvinotoolkit.org/repositories/open_model_zoo/2021.3/models_bin/2/emotions-recognition-retail-0003/FP32/emotions-recognition-retail-0003.bin -o emotions/1/emotions-recognition-retail-0003.xml
!curl -L --create-dir https://github.com/onnx/models/raw/master/vision/classification/resnet/model/resnet50-caffe2-v1-9.onnx -o resnet/1/resnet50-caffe2-v1-9.onnx

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 30901  100 30901    0     0  48662      0 --:--:-- --:--:-- --:--:-- 48586
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 8351k  100 8351k    0     0  8539k      0 --:--:-- --:--:-- --:--:-- 8530k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  101k  100  101k    0     0   188k      0 --:--:-- --:--:-- --:--:--  188k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2297k  100 2297k    0     0  2429k      0 --:--:-- --:--:-- --:--:-- 2426k
  % Total    % Received % Xferd  Average Speed   Tim

In [20]:
!./mc cp --recursive age-gender minio/models/
!./mc cp --recursive face-detection minio/models/
!./mc cp --recursive emotion minio/models/
!./mc cp --recursive resnet minio/models/

...-0003.xml:  18.98 MiB / 18.98 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 223.99 MiB/s 0s[0m[0m[m[32;1m[m[32;1m[m[32;1m[m[32;1m[m[32;1m

In [34]:
!./mc ls -r minio/models/

[m[32m[2021-04-21 15:58:35 UTC][0m[33m 8.2MiB[0m[1m age-gender/1/age-gender-recognition-retail-0013.bin[0m
[0m[m[32m[2021-04-21 15:58:35 UTC][0m[33m  30KiB[0m[1m age-gender/1/age-gender-recognition-retail-0013.xml[0m
[0m[m[32m[2021-04-21 15:58:36 UTC][0m[33m 9.5MiB[0m[1m emotions/1/emotions-recognition-retail-0003.xml[0m
[0m[m[32m[2021-04-21 15:58:36 UTC][0m[33m 2.2MiB[0m[1m face-detection/1/face-detection-retail-0004.bin[0m
[0m[m[32m[2021-04-21 15:58:36 UTC][0m[33m 102KiB[0m[1m face-detection/1/face-detection-retail-0004.xml[0m
[0m[m[32m[2021-04-21 20:37:26 UTC][0m[33m 101KiB[0m[1m resnet/1/resnet50-caffe2-v1-9.onnx[0m
[0m

With the model repository created, we can move on the deploying OpenVINO Model Server in the cluster.

## OpenVINO Model Server deployment with a single model

The first scenario will be with a serving a single model. In the demo, there will be performed image classification using ResNet50 model in ONNX format.

While the operator in place, starting the inference service is easy:

In [76]:
!curl -s https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/ovms-demo/notebooks/202-model-server/ovms-resnet.yaml

apiVersion: intel.com/v1alpha1
kind: Ovms
metadata:
  name: ovms-resnet
spec:
  aws_access_key_id: "minio"
  aws_region: "us-east-1"
  aws_secret_access_key: "minio123"
  grpc_port: 8080
  image_name: registry.connect.redhat.com/intel/openvino-model-server:latest
  log_level: INFO
  model_name: "resnet"
  model_path: "s3://minio-service:9000/models/resnet"
  plugin_config: '{\"CPU_THROUGHPUT_STREAMS\":\"1\"}'
  replicas: 1
  resources:
    limits:
      cpu: 4
      memory: 500Mi
  rest_port: 8081
  service_type: ClusterIP


In [77]:
!oc apply -f https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/ovms-demo/notebooks/202-model-server/ovms-resnet.yaml

ovms.intel.com/ovms-resnet configured


In [80]:
!oc get pod
!oc get service

NAME                           READY     STATUS    RESTARTS   AGE
minio-5c57f888dd-9q7k8         1/1       Running   0          7h20m
ovms-resnet-7cdb696f7b-jb6lf   1/1       Running   0          56s
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
minio-service   ClusterIP   172.30.116.144   <none>        9000/TCP            7h20m
ovms-resnet     ClusterIP   172.30.95.150    <none>        8080/TCP,8081/TCP   33m


With those steps, OpenVINO Model Server is running and is ready to accept inference requests. The status of models can be queries with a simple REST API calls:

In [87]:
!curl http://ovms-resnet.ovms.svc:8081/v1/models/resnet

{
 "model_version_status": [
  {
   "version": "1",
   "state": "AVAILABLE",
   "status": {
    "error_code": "OK",
    "error_message": "OK"
   }
  }
 ]
}


In [88]:
!curl http://ovms-resnet.ovms.svc:8081/v1/models/resnet/metadata

{
 "modelSpec": {
  "name": "resnet",
  "signatureName": "",
  "version": "1"
 },
 "metadata": {
  "signature_def": {
   "@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap",
   "signatureDef": {
    "serving_default": {
     "inputs": {
      "gpu_0/data_0": {
       "dtype": "DT_FLOAT",
       "tensorShape": {
        "dim": [
         {
          "size": "1",
          "name": ""
         },
         {
          "size": "3",
          "name": ""
         },
         {
          "size": "224",
          "name": ""
         },
         {
          "size": "224",
          "name": ""
         }
        ],
        "unknownRank": false
       },
       "name": "gpu_0/data_0"
      }
     },
     "outputs": {
      "gpu_0/softmax_1": {
       "dtype": "DT_FLOAT",
       "tensorShape": {
        "dim": [
         {
          "size": "1",
          "name": ""
         },
         {
          "size": "1000",
          "name": ""
         }
        ],
        "unknownRank": false

## Running predition requests

In [91]:
!git clone --depth=1 https://github.com/openvinotoolkit/model_server

Cloning into 'model_server'...
remote: Enumerating objects: 571, done.[K
remote: Counting objects: 100% (571/571), done.[K
remote: Compressing objects: 100% (492/492), done.[K
remote: Total 571 (delta 147), reused 262 (delta 57), pack-reused 0[K
Receiving objects: 100% (571/571), 3.75 MiB | 38.45 MiB/s, done.
Resolving deltas: 100% (147/147), done.
