# 3. Model Translation and Deployment: Serve an ONNX model via GraphPipe

See the [GraphPipe User Guide](https://oracle.github.io/graphpipe/#/guide/user-guide/overview)

We will deploy an onnx model and send test queries via Python (using) requests

In [1]:
import json

import numpy as np
import onnx
from onnx_tf.backend import prepare

In [2]:
model_path = '../models/dnn_model_pt.onnx'
dnn_model_onnx = onnx.load(model_path)

Create a .json-file describing the network inputs as required by `graphpipe-onnx`

Unfortunately, there is insufficient documentation on how to set up the value_inputs.json, but we just follow the structure for the exemplary [Squeezenet input](https://oracle.github.io/graphpipe/models/squeezenet.value_inputs.json) assuming that the outer list annotates the no. of examples per request and the inner list describes the dimensions of the input:

`{"data_0": [1, [1, 3, 227, 227]]}`

In [3]:
input_name = dnn_model_onnx.graph.node[0].input[0]
graphpipe_value_inputs = {input_name: [1, [1, 28*28]]}
json.dump(graphpipe_value_inputs,
          open('../models/dnn_model_pt.value_inputs.json', 'w'))
print(json.dumps(graphpipe_value_inputs))

{"flattened_rescaled_img_28x28": [1, [1, 784]]}


## GraphPipe with ONNX

```
docker run -it --rm \
    -v "$PWD:/models/" \
    -p 9000:9000 \
    sleepsonthefloor/graphpipe-onnx:cpu \
    --value-inputs=../models/dnn_model_pt.value_inputs.json \
    --model=../models/dnn_model_pt.onnx \
    --listen=0.0.0.0:9000
```

**Unfortunately, sometimes this failed with**

```
INFO[0000] Setting MKL_NUM_THREADS=4.  You can override this variable in your environment. 
INFO[0000] Starting graphpipe-caffe2 version 1.0.0.4.0a1675f.dev (built from sha 0a1675f) 
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0908 12:45:33.273241     1 c2_api.cc:309] Binary compiled without cuda support.  Using cpu backend.
INFO[0000] Loading file %!(EXTRA string=../models/dnn_model_pt.value_inputs.json) 
INFO[0000] Loading file %!(EXTRA string=../models/dnn_model_pt.onnx) 
E0908 12:45:33.287909     1 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0908 12:45:33.288249     1 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0908 12:45:33.288272     1 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
  what():  [enforce fail at tensor.h:147] values.size() == size_. 784 vs 1229312 
*** Aborted at 1536410733 (unix time) try "date -d @1536410733" if you are using GNU date ***
PC: @     0x7ff4d3c2b428 gsignal
*** SIGABRT (@0x1) received by PID 1 (TID 0x7ff4d5c08b40) from PID 1; stack trace: ***
    @     0x7ff4d4569390 (unknown)
    @     0x7ff4d3c2b428 gsignal
    @     0x7ff4d3c2d02a abort
    @     0x7ff4d426584d __gnu_cxx::__verbose_terminate_handler()
    @     0x7ff4d42636b6 (unknown)
    @     0x7ff4d4263701 std::terminate()
    @     0x7ff4d4263919 __cxa_throw
    @           0x73e86e caffe2::Tensor<>::Tensor<>()
    @           0x737ada _initialize()
    @           0x738b9f c2_engine_initialize_onnx
    @           0x733a8f _cgo_e12a854003a1_Cfunc_c2_engine_initialize_onnx
    @           0x45f340 runtime.asmcgocall
```

**And sometimes it even doesn't find the files even though they are present**

```
INFO[0000] Setting MKL_NUM_THREADS=4.  You can override this variable in your environment. 
INFO[0000] Starting graphpipe-caffe2 version 1.0.0.4.0a1675f.dev (built from sha 0a1675f) 
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0908 17:39:21.048094     1 c2_api.cc:309] Binary compiled without cuda support.  Using cpu backend.
INFO[0000] Loading file %!(EXTRA string=../models/dnn_model_pt.value_inputs.json) 
FATA[0000] Could not load value_input: open ../models/dnn_model_pt.value_inputs.json: no such file or directory 
```

Trying to use online resources, we gave it a second try:

```
docker run -it --rm \
      -e https_proxy=${https_proxy} \
      -p 9000:9000 \
      sleepsonthefloor/graphpipe-onnx:cpu \
      --value-inputs=https://raw.githubusercontent.com/squall-1002/test_graphpipe/master/dnn_model_pt.value_inputs.json \
      --model=https://github.com/squall-1002/test_graphpipe/blob/master/dnn_model_pt.onnx \
      --listen=0.0.0.0:9000
```

```
INFO[0000] Setting MKL_NUM_THREADS=4.  You can override this variable in your environment. 
INFO[0000] Starting graphpipe-caffe2 version 1.0.0.4.0a1675f.dev (built from sha 0a1675f) 
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0908 17:39:21.048094     1 c2_api.cc:309] Binary compiled without cuda support.  Using cpu backend.
INFO[0000] Loading file %!(EXTRA string=../models/dnn_model_pt.value_inputs.json) 
FATA[0000] Could not load value_input: open ../models/dnn_model_pt.value_inputs.json: no such file or directory 
(base) Marcels-MBP:notebooks mkurovski$ docker run -it --rm \
>       -e https_proxy=${https_proxy} \
>       -p 9000:9000 \
>       sleepsonthefloor/graphpipe-onnx:cpu \
>       --value-inputs=https://raw.githubusercontent.com/squall-1002/test_graphpipe/master/dnn_model_pt.value_inputs.json \
>       --model=https://github.com/squall-1002/test_graphpipe/blob/master/dnn_model_pt.onnx \
>       --listen=0.0.0.0:9000
INFO[0000] Setting MKL_NUM_THREADS=4.  You can override this variable in your environment. 
INFO[0000] Starting graphpipe-caffe2 version 1.0.0.4.0a1675f.dev (built from sha 0a1675f) 
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0908 17:42:27.937944     1 c2_api.cc:309] Binary compiled without cuda support.  Using cpu backend.
INFO[0000] Loading file %!(EXTRA string=https://raw.githubusercontent.com/squall-1002/test_graphpipe/master/dnn_model_pt.value_inputs.json) 
INFO[0000] Loading file %!(EXTRA string=https://github.com/squall-1002/test_graphpipe/blob/master/dnn_model_pt.onnx) 
E0908 17:42:28.951375    14 init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0908 17:42:28.954814    14 init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
E0908 17:42:28.954874    14 init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
terminate called after throwing an instance of 'onnx_c2::checker::ValidationError'
  what():  The model does not have an ir_version set properly.
*** Aborted at 1536428548 (unix time) try "date -d @1536428548" if you are using GNU date ***
PC: @     0x7f5a50bc6428 gsignal
*** SIGABRT (@0x1) received by PID 1 (TID 0x7f5a2cb92700) from PID 1; stack trace: ***
    @     0x7f5a51504390 (unknown)
    @     0x7f5a50bc6428 gsignal
    @     0x7f5a50bc802a abort
    @     0x7f5a5120084d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f5a511fe6b6 (unknown)
    @     0x7f5a511fe701 std::terminate()
    @     0x7f5a511fe919 __cxa_throw
    @     0x7f5a5236945a onnx_c2::checker::check_model()
    @     0x7f5a51d90f7f caffe2::onnx::Caffe2Backend::Prepare()
    @           0x738b4a c2_engine_initialize_onnx
    @           0x733a8f _cgo_e12a854003a1_Cfunc_c2_engine_initialize_onnx
    @           0x45f340 runtime.asmcgocall
```

## GraphPipe with TensorFlow Serving

In [4]:
from onnx_tf.backend import prepare
dnn_model_tf = prepare(dnn_model_onnx, device='cpu')
dnn_model_tf.export_graph('../models/dnn_model_tf.pb')

```
docker run -it --rm \
      -v "$PWD:/models/" \
      -p 9000:9000 \
      sleepsonthefloor/graphpipe-tf:cpu \
      --model=../models/dnn_model_tf.pb \
      --listen=0.0.0.0:9000
```

```
INFO[0000] Starting graphpipe-tf version 1.0.0.10.f235920 (built from sha f235920) 
2018-09-08 18:31:12.419289: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-08 18:31:12.436736: I tensorflow/core/common_runtime/process_util.cc:63] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO[0000] Model hash is '59b5b3ca8f39507a996debacb6589c7c3604cc276ffb87f197377371deeb0836' 
INFO[0000] Using default inputs [flattened_rescaled_img_28x28:0] 
INFO[0000] Using default outputs [Softmax:0]            
INFO[0000] Listening on '0.0.0.0:9000' 
```

### Check the Prediction Accuracy by sending some test queries

In [5]:
from emnist_dl2prod.utils import load_emnist, get_emnist_mapping
from graphpipe import remote

In [6]:
x_train, y_train, x_test, y_test, _ = load_emnist('emnist_data/')
mapping = get_emnist_mapping()

[2018-09-08 20:33:02] INFO:emnist_dl2prod.utils:Loading train and test data from emnist_data/emnist-byclass.mat


In [7]:
n_test_instances = 10
n_test = x_test.shape[0]
for _ in range(n_test_instances):
    idx = np.random.randint(n_test)
    # flatten and normalize test image
    x = x_test[idx].reshape(1, -1)/255
    y = y_test[idx][0]
    softmax_pred = remote.execute("http://127.0.0.1:9000", x)
    pred_class = mapping[np.argmax(softmax_pred)]
    true_class = mapping[y_test[idx][0]]
    print("Predicted Label / True Label: {} == {} ? - {} !".format(
        pred_class, true_class, (pred_class==true_class)))

Predicted Label / True Label: O == O ? - True !
Predicted Label / True Label: 6 == 6 ? - True !
Predicted Label / True Label: 0 == O ? - False !
Predicted Label / True Label: 8 == 8 ? - True !
Predicted Label / True Label: 1 == l ? - False !
Predicted Label / True Label: t == t ? - True !
Predicted Label / True Label: 2 == 2 ? - True !
Predicted Label / True Label: 8 == 8 ? - True !
Predicted Label / True Label: 5 == 5 ? - True !
Predicted Label / True Label: 6 == S ? - False !


The backend tell us the following:

```
INFO[0113] Request for / took 212.855085ms              
INFO[0113] Request for / took 773.621µs                 
INFO[0113] Request for / took 859.584µs                 
INFO[0113] Request for / took 810.67µs                  
INFO[0113] Request for / took 639.304µs                 
INFO[0113] Request for / took 710.167µs                 
INFO[0113] Request for / took 813.688µs                 
INFO[0113] Request for / took 700.67µs                  
INFO[0113] Request for / took 747.003µs                 
INFO[0113] Request for / took 751.787µs  
```