This notebook shows how to use the `check_conversion` function to verify successful conversion to ONNX. This function wraps the `check_model` and `create_input` modules found in the source code of OLive (ONNX Go Live) service. https://github.com/microsoft/OLive

We will demonstrate step-by-step how to use the model trained in the [Sklearn tutorial notebook](https://github.com/factoryofthesun/IVaps/blob/master/examples/Sklearn_Iris_Conversion_Simulation_and_Estimation.ipynb) to (1) generate test data, (2) save the data as .pb files in the proper directory structure, and (3) confirm successful conversion.

### Preparing test data files

#### Expected folder structure

First you'll need to prepare your model and test data files. Supported model frameworks are - cntk, coreml, keras, scikit-learn, tensorflow and pytorch. 

Test data are used for a quick verification of correctness for the converted model. This is strongly encouraged. However if no input files are provided, the `check_conversion` function will randomly generate dummy inputs for you if possible.

You can put your test data in one of - 

  1) Your input folder with your model from another framework.
  
  2) Your output folder created in advance to hold your converted ONNX model.
  
  3) Any other location. Need to specify the path with the `test_input_path` parameter in `check_conversion`.
  
The best practice to put your input model file(s) and test data(optional) is **2)**. By putting test_data_sets in the "output" folder instead of the "input" folder, this approach avoids copying files in the backend. The suggested folder structure is as below:

    - your_input_folder
       - model_file(s)
    - your_output_folder_to_hold_onnx_file
       - test_data_set_0
           - input_0.pb
           - ...
           - output_0.pb
           - ...
       - test_data_set_1
           - ...
       ...
       - (your .onnx file after running "convert_to_onnx")

#### Convert Test Data to ONNX pb 

ONNX .pb files are expected for test data. However, `IVaps` also provides a wrapper function `convert_data_to_pb` to convert pickle data to pb. Dump your input data to a single pickle file in the following dict format - 

    {
        "input_name_0": input_data_0,
        "input_name_1": input_data_1, 
        ...
    }
    
or if dumping output data - 

    {
        "output_name_0": output_data_0,
        "output_name_1": output_data_1, 
        ...
    }

Then use `convert_data_to_pb` to convert your pickle file to pb files.

Run convert_test_data.py to convert your pickle file. This script will read your pickle file and dump the data to a folder named "test_data_set_0" by default. Note that ONNX naming convention for test data folder is "test_data_*". Make sure to pass `--output_folder` with a folder name starting with `test_data_`. 

If `is_input=True`, data will be generated to `input_*.pb`s. Set `is_input` to false if you'd like to generate output pbs, in which data will be generated to `output_*.pb`s.

### Load Sklearn model and generate test data

In [1]:
import pickle 

model = pickle.load(open(f"models/iris_logreg.pickle", 'rb'))
model.get_params()



{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'auto',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'lbfgs',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

We will run inference on the raw iris data and use the outputs as the target test data.

In [2]:
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

In [3]:
test_out = model.predict_proba(X)
test_out[:5]

array([[0.97964346, 0.02035654],
       [0.97171913, 0.02828087],
       [0.9822502 , 0.0177498 ],
       [0.97178867, 0.02821133],
       [0.98207923, 0.01792077]])

Now we can save the input and output data as test files. Below is an example of saving the arrays directly to .pb files using the `numpy_helper` module from onnx. Please note that we need to set the name of our output tensor to match the output label of the ONNX model. This can easily be checked by starting up an InferenceSession with onnxruntime and checking `get_ouputs()`. The input labels can be arbitrary. 

In [4]:
from onnx import numpy_helper
import os

if not os.path.exists("models/test_data_set_0"):
    os.mkdir("models/test_data_set_0")
    
# Check output labels 
from onnxruntime import InferenceSession
sess = InferenceSession("models/iris_logreg.onnx")
output_labels = [sess.get_outputs()[i].name for i in range(len(sess.get_outputs()))]
print(output_labels) 

# Convert arrays to TensorProto, then serialize
inp_tensor = numpy_helper.from_array(X)
inp_tensor.name = "input_0"
out_tensor = numpy_helper.from_array(test_out)
out_tensor.name = "output_probability"

with open('models/test_data_set_0/input_0.pb', 'wb') as f:
    f.write(inp_tensor.SerializeToString())

with open('models/test_data_set_0/output_0.pb', 'wb') as f:
    f.write(out_tensor.SerializeToString())

['output_label', 'output_probability']


We can also first pickle them then convert to .pb files with the `convert_data_to_pb` function.

In [5]:
# Prepare inputs and outputs as dictionaries 
inputs = {"input_0":X}
outputs = {"output_probability":test_out}

# Pickle 
with open('data/iris_inputs.pickle', 'wb') as f:
    pickle.dump(inputs, f)

with open('data/iris_outputs.pickle', 'wb') as f:
    pickle.dump(outputs, f)

If `ouptut_folder` is not specified for pickle conversion, then the output files will be saved in `test_data_set_0` in the current working directory. In this example, we want to save the .pb files in the same directory that the ONNX model is, the `models` subdirectory. 

In [6]:
import os 
from IVaps import convert_data_to_pb

# Convert inputs
convert_data_to_pb("data/iris_inputs.pickle", output_folder = "models/test_data_set_0", is_input = True)
# Convert outputs
convert_data_to_pb("data/iris_outputs.pickle", output_folder = "models/test_data_set_0", is_input = False)

Successfully stored input input_0 in models/test_data_set_0/input_0.pb
Successfully stored input output_probability in models/test_data_set_0/output_0.pb


Now we can check our converted model using the `check_conversion` function. If we don't specify `test_input_path`, then the function will attempt to randomly generate inputs. We can also pass `log_path` if we want to save a JSON of the conversion testing log. 

In [1]:
from IVaps import check_conversion

check_conversion(model_path = "models/iris_logreg.pickle", onnx_model_path = "models/iris_logreg.onnx", framework="sklearn")

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


Test data .pb files found under /Users/rl874/Documents/Tobin/IVaps/examples/models/test_data_set_0. 
Test data .pb files already exist. Skipping dummy input generation. 

-------------
MODEL CORRECTNESS VERIFICATION


Check the ONNX model for validity 
The ONNX model is valid.

Check ONNX model for correctness. 
Running inference on original model with specified or random inputs. 
...
/Users/rl874/Documents/Tobin/IVaps/examples/models/test_data_set_0
Running inference on the converted model with the same inputs
...

Comparing the outputs from two models. 
The converted model achieves 5-decimal precision compared to the original model.
MODEL CONVERSION SUCCESS. 

-------------
MODEL CONVERSION SUMMARY

{'output_onnx_path': 'models/iris_logreg.onnx', 'correctness_verified': 'SUCCESS', 'input_folder': '/Users/rl874/Documents/Tobin/IVaps/examples/models/test_data_set_0', 'error_message': ''}


  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


True