![alt text](egm-logo.png "Title")

## Use BentoML with ONNX model zoo(Yolov4)

**BentoML makes moving trained ML models to production easy:**

* Package models trained with **any ML framework** and reproduce them for model serving in production
* **Deploy anywhere** for online API serving or offline batch serving
* High-Performance API model server with *adaptive micro-batching* support
* Central hub for managing models and deployment process via Web UI and APIs
* Modular and flexible design making it *adaptable to your infrastrcuture*

BentoML is a framework for serving, managing, and deploying machine learning models. It is aiming to bridge the gap between Data Science and DevOps, and enable teams to deliver prediction services in a fast, repeatable, and scalable way.

Before reading this example project, be sure to check out the [Getting started guide](https://github.com/bentoml/BentoML/blob/master/guides/quick-start/bentoml-quick-start-guide.ipynb) to learn about the basic concepts in BentoML.

This example notebook demonstrates how to use ONNX model zoo with BentoML.  It defines a BentoService with `Yolov4` model and deploys it to AWS sagemaker as an API endpoint.

original notebook: https://github.com/onnx/onnx-docker/blob/master/onnx-ecosystem/inference_demos/resnet50_modelzoo_onnxruntime_inference.ipynb


![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=onnx&ea=onnx-resnet50&dt=onnx-resnet50)

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import numpy as np    # we're going to use numpy to process input and output data
import onnxruntime    # to inference ONNX models, we use the ONNX Runtime
import onnx
from onnx import numpy_helper
import urllib.request
import json
import time
import cv2



# display images in notebook
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw, ImageFont

%matplotlib inline

### Load sample outputs and inputs

In [None]:
# Read class names from obj.names
def load_labels(path):
    classes = []
    with open(path, "r") as f:
        classes = [cname.strip() for cname in f.readlines()]
    return classes
labels = load_labels('coco.names')
labels

In [None]:
from onnx_yolov4 import OnnxYolov4
svc = OnnxYolov4()
svc.pack('labels', labels)
svc.pack('model', 'yolov4.onnx')
saved_path = svc.save()

## REST API Model Serving


To start a REST API model server with the BentoService saved above, use the bentoml serve command:

In [None]:
!bentoml serve OnnxYolov4:latest

If you are running this notebook from Google Colab, you can start the dev server with `--run-with-ngrok` option, to gain acccess to the API endpoint via a public endpoint managed by [ngrok](https://ngrok.com/):

Sending POST request from termnial:
```bash
curl -X POST "http://127.0.0.1:5000/predict" -F image=@dog.jpg
```

```bash
curl -X POST "http://127.0.0.1:5000/predict" -H "Content-Type: image/png" --data-binary @dog.jpg
```

Go visit http://127.0.0.1:5000/ from your browser, click `/predict` -> `Try it out` -> `Choose File` -> `Execute` to sumbit an image from your computer

## Containerize model server with Docker


One common way of distributing this model API server for production deployment, is via Docker containers. And BentoML provides a convenient way to do that.

Note that docker is **not available in Google Colab**. You will need to download and run this notebook locally to try out this containerization with docker feature.

If you already have docker configured, simply run the follow command to product a docker container serving the IrisClassifier prediction service created above:

In [None]:
!bentoml containerize OnnxYolov4:latest --debug

In [None]:
!docker run --rm -p 5000:5000 onnxyolov4:20210505100006_868839