# Predict on a InferenceService using BentoML


The notebook shows how to use BentoML to deploy InferenceService with a custom model.


[BentoML](https://bentoml.org) is an open-source platform for high-performance ML model serving, which supports all major machine learning frameworks including Keras, Tensorflow, PyTorch, Fast.ai, XGBoost and etc.


### Setup

* Your ~/.kube/config should point to a cluster with KFServing installed.
* Your cluster's Istio Ingress gateway must be network accessible.
* docker and docker hub must be properly configured

In [None]:
!pip install bentoml
!pip install scikit-learn

## Train and Save model

In [None]:
from sklearn import svm
from sklearn import datasets


# Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)

**Define ML service with BentoML**

These code defines a prediction service that requires a scikit-learn model, and asks BentoML to figure out the required PyPI pip packages automatically. It also defined an API, which is the entry point for accessing this prediction service. And the API is expecting a pandas.DataFrame object as its input data.

In [None]:
%%writefile iris_classifier.py

from bentoml import env, artifacts, api, BentoService
from bentoml.handlers import DataframeHandler
from bentoml.artifact import SklearnModelArtifact


@env(auto_pip_dependencies=True)
@artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):

    @api(DataframeHandler)
    def predict(self, df):
        return self.artifacts.model.predict(df)

**Save the trained model to local disk with BentoML**

In [None]:
from iris_classifier import IrisClassifier

# Create a iris classifier service instance
iris_classifier_service = IrisClassifier()

# Pack the newly trained model artifact
iris_classifier_service.pack('model', clf)

# Save the prediction service to disk for model serving
saved_path = iris_classifier_service.save()

**Use bentoml CLI to test prediction with sample data**

In [None]:
!bentoml run IrisClassifier:latest predict --input '[[5.1, 3.5, 1.4, 0.2]]'

## Deploy a custom InferenceService with BentoML using the command line


This example includes additional files for KFServing V1 prediction protocol.

*Better support for KFserving and its V2 prediction protocol is coming with BentoML.*

In [None]:
%%writefile {saved_path}/app.py

import os
from flask import Flask, request

from bentoml import load


app = Flask(__name__)


bento_service = load('.')
api = bento_service.get_service_api('predict')


@app.route('/v1/models/iris-classifier:predict')
def predict():
    return api.handle_request(request)


if __name__ == "__main__":
    app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

In [None]:
%%writefile {saved_path}/Dockerfile-kfserving

FROM continuumio/miniconda3:4.7.12

ENTRYPOINT [ "/bin/bash", "-c" ]

EXPOSE 8000

RUN set -x \\
     && apt-get update \\
     && apt-get install --no-install-recommends --no-install-suggests -y libpq-dev build-essential \\
     && rm -rf /var/lib/apt/lists/*

# pre-install BentoML base dependencies
RUN conda install pip numpy scipy \\
      && pip install gunicorn

# copy over model files
COPY . /bento
WORKDIR /bento

# run user defined setup script
RUN if [ -f /bento/setup.sh ]; then /bin/bash -c /bento/setup.sh; fi

# update conda base env
RUN conda env update -n base -f /bento/environment.yml
ARG PIP_TRUSTED_HOST
ARG PIP_INDEX_URL
RUN pip install -r /bento/requirements.txt

# Install additional pip dependencies inside bundled_pip_dependencies dir
RUN if [ -f /bento/bentoml_init.sh ]; then /bin/bash -c /bento/bentoml_init.sh; fi

# Run Gunicorn server with path to module.
CMD ["exec gunicorn --bind :$PORT --workers 1 --threads 8 app:app"]

In [None]:
%%bash
# Ensure docker_username has correct value
docker_username=DOCKER_USERNAME
model_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

docker build -t $docker_username/kfserving-iris-classifier $model_path/Dockerfile-kfserving

docker push $docker_username/kfserving-iris-classifier

*Update the docker image tag inside InferenceServer yaml definition and apply to the cluster*

In [None]:
%%bash
# Ensure docker_username has correct value
docker_username=DOCKER_USERNAME 
sed 's/{docker_username}/'"$docker_username"'/g' custom.yaml
kubectl apply -f custom.yaml

## Run prediction

*Note: Use kfserving-ingressgateway as your INGRESS_GATEWAY if you are deploying KFServing as part of Kubeflow install, and not independently.*

In [None]:
%%bash

MODEL_NAME=iris-classifier
INGRESS_GATEWAY=istio-ingressgateway
CLUSTER_IP=$(kubectl -n istio-system get service $INGRESS_GATEWAY -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
SERVICE_HOSTNAME=$(kubectl get inferenceservice ${MODEL_NAME} -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \
  http://$CLUSTER_IP/v1/models/${MODEL_NAME}:predict

## Delete deployment

In [None]:
!kubectl delete -f custom.yaml