kserve · yuzisun · Nov 10, 2020 · Oct 23, 2020 · Nov 1, 2020 · Nov 2, 2020
diff --git a/docs/samples/custom/torchserve/README.md b/docs/samples/custom/torchserve/README.md
@@ -0,0 +1,74 @@
+# Predict on a InferenceService using a Custom Torchserve Image
+
+## Setup
+
+1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
+2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
+
+## Build and push the sample Docker Image
+
+The custom torchserve image is wrapped with model inside the container and serves it with KFServing.
+
+In this example we use Docker to build the torchserve image with marfile and config.properties into a container. To build and push with Docker Hub, run these commands replacing {username} with your Docker Hub username:
+
+```
+# Build the container on your local machine
+docker build -t {username}/torchserve-custom .
+
+# Push the container to docker registry
+docker push {username}/torchserve-custom
+```
+
+## Create the InferenceService
+
+In the `torchserve-custom.yaml` file edit the container image and replace {username} with your Docker Hub username.
+
+Apply the CRD
+
+```
+kubectl apply -f torchserve-custom.yaml
+```
+
+Expected Output
+
+```
+$ inferenceservice.serving.kubeflow.org/torchserve-custom created
+```
+
+## Run a prediction
+The first step is to [determine the ingress IP and ports](../../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`
+
+```
+MODEL_NAME=torchserve-custom
+SERVICE_HOSTNAME=$(kubectl get route ${MODEL_NAME}-predictor-default -n <namespace> -o jsonpath='{.status.url}' | cut -d "/" -f 3)
+
+curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/predictions/mnist
+```
+
+Expected Output
+
+```
+*   Trying 52.89.19.61...
+* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (52.89.19.61) port 80 (#0)
+> PUT /predictions/mnist HTTP/1.1
+> Host: torchserve-custom.kfserving-test.example.com
+> User-Agent: curl/7.47.0
+> Accept: */*
+> Content-Length: 272
+> Expect: 100-continue
+> 
+< HTTP/1.1 100 Continue
+* We are completely uploaded and fine
+< HTTP/1.1 200 OK
+< cache-control: no-cache; no-store, must-revalidate, private
+< content-length: 1
+< date: Fri, 23 Oct 2020 13:01:09 GMT
+< expires: Thu, 01 Jan 1970 00:00:00 UTC
+< pragma: no-cache
+< x-request-id: 8881f2b9-462e-4e2d-972f-90b4eb083e53
+< x-envoy-upstream-service-time: 5018
+< server: istio-envoy
+< 
+* Connection #0 to host a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com left intact
+0
+```
diff --git a/docs/samples/custom/torchserve/autoscale.yaml b/docs/samples/custom/torchserve/autoscale.yaml
@@ -0,0 +1,13 @@
+apiVersion: "serving.kubeflow.org/v1beta1"
+kind: "InferenceService"
+metadata:
+  name: torchserve-custom-autoscaling
+  annotations:
+    autoscaling.knative.dev/target: "5"
+spec:
+  predictor:
+    containers:
+    - image: {username}/torchserve:latest
+      name: torchserve-container
+      ports:
+        - containerPort: 8080
diff --git a/docs/samples/custom/torchserve/canary.yaml b/docs/samples/custom/torchserve/canary.yaml
@@ -0,0 +1,12 @@
+apiVersion: "serving.kubeflow.org/v1beta1"
+kind: "InferenceService"
+metadata:
+  name: torchserve-custom
+spec:
+  predictor:
+    canaryTrafficPercent: 10
+    containers:
+    - image: {username}/torchserve:latest
+      name: torchserve-container
+      ports:
+        - containerPort: 8080
diff --git a/docs/samples/custom/torchserve/gpu.yaml b/docs/samples/custom/torchserve/gpu.yaml
@@ -0,0 +1,15 @@
+apiVersion: "serving.kubeflow.org/v1beta1"
+kind: "InferenceService"
+metadata:
+  name: torchserve-custom-gpu
+spec:
+  predictor:
+    containers:
+    - image: {username}/torchserve:latest-gpu
+      name: torchserve-container
+      ports:
+        - containerPort: 8080
+      resources:
+        limits:
+          nvidia.com/gpu: 1
+
diff --git a/docs/samples/custom/torchserve/torchserve-custom.yaml b/docs/samples/custom/torchserve/torchserve-custom.yaml
@@ -0,0 +1,12 @@
+apiVersion: "serving.kubeflow.org/v1beta1"
+kind: "InferenceService"
+metadata:
+  name: torchserve-custom
+spec:
+  predictor:
+    containers:
+    - image: {username}/torchserve:latest
+      name: torchserve-container
+      ports:
+        - containerPort: 8080
+
diff --git a/docs/samples/custom/torchserve/torchserve-image/Dockerfile b/docs/samples/custom/torchserve/torchserve-image/Dockerfile
@@ -0,0 +1,88 @@
+# syntax = docker/dockerfile:experimental
+#
+# This file can build images for cpu and gpu env. By default it builds image for CPU.
+# Use following option to build image for cuda/GPU: --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
+# Here is complete command for GPU/cuda - 
+# $ DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 -t torchserve:latest .
+#
+# Following comments have been shamelessly copied from https://github.com/pytorch/pytorch/blob/master/Dockerfile
+# 
+# NOTE: To build this you will need a docker version > 18.06 with
+#       experimental enabled and DOCKER_BUILDKIT=1
+#
+#       If you do not use buildkit you are not going to have a good time
+#
+#       For reference: 
+#           https://docs.docker.com/develop/develop-images/build_enhancements/
+
+
+ARG BASE_IMAGE=ubuntu:18.04
+
+FROM ${BASE_IMAGE} AS compile-image
+
+ENV PYTHONUNBUFFERED TRUE
+
+RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
+    ca-certificates \
+    g++ \
+    python3-dev \
+    python3-distutils \
+    python3-venv \
+    openjdk-11-jre-headless \
+    curl \
+    && rm -rf /var/lib/apt/lists/* \
+    && cd /tmp \
+    && curl -O https://bootstrap.pypa.io/get-pip.py \
+    && python3 get-pip.py
+
+RUN python3 -m venv /home/venv
+
+ENV PATH="/home/venv/bin:$PATH"
+
+RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
+RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1
+
+# This is only useful for cuda env 
+RUN export USE_CUDA=1
+
+RUN pip install --no-cache-dir torch torchvision torchtext torchserve torch-model-archiver transformers
+
+# Final image for production
+FROM ${BASE_IMAGE} AS runtime-image
+
+ENV PYTHONUNBUFFERED TRUE
+
+RUN --mount=type=cache,target=/var/cache/apt \
+    apt-get update && \
+    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
+    python3 \
+    openjdk-11-jre-headless \
+    && rm -rf /var/lib/apt/lists/* \
+    && cd /tmp
+
+COPY --from=compile-image /home/venv /home/venv
+
+ENV PATH="/home/venv/bin:$PATH"
+
+RUN useradd -m model-server \
+    && mkdir -p /home/model-server/tmp
+
+COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
+
+RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
+    && chown -R model-server /home/model-server
+
+COPY config.properties /home/model-server/config.properties 
+RUN mkdir /home/model-server/model-store 
+COPY model-store/* /home/model-server/model-store/
+RUN chown -R model-server /home/model-server/model-store
+
+EXPOSE 8080 8081 8082
+
+USER model-server
+WORKDIR /home/model-server
+ENV TEMP=/home/model-server/tmp
+ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
+CMD ["serve"]
diff --git a/docs/samples/custom/torchserve/torchserve-image/README.md b/docs/samples/custom/torchserve/torchserve-image/README.md
@@ -0,0 +1,11 @@
+## Docker image building
+
+**Steps:**
+ 1. Copy marfiles to model-store folder
+ 2. Edit config.properties for requirement
+ 3. Run docker build 
+For CPU:
+`DOCKER_BUILDKIT=1 docker build --file Dockerfile -t torchserve:latest .`
+For GPU:
+`DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 -t torchserve:latest .`
+
diff --git a/docs/samples/custom/torchserve/torchserve-image/config.properties b/docs/samples/custom/torchserve/torchserve-image/config.properties
@@ -0,0 +1,6 @@
+inference_address=http://0.0.0.0:8080
+management_address=http://0.0.0.0:8081
+number_of_netty_threads=4
+job_queue_size=10
+model_store=/home/model-server/model-store
+model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":3,"maxWorkers":5,"batchSize":4,"maxBatchDelay":5000,"responseTimeout":120}}}}
diff --git a/docs/samples/custom/torchserve/torchserve-image/dockerd-entrypoint.sh b/docs/samples/custom/torchserve/torchserve-image/dockerd-entrypoint.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+set -e
+
+if [[ "$1" = "serve" ]]; then
+    shift 1
+    torchserve --start --ts-config /home/model-server/config.properties
+else
+    eval "$@"
+fi
+
+# prevent docker exit
+tail -f /dev/null
diff --git a/docs/samples/custom/torchserve/torchserve-image/model-store/.gitignore b/docs/samples/custom/torchserve/torchserve-image/model-store/.gitignore