Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add torchserve custom server #1156

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/samples/custom/torchserve/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Predict on a InferenceService using a Custom Torchserve Image

## Setup

1. Your ~/.kube/config should point to a cluster with [KFServing installed](https://github.com/kubeflow/kfserving/#install-kfserving).
2. Your cluster's Istio Ingress gateway must be [network accessible](https://istio.io/latest/docs/tasks/traffic-management/ingress/ingress-control/).
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved

## Build and push the sample Docker Image

The custom torchserve image is wrapped with model inside the container and serves it with KFServing.

In this example we use Docker to build the torchserve image with marfile and config.properties into a container. To build and push with Docker Hub, run these commands replacing {username} with your Docker Hub username:

```
# Build the container on your local machine
docker build -t {username}/torchserve-custom .
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved

# Push the container to docker registry
docker push {username}/torchserve-custom
```

## Create the InferenceService

In the `torchserve-custom.yaml` file edit the container image and replace {username} with your Docker Hub username.

Apply the CRD

```
kubectl apply -f torchserve-custom.yaml
```

Expected Output

```
$ inferenceservice.serving.kubeflow.org/torchserve-custom created
```

## Run a prediction
The first step is to [determine the ingress IP and ports](../../../../README.md#determine-the-ingress-ip-and-ports) and set `INGRESS_HOST` and `INGRESS_PORT`

```
MODEL_NAME=torchserve-custom
SERVICE_HOSTNAME=$(kubectl get route ${MODEL_NAME}-predictor-default -n <namespace> -o jsonpath='{.status.url}' | cut -d "/" -f 3)

curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/predictions/mnist
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
```

Expected Output

```
* Trying 52.89.19.61...
* Connected to a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com (52.89.19.61) port 80 (#0)
> PUT /predictions/mnist HTTP/1.1
> Host: torchserve-custom.kfserving-test.example.com
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 272
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< cache-control: no-cache; no-store, must-revalidate, private
< content-length: 1
< date: Fri, 23 Oct 2020 13:01:09 GMT
< expires: Thu, 01 Jan 1970 00:00:00 UTC
< pragma: no-cache
< x-request-id: 8881f2b9-462e-4e2d-972f-90b4eb083e53
< x-envoy-upstream-service-time: 5018
< server: istio-envoy
<
* Connection #0 to host a881f5a8c676a41edbccdb0a394a80d6-2069247558.us-west-2.elb.amazonaws.com left intact
0
```
13 changes: 13 additions & 0 deletions docs/samples/custom/torchserve/autoscale.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
name: torchserve-custom-autoscaling
annotations:
autoscaling.knative.dev/target: "5"
spec:
predictor:
containers:
- image: {username}/torchserve:latest
name: torchserve-container
ports:
- containerPort: 8080
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 12 additions & 0 deletions docs/samples/custom/torchserve/canary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
name: torchserve-custom
spec:
predictor:
canaryTrafficPercent: 10
containers:
- image: {username}/torchserve:latest
name: torchserve-container
ports:
- containerPort: 8080
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
15 changes: 15 additions & 0 deletions docs/samples/custom/torchserve/gpu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
name: torchserve-custom-gpu
spec:
predictor:
containers:
- image: {username}/torchserve:latest-gpu
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
name: torchserve-container
ports:
- containerPort: 8080
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
resources:
limits:
nvidia.com/gpu: 1

jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 12 additions & 0 deletions docs/samples/custom/torchserve/torchserve-custom.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: "serving.kubeflow.org/v1beta1"
kind: "InferenceService"
metadata:
name: torchserve-custom
spec:
predictor:
containers:
- image: {username}/torchserve:latest
name: torchserve-container
ports:
- containerPort: 8080
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved

88 changes: 88 additions & 0 deletions docs/samples/custom/torchserve/torchserve-image/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# syntax = docker/dockerfile:experimental
#
# This file can build images for cpu and gpu env. By default it builds image for CPU.
# Use following option to build image for cuda/GPU: --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
# Here is complete command for GPU/cuda -
# $ DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 -t torchserve:latest .
#
# Following comments have been shamelessly copied from https://github.com/pytorch/pytorch/blob/master/Dockerfile
#
# NOTE: To build this you will need a docker version > 18.06 with
# experimental enabled and DOCKER_BUILDKIT=1
#
# If you do not use buildkit you are not going to have a good time
#
# For reference:
# https://docs.docker.com/develop/develop-images/build_enhancements/


ARG BASE_IMAGE=ubuntu:18.04

FROM ${BASE_IMAGE} AS compile-image

ENV PYTHONUNBUFFERED TRUE

RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
ca-certificates \
g++ \
python3-dev \
python3-distutils \
python3-venv \
openjdk-11-jre-headless \
curl \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp \
&& curl -O https://bootstrap.pypa.io/get-pip.py \
&& python3 get-pip.py

RUN python3 -m venv /home/venv

ENV PATH="/home/venv/bin:$PATH"

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1

# This is only useful for cuda env
RUN export USE_CUDA=1

RUN pip install --no-cache-dir torch torchvision torchtext torchserve torch-model-archiver transformers

# Final image for production
FROM ${BASE_IMAGE} AS runtime-image

ENV PYTHONUNBUFFERED TRUE

RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
python3 \
openjdk-11-jre-headless \
&& rm -rf /var/lib/apt/lists/* \
&& cd /tmp

COPY --from=compile-image /home/venv /home/venv

ENV PATH="/home/venv/bin:$PATH"

RUN useradd -m model-server \
&& mkdir -p /home/model-server/tmp

COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh

RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh \
&& chown -R model-server /home/model-server

COPY config.properties /home/model-server/config.properties
RUN mkdir /home/model-server/model-store
COPY model-store/* /home/model-server/model-store/
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
RUN chown -R model-server /home/model-server/model-store

EXPOSE 8080 8081 8082

USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/dockerd-entrypoint.sh"]
CMD ["serve"]
jagadeeshi2i marked this conversation as resolved.
Show resolved Hide resolved
11 changes: 11 additions & 0 deletions docs/samples/custom/torchserve/torchserve-image/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Docker image building

**Steps:**
1. Copy marfiles to model-store folder
2. Edit config.properties for requirement
3. Run docker build
For CPU:
`DOCKER_BUILDKIT=1 docker build --file Dockerfile -t torchserve:latest .`
For GPU:
`DOCKER_BUILDKIT=1 docker build --file Dockerfile --build-arg BASE_IMAGE=nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 -t torchserve:latest .`

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
number_of_netty_threads=4
job_queue_size=10
model_store=/home/model-server/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"mnist":{"1.0":{"defaultVersion":true,"marName":"mnist.mar","minWorkers":3,"maxWorkers":5,"batchSize":4,"maxBatchDelay":5000,"responseTimeout":120}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#!/bin/bash
set -e

if [[ "$1" = "serve" ]]; then
shift 1
torchserve --start --ts-config /home/model-server/config.properties
else
eval "$@"
fi

# prevent docker exit
tail -f /dev/null
Empty file.