Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add amdserver #179

Merged
merged 7 commits into from
Nov 12, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
189 changes: 189 additions & 0 deletions docs/modelserving/v1beta1/amd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# AMD Inference Server
yuzisun marked this conversation as resolved.
Show resolved Hide resolved

The [AMD Inference Server](https://xilinx.github.io/inference-server/main/index.html) is an easy-to-use inferencing solution specially designed for AMD CPUs, GPUs, and FPGAs.
It can be deployed as a standalone executable or on a Kubernetes cluster with KServe or used to create custom applications by linking to its C++ API.
This example demonstrates how to deploy a Tensorflow GraphDef model on KServe with the AMD Inference Server to run inference on AMD EPYC CPUs.
varunsh-xilinx marked this conversation as resolved.
Show resolved Hide resolved

## Prerequisites

These example was tested on an Ubuntu 18.04 host machine using the Bash shell.

These instructions assume:
- You have a machine with a modern version of Docker (>=18.09) and sufficient disk space to build the image
- You have a Kubernetes cluster set up
- KServe has been installed on the Kubernetes cluster
- Some familiarity with Kubernetes / KServe

Refer to the installation instructions for these tools to install them if needed.

## Set Up the Image

This example uses the [AMD ZenDNN](https://developer.amd.com/zendnn/) backend to run inference on TensorFlow models on AMD EPYC CPUs.
To build a Docker image for the AMD Inference Server that uses this backend, download the `TF_v2.9_ZenDNN_v3.3_C++_API.zip` package from ZenDNN.
You must agree to the EULA to download this package.
You need a modern version of Docker (at least 18.09) to build this image.

```bash
# clone the inference server repository
git clone https://github.com/Xilinx/inference-server.git

# place the downloaded ZenDNN zip in the repository
mv *.zip ./inference-server/

# build the image
cd inference-server
./proteus dockerize --production --tfzendnn=./TF_v2.9_ZenDNN_v3.3_C++_API.zip
```

This builds an image on your host: `<username>/proteus:latest`.
This image can now be uploaded to a [local Docker registry server](https://docs.docker.com/registry/deploying/) to use with KServe.
You will need to update the YAML files in this example to use this image.

More documentation for building a ZenDNN image for KServe is available: [ZenDNN + Inference Server](https://xilinx.github.io/inference-server/main/zendnn.html) and [KServe + Inference Server](https://xilinx.github.io/inference-server/main/kserve.html).

## Set up the model

In this example, you will use an [MNIST Tensorflow model](https://github.com/Xilinx/inference-server/blob/main/tests/assets/mnist.zip).

To use your own models, you first need a GraphDef model to serve.
One way to do this is to [freeze a checkpoint](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py).
Once you have the model, you also need a `config.pbtxt`, which defines some metadata about your model such as its name and input/output tensors.
The `config.pbtxt` for our example MNIST model is the following:

```
name: "mnist"
platform: "tensorflow_graphdef"
inputs [
{
name: "images_in"
datatype: "FP32"
shape: [28,28,1]
}
]
outputs [
{
name: "flatten/Reshape"
datatype: "FP32"
shape: [10]
}
]
```

The name, platform, inputs and outputs must all be defined.
Some notes about the acceptable values:
- the name must uniquely identify a model for the server
- the platform must be one of `tfzendnn_graphdef` or `vitis_xmodel`
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
- for each input/output, the name, datatype, and shape must be defined
- the name corresponds to the name of the input/output tensor(s). Multiple input/output tensors aren't currently supported
- the shape must be a flat array describing the shape of the input/output tensor

The final model directory structure looks like:

```
/
├─ model_a/
│ ├─ 1/
│ │ ├─ saved_model.pb
│ ├─ config.pbtxt
```

The names for the files (`saved_model.x` and `config.pbtxt`) must match as above.
The file extension for `tfzendnn_graphdef` and `vitis_xmodel` models should be `.pb` and `.xmodel`, respectively.
This model can now be zipped and stored on a web server or made available on a cloud storage platform compatible with KServe.

## Make an inference

The AMD Inference Server can be used in single or multi-model serving mode in KServe.
The code snippets below use the environment variables `INGRESS_HOST` and `INGRESS_PORT` to make requests to the cluster.
[Find the ingress host and port](https://kserve.github.io/website/master/get_started/first_isvc/#4-determine-the-ingress-ip-and-ports) for making requests to your cluster and set these values appropriately.

### Single model serving

```bash
# download the inference service file and input data
curl -O https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/amd/single_model.yaml
varunsh-xilinx marked this conversation as resolved.
Show resolved Hide resolved
curl -O https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/amd/input.json
varunsh-xilinx marked this conversation as resolved.
Show resolved Hide resolved

# update the single_model.yaml to use your custom image
sed -i "s/<image>/$(whoami)\/proteus:latest/" single_model.yaml

# create the inference service
kubectl apply -f single_model.yaml

# wait for service to be ready
kubectl wait --for=condition=ready isvc -l app=example-amdserver-single-isvc

export SERVICE_HOSTNAME=$(kubectl get inferenceservice example-amdserver-single-isvc -o jsonpath='{.status.url}' | cut -d "/" -f 3)
```

### Multi model serving

```bash
# download the inference service file and input data
curl -O https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/amd/multi_model.yaml
varunsh-xilinx marked this conversation as resolved.
Show resolved Hide resolved
curl -O https://raw.githubusercontent.com/kserve/kserve/master/docs/samples/v1beta1/amd/input.json
yuzisun marked this conversation as resolved.
Show resolved Hide resolved

# update the multi_model.yaml to use the custom image
sed -i "s/<image>/$(whoami)\/proteus:latest/" multi_model.yaml

# create the inference service
kubectl apply -f multi_model.yaml

# wait for service to be ready
kubectl wait --for=condition=ready isvc -l app=example-amdserver-multi-isvc

# add a model to serve: our MNIST model
kubectl apply -f trained_model.yaml
yuzisun marked this conversation as resolved.
Show resolved Hide resolved

export SERVICE_HOSTNAME=$(kubectl get inferenceservice example-amdserver-multi-isvc -o jsonpath='{.status.url}' | cut -d "/" -f 3)

# wait until the model is ready
until curl --fail -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mnist/ready &> /dev/null; do echo "Waiting for model..."; sleep 2; done
```

You can add additional models on the same `InferenceService` by applying more `TrainedModel` yaml files.

### Make a request with REST

Once the service is ready, you can make requests to it.
Assuming that `INGRESS_HOST`, `INGRESS_PORT`, and `SERVICE_HOSTNAME` have been defined as above, the following command runs an inference over REST to the example MNIST model.

```bash
export MODEL_NAME=mnist
export INPUT_DATA=@./input.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/${MODEL_NAME}/infer -d ${INPUT_DATA}
```

This shows the response from the server in KServe's v2 API format.
For this example, it will be similar to:

```{ .bash .no-copy }
{
"id":"",
"model_name":"TFModel",
"outputs":
[
{
"data": [
0.11987821012735367,
0.18648317456245422,
-0.83796119689941406,
-0.088459312915802002,
0.030454874038696289,
0.074872657656669617,
-1.1334009170532227,
-0.046301722526550293,
-0.31683838367462158,
0.32014602422714233
],
"datatype":"FP32",
"name":"input-0",
"parameters":{},
"shape":[10]
}
]
}
```

For MNIST, the data indicates the likely classification for the input image, which is the number 9.
In this response, the index with the highest value is the last one, indicating that the image was correctly classified as nine.