Skip to content

Commit 4f7fc39

Browse files
Add kubernetes support for VisualQnA (#578)
* Add kubernetes support for VisualQnA Signed-off-by: lvliang-intel <liang1.lv@intel.com> * update gmc file Signed-off-by: lvliang-intel <liang1.lv@intel.com> * update pic Signed-off-by: lvliang-intel <liang1.lv@intel.com> --------- Signed-off-by: lvliang-intel <liang1.lv@intel.com>
1 parent 80e3e2a commit 4f7fc39

File tree

9 files changed

+784
-7
lines changed

9 files changed

+784
-7
lines changed

VisualQnA/docker/gaudi/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ curl http://${host_ip}:8888/v1/visualqna -H "Content-Type: application/json" -d
116116
{
117117
"type": "image_url",
118118
"image_url": {
119-
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
119+
"url": "https://www.ilankelman.org/stopsigns/australia.jpg"
120120
}
121121
}
122122
]

VisualQnA/docker/xeon/README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,15 +68,20 @@ docker build --no-cache -t opea/visualqna-ui:latest --build-arg https_proxy=$htt
6868
cd ../../../..
6969
```
7070

71-
### 4. Pull TGI image
71+
### 4. Build TGI Xeon Image
72+
73+
Since TGI official image has not supported llava-next for CPU, we'll need to build it based on Dockerfile_intel.
7274

7375
```bash
74-
docker pull ghcr.io/huggingface/text-generation-inference:2.2.0
76+
git clone https://github.com/huggingface/text-generation-inference
77+
cd text-generation-inference/
78+
docker build -t opea/llava-tgi-xeon:latest --build-arg PLATFORM=cpu --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${https_proxy} . -f Dockerfile_intel
79+
cd ../
7580
```
7681

7782
Then run the command `docker images`, you will have the following 4 Docker Images:
7883

79-
1. `ghcr.io/huggingface/text-generation-inference:2.2.0`
84+
1. `opea/llava-tgi-xeon:latest`
8085
2. `opea/lvm-tgi:latest`
8186
3. `opea/visualqna:latest`
8287
4. `opea/visualqna-ui:latest`
@@ -152,7 +157,7 @@ curl http://${host_ip}:8888/v1/visualqna -H "Content-Type: application/json" -d
152157
{
153158
"type": "image_url",
154159
"image_url": {
155-
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
160+
"url": "https://www.ilankelman.org/stopsigns/australia.jpg"
156161
}
157162
}
158163
]

VisualQnA/docker/xeon/compose.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ version: "3.8"
66

77
services:
88
llava-tgi-service:
9-
image: ghcr.io/huggingface/text-generation-inference:2.2.0
9+
image: opea/llava-tgi-xeon:latest
1010
container_name: tgi-llava-xeon-server
1111
ports:
1212
- "9399:80"
@@ -19,7 +19,7 @@ services:
1919
https_proxy: ${https_proxy}
2020
HF_HUB_DISABLE_PROGRESS_BARS: 1
2121
HF_HUB_ENABLE_HF_TRANSFER: 0
22-
command: --model-id ${LVM_MODEL_ID}
22+
command: --model-id ${LVM_MODEL_ID} --max-input-length 4096 --max-total-tokens 8192 --cuda-graphs 0
2323
lvm-tgi:
2424
image: opea/lvm-tgi:latest
2525
container_name: lvm-tgi-server

VisualQnA/kubernetes/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Deploy VisualQnA in a Kubernetes Cluster
2+
3+
This document outlines the deployment process for a Visual Question Answering (VisualQnA) application that utilizes the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice components on Intel Xeon servers and Gaudi machines.
4+
5+
Please install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector#readme). We will soon publish images to Docker Hub, at which point no builds will be required, further simplifying install.
6+
7+
If you have only Intel Xeon machines you could use the visualqna_xeon.yaml file or if you have a Gaudi cluster you could use visualqna_gaudi.yaml
8+
In the below example we illustrate on Xeon.
9+
10+
## Deploy the VisualQnA application
11+
12+
1. Create the desired namespace if it does not already exist and deploy the application
13+
```bash
14+
export APP_NAMESPACE=CT
15+
kubectl create ns $APP_NAMESPACE
16+
sed -i "s|namespace: visualqna|namespace: $APP_NAMESPACE|g" ./visualqna_xeon.yaml
17+
kubectl apply -f ./visualqna_xeon.yaml
18+
```
19+
20+
2. Check if the application is up and ready
21+
```bash
22+
kubectl get pods -n $APP_NAMESPACE
23+
```
24+
25+
3. Deploy a client pod for testing
26+
```bash
27+
kubectl create deployment client-test -n $APP_NAMESPACE --image=python:3.8.13 -- sleep infinity
28+
```
29+
30+
4. Check that client pod is ready
31+
```bash
32+
kubectl get pods -n $APP_NAMESPACE
33+
```
34+
35+
5. Send request to application
36+
```bash
37+
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
38+
export accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='visualqna')].status.accessUrl}")
39+
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"messages": [
40+
{
41+
"role": "user",
42+
"content": [
43+
{
44+
"type": "text",
45+
"text": "What'\''s in this image?"
46+
},
47+
{
48+
"type": "image_url",
49+
"image_url": {
50+
"url": "https://www.ilankelman.org/stopsigns/australia.jpg"
51+
}
52+
}
53+
]
54+
}
55+
],
56+
"max_tokens": 128}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_visualqna.log
57+
```
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Deploy VisualQnA in Kubernetes Cluster
2+
3+
> [NOTE]
4+
> You can also customize the "LVM_MODEL_ID" if needed.
5+
6+
> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the visualqna workload is running. Otherwise, you need to modify the `visualqna.yaml` file to change the `model-volume` to a directory that exists on the node.
7+
8+
## Deploy On Xeon
9+
10+
```
11+
cd GenAIExamples/visualqna/kubernetes/manifests/xeon
12+
kubectl apply -f visualqna.yaml
13+
```
14+
15+
## Deploy On Gaudi
16+
17+
```
18+
cd GenAIExamples/visualqna/kubernetes/manifests/gaudi
19+
kubectl apply -f visualqna.yaml
20+
```
21+
22+
## Verify Services
23+
24+
To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
25+
26+
Then run the command `kubectl port-forward svc/visualqna 8888:8888` to expose the visualqna service for access.
27+
28+
Open another terminal and run the following command to verify the service if working:
29+
30+
```console
31+
curl http://localhost:8888/v1/visualqna \
32+
-H 'Content-Type: application/json' \
33+
-d '{"messages": [
34+
{
35+
"role": "user",
36+
"content": [
37+
{
38+
"type": "text",
39+
"text": "What'\''s in this image?"
40+
},
41+
{
42+
"type": "image_url",
43+
"image_url": {
44+
"url": "https://www.ilankelman.org/stopsigns/australia.jpg"
45+
}
46+
}
47+
]
48+
}
49+
],
50+
"max_tokens": 128}'
51+
```

0 commit comments

Comments
 (0)