Skip to content

Commit 80e3e2a

Browse files
Update mainifest for FaqGen (#582)
* update tgi version Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * add k8s for faq Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * add benchmark for faq Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * refine k8s for faq Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * add tuning for faq Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * add prompts with different length for faq Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * add tgi docker for llama3.1 Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * remove useless code Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * remove nodeselector Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * remove hg token Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * refine code structure Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix readme Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> --------- Signed-off-by: Xinyao Wang <xinyao.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 8c384e0 commit 80e3e2a

File tree

6 files changed

+308
-330
lines changed

6 files changed

+308
-330
lines changed

FaqGen/docker/gaudi/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ cd GenAIComps
1616
As TGI Gaudi has been officially published as a Docker image, we simply need to pull it:
1717

1818
```bash
19-
docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1
19+
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
2020
```
2121

2222
### 2. Build LLM Image
@@ -56,7 +56,7 @@ docker build -t opea/faqgen-react-ui:latest --build-arg https_proxy=$https_proxy
5656

5757
Then run the command `docker images`, you will have the following Docker Images:
5858

59-
1. `ghcr.io/huggingface/tgi-gaudi:1.2.1`
59+
1. `ghcr.io/huggingface/tgi-gaudi:2.0.1`
6060
2. `opea/llm-faqgen-tgi:latest`
6161
3. `opea/faqgen:latest`
6262
4. `opea/faqgen-ui:latest`

FaqGen/docker/gaudi/compose.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,14 @@ services:
1717
https_proxy: ${https_proxy}
1818
HABANA_VISIBLE_DEVICES: all
1919
OMPI_MCA_btl_vader_single_copy_mechanism: none
20-
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
20+
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
21+
PREFILL_BATCH_BUCKET_SIZE: 1
22+
BATCH_BUCKET_SIZE: 8
2123
runtime: habana
2224
cap_add:
2325
- SYS_NICE
2426
ipc: host
25-
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
27+
command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 4096
2628
llm_faqgen:
2729
image: opea/llm-faqgen-tgi:latest
2830
container_name: llm-faqgen-server

FaqGen/kubernetes/manifests/README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,24 @@ sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" faqg
2323
kubectl apply -f faqgen.yaml
2424
```
2525

26+
## Deploy UI
27+
28+
```
29+
cd GenAIExamples/FaqGen/kubernetes/manifests/
30+
kubectl get svc # get ip address
31+
ip_address="" # according to your svc address
32+
sed -i "s/insert_your_ip_here/${ip_address}/g" ui.yaml
33+
kubectl apply -f ui.yaml
34+
```
35+
2636
## Verify Services
2737

2838
Make sure all the pods are running, and restart the faqgen-xxxx pod if necessary.
2939

3040
```
3141
kubectl get pods
32-
curl http://${host_ip}:8888/v1/faqgen -H "Content-Type: application/json" -d '{
42+
port=7779 # 7779 for gaudi, 7778 for xeon
43+
curl http://${host_ip}:7779/v1/faqgen -H "Content-Type: application/json" -d '{
3344
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."
3445
}'
3546
```

0 commit comments

Comments
 (0)