Skip to content

Commit

Permalink
Update mainifest for FaqGen (#582)
Browse files Browse the repository at this point in the history
* update tgi version

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* add k8s for faq

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* add benchmark for faq

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* refine k8s for faq

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* add tuning for faq

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* add prompts with different length for faq

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* add tgi docker for llama3.1

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* remove useless code

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* remove nodeselector

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* remove hg token

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* refine code structure

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix readme

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>

---------

Signed-off-by: Xinyao Wang <xinyao.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
XinyaoWa and pre-commit-ci[bot] authored Aug 13, 2024
1 parent 8c384e0 commit 80e3e2a
Show file tree
Hide file tree
Showing 6 changed files with 308 additions and 330 deletions.
4 changes: 2 additions & 2 deletions FaqGen/docker/gaudi/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ cd GenAIComps
As TGI Gaudi has been officially published as a Docker image, we simply need to pull it:

```bash
docker pull ghcr.io/huggingface/tgi-gaudi:1.2.1
docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1
```

### 2. Build LLM Image
Expand Down Expand Up @@ -56,7 +56,7 @@ docker build -t opea/faqgen-react-ui:latest --build-arg https_proxy=$https_proxy

Then run the command `docker images`, you will have the following Docker Images:

1. `ghcr.io/huggingface/tgi-gaudi:1.2.1`
1. `ghcr.io/huggingface/tgi-gaudi:2.0.1`
2. `opea/llm-faqgen-tgi:latest`
3. `opea/faqgen:latest`
4. `opea/faqgen-ui:latest`
Expand Down
6 changes: 4 additions & 2 deletions FaqGen/docker/gaudi/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@ services:
https_proxy: ${https_proxy}
HABANA_VISIBLE_DEVICES: all
OMPI_MCA_btl_vader_single_copy_mechanism: none
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
PREFILL_BATCH_BUCKET_SIZE: 1
BATCH_BUCKET_SIZE: 8
runtime: habana
cap_add:
- SYS_NICE
ipc: host
command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048
command: --model-id ${LLM_MODEL_ID} --max-input-length 2048 --max-total-tokens 4096 --max-batch-total-tokens 65536 --max-batch-prefill-tokens 4096
llm_faqgen:
image: opea/llm-faqgen-tgi:latest
container_name: llm-faqgen-server
Expand Down
13 changes: 12 additions & 1 deletion FaqGen/kubernetes/manifests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,24 @@ sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" faqg
kubectl apply -f faqgen.yaml
```

## Deploy UI

```
cd GenAIExamples/FaqGen/kubernetes/manifests/
kubectl get svc # get ip address
ip_address="" # according to your svc address
sed -i "s/insert_your_ip_here/${ip_address}/g" ui.yaml
kubectl apply -f ui.yaml
```

## Verify Services

Make sure all the pods are running, and restart the faqgen-xxxx pod if necessary.

```
kubectl get pods
curl http://${host_ip}:8888/v1/faqgen -H "Content-Type: application/json" -d '{
port=7779 # 7779 for gaudi, 7778 for xeon
curl http://${host_ip}:7779/v1/faqgen -H "Content-Type: application/json" -d '{
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."
}'
```
Loading

0 comments on commit 80e3e2a

Please sign in to comment.