Skip to content

Commit

Permalink
Update v0.9 RAG release data (#747)
Browse files Browse the repository at this point in the history
* run both xeon and gaudi when both hardware detect

Signed-off-by: chensuyue <suyue.chen@intel.com>

* add v0.9 RAG release data

Signed-off-by: chensuyue <suyue.chen@intel.com>

* update system summary

Signed-off-by: chensuyue <suyue.chen@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
chensuyue and pre-commit-ci[bot] committed Sep 6, 2024
1 parent 4b0bc26 commit 947936e
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 9 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/_get-test-matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ jobs:
run_hardware=""
if [ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | grep -c gaudi) != 0 ]; then run_hardware="gaudi"; fi
if [ $(printf '%s\n' "${changed_files[@]}" | grep ${example} | grep -c xeon) != 0 ]; then run_hardware="xeon ${run_hardware}"; fi
if [ "$run_hardware" == "" ]; then run_hardware="gaudi"; fi
if [ "$run_hardware" == "" ]; then run_hardware="xeon gaudi"; fi
for hw in ${run_hardware}; do
if [ "$hw" == "gaudi" ] && [ "${{ inputs.gaudi_server_label }}" != "" ]; then
run_matrix="${run_matrix}{\"example\":\"${example}\",\"hardware\":\"${{ inputs.gaudi_server_label }}\"},"
Expand Down
20 changes: 12 additions & 8 deletions ChatQnA/benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,11 +133,11 @@ kubectl label nodes k8s-worker1 node-type=chatqna-opea

#### 2. Install ChatQnA

Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/single_gaudi) and apply to K8s.
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/single_gaudi) and apply to K8s.

```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
kubectl apply -f .
```

Expand Down Expand Up @@ -199,7 +199,7 @@ All the test results will come to this folder `/home/sdp/benchmark_output/node_1

```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-worker1 node-type-
```
Expand All @@ -216,11 +216,11 @@ kubectl label nodes k8s-worker1 k8s-worker2 node-type=chatqna-opea

#### 2. Install ChatQnA

Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/two_gaudi) and apply to K8s.
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/two_gaudi) and apply to K8s.

```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/two_gaudi
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/two_gaudi
kubectl apply -f .
```

Expand Down Expand Up @@ -265,11 +265,11 @@ kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type=cha

#### 2. Install ChatQnA

Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/four_gaudi) and apply to K8s.
Go to [BKC manifest](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/benchmark/tuned/with_rerank/four_gaudi) and apply to K8s.

```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/four_gaudi
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/four_gaudi
kubectl apply -f .
```

Expand Down Expand Up @@ -298,7 +298,11 @@ All the test results will come to this folder `/home/sdp/benchmark_output/node_4

```bash
# on k8s-master node
cd GenAIExamples/ChatQnA/benchmark/single_gaudi
cd GenAIExamples/ChatQnA/benchmark/tuned/with_rerank/single_gaudi
kubectl delete -f .
kubectl label nodes k8s-master k8s-worker1 k8s-worker2 k8s-worker3 node-type-
```

#### 6. Results

Check OOB performance data [here](/opea_release_data.md#chatqna), tuned performance data will be released soon.
49 changes: 49 additions & 0 deletions opea_release_data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# OPEA Release Data

This page shows the benchmark data of GenAIExamples. More data for different examples will be submitted in the future release.

## ChatQnA

| **Docker Images for Test** |
| ----------------------------------------------------- |
| opea/embedding-tei:v0.9 |
| ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 |
| opea/llm-tgi:v0.9 |
| ghcr.io/huggingface/tgi-gaudi:2.0.1 |
| opea/dataprep-redis:v0.9 |
| redis/redis-stack:7.2.0-v9 |
| opea/reranking-tei:v0.9 |
| opea/tei-gaudi:v0.9 |
| opea/retriever-redis:v0.9 |
| opea/chatqna:v0.9 |

System Summary:
1-node, 2x Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz, 40 cores, 270W TDP, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 0 [0], DSA 0 [0], IAA 0 [0], QAT 0 [0], Total Memory 1024GB (32x32GB DDR4 3200 MT/s [3200 MT/s]), BIOS ETM02, microcode 0xd0003b9, 8x Habana Labs Ltd., 4x MT28800 Family [ConnectX-5 Ex], 4x 7T INTEL SSDPF2KX076TZ, 2x 894.3G SAMSUNG MZ1L2960HCJR-00A07, Ubuntu 22.04.3 LTS, 5.15.0-92-generic. Software: WORKLOAD+VERSION, COMPILER, LIBRARIES, OTHER_SW. Test by Intel as of 08/20/24.

### Performance Data

| 1Node E2E Performance (Sec) | Gaudi nodes | Concurrency | Input | Output | Average Latency | P90 Total latency |
| :-------------------------: | :---------: | :---------: | :---: | :----: | :-------------: | :---------------: |
| OOB w/o Reranking | 1 | 128 | 128 | 128 | 5.597 | 7.59 |
| OOB w/ Reranking | 1 | 128 | 128 | 128 | 6.003 | 8.123 |

| 2Nodes E2E Performance (Sec) | Gaudi nodes | Concurrency | Input | Output | Average Latency | P90 Total latency |
| :--------------------------: | :---------: | :---------: | :---: | :----: | :-------------: | :---------------: |
| OOB w/o Reranking | 2 | 256 | 128 | 128 | 7.05 | 9.122 |
| OOB w/ Reranking | 2 | 256 | 128 | 128 | 7.26 | 9.239 |

| 4Nodes E2E Performance (Sec) | Gaudi nodes | Concurrency | Input | Output | Average Latency | P90 Total latency |
| :--------------------------: | :---------: | :---------: | :---: | :----: | :-------------: | :---------------: |
| OOB w/o Reranking | 4 | 512 | 128 | 128 | 16.293 | 21.169 |
| OOB w/ Reranking | 4 | 512 | 128 | 128 | 17.22 | 21.942 |

Go to Benchmark [README](./ChatQnA/benchmark/README.md) for reproduce steps, tuned performance data will be released soon.

### Accuracy Data

| Test Case | Hits@10 | Hits@4 | MAP@10 | MRR@10 |
| :---------------------: | :-----: | :----: | :----: | :----: |
| Retrieval w/o Reranking | 66.16% | 49.80% | 17.62% | 39.75% |
| Retrieval w/ Reranking | 72.28% | 63.24% | 24.97% | 56.79% |

Go to Accuracy [README](https://github.com/opea-project/GenAIEval/tree/main/evals/evaluation/rag_eval#multihop-english-dataset) for reproduce steps.

0 comments on commit 947936e

Please sign in to comment.