Skip to content

Commit 4c01e14

Browse files
authored
[ChatQnA] Remove enforce-eager to enable HPU graphs for better vLLM perf (#1210)
Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
1 parent 6f9f6f0 commit 4c01e14

File tree

3 files changed

+2
-3
lines changed

3 files changed

+2
-3
lines changed

ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ services:
9797
cap_add:
9898
- SYS_NICE
9999
ipc: host
100-
command: --enforce-eager --model $LLM_MODEL_ID --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
100+
command: --model $LLM_MODEL_ID --tensor-parallel-size 1 --host 0.0.0.0 --port 80 --block-size 128 --max-num-seqs 256 --max-seq_len-to-capture 2048
101101
chatqna-gaudi-backend-server:
102102
image: ${REGISTRY:-opea}/chatqna:${TAG:-latest}
103103
container_name: chatqna-gaudi-backend-server

ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna-vllm.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1286,7 +1286,6 @@ spec:
12861286
type: RuntimeDefault
12871287
image: "opea/vllm-gaudi:latest"
12881288
args:
1289-
- "--enforce-eager"
12901289
- "--model"
12911290
- "$(MODEL_ID)"
12921291
- "--tensor-parallel-size"

ChatQnA/tests/test_compose_vllm_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ function start_services() {
3939
# Start Docker Containers
4040
docker compose -f compose_vllm.yaml up -d > ${LOG_PATH}/start_services_with_compose.log
4141
n=0
42-
until [[ "$n" -ge 100 ]]; do
42+
until [[ "$n" -ge 160 ]]; do
4343
echo "n=$n"
4444
docker logs vllm-gaudi-server > vllm_service_start.log
4545
if grep -q "Warmup finished" vllm_service_start.log; then

0 commit comments

Comments
 (0)