diff --git a/ChatQnA/docker_compose/intel/cpu/xeon/README.md b/ChatQnA/docker_compose/intel/cpu/xeon/README.md index 49a7bf168e..8396df454f 100644 --- a/ChatQnA/docker_compose/intel/cpu/xeon/README.md +++ b/ChatQnA/docker_compose/intel/cpu/xeon/README.md @@ -432,57 +432,66 @@ curl -X POST "http://${host_ip}:6007/v1/dataprep/delete_file" \ -H "Content-Type: application/json" ``` - ### Profile Microservices -To further analyze MicroService Performance, users could follow the instructions to profile MicroServices. +To further analyze MicroService Performance, users could follow the instructions to profile MicroServices. #### 1. vLLM backend Service - Users could follow previous section to testing vLLM microservice or ChatQnA MegaService. - By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands. - ##### Start vLLM profiling +Users could follow previous section to testing vLLM microservice or ChatQnA MegaService. + By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands. - ```bash - curl http://${host_ip}:9009/start_profile \ - -H "Content-Type: application/json" \ - -d '{"model": "Intel/neural-chat-7b-v3-3"}' - ``` - Users would see below docker logs from vllm-service if profiling is started correctly. - ```bash - INFO api_server.py:361] Starting profiler... - INFO api_server.py:363] Profiler started. - INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK - ``` - After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService - or ChatQnA MicroService. - - ##### Stop vLLM profiling - By following command, users could stop vLLM profliing and generate a *.pt.trace.json.gz file as profiling result - under /mnt folder in vllm-service docker instance. - ```bash - # vLLM Service - curl http://${host_ip}:9009/stop_profile \ - -H "Content-Type: application/json" \ - -d '{"model": "Intel/neural-chat-7b-v3-3"}' - ``` - Users would see below docker logs from vllm-service if profiling is stopped correctly. - ```bash - INFO api_server.py:368] Stopping profiler... - INFO api_server.py:370] Profiler stopped. - INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK - ``` - After vllm profiling is stopped, users could use below command to get the *.pt.trace.json.gz file under /mnt folder. - ```bash - docker cp vllm-service:/mnt/ . - ``` +##### Start vLLM profiling + +```bash +curl http://${host_ip}:9009/start_profile \ + -H "Content-Type: application/json" \ + -d '{"model": "Intel/neural-chat-7b-v3-3"}' +``` + +Users would see below docker logs from vllm-service if profiling is started correctly. + +```bash +INFO api_server.py:361] Starting profiler... +INFO api_server.py:363] Profiler started. +INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK +``` + +After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService + or ChatQnA MicroService. + +##### Stop vLLM profiling + +By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result + under /mnt folder in vllm-service docker instance. + +```bash +# vLLM Service +curl http://${host_ip}:9009/stop_profile \ + -H "Content-Type: application/json" \ + -d '{"model": "Intel/neural-chat-7b-v3-3"}' +``` + +Users would see below docker logs from vllm-service if profiling is stopped correctly. + +```bash +INFO api_server.py:368] Stopping profiler... +INFO api_server.py:370] Profiler stopped. +INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK +``` + +After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder. + +```bash +docker cp vllm-service:/mnt/ . +``` + +##### Check profiling result - ##### Check profiling result - Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able - to see the vLLM profiling result as below diagram. +Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able + to see the vLLM profiling result as below diagram. ![image](https://github.com/user-attachments/assets/55c7097e-5574-41dc-97a7-5e87c31bc286) - ## 🚀 Launch the UI ### Launch with origin port diff --git a/FaqGen/docker_compose/intel/cpu/xeon/README.md b/FaqGen/docker_compose/intel/cpu/xeon/README.md index c512621b04..2ed343e2ef 100644 --- a/FaqGen/docker_compose/intel/cpu/xeon/README.md +++ b/FaqGen/docker_compose/intel/cpu/xeon/README.md @@ -79,6 +79,7 @@ export TGI_LLM_ENDPOINT="http://${your_ip}:8008" export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} export MEGA_SERVICE_HOST_IP=${host_ip} export LLM_SERVICE_HOST_IP=${host_ip} +export LLM_SERVICE_PORT=9000 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/faqgen" ``` diff --git a/FaqGen/docker_compose/intel/cpu/xeon/compose.yaml b/FaqGen/docker_compose/intel/cpu/xeon/compose.yaml index 59df3093e9..18a6a7ec35 100644 --- a/FaqGen/docker_compose/intel/cpu/xeon/compose.yaml +++ b/FaqGen/docker_compose/intel/cpu/xeon/compose.yaml @@ -46,6 +46,7 @@ services: - http_proxy=${http_proxy} - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP} - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP} + - LLM_SERVICE_PORT=${LLM_SERVICE_PORT} ipc: host restart: always faqgen-xeon-ui-server: @@ -59,7 +60,7 @@ services: - no_proxy=${no_proxy} - https_proxy=${https_proxy} - http_proxy=${http_proxy} - - DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT} + - FAQ_BASE_URL=${BACKEND_SERVICE_ENDPOINT} ipc: host restart: always networks: diff --git a/FaqGen/docker_compose/intel/hpu/gaudi/README.md b/FaqGen/docker_compose/intel/hpu/gaudi/README.md index b157106bf2..81473e49c2 100644 --- a/FaqGen/docker_compose/intel/hpu/gaudi/README.md +++ b/FaqGen/docker_compose/intel/hpu/gaudi/README.md @@ -80,6 +80,7 @@ export TGI_LLM_ENDPOINT="http://${your_ip}:8008" export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} export MEGA_SERVICE_HOST_IP=${host_ip} export LLM_SERVICE_HOST_IP=${host_ip} +export LLM_SERVICE_PORT=9000 export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/faqgen" ``` diff --git a/FaqGen/docker_compose/intel/hpu/gaudi/compose.yaml b/FaqGen/docker_compose/intel/hpu/gaudi/compose.yaml index 1416019b12..f810319f0e 100644 --- a/FaqGen/docker_compose/intel/hpu/gaudi/compose.yaml +++ b/FaqGen/docker_compose/intel/hpu/gaudi/compose.yaml @@ -56,6 +56,7 @@ services: - http_proxy=${http_proxy} - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP} - LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP} + - LLM_SERVICE_PORT=${LLM_SERVICE_PORT} ipc: host restart: always faqgen-gaudi-ui-server: @@ -69,7 +70,7 @@ services: - no_proxy=${no_proxy} - https_proxy=${https_proxy} - http_proxy=${http_proxy} - - DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT} + - FAQ_BASE_URL=${BACKEND_SERVICE_ENDPOINT} ipc: host restart: always diff --git a/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_react_ui.yaml b/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_react_ui.yaml index 53b2d541f3..4577372495 100644 --- a/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_react_ui.yaml +++ b/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_react_ui.yaml @@ -264,7 +264,7 @@ spec: containers: - name: faqgen-react-ui env: - - name: DOC_BASE_URL + - name: FAQ_BASE_URL value: "http://faqgen:8888/v1/faqgen" - name: http_proxy value: diff --git a/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_ui.yaml b/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_ui.yaml index f74299a094..6b531a0c78 100644 --- a/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_ui.yaml +++ b/FaqGen/kubernetes/intel/cpu/xeon/manifest/faqgen_ui.yaml @@ -22,7 +22,7 @@ spec: containers: - name: faq-mega-ui-deploy env: - - name: DOC_BASE_URL + - name: FAQ_BASE_URL value: http://{insert_your_ip_here}:7779/v1/faqgen image: opea/faqgen-ui:latest imagePullPolicy: IfNotPresent diff --git a/FaqGen/kubernetes/intel/hpu/gaudi/manifest/faqgen_ui.yaml b/FaqGen/kubernetes/intel/hpu/gaudi/manifest/faqgen_ui.yaml index f74299a094..6b531a0c78 100644 --- a/FaqGen/kubernetes/intel/hpu/gaudi/manifest/faqgen_ui.yaml +++ b/FaqGen/kubernetes/intel/hpu/gaudi/manifest/faqgen_ui.yaml @@ -22,7 +22,7 @@ spec: containers: - name: faq-mega-ui-deploy env: - - name: DOC_BASE_URL + - name: FAQ_BASE_URL value: http://{insert_your_ip_here}:7779/v1/faqgen image: opea/faqgen-ui:latest imagePullPolicy: IfNotPresent diff --git a/FaqGen/tests/test_compose_on_gaudi.sh b/FaqGen/tests/test_compose_on_gaudi.sh index 6eb229ca72..dc12dfde8a 100644 --- a/FaqGen/tests/test_compose_on_gaudi.sh +++ b/FaqGen/tests/test_compose_on_gaudi.sh @@ -34,6 +34,7 @@ function start_services() { export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export MEGA_SERVICE_HOST_IP=${ip_address} export LLM_SERVICE_HOST_IP=${ip_address} + export LLM_SERVICE_PORT=9000 export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/faqgen" sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env diff --git a/FaqGen/tests/test_compose_on_xeon.sh b/FaqGen/tests/test_compose_on_xeon.sh index e9ed4bf1e5..3dbde68283 100755 --- a/FaqGen/tests/test_compose_on_xeon.sh +++ b/FaqGen/tests/test_compose_on_xeon.sh @@ -34,6 +34,7 @@ function start_services() { export HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} export MEGA_SERVICE_HOST_IP=${ip_address} export LLM_SERVICE_HOST_IP=${ip_address} + export LLM_SERVICE_PORT=9000 export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/faqgen" sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env