Skip to content

Commit 00d9bb6

Browse files
authored
Enable vLLM Profiling for ChatQnA on Gaudi (#1128)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
1 parent 59b624c commit 00d9bb6

File tree

2 files changed

+63
-0
lines changed

2 files changed

+63
-0
lines changed

ChatQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,68 @@ curl http://${host_ip}:9090/v1/guardrails\
434434
-H 'Content-Type: application/json'
435435
```
436436
437+
### Profile Microservices
438+
439+
To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
440+
441+
#### 1. vLLM backend Service
442+
443+
Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
444+
By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
445+
446+
##### Start vLLM profiling
447+
448+
```bash
449+
curl http://${host_ip}:9009/start_profile \
450+
-H "Content-Type: application/json" \
451+
-d '{"model": ${LLM_MODEL_ID}}'
452+
```
453+
454+
Users would see below docker logs from vllm-service if profiling is started correctly.
455+
456+
```bash
457+
INFO api_server.py:361] Starting profiler...
458+
INFO api_server.py:363] Profiler started.
459+
INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
460+
```
461+
462+
After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
463+
or ChatQnA MicroService.
464+
465+
##### Stop vLLM profiling
466+
467+
By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
468+
under /mnt folder in vllm-service docker instance.
469+
470+
```bash
471+
# vLLM Service
472+
curl http://${host_ip}:9009/stop_profile \
473+
-H "Content-Type: application/json" \
474+
-d '{"model": ${LLM_MODEL_ID}}'
475+
```
476+
477+
Users would see below docker logs from vllm-service if profiling is stopped correctly.
478+
479+
```bash
480+
INFO api_server.py:368] Stopping profiler...
481+
INFO api_server.py:370] Profiler stopped.
482+
INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
483+
```
484+
485+
After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
486+
487+
```bash
488+
docker cp vllm-service:/mnt/ .
489+
```
490+
491+
##### Check profiling result
492+
493+
Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
494+
to see the vLLM profiling result as below diagram.
495+
![image](https://github.com/user-attachments/assets/487c52c8-d187-46dc-ab3a-43f21d657d41)
496+
497+
![image](https://github.com/user-attachments/assets/e3c51ce5-d704-4eb7-805e-0d88b0c158e3)
498+
437499
## 🚀 Launch the UI
438500
439501
### Launch with origin port

ChatQnA/docker_compose/intel/hpu/gaudi/compose_vllm.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ services:
9292
HABANA_VISIBLE_DEVICES: all
9393
OMPI_MCA_btl_vader_single_copy_mechanism: none
9494
LLM_MODEL_ID: ${LLM_MODEL_ID}
95+
VLLM_TORCH_PROFILER_DIR: "/mnt"
9596
runtime: habana
9697
cap_add:
9798
- SYS_NICE

0 commit comments

Comments
 (0)