@@ -434,6 +434,68 @@ curl http://${host_ip}:9090/v1/guardrails\
434
434
-H ' Content-Type: application/json'
435
435
```
436
436
437
+ ### Profile Microservices
438
+
439
+ To further analyze MicroService Performance, users could follow the instructions to profile MicroServices.
440
+
441
+ #### 1. vLLM backend Service
442
+
443
+ Users could follow previous section to testing vLLM microservice or ChatQnA MegaService.
444
+ By default, vLLM profiling is not enabled. Users could start and stop profiling by following commands.
445
+
446
+ ##### Start vLLM profiling
447
+
448
+ ```bash
449
+ curl http://${host_ip}:9009/start_profile \
450
+ -H "Content-Type: application/json" \
451
+ -d ' {" model" : ${LLM_MODEL_ID} }'
452
+ ```
453
+
454
+ Users would see below docker logs from vllm-service if profiling is started correctly.
455
+
456
+ ```bash
457
+ INFO api_server.py:361] Starting profiler...
458
+ INFO api_server.py:363] Profiler started.
459
+ INFO: x.x.x.x:35940 - "POST /start_profile HTTP/1.1" 200 OK
460
+ ```
461
+
462
+ After vLLM profiling is started, users could start asking questions and get responses from vLLM MicroService
463
+ or ChatQnA MicroService.
464
+
465
+ ##### Stop vLLM profiling
466
+
467
+ By following command, users could stop vLLM profliing and generate a \*.pt.trace.json.gz file as profiling result
468
+ under /mnt folder in vllm-service docker instance.
469
+
470
+ ```bash
471
+ # vLLM Service
472
+ curl http://${host_ip}:9009/stop_profile \
473
+ -H "Content-Type: application/json" \
474
+ -d ' {" model" : ${LLM_MODEL_ID} }'
475
+ ```
476
+
477
+ Users would see below docker logs from vllm-service if profiling is stopped correctly.
478
+
479
+ ```bash
480
+ INFO api_server.py:368] Stopping profiler...
481
+ INFO api_server.py:370] Profiler stopped.
482
+ INFO: x.x.x.x:41614 - "POST /stop_profile HTTP/1.1" 200 OK
483
+ ```
484
+
485
+ After vllm profiling is stopped, users could use below command to get the \*.pt.trace.json.gz file under /mnt folder.
486
+
487
+ ```bash
488
+ docker cp vllm-service:/mnt/ .
489
+ ```
490
+
491
+ ##### Check profiling result
492
+
493
+ Open a web browser and type "chrome://tracing" or "ui.perfetto.dev", and then load the json.gz file, you should be able
494
+ to see the vLLM profiling result as below diagram.
495
+ 
496
+
497
+ 
498
+
437
499
## 🚀 Launch the UI
438
500
439
501
### Launch with origin port
0 commit comments