chore(perf): run guidellm benchmarks #3421

ehhuang · 2025-09-11T22:38:54Z

What does this PR do?

Mostly AI-generated scripts to run guidellm (https://github.com/vllm-project/guidellm) benchmarks on k8s setup
Stack is using image built from main on 9/11

Test Plan

See updated README.md

# What does this PR do? As shown in #3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works.

slekkala1 · 2025-09-19T18:20:35Z

benchmarking/k8s-benchmark/scripts/generate_charts.py

+#   "matplotlib",
+# ]
+# ///
+"""


nit: clean up above

this is used by uv run python scripts/generated_charts.py

slekkala1 · 2025-09-19T18:40:43Z

benchmarking/k8s-benchmark/scripts/run-all-benchmarks.sh

+    # "stack 4 2"
+    "stack 8 2"
+    # "vllm 1 2"
+)


nit: do we need to keep these, are these various configs we use for running the benchmarks?

updated to correspond to the checked in results/

slekkala1 · 2025-09-19T18:46:38Z

benchmarking/k8s-benchmark/README.md

+## Benchmark Results
+
+We use [GuideLLM](https://github.com/neuralmagic/guidellm) against our k8s deployment for comprehensive performance testing.
+


Are we still using the scripts run-all-benchmarks.sh and run-benchmark.sh?

ehhuang · 2025-09-19T21:00:34Z

Will update this with the new workers support on stack server

# What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan

slekkala1 · 2025-09-23T23:15:23Z

lgtm

# What does this PR do? As shown in llamastack#3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 11, 2025

ehhuang force-pushed the pr3421 branch 3 times, most recently from 859ac24 to dc6acf8 Compare September 12, 2025 23:08

ehhuang changed the title ~~guidellm~~ chore(perf): run guidellm benchmarks Sep 12, 2025

ehhuang force-pushed the pr3421 branch 4 times, most recently from 9ec5404 to 9bcad5d Compare September 15, 2025 17:20

ehhuang marked this pull request as ready for review September 15, 2025 17:27

ehhuang requested review from ashwinb, bbrowning, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners September 15, 2025 17:27

ehhuang force-pushed the pr3421 branch 3 times, most recently from 2d0c298 to 06739b3 Compare September 15, 2025 20:22

ehhuang mentioned this pull request Sep 16, 2025

chore: refactor server.main #3462

Merged

ehhuang force-pushed the pr3421 branch from 06739b3 to b5af9ba Compare September 17, 2025 03:06

ehhuang mentioned this pull request Sep 17, 2025

Performance testing #2384

Closed

ehhuang force-pushed the pr3421 branch 2 times, most recently from 4ddc5e8 to 5662e9e Compare September 18, 2025 16:28

slekkala1 reviewed Sep 19, 2025

View reviewed changes

ehhuang marked this pull request as draft September 19, 2025 21:00

ehhuang force-pushed the pr3421 branch 2 times, most recently from 4a81a4e to 6678849 Compare September 23, 2025 23:06

guidellm, runs, charts

4dfb379

# What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan

ehhuang force-pushed the pr3421 branch from 6678849 to 4dfb379 Compare September 23, 2025 23:10

ehhuang marked this pull request as ready for review September 23, 2025 23:12

slekkala1 approved these changes Sep 23, 2025

View reviewed changes

ehhuang merged commit 48a551e into llamastack:main Sep 24, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(perf): run guidellm benchmarks #3421

chore(perf): run guidellm benchmarks #3421

Uh oh!

ehhuang commented Sep 11, 2025 •

edited

Loading

Uh oh!

slekkala1 Sep 19, 2025

Uh oh!

ehhuang Sep 23, 2025

Uh oh!

slekkala1 Sep 19, 2025

Uh oh!

ehhuang Sep 23, 2025

Uh oh!

slekkala1 Sep 19, 2025

Uh oh!

ehhuang Sep 23, 2025

Uh oh!

ehhuang commented Sep 19, 2025

Uh oh!

slekkala1 commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		## Benchmark Results

		We use [GuideLLM](https://github.com/neuralmagic/guidellm) against our k8s deployment for comprehensive performance testing.

chore(perf): run guidellm benchmarks #3421

chore(perf): run guidellm benchmarks #3421

Uh oh!

Conversation

ehhuang commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Uh oh!

slekkala1 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ehhuang Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

slekkala1 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ehhuang Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

slekkala1 Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

ehhuang Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

ehhuang commented Sep 19, 2025

Uh oh!

slekkala1 commented Sep 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ehhuang commented Sep 11, 2025 •

edited

Loading