Release v0.6.0 · llm-d/llm-d-benchmark

What's Changed

Full conversion to python, with a new CLI and a new declarative specification language for experiment description
- Plugin architecture makes adding new stages to the life cycle fluent and scalable for future features.
- User experience was enhanced with a much more meaningful logging and message display
- Extensive health checking during and at the end of the deployment.
New standup method available: "Fast Model Actuator" (FMA)
- Fast Model Actuation (FMA) is a Kubernetes-native system for efficiently managing LLM inference servers and reduces model startup latency from minutes to seconds. FMA uses two techniques: vLLM sleep/wake, where model instances move tensors from GPU to CPU memory — freeing accelerator resources while keeping the process alive for rapid wake-up and model swapping, where a persistent launcher process handles initialization upfront so instances can be swapped without full cold starts.
Significant improvements for perfomance data collection, including relevant changes on benchmark report
- "Time-series" metrics on version 0.2 of the benchmark reports now include both statics summarization and link to raw collected data on csv format.
Tighter integration with Workload Variant Autoscaler (WVA), including the ability to deploy multiple models on the same namespace as defined within a scenario. In the same vein - allowing one or more stacks in the scenario to be deployed and torn down based on user preference.
Ability to provide different parameters for vllm process on different pods (by using LeaderWorkerSet (LWS) Kubernetes API).
- Allow filling in stack details from a YAML file from harness pod.
- Assorted corrections and robustness improvements.
The "capacity planner" and "configuration explorer" are now part of a new project: https://github.com/llm-d-incubation/llm-d-planner
Strongly enhanced development constructs including pre-commit and CICD that safe guard existing library patterns and functionality.

Regular Contributors to this release

New Contributors

@DolevAdas made their first contribution in #742
@michael-desmond made their first contribution in #853
@jia-gao made their first contribution in #859
@ruocco made their first contribution in #867
@adinilfeld made their first contribution in #874
@Luka-D made their first contribution in #917
@forfreedomforrich-eng made their first contribution in #899
@Copilot made their first contribution in #951
@aavarghese made their first contribution in #995

What's Changed

🌱 Remove per-repo gh-aw typo/link/upstream workflows by @clubanderson in #778
⬆️ Bump yq from v4.45.4 to v4.45.5 by @github-actions[bot] in #748
Fix logs for new vllm on nop harness by @manoelmarques in #781
Add memory and cache metrics #2 by @DolevAdas in #742
[Experimental] Add a new production trace replay for real-world multi-turn chat workflow by @achandrasekar in #761
Update GAIE InferencePool v1.3.0 to v1.3.1 by @diegocastanibm in #830
Fix partial metrics by @mengmeiye in #834
update istio by @diegocastanibm in #840
update vllm by @diegocastanibm in #837
update yq by @diegocastanibm in #836
update inferecemax by @diegocastanibm in #835
update kgateway by @diegocastanibm in #839
update helmfile to v1.4.1 by @diegocastanibm in #832
update wva by @diegocastanibm in #838
update inference-perf by @diegocastanibm in #833
v0.5.3 tagged release by @diegocastanibm in #831
[Standup] Add the ability to use initContainers. by @maugustosilva in #851
[Standup] Additional fixes (accelerator automatic selection) by @maugustosilva in #852
🌱 Add missing governance files per CNCF audit by @clubanderson in #783
Feat/small cluster config by @michael-desmond in #853
[Standup] Consolidate all sim scenarios (with small gateway pod) by @maugustosilva in #856
Fix metrics scrape by @mengmeiye in #854
Fix standalone preprocess env. variable by @manoelmarques in #860
Epp log scrape by @mengmeiye in #855
[Run] Add --repeat flag to repeat experiments N times with aggregation by @jia-gao in #859
remove accessLogging for helm chart schema validation error by @mengmeiye in #861
workload, inference-perf: increase tokens in sanity check. by @ruocco in #867
Stack discovery tool by @namasl in #762
AI generated scenarios POC by @kalantar in #674
[Standup] Fix for GKE with new v0.5.1 llm-d-cuda image by @maugustosilva in #868
[Run] Add pre/post workload hooks to run_only.sh by @jia-gao in #873
Declarative Python Package by @Vezio in #848
Use --serviceaccount value when creating model verification pod by @adinilfeld in #874
[Docs] Add Note for Previous Library by @Vezio in #875
feat: introduce harness namespace step in run sequence by @adinilfeld in #877
fix pd-disaggregation by @mengmeiye in #878
feat: Fix secret for monitoring epp by @Vezio in #879
fix: Extract IP values through standard status.addresses object lookups by @adinilfeld in #883
fix template for pd-disaggregation by @mengmeiye in #884
fix: Configuration file concatenation bugs and Crane resolution fallback by @adinilfeld in #888
docs: Add inline comments providing recommended storage classes by @adinilfeld in #887
Fix: Re-Enable CICD via Kind Deployment for PRs by @Vezio in #885
Remove unneeded config explorer components, consolidate analysis notebook by @namasl in #886
feat: Split CI benchmark into parallel standalone and modelservice jobs by @Vezio in #889
Fix harness metadata loss from subshell variable scoping by @Vezio in #880
[Run] Updated trivy scanner version by @maugustosilva in #890
Remove redundant metrics by @mengmeiye in #891
Add GCS results metadata injection skill and standardize skills directory by @adinilfeld in #895
Remove public IP address from Gemini skill by @adinilfeld in #896
Auto-provision RBAC and enable pod-native auth for run-only mode by @adinilfeld in #894
add metrics stat to benchmark report v0.2 by @mengmeiye in #897
Enhance Smoketest Cleanup and Document ModelService Protocols by @Vezio in #901
feature: Allows ModelService K8 Manifests to be Rendered BEFORE being Applied by @Vezio in #902
fix: Removes Redundant Version References by @Vezio in #903
fix: Render K8 Manifests for MS Early in Plan Phase by @Vezio in #907
fix(standup): skip accelerator validators for CPU-only scenarios by @Vezio in #911
fix the detection of whether a cluster is an OpenShift one by @mengmeiye in #913
fix: Update CPU Scenario by @Vezio in #912
fix: Standalone Rendering by @Vezio in #916
fix: Brings Back Pre Commit and Updates Getting Started by @Vezio in #919
fix capacity_validator on the number of accelerators by @mengmeiye in #920
fix: helm-diff install uses verify=false flag by @Luka-D in #917
add parser to replace model.name automatically by @mengmeiye in #918
fix: Quickstart Guide Fix by @Vezio in #921
Add description and keywords metadata to experiment config by @jia-gao in #898
Update production-trace-replay-qwen.py by @forfreedomforrich-eng in #899
fix issue 922 by @mengmeiye in #924
auto render host and add VLLM_INFERENCE_PORT in default template by @mengmeiye in #929
Add Fast Model Actuation Mode and Fix Standalone mode with nop harness by @manoelmarques in #900
Feature: Added tuneable cli arguments for timeouts during standup and run by @Luka-D in #934
Remove reference to custom image in fma and launcher specs by @manoelmarques in #933
Phase config-explorer out of llm-d-benchmark and import llm-d-planner by @jgchn in #930
deps(actions): bump actions/github-script from 7 to 9 by @dependabot[bot] in #935
deps(actions): bump actions/upload-artifact from 7.0.0 to 7.0.1 by @dependabot[bot] in #936
feat: Pull in AgentGateway and Re-Enable all Scenarios and LWS by @maugustosilva in #937
fix: Fix Documentation for QuickStart by @Vezio in #939
chore: bump llm-d-infra chart version v1.3.8 → v1.4.0 by @Copilot in #951
chore: bump kgateway v2.1.1 → v2.2.3 by @Copilot in #949
update curl image path by @mengmeiye in #954
add support for context length aware router by @mengmeiye in #955
[Standup] Ensure simulated-accelerators is in sync with guides by @maugustosilva in #956
feat: Generate SBOM Automatically and Add Precommit to Installer by @Vezio in #957
Monitor replicas and standup time by @mengmeiye in #958
update metrics documentation by @mengmeiye in #959
fix: Remove some Pre-Push Requirement and Fix Python Version by @Vezio in #960
[Standup] Update istio and gaie by @maugustosilva in #961
fix the bugs for configMap under sidecar and preprocess script by @mengmeiye in #964
Fix spyre smoketest by @mengmeiye in #965
fix bug when harness waitTimeout is defined in scenario by @mengmeiye in #986
[Optional] Uv install by @mengmeiye in #963
Upgrade FMA to next release v0.5.1-alpha.7 by @aavarghese in #995
ignore pycache in dockerbuild by @mengmeiye in #994
feat: Workload Variant Autoscaler Scenario and Infra Imeplementation by @Vezio in #999
Additional nop harness metrics by @manoelmarques in #990
feat: Add Check for PVC Creation Instead of Timing Out Only by @Vezio in #1001
teardown leaderworkerset and statefulset by @mengmeiye in #1003
Monitor startup time and replicas by @mengmeiye in #1002
Add FMA mode to pull request CI by @manoelmarques in #1000
fix spyre standalone by @mengmeiye in #1004
deployment: add chat-template parameter by @ruocco in #1017
Re-enabled all CI/CD workflows by @maugustosilva in #1018
improve replica monitoring by @mengmeiye in #1019
fix: Delay Metric Validation via Prom. Adapt. to Post Install by @Vezio in #1020
fix: Removes Final Duplicate Reference to Benchmark Report by @Vezio in #1021
fix: Bump Pckg Version for llmdbenchmark by @Vezio in #1024
feat: Enable Flow Control for Inference-Scheduling + WVA Scenario by @Vezio in #1025
add epp pool monitoring by @mengmeiye in #1026
update spyre scenario by @mengmeiye in #1027
⚠️ Switch OCP nightly benchmark runners from platform-eval to pokprod01 by @clubanderson in #1028
[cicd] Multiple updates to restore cicd (nightly) by @maugustosilva in #1030
feat: llm-d Multi Model + WVA Enablement per Namespace by @Vezio in #1029
[Standup] Consolidate versioning information by @maugustosilva in #1034
make monitoring enabled as default for standup by @mengmeiye in #904
feat: Persist WVA Resources on per-stack Teardown by @Vezio in #1035
Remove -f in llmdbenchmark run by @mengmeiye in #1037
Add FMA to nigthly CI by @manoelmarques in #1036
skip monitoring check for dry run by @mengmeiye in #1038
Release v0.6.0 by @maugustosilva in #1039

Full Changelog: v0.5.0...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Regular Contributors to this release

New Contributors

What's Changed

Contributors

Uh oh!