Release Release v0.8.1 · llm-d/llm-d

LLM-D v0.8.0 Component Summary

Themes: solidify CI coverage & project operations, expand accelerator coverage, graduate multimodal/batch/flow-control to production, introduce initial RL support.

Component	Version	Previous Version	Type	Notes
llm-d/llm-d-router-endpoint-picker	`v0.9.0`	`v0.8.0`	Image + Helm Chart	Core EPP image (renamed from llm-d-inference-scheduler)
llm-d/llm-d-router-disagg-sidecar	`v0.9.0`	`v0.8.0`	Image	P/D routing sidecar (renamed from llm-d-routing-sidecar)
llm-d/llm-d-uds-tokenizer	`vllm-v0.23.0`	`vllm-v0.19.1`	Image	Tokenizer sidecar aligned with vLLM version
llm-d/llm-d-kv-cache	`v0.9.0`	`v0.8.0`	Library	HMA support, storage events, multi-tier offloading
llm-d/llm-d-inference-sim	`v0.9.2`	`v0.8.2`	Image	Multimodal support, Mooncake bootstrap, configurable latency
llm-d/llm-d-cuda	`v0.8.0`	`v0.7.0`	Image	vLLM v0.23.0, CUDA 13.0.2
llm-d/llm-d-aws (EFA)	`v0.8.0`	`v0.7.0`	Image
llm-d/llm-d-hpu	`v0.8.0`	`v0.7.0`	Image
llm-d/llm-d-kv-cache/llmd_fs_backend_connector	`v0.23`	`v0.19.1`	Wheel installed in `llm-d`	Migrated to vLLM 0.23.0 offload API
llm-d/llm-d-benchmark	`v0.7.0`	`v0.6.8.1`	Image	Benchmark workload launcher
llm-d/llm-d-workload-variant-autoscaler	`v0.8.0`	`v0.7.0`	Helm Chart + Image	CRD migration to llm-d.ai API group, improved observability
vllm-project/vllm	`v0.23.0`	`v0.19.1`	Wheel installed in `llm-d`	Confirmed by @Gregory-Pereira and @tessapham
kubernetes-sigs/gateway-api-inference-extension	`v1.5.0`	`v1.5.0`	Helm Chart	Charts now published from llm-d-router OCI registry

Upstream vLLM Images (replacing llm-d-built images)

Per PR #1791, the following platforms now use upstream vLLM images directly instead of llm-d-built custom images:

Platform	New Image	Tag	Previous llm-d Image
GPU (CUDA)	`vllm/vllm-openai`	`v0.23.0`	`ghcr.io/llm-d/llm-d-cuda` (still available for advanced builds)
ROCm (AMD)	`vllm/vllm-openai-rocm`		`ghcr.io/llm-d/llm-d-rocm`
CPU	`ghcr.io/llm-d/llm-d-cpu`	`v0.7.0`	Same image, version bump
XPU (Intel)	`vllm/vllm-openai`	`v0.23.0`	`ghcr.io/llm-d/llm-d-xpu`

Infrastructure Changes

Component	Version	Previous Version	Notes
Gateway API	`v1.5.1`	`v1.5.1`	No change
Istio	`1.29.4`	`1.29.1`	Patch update
kgateway (agentgateway)	`v2.3.3`	`v2.2.1`

Deprecated / Removed Components

Component	Status	Replaced By
llm-d/llm-d-inference-scheduler	Renamed	`ghcr.io/llm-d/llm-d-router-endpoint-picker`
llm-d/llm-d-routing-sidecar	Renamed	`ghcr.io/llm-d/llm-d-router-disagg-sidecar`
llm-d/llm-d-cuda (debug)	Removed	N/A
llm-d/llm-d-cuda-gb200	Removed	N/A
llm-d/llm-d-xpu	Removed	`vllm/vllm-openai`
llm-d/llm-d-rocm	Removed	`vllm/vllm-openai-rocm`

New Capabilities in This Release

Capability	Component(s)	Status
Multi-modal serving (production)	llm-d-router, llm-d-inference-sim	Graduated
Batch gateway (production)	llm-d-router	Graduated
Flow control (production)	llm-d-router	Graduated
Non-Kubernetes mode (RL/Slurm)	llm-d-router (FileDiscovery plugin)	New
Responses API support	llm-d-router	New
Multi-tier KV offloading (CPU → storage)	llm-d-kv-cache	New
HMA (Heterogeneous Memory Allocation) support	llm-d-kv-cache, fs-connector	New
DP-Aware scheduling	llm-d-router	Graduating
Mooncake connector	llm-d-kv-cache	New
Predicted latency scheduling	llm-d-router	New
Agentic workload routing	llm-d-router	New
TPU nightly tests	CI/CD	New

What's Changed

Point to patch v0.8.1 by @maugustosilva in commit d6ffb5c
Updated the branch to clone in the guides to release-0.8 branch by @ahg-g in #1958
Pin inference-perf version by @diegocastanibm in #1946

Full Changelog: v0.8.0...v0.8.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v0.8.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

LLM-D v0.8.0 Component Summary

Upstream vLLM Images (replacing llm-d-built images)

Infrastructure Changes

Deprecated / Removed Components

New Capabilities in This Release

What's Changed

Contributors

Uh oh!