Skip to content

Release v0.8.1

Latest

Choose a tag to compare

@diegocastanibm diegocastanibm released this 26 Jun 19:13
v0.8.1
d6ffb5c

LLM-D v0.8.0 Component Summary

Themes: solidify CI coverage & project operations, expand accelerator coverage, graduate multimodal/batch/flow-control to production, introduce initial RL support.

Component Version Previous Version Type Notes
llm-d/llm-d-router-endpoint-picker v0.9.0 v0.8.0 Image + Helm Chart Core EPP image (renamed from llm-d-inference-scheduler)
llm-d/llm-d-router-disagg-sidecar v0.9.0 v0.8.0 Image P/D routing sidecar (renamed from llm-d-routing-sidecar)
llm-d/llm-d-uds-tokenizer vllm-v0.23.0 vllm-v0.19.1 Image Tokenizer sidecar aligned with vLLM version
llm-d/llm-d-kv-cache v0.9.0 v0.8.0 Library HMA support, storage events, multi-tier offloading
llm-d/llm-d-inference-sim v0.9.2 v0.8.2 Image Multimodal support, Mooncake bootstrap, configurable latency
llm-d/llm-d-cuda v0.8.0 v0.7.0 Image vLLM v0.23.0, CUDA 13.0.2
llm-d/llm-d-aws (EFA) v0.8.0 v0.7.0 Image
llm-d/llm-d-hpu v0.8.0 v0.7.0 Image
llm-d/llm-d-kv-cache/llmd_fs_backend_connector v0.23 v0.19.1 Wheel installed in llm-d Migrated to vLLM 0.23.0 offload API
llm-d/llm-d-benchmark v0.7.0 v0.6.8.1 Image Benchmark workload launcher
llm-d/llm-d-workload-variant-autoscaler v0.8.0 v0.7.0 Helm Chart + Image CRD migration to llm-d.ai API group, improved observability
vllm-project/vllm v0.23.0 v0.19.1 Wheel installed in llm-d Confirmed by @Gregory-Pereira and @tessapham
kubernetes-sigs/gateway-api-inference-extension v1.5.0 v1.5.0 Helm Chart Charts now published from llm-d-router OCI registry

Upstream vLLM Images (replacing llm-d-built images)

Per PR #1791, the following platforms now use upstream vLLM images directly instead of llm-d-built custom images:

Platform New Image Tag Previous llm-d Image
GPU (CUDA) vllm/vllm-openai v0.23.0 ghcr.io/llm-d/llm-d-cuda (still available for advanced builds)
ROCm (AMD) vllm/vllm-openai-rocm ghcr.io/llm-d/llm-d-rocm
CPU ghcr.io/llm-d/llm-d-cpu v0.7.0 Same image, version bump
XPU (Intel) vllm/vllm-openai v0.23.0 ghcr.io/llm-d/llm-d-xpu

Infrastructure Changes

Component Version Previous Version Notes
Gateway API v1.5.1 v1.5.1 No change
Istio 1.29.4 1.29.1 Patch update
kgateway (agentgateway) v2.3.3 v2.2.1

Deprecated / Removed Components

Component Status Replaced By
llm-d/llm-d-inference-scheduler Renamed ghcr.io/llm-d/llm-d-router-endpoint-picker
llm-d/llm-d-routing-sidecar Renamed ghcr.io/llm-d/llm-d-router-disagg-sidecar
llm-d/llm-d-cuda (debug) Removed N/A
llm-d/llm-d-cuda-gb200 Removed N/A
llm-d/llm-d-xpu Removed vllm/vllm-openai
llm-d/llm-d-rocm Removed vllm/vllm-openai-rocm

New Capabilities in This Release

Capability Component(s) Status
Multi-modal serving (production) llm-d-router, llm-d-inference-sim Graduated
Batch gateway (production) llm-d-router Graduated
Flow control (production) llm-d-router Graduated
Non-Kubernetes mode (RL/Slurm) llm-d-router (FileDiscovery plugin) New
Responses API support llm-d-router New
Multi-tier KV offloading (CPU → storage) llm-d-kv-cache New
HMA (Heterogeneous Memory Allocation) support llm-d-kv-cache, fs-connector New
DP-Aware scheduling llm-d-router Graduating
Mooncake connector llm-d-kv-cache New
Predicted latency scheduling llm-d-router New
Agentic workload routing llm-d-router New
TPU nightly tests CI/CD New

What's Changed

Full Changelog: v0.8.0...v0.8.1