v0.9.0-rc.2
Pre-release
Pre-release
Here is a summary of the release notes for v0.9.0-rc.2:
Highlights
- Performance Wins: Optimizations were made to the sidecar proxy hot path for high-concurrency P/D routing (#746) and allocations in
Scheduler.Schedulewere reduced by ~90% on large fleets (#1171). OpenAI parser response usage parsing allocations were also minimized (#1583). - Key Features: Added support for multi-modal image content in approximate prefix cache match, and encoder cache affinity scorer, Added support for the Anthropic
v1/messagesAPI (#1088), Mooncake as a new KV-connector in the routing sidecar (#1193), and a chunked decode feature to the sidecar (#822). - New Metrics: Introduced inter-token latency (ITL) (#1591), stream-aware TPOT, and core TTFT metrics (#1499).
- Major Bug Fixes: Resolved a panic in the SGLang proxy handling concurrent requests (#632), fixed dropped multimodal content in prefix cache scoring/tokenizer rendering (#781), and patched a ZMQ port range validation issue to prevent multi-rank pod failures (#1242).
Upgrade Steps & Deprecations
- Repository & Binary Renaming: The project has officially transitioned from
llm-d-inference-schedulertollm-d-router. Reflecting this change, Docker images have been renamed torouter-endpoint-pickerandrouter-disagg-sidecar(#1098), and configuration pseudo CRD groups are migrating tollm-d.ai(#972). - Removed Sidecar Flags: The deprecated
--inference-pool-nameand--inference-pool-namespaceflags have been permanently removed from the sidecar (#1416). - Metric Changes: Legacy metrics feature gates and their associated CLI flags have been removed from the EPP template (#1418, #1466). The metrics subsystem has also been unified and renamed to
llm_d_epp(#1661). - Component Deprecations:
- Istio & Gateway API: Istio has been upgraded to 1.29.2 (#1052) and the Gateway API has been updated to v1.5.1 (#780). The previous workaround for vLLM Data Parallel on Istio 1.28 is now deprecated (#727).
What's Changed
- Fix for flaky sites during lychee md link checker by @pierDipi in #485
- deps(actions): bump actions/checkout from 5 to 6 by @dependabot[bot] in #487
- deps(go): bump google.golang.org/grpc from 1.76.0 to 1.77.0 in the go-dependencies group by @dependabot[bot] in #486
- Use kv-cache-manager based on Go mod version instead of hardcoded by @pierDipi in #484
- update llm-d-kv-cache version to v0.4.0 by @vMaroon in #492
- fix: github action missing Trivy scan on sidecar image by @zdtsw in #481
- [Fix] Enhance macOS Makefile to Support Non-Homebrew Python Installations by @hyeongyun0916 in #489
- feat(allowlist): support both v1 and v1alpha2 InferencePool APIs with flag by @googs1025 in #474
- fix: make 'install-dependencies' and 'build' target by @zdtsw in #493
- skip lint and test when only docs change by @setsunakute in #494
- fix: Fixes for Data Parallel support when also running with Prefix Disaggregation by @shmuelk in #498
- sync gie to v1.2.0 by @nirrozenbaum in #499
- Error if PYTHON_CONFIG is empty by @elevran in #497
- Add GH action to check for signed and verified commits in PR by @elevran in #500
- deps(actions): bump crate-ci/typos from 1.39.2 to 1.40.0 by @dependabot[bot] in #501
- build: make build should use CGO_CFLAGS, CGO_LDFLAGS by @evacchi in #503
- chore: bump gie to v1.2.1 by @nirrozenbaum in #504
- deps(go): bump sigs.k8s.io/gateway-api from 1.4.0 to 1.4.1 in the kubernetes group by @dependabot[bot] in #508
- deps(go): bump the go-dependencies group with 3 updates by @dependabot[bot] in #507
- Miscellaneous dependency updates by @shmuelk in #510
- deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #513
- Fix running
make env-dev-kindby @acardace in #512 - test: add precise_prefix_cache_test by @evacchi in #505
- test: reuse upstream data store and enable logr in unit tests by @MregXN in #518
- feat: allow pd_profile_handler to handle diverse plugin types by @hyeongyun0916 in #516
- deps(actions): bump crate-ci/typos from 1.40.0 to 1.40.1 by @dependabot[bot] in #526
- deps(go): bump google.golang.org/grpc from 1.77.0 to 1.78.0 in the go-dependencies group by @dependabot[bot] in #527
- feat(metrics): add model_name label to PD decision metric by @googs1025 in #528
- deps(actions): bump crate-ci/typos from 1.40.1 to 1.41.0 by @dependabot[bot] in #532
- Configure dependabot ignores Go version updates by @elevran in #533
- Updates the architecture description by @davidbreitgand in #525
- Dependabot: exert finer control over package updates by @elevran in #542
- port auto-assign action from llm-d-kv-cache by @vMaroon in #551
- refactor: set python version and pin docker image with tag by @zdtsw in #543
- chore(test): update API version for nixl test by @zdtsw in #555
- deps(go): bump the go-dependencies group with 2 updates by @dependabot[bot] in #558
- deps(actions): bump crate-ci/typos from 1.41.0 to 1.42.0 by @dependabot[bot] in #557
- deps(actions): bump actions/checkout from 4 to 6 by @dependabot[bot] in #556
- Update auto-assign logic by @elevran in #560
- Remove newline in unsigned commit message by @elevran in #561
- bump gie to v1.3.0 rc2 by @nirrozenbaum in #562
- Update OWNERS by @elevran in #559
- refactor: Makefile, update docs by @zdtsw in #463
- feat: add metrics validation in e2e test by @googs1025 in #529
- feat: make no-hit-lru P/D-aware by @evacchi in #522
- Update disaggregated Prefill/Decode inference serving documentation by @mayabar in #571
- deps(actions): bump crate-ci/typos from 1.42.0 to 1.42.1 by @dependabot[bot] in #572
- deps(go): bump github.com/onsi/ginkgo/v2 from 2.27.4 to 2.27.5 in the go-dependencies group by @dependabot[bot] in #573
- fix reviewers auto assign minor bug by @nirrozenbaum in #575
- fix(scorer): make active request pd aware by @kyanokashi in #569
- test(e2e): cleanup kind cluster by @zdtsw in #563
- refactor: add early validation in DP profile handler by @zdtsw in #554
- deps(go): bump the kubernetes group with 2 updates by @dependabot[bot] in #574
- refactor: kv cache manager repo by @sagearc in #570
- bumping IGW version to the full released version by @kfswain in #583
- Enable prefix-cache awareness in active-active multi-replica scheduler deployments by @vMaroon in #578
- Switch to pre-built vLLM wheels for CPU builds by @sagearc in #582
- update llm-d-kv-cache import to v0.5.0-RC1 by @vMaroon in #584
- Use 1.3.0 CRDs by @shmuelk in #586
- free disk space on ci-release by @vMaroon in #587
- feat: use Tinyllama as the "model" for kind test and switch to use precise-prefix-cache-score in config by @zdtsw in #581
- Update linter configuration by @elevran in #588
- fix: config should use new precise-prefix-cache-scorer by @zdtsw in #576
- deps(actions): bump crate-ci/typos from 1.42.1 to 1.42.2 by @dependabot[bot] in #589
- Updated to more recent GIE by @shmuelk in #592
- pull kvc v0.5.0 libs by @vMaroon in #595
- deps(actions): bump crate-ci/typos from 1.42.2 to 1.43.0 by @dependabot[bot] in #596
- Address return nil,nil linter error in test mock by @elevran in #598
- deps(go): bump the go-dependencies group with 2 updates by @dependabot[bot] in #597
- Models extractor by @irar2 in #553
- feat(lmcache): implement decode first flow on lmcache connector when cache_hit_threshold field is present by @kyanokashi in #509
- Extend support for different ways to decide if disaggregated PD is required by @mayabar in #531
- chore: fix wrong port for NIXL by @zdtsw in #593
- fix: resolve JSON serialization error in active-request-scorer debug logs by @albertoperdomo2 in #602
- Match documentation with default model in scripts by @shmuelk in #615
- Cleanup End to End test framework by @shmuelk in #625
- deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #629
- deps(go): bump google.golang.org/grpc from 1.78.0 to 1.79.1 in the go-dependencies group by @dependabot[bot] in #628
- deps(actions): bump crate-ci/typos from 1.43.0 to 1.43.5 by @dependabot[bot] in #627
- Update Go to Go 1 25 by @shmuelk in #624
- feat: update to use new kv-cache UDS tokenizer by @zdtsw in #609
- fix: Add kustomization file for rbac by @albertoperdomo2 in #601
- Fix panic in SGLang proxy handling of concurrent requests by @yangligt2 in #632
- Add otel tracing instrumentation by @sallyom in #506
- bump kvc import to v0.5.1-rc2 by @vMaroon in #657
- pull in v0.6.0 of kvcache by @Gregory-Pereira in #660
- deps(go): bump go.opentelemetry.io/otel/sdk from 1.39.0 to 1.40.0 by @dependabot[bot] in #661
- deps(go): bump the go-dependencies group across 1 directory with 2 updates by @dependabot[bot] in #662
- fix(docs): Updatede development guide by @gyliu513 in #666
- Optimized request prefill error messages by @learner0810 in #652
- fix(makefile): use shell variable for kv-cache path in tokenizer build by @gyliu513 in #664
- fix: remove kustomize dependency by @gyliu513 in #665
- deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #673
- deps(go): bump the go-dependencies group across 1 directory with 5 updates by @dependabot[bot] in #674
- deps(actions): bump lycheeverse/lychee-action from 2.7.0 to 2.8.0 by @dependabot[bot] in #671
- ci: add dev image workflow for main and release branches by @pierDipi in #668
- deps(actions): bump crate-ci/typos from 1.43.5 to 1.44.0 by @dependabot[bot] in #670
- fix(ci): update Trivy to v0.69.2 by @pierDipi in #675
- Allow sidecar server to reload TLS certificates by @pierDipi in #607
- Use trivy action for image scanning by @elevran in #688
- deps(go): bump the go-dependencies group with 7 updates by @dependabot[bot] in #692
- feat(sidecar): simplify TLS command line options with StringSlice flags by @gyliu513 in #682
- Fix terminolgy and add links to docs/archiecture.md by @elevran in #695
- replace map[string]bool with map[string]struct{} by @roytman in #696
- Add presubmit makefile target for local verification before pushing a PR by @elevran in #687
- Update Trivy scan to run newer version with explicit auth tokens by @elevran in #698
- Remove extra trivy params by @elevran in #702
- fix: simplify InferencePool flag to namespace/name format by @gyliu513 in #685
- Trivy complains of user-password mismatch by @elevran in #704
- fix(test): Add unit test for pd_prerequest.go by @gyliu513 in #706
- Remove trivy cache and enable manual workflow dispatch by @elevran in #713
- initial E/PD extension of the sidecar by @roytman in #643
- Check for uniqueness of media URLs by @roytman in #717
- Move typo checking from tools makefile to main, under lint by @elevran in #719
- Rename EncoderPodsHeader according to other constants by @roytman in #721
- Implement Options pattern for sidecar proxy by @Mohamedma96 in #697
- Rename common constants by @roytman in #722
- deps(actions): bump dorny/paths-filter from 3 to 4 by @dependabot[bot] in #723
- Enable major version updates to gh actions by @elevran in #714
- NonCachedTokens defines the minimum number of non-cached tokens required by @modassarrana89-new in #691
- Add external tokenizer PrepareData plugin and TokenizedPrompt scorer support by @acardace in #694
- Deprecate the workaround used to support vLLM Data Parallel on Istio 1.28 by @shmuelk in #727
- build: remove CGO dependency by migrating to pure-Go ZMQ by @elevran in #728
- deps(go): bump google.golang.org/grpc from 1.79.2 to 1.79.3 by @dependabot[bot] in #737
- Use kubectl kustomize instead of standalone kustomize by @elevran in #741
- Add idle pod config to active-request-scorer by @dagrayvid in #669
- [build] Optimize docker build (local and CICD) by @elevran in #740
- feat: speculative indexing for PrecisePrefixCacheScorer by @bongwoobak in #659
- Remove obsolete hashBlockSize by @roytman in #748
- feat: add optional Prometheus monitoring to Kind dev environment by @hexfusion in #742
- fix: support podman-docker in e2e tests and Makefile by @hexfusion in #730
- Remove UDS tokenizer image build from inference scheduler repo by @elevran in #739
- deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #754
- test: add disruption e2e tests for scheduler failure scenarios by @hexfusion in #735
- Unified Disaggregate Handler by @roytman in #732
- [build] Add test coverage reporting by @elevran in #749
- [refactor] Unify sidecar
ConfigandOptionsby @elevran in #751 - [cicd[ Make coverage comparison optional by @elevran in https://github.com//pull/762
- A temporary fix to use the previous simulator image by @roytman in #766
- Combine EncodeHeaderHandler and PrefillHeaderHandler into a single DisaggHeadersHandler by @roytman in #758
- Prevent mismatch between new and deprecated APIs by @roytman in #756
- Revert IGW import to 1.4.0 and update tokenizer plugin accordingly by @vMaroon in #770
- fix(test): Increase test coverage for prefix_based_pd_decider.go by @gyliu513 in #715
- Run unit, integration tests and builds in a container by @acardace in #521
- update Makefile to run e2e tests locally with Podman by @roytman in #775
- implement context-length-aware plugin (scorer/filter) by @vMaroon in #550
- Build images use Go builtin cross compilation by @shmuelk in #776
- feat: bump llm-d-kv-cache for MM-aware prefix-cache routing by @vMaroon in #772
- change simulator version to v0.8.1 by @mayabar in #773
- [fix] [cicd] Add missing newline at end of OWNERS, make script more robust for missing newlines by @elevran in #779
- [fix] Added custom LevelEncoder for sidecar by @elevran in #778
- Update GW API to v1.5.1 by @elevran in #780
- Add E2E_KEEP_CLUSTER_ON_FAILURE env variable by @roytman in #782
- [bug] fix multimodal content dropped in prefix cache scoring and tokenizer chat rendering by @vMaroon in #781
- [Docs] Restructure development.md by @elevran in #769
- deps(actions): bump actions/cache from 4 to 5 in the github-actions group by @dependabot[bot] in #788
- Print Pod statuses and logs if e2e tests fail by @roytman in #783
- Propagates relevant environment variables into test container by @roytman in #785
- bump kvc import to v0.7.0 by @vMaroon in #789
- import kvc-v0.7.1 by @vMaroon in #796
- Add tests for disaggregated Encoder. by @revit13 in #777
- (1/2) import igw v1.5.0-rc1 by @zetxqx in #806
- deps(go): bump go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp from 0.67.0 to 0.68.0 in the go-dependencies group across 1 directory by @dependabot[bot] in #809
- deps(actions): bump crate-ci/typos from 1.44.0 to 1.45.0 in the github-actions group across 1 directory by @dependabot[bot] in #801
- perf: optimize sidecar proxy hot path for high-concurrency P/D routing by @tlrmchlsmth in #746
- [feat] [cicd] Use distroless/static as default runtime base image by @elevran in #795
- Support image build with different base via Maefile and document runtime debugging by @elevran in #821
- [cicd] Compare coverage against
mainand latestrelease-*by @elevran in #757 - deps(actions): bump the github-actions group across 1 directory with 2 updates by @dependabot[bot] in #824
- Script to automate migration for GAIE code into llm-d by @elevran in #804
- feat: Add support for Responses API by @asharkhan3101 in #599
- update image-build-builder rule by @roytman in #830
- [cicd] Add release notes action by @elevran in #791
- switch active-request scorer's logging from debug to trace by @vMaroon in #825
- Add ahg-g as an owner by @ahg-g in #835
- Refactoring plugins according to GAIE structure by @roytman in #827
- Add lint exclusions from GAIE by @elevran in #834
- fix: broken link after refactoring by @zdtsw in #839
- deps(go): bump github.com/moby/spdystream from 0.5.0 to 0.5.1 by @dependabot[bot] in #838
- migration scripts: improve tooling robustness by @elevran in #843
- Update to GAIE v1.5 by @elevran in #844
- deps(go): bump the kubernetes group with 5 updates by @dependabot[bot] in #853
- [migration] Clear GAIE destination paths of conflicting Go files by @elevran in #855
- lint: simplify golangci config for GAIE migration by @elevran in #859
- move scheduler pd unittests to the new locations by @roytman in #849
- migrate: sigs.k8s.io/gateway-api-inference-extension @ a048a796 by @elevran in #870
- Added datalayer plugins docs by @ahg-g in #875
- fix image build with Podman by @roytman in #869
- use endpoints extractor for kvevents subscription mgmt in
precise-prefix-cache-scorerby @vMaroon in #862 - fix: CVE-2025-61729 - pin toolchain to go1.25.8 by @andresllh in #861
- fix: align pod label selector to kebab-case inference-serving by @yankay in #854
- move sidecar/latencypredictorasync into predictedlatency by @kaushikmitr in #881
- feat(epp): allow Parser.ParseRequest to signal skip and fallback to random endpoint by @zetxqx in #882
- fix(test): eliminate data race in SGLang connector test by @elevran in #886
- Fix: dispatch endpoint events when no PollingDataSource is configured by @elevran in #879
- Exclude envoyproxy.io from markdown link checker by @roytman in #865
- Add FULL_DUPLEX_STREAMED requirement by @roytman in #892
- updates to go.mod changes not included by latencypredictor client by @elevran in #897
- deps(actions): bump the github-actions group with 2 updates by @dependabot[bot] in #903
- migrate: Correct some of the lint issues with the migrated IGW code by @shmuelk in #909
- add disk cleanup before e2e tests by @roytman in #915
- Refactoring and unification of e2e tests and deployment yaml files by @roytman in #811
- refactor: introduce ExtractorBase to remove forced no-op Extract from event-driven extractors by @elevran in #880
- refactor: migrate GAIE runtime strings to llm-d.ai by @evacchi in #922
- refactor: align eviction plugin Type names with naming conventions by @jia-gao in #894
- fix make test-unit on arm64 by @roytman in #940
- Enabled more lint rules and corrected related code issues by @shmuelk in #938
- feat: Add Vertex AI parser support by @zetxqx in #884
- Add documentation to the plugins. by @revit13 in #888
- feat(datalayer): additive ensureDataLayer with InjectDefaults opt-out by @elevran in #941
- migration: Migrate portions of the IGW Makefile by @shmuelk in #887
- [Migration]: Move GAIE deployment assets by @danehans in #950
- [Migration]: sigs.k8s.io/gateway-api-inference-extension @ 8ed5a0cd by @hexfusion in #927
- improve metrics logging by @andreyod in #966
- fix: make eviction runtime a builtin, not a plugin by @abdallahsamabd in #898
- feat(datalayer): plugin-driven source/extractor registration via Registrant interface by @elevran in #942
- feat: Add pluggable DiscoveryPlugin interface by @ezrasilvera in #908
- refactor(disagg): add PreRequest to disagg-profile-handler, deprecate disagg-headers-handler by @noalimoy in #905
- deps(actions): bump crate-ci/typos from 1.45.2 to 1.46.0 in the github-actions group by @dependabot[bot] in #985
- refactor(test): drop igw/ prefix on integration test paths by @hexfusion in #984
- tokenizer: consolidate to a single PrepareData plugin by @vMaroon in #863
- ci: implement Phase 1 security and reliability hardening by @Jwrede in #960
- [migrate:] phase 1 of e2e tests migration from sigs.k8s.io/gateway-api-inference-extension @ ed1af5d8 by @roytman in #987
- Enabled more lint rules and corrected related code issues - part three by @shmuelk in #990
- Refactor label-based filters: rename, deprecate, and add runtime warn… by @szedan-rh in #842
- [flake] Testing: wait for manager shutdown before releasing test cleanup by @elevran in #992
- migration: Change the Group Name in the configuration pseudo CRD to llm-d.ai by @shmuelk in #972
- Updated the main readme file by @ahg-g in #978
- ci(deps): allow only patch bumps by dependabot by @elevran in #991
- refactor(datalayer): simplify Collector lifecycle, add error metrics by @hexfusion in #899
- deps(go): bump the go-dependencies group across 1 directory with 3 updates by @dependabot[bot] in #993
- switch to
on: pull_request_targetas GH unconditionally restricts GITHUB_TOKEN to read-only for pull_request by @elevran in #994 - refactor(tokenizer): return errors from PrepareRequestData instead of swallowing them by @acardace in #998
- Add PR size labeler workflow by @yehuditkerido in #997
- [sidecar] strip disaggregation headers before forwarding to model server by @andreyod in #999
- fix(e2e/epp): treat non-HTTP-200 as failure in generateTraffic by @revit13 in #1013
- Added a terminology section to clarify how everything fits under llm-d Router by @ahg-g in #1016
- refactor: align all import aliases by @evacchi in #921
- Re-enable two more lint rules that were disabled for the IGW migration by @shmuelk in #1000
- Correct issues related to pulling images in tests by @shmuelk in #1001
- docs: fix prefix scorer README by @learner0810 in #1014
- Migrate active request scorer to inflight load by @wenhug in #931
- Updated owners file as per the code migration alignment by @ahg-g in #946
- migrate: sigs.k8s.io/gateway-api-inference-extension @ ed1af5d8 by @zhouyou9505 in #989
- update release notes for PR #931 by @github-actions[bot] in #1028
- fix(e2e): waitForDeploymentReady races on terminating pods after DeleteAllOf. by @revit13 in #1025
- refactor(requestcontrol): remove ConsumerPlugin from DataProducer by @yehuditkerido in #1039
- fix: use int32 for InferenceObjective priority by @zhouyou9505 in #1037
- fix: allow standalone sidecar resources to be configured by @zhouyou9505 in #1040
- reduce logging level of envoy sidecar to warn instead of trace by @ahg-g in #1044
- refactor(plugins): rename PrepareRequestData and related refactor by @rahulgurnani in #1026
- deps(actions): bump crate-ci/typos from 1.46.0 to 1.46.1 in the github-actions group by @dependabot[bot] in #1055
- fix nonCachedTokens zero-value behavior description in docs by @mayabar in #871
- tokenizer: add vLLM HTTP /render backend as default by @vMaroon in #890
- feat: add probabilistic-admitter plugin by @jgchn in #959
- feat: add x-llm-d-request-dropped-reason header to flow control responses by @lioraron in #1027
- helm: split monitoring provider from gateway provider by @zhouyou9505 in #1049
- update release notes for PR #890 by @github-actions[bot] in #1078
- update release notes for PR #959 by @github-actions[bot] in #1084
- Second step in refactoring the FlowControl component by @shmuelk in #891
- docs: update sig channel name and meeting links by @nilig in #1097
- doc: update the google meet link by @varad-ahirwadkar in #1087
- filediscovery-2: add DiscoveryBackendStore interface, NewDiscoveryNotifier … by @ezrasilvera in #1081
- Adds configuration file for sidecar proxy by @DhritiShikhar in #683
- fix: prefill cached tokens for NIXL decode usage by @kebe7jun in #916
- [docs] expand submitting changes to include scope and deprecation by @elevran in #1114
- Cleanup migrated GAIE imports by @yankay in #1125
- update: change after rename git repo to "llm-d-router" by @zdtsw in #1105
- fix: resolve Trivy CVEs and CI action deprecation warnings by @elevran in #1108
- build: pre-install govulncheck in builder image by @gkneighb in #1132
- docs: link out llm-d org CONTRIBUTING.md from README and DEVELOPMENT by @hexfusion in #1133
- file-discovery-1: add BackendUpsert/Delete interface methods and extract upertEndpoint helper by @ezrasilvera in #1080
- [docs] reference llm-d CONTRIBUTING.md by @elevran in #1137
- Upgraded Istio to 1.29.2 by @shmuelk in #1052
- [1/3] feat(precise-scorer): switch to absolute matched/total normalization by @vMaroon in #1107
- fix 404 link in docs by @ahg-g in #1150
- fix: prevent EPP response body queue panic by @wenhug in #1147
- rename BackendUpsert/Delete and DiscoveryBackendStore to us… by @ezrasilvera in #1154
- chore(GHA): rename internal variable for workflow by @zdtsw in #1100
- rename docker images to router-endpoint-picker and router-disagg-sidecar by @tessapham in #1098
- ci: trigger check workflows on pull_request_target for bot-created PRs by @elevran in #1136
- chore: remove call to fetch GIE CRD from remote, use local ones by @zdtsw in #1170
- chore: update Helm deploy for renaming llm-d-inference-scheduler to llm-d-router by @zdtsw in #1166
- chore: rename go model from llm-d-inference-scheduler to llm-d-router by @zdtsw in #1164
- generic read attribute function by @nirrozenbaum in #1163
- feat(sidecar): add chunked decode by @andreyod in #822
- ci: use GITHUB_TOKEN instead of PAT for GHCR authentication by @elevran in #1175
- docs: update filter tutorial for lable-selector-filter plugin by @zdtsw in #1151
- fix(mocks): give mock extractors distinct Types by @hexfusion in #1143
- Sidecar references old repo name by @elevran in #1195
- update release notes for PR #916 by @github-actions[bot] in #1129
- update release notes for PR #822 by @github-actions[bot] in #1176
- update release notes for PR #683 by @github-actions[bot] in #1117
- chore: more cleanup in docs + tracing for renaming to llm-d-router by @zdtsw in #1196
- perf(epp/scheduling): cut Scheduler.Schedule allocations ~90% on large fleets by @gkneighb in #1171
- update release notes for PR #1171 by @github-actions[bot] in #1201
- Fix InferenceObjective poolRef group by @learner0810 in #1194
- Add wellknown EPP config test by @liu-cong in #1180
- Add producer name (defualt to the producer type) to producer/consumer datakey by @liu-cong in #1183
- deps(actions): bump crate-ci/typos from 1.46.1 to 1.46.2 in the github-actions group across 1 directory by @dependabot[bot] in #1209
- deps(go): bump the go-dependencies group across 1 directory with 6 updates by @dependabot[bot] in #1210
- refactor(metrics): replace prometheus server parser with regex by @elevran in #1138
- Remove InferencePool and InferencePoolImport CRD definitions by @notpad in #1204
- feat(helm): auto-select passthrough-parser for triton model server. This prevents the EPP from crashing when attempting to route standard ML payloads (like KServe v2) using the default OpenAI parser. by @andrewdoatgoogle in #1208
- file-discovery (3/4): add FileDiscovery plugin with YAML/JSON file loading and fsnotify watch by @ezrasilvera in #1135
- feat: add native metrics mapping for standard triton. This allows the EPP to monitor standard NVIDIA Triton Inference Server workloads without requiring custom Data Layer configurations. by @andrewdoatgoogle in #1207
- Fix Triton Routing Failure in Legacy Metrics Mode by @weizhoublue in #1230
- ci: type prerelease workflow input as boolean by @elevran in #1211
- Rename EPP-managed headers to x-llm-d by @wenhug in #1161
- docs: rename
inference schedulertollm-d Routerand fix typos by @Iceber in #1233 - docs: replace relative paths with root-relative links in plugin READMEs by @Iceber in #1232
- helm: Set required validations for EPP CPU and Mem requests by @mayuka-c in #1229
- feat: expose metrics registration through plugin handle by @wenhug in #1173
- fix(test): replace fmt.Println with proper test logging by @Iceber in #1228
- refactor(datalayer): wrap sync.Maps in typed managers by @hexfusion in #1057
- refactor: rename runProtocal to handler by @zdtsw in #1215
- feat(precise-prefix-cache): DP support in discovery mode by @vMaroon in #1106
- test(framework): add unit tests for plugin and scheduling interfaces by @kube-gopher in #1226
- Inflightload tokens accounting by @kaushikmitr in #1029
- prefixcacheaffinity: add MaxTokensInFlightPenalty load gate by @kaushikmitr in #1024
- Add gRPC transport controls by @LukeAVanDrie in #1155
- feat(apix): migrate v1alpha2 objective/rewrite defaults to llm-d.ai by @zhouyou9505 in #1169
- WIP: added debug test by @capri-xiyue in #1225
- fix(epp/parsers/openai): degrade gracefully on malformed usage fields by @gkneighb in #1240
- feat(flowcontrol): allow separate default band config for negative priorities by @Jwrede in #1099
- test(epp): add unit regression test for DynamicMetadata encode path by @adelsam in #1237
- helm: move shared inferenceExtension defaults into epplib by @varad-ahirwadkar in #1063
- [2/3] feat(precise-prefix-cache): add precise-prefix-cache-producer by @vMaroon in #1112
- change EPP health check log level to error by @zetxqx in #1258
- Update helm charts names by @ahg-g in #1254
- Revert "WIP: added debug test" by @capri-xiyue in #1251
- fix: make PR kind label workflow work with pull_request_target by @varad-ahirwadkar in #1259
- Refactored the two helm charts to use common parameters defined in the library by @ahg-g in #1260
- Helm refactor by @ahg-g in #1262
- add liu-cong to reviewers and auto-assign by @liu-cong in #1263
- Refactored epp container specific parameters by @ahg-g in #1264
- [Helm] Refactored sidecar parameter to become proxy specific by @ahg-g in #1265
- Add new metrics with single subsystem by @rlakhtakia in #1071
- feat: added tokenplaceholder guess algorithm with image content in chat completion api for approximate prefix cache aware plugin by @capri-xiyue in #1018
- Add multimodal encoder-cache affinity scorer by @guygir in #901
- Fix(helm): router.epp.pluginsCustomConfig not rendered in ConfigMap by @weizhoublue in #1272
- fix: validate ZMQ port range to prevent KV-events subscriber startup failures on multi-rank pods by @weizhoublue in #1242
- [Helm] Added build and push github actions to ci-dev workflow by @ahg-g in #1277
- [Helm] Adding a dedicated Tokenizer option to the helm charts by @ahg-g in #1266
- Fixing the chart build to distinguish between release and dev by @ahg-g in #1279
- fix(helm): align verify-helm.sh --set paths with router.* schema by @weizhoublue in #1282
- update release notes for PR #1155 by @github-actions[bot] in #1241
- Stabilize EPP integration tests and improve sync diagnostics by @ErikJiang in #1286
- Fix : correct targetPorts format in verify-helm-charts and add CI validation job by @weizhoublue in #1296
- Third step in refactoring the FlowControl component by @shmuelk in #1128
- Add CPU and other missing pprof endpoints by @rsevilla87 in #1235
- docs(plugins): standardize loadaware scorer README by @revit13 in #1076
- deprecate UDS-backend in
token-producerand suggest configuring the vllm render instead by @vMaroon in #1079 - fix: allow unstale workflow to remove labels from PRs by @varad-ahirwadkar in #1281
- feat(epp/plugins): strict plugin-config parsing via *json.Decoder by @gkneighb in #1134
- ci: stage release notes as per-PR fragments by @elevran in #1301
- update: pull GIE CRD from upstream instead of locally by @zdtsw in #1043
- Fix: config drift of vllmConfig.URL after field rename by @weizhoublue in #1305
- [Helm] inferenceObjective available for both modes and graduate the httproute option by @ahg-g in #1275
- feat(docs): agentic context engineering by @vMaroon in #1295
- refactor: a bunch of clean up and refactor in routing sidecar by @zdtsw in #1200
- docs: consolidate pending release-notes entries by @elevran in #1309
- Inflightload tokens ttl by @kaushikmitr in #1030
- fix(openai): parse prompt tokens details usage by @zhouyou9505 in #1276
- optimize input token estimation in getUserInputLenInTokens by @learner0810 in #1011
- Consolidate flow control and request handler parameters in the config api by @ahg-g in #1247
- FileDiscovery plugin (4/4) - wiring into the runner by @ezrasilvera in #1218
- chore(deps): bump llm-d-kv-cache to v0.8.1 by @vMaroon in #1323
- docs: add release-note fragment for PR #1247 by @llm-d-router-release-notes[bot] in #1325
- deps(actions): bump the github-actions group with 3 updates by @dependabot[bot] in #1324
- Pre admission processor by @D-Sai-Venkatesh in #1244
- docs: add release-note fragment for PR #1244 by @llm-d-router-release-notes[bot] in #1329
- refactor: updated v1/models to implement ProducerPlugin, moved its data definition to attribute by @irar2 in #1231
- docs(discovery): correct binary name, module path, Factory signature,… by @ezrasilvera in #1338
- docs: update release template image links to new image names by @elevran in #1337
- Helm deprecation checks by @ahg-g in #1302
- ci: drop pull_request_target from PR checks workflow by @elevran in #1326
- remove scheduling.CycleState by @elevran in #1335
- docs: add release-note fragment for PR #1302 by @llm-d-router-release-notes[bot] in #1339
- Fixed a bug in the chart rendering when adding a suffix by @ahg-g in #1352
- docs: backfill release-note fragments for PRs #1030, #1218, #1276 by @elevran in #1327
- add per-request attribute store to InferenceRequest by @elevran in #1351
- Feat/add currequest tokens by @kaushikmitr in #1319
- Fix typos in comments by @jtechapps in #1357
- Log a warning when infObj or infModelRewrite objects are not installed on the cluster and not reconciled by @ahg-g in #1358
- ci: add standalone lint workflow by @sagearc in #1313
- fix: resolve gRPC size parameter omission in runWithFileDiscovery runner initialization by @weizhoublue in #1362
- fix(test): only reuse same kind cluster if kube current-context is set to it by @zdtsw in #1322
- Update simulator version to v0.9 by @mayabar in #1234
- fix(ci): go module cache never saved due to permission error by @sagearc in #1314
- feat(sessionid): add session-id producer plugin by @elevran in #1372
- docs: add release-note fragment for PR #1372 by @llm-d-router-release-notes[bot] in #1379
- feat(epp): add /inference/v1/generate endpoint support by @dmitripikus in #1248
- docs: add release-note fragment for PR #1248 by @llm-d-router-release-notes[bot] in #1390
- fix(scorer): treat unset maxBusyScore as default 1.0 in active-reques… by @ophirazulai in #919
- Bump vllm render image to v0.21.0 by @revit13 in #1377
- refactor(datalayer): typed Extractor[T] and source-as-dispatcher polling by @hexfusion in #1073
- docs: add release-note fragment for PR #1234 by @llm-d-router-release-notes[bot] in #1369
- Fourth step in refactoring of the Flow Control layer by @shmuelk in #1311
- vertexai parser add streamRawPredictServiceMethod and rawPredict support by @zetxqx in #1261
- fix: isolate HTTP clients to prevent config pollution and secure TLS defaults by @weizhoublue in #1395
- epp: wire FlowControl into file-discovery mode by @ezrasilvera in #1368
- feat(parsers): support Anthropic
v1/messagesAPI by @tessapham in #1088 - chore(refactor): move shared const to pkg/common/routing by @zdtsw in #1321
- fix: disagg multi-modal input content by @namgyu-youn in #1402
- Bug: LRUCapacityPerServer config is not respected by @liu-cong in #1399
- epp: improve configuration logging formatting by @liu-cong in #1415
- fix(epd): add missing video_url to sidecar mmTypes after EPP encode routing by @weizhoublue in #1414
- ci: rename lint-and-test job to test by @elevran in #1393
- remove pkg/epp/backend/metrics and enableLegacyMetrics feature gate by @elevran in #1418
- docs: add release-note fragment for PR #1418 by @llm-d-router-release-notes[bot] in #1422
- ci: drop redundant pull_request trigger from auto-assign workflow by @elevran in #1423
- add MayConsumePlugin interface for optional data key consumption by @sudoalok in #1249
- ci: skip unused render sidecar in e2e by @sagearc in #1411
- ci: add e2e labels and filter support by @sagearc in #1409
- ci: use build-push-action for image builds by @sagearc in #1412
- docs: add release-note fragment for PR #1402 by @llm-d-router-release-notes[bot] in #1407
- fix(datalayer): dispatch endpoint update notifications by @zhouyou9505 in #1404
- refactor: flow control, move plugin resolution into the config loader by @evacchi in #1333
- fix(build): run test via Make target on Fedora by @zdtsw in #1406
- ci: cache image compile layers by @sagearc in #1410
- ci: implement Phase 2 quality gates and reliability improvements by @Jwrede in #1424
- Unified the metrics subsystem under llm_d_router_epp by @ahg-g in #1425
- refactor(unifiedInferenceReqeustBody)[1/n]: unify request body prompt and token representations by @zetxqx in #1380
- Moved metrics our of pkg/metrics to its canonical place by @ahg-g in #1427
- fix: detect legacy inference CRDs at startup by @richl9 in #1434
- feat: replace UDS support in Helm chart with new tokenizer render by @zdtsw in #1428
- feat(epp): add conditional-decode support (Prefer: if-available) - Depends on PR #1248 by @dmitripikus in #1288
- Revert "refactor(unifiedInferenceReqeustBody)[1/n]: unify request body prompt and token representations" by @ahg-g in #1440
- fix: release images embed CommitSHA=unknown, breaking version traceability in logs, metrics, and traces by @weizhoublue in #1437
- [3/3] feat(precise-prefix-cache): scorer deprecation + maximal implicit use of the new producer by @vMaroon in #1121
- docs: add release-note fragment for PR #1121 by @llm-d-router-release-notes[bot] in #1447
- docs: add release-note fragment for PR #1288 by @llm-d-router-release-notes[bot] in #1442
- feat(epp/prefix-cache): clamp autotuned blockSizeTokens to bound EPP indexer memory by @gkneighb in #1160
- fix(epp): support legacy inference API group in cache configuration by @ahg-g in #1449
- docs: add release-note fragment for PR #1449 by @llm-d-router-release-notes[bot] in #1452
- ci: split test workflow and e2e shards by @sagearc in #1315
- docs: add release-note fragment for PR #1160 by @llm-d-router-release-notes[bot] in #1451
- update(GHA): to scan image before push out by @zdtsw in #1441
- metrics: Add plugin name and type labels to EPP metrics by @liu-cong in #1436
- docs: add release-note fragment for PR #1436 by @llm-d-router-release-notes[bot] in #1453
- Allow skipped requests to go through scheduling profiles by @zetxqx in #1387
- feat(sidecar): remove deprecated --inference-pool-name and --inference-pool-namespace flags by @elevran in #1416
- deps(actions): bump the github-actions group with 3 updates by @dependabot[bot] in #1464
- feat(docs): tighten operating rules by @roytman in #1446
- fix vulnerabilities on main by @rahulgurnani in #1465
- fix(helm): remove deprecated legacy metric CLI flags from EPP template by @weizhoublue in #1466
- fix(ci): exclude release-notes/* branches from release baseline lookup by @elevran in #1460
- feat: update encoder cache structure and add encoder-cache query and hit metrics by @rahulgurnani in #1385
- Fixes for CI after latest vulnerability fix by @shmuelk in #1468
- feat: support multiple parsers in EPP config by @tessapham in #1448
- deps(actions): bump crate-ci/typos from 1.47.0 to 1.47.2 in the github-actions group by @dependabot[bot] in #1479
- Add check-all-tools target to verify environment setup by @omerap12 in #1469
- fix(epp/metrics): treat absent vllm:lora_requests_info as no LoRA loaded by @gkneighb in https://github.com/llm-d/llm-d-router/pull/1467
- Allow multiple Parsers by adding match interface and building routing based on it by @zetxqx in https://github.com/llm-d/llm-d-router/pull/1475
- refactor: contain prompt/token handling in the token-producer by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1445
- docs: add release-note fragment for PR #1475 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1490
- Enable all HTTP parsers by default in EPP by @zetxqx in https://github.com/llm-d/llm-d-router/pull/1488
- docs: add release-note fragment for PR #1488 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1491
- fix(apix): correct InferenceModelRewrite field metadata by @richl9 in https://github.com/llm-d/llm-d-router/pull/1417
- Fix/inflight token dependency by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1493
- docs: add release-note fragment for PR #1493 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1496
- fix: preserve assistant tool_calls through tokenizer render path by @yankay in https://github.com/llm-d/llm-d-router/pull/1046
- fix(requestcontrol): let data producers override the execution timeout by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1504
- feat(parser) add messages/count_tokens support in anthropic parser by @zetxqx in https://github.com/llm-d/llm-d-router/pull/1509
- Add dynamic attributes for EPP AttributeMap by @LukeAVanDrie in https://github.com/llm-d/llm-d-router/pull/1478
- docs: add release-note fragment for PR #1509 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1510
- fix: allow model rewrite rules to fill missing model field by @Jwrede in https://github.com/llm-d/llm-d-router/pull/1426
- increase lychee link-checker timeout to 60s by @roytman in https://github.com/llm-d/llm-d-router/pull/1523
- ci(e2e): pull kindest/node from mirror.gcr.io instead of docker.io by @hexfusion in https://github.com/llm-d/llm-d-router/pull/1526
- pkg/common/observability/tracing: consolidate tracing initialization by @LukeAVanDrie in https://github.com/llm-d/llm-d-router/pull/1513
- docs: add release-note fragment for PR #1426 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1520
- docs: add release-note fragment for PR #1513 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1530
- fix(epp): route chat-completions sub-paths through the OpenAI parser by @hexfusion in https://github.com/llm-d/llm-d-router/pull/1515
- docs: add release-note fragment for PR #1515 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1537
- feat(approximate prefix): add Anthropic /v1/messages prefix-cache estimation by @zetxqx in https://github.com/llm-d/llm-d-router/pull/1508
- feat(metrics): extract configured custom metrics by @richl9 in https://github.com/llm-d/llm-d-router/pull/1462
- fix(tokenizer): forward raw payload to vLLM render endpoint by @sagearc in https://github.com/llm-d/llm-d-router/pull/1115
- fix(epp): extract upstream traceparent and re-parent server span by @gyliu513 in https://github.com/llm-d/llm-d-router/pull/1514
- test(flowcontrol): verify per-band stats propagation in registry by @RishabhSaini in https://github.com/llm-d/llm-d-router/pull/1544
- feat: add request-format agnostic EPP configuration by @sudoalok in https://github.com/llm-d/llm-d-router/pull/1486
- fix(tracing): unify tracer scope and version metadata by @gyliu513 in https://github.com/llm-d/llm-d-router/pull/1539
- docs: add release-note fragment for PR #1539 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1555
- Use unweighted cached-block count for prefix PD decider by @jia-gao in https://github.com/llm-d/llm-d-router/pull/1413
- feat(sidecar): disaggregate /inference/v1/generate (token-in) requests by @S1ro1 in https://github.com/llm-d/llm-d-router/pull/1458
- test: remove stale legacy selector case from verify-helm by @ErikJiang in https://github.com/llm-d/llm-d-router/pull/1558
- test: Continued work on merging the IGW End to End tests with those of the llm-d-router by @shmuelk in https://github.com/llm-d/llm-d-router/pull/1557
- feat(epp): add modality label to multimodal encoder-cache metrics by @hexfusion in https://github.com/llm-d/llm-d-router/pull/1536
- Codeowners based ownership by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1512
- fix(preciseprefixcache): honor cache_salt in block hashing by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1494
- Fix username in CODEOWNERS for precise prefix cache by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1560
- Add support to the EC Nixl connector. by @revit13 in https://github.com/llm-d/llm-d-router/pull/1444
- Add script and Makefile target to check simulator image does not use :latest by @apollofps in https://github.com/llm-d/llm-d-router/pull/1457
- disagg: add test coverage for epd test by @namgyu-youn in https://github.com/llm-d/llm-d-router/pull/1405
- docs: add release-note fragment for PR #1444 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1563
- clean up references to inference-extension by @ahg-g in https://github.com/llm-d/llm-d-router/pull/1561
- fix(approximate prefix): include tools in approximate prefix-cache estimator for chatCompletion and messages by @zetxqx in https://github.com/llm-d/llm-d-router/pull/1554
- docs: add release-note fragment for PR #1554 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1569
- Add EPP model server protocol to docs by @BenjaminBraunDev in https://github.com/llm-d/llm-d-router/pull/1267
- Add plugin state debug endpoint by @wenhug in https://github.com/llm-d/llm-d-router/pull/1148
- feat: add support for mooncake as new kv-connector in routing sidecar by @zdtsw in #1193
- metrics: use centralized LLMDRouterEndpointPickerSubsystem in mm cache producer by @hexfusion in https://github.com/llm-d/llm-d-router/pull/1576
- bug: Added back podman client to builder image by @shmuelk in https://github.com/llm-d/llm-d-router/pull/1577
- Fix race condition in logger test by @nicolexin in https://github.com/llm-d/llm-d-router/pull/1567
- test(flowcontrol): add full-path stress benchmark by @RishabhSaini in https://github.com/llm-d/llm-d-router/pull/1543
- docs: add release-note fragment for PR #1536 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1559
- fix(ci): lint by @tessapham in https://github.com/llm-d/llm-d-router/pull/1586
- feat(metrics): add core TTFT and stream-aware TPOT metrics by @tessapham in #1499
- docs: add README for requestcontrol/dataproducer directory by @rahulgurnani in https://github.com/llm-d/llm-d-router/pull/1455
- fix(sidecar): take effect mooncake bootstrap port configured in yaml by @weizhoublue in https://github.com/llm-d/llm-d-router/pull/1589
- flowcontrol: move priority band provisioning off request hot path by @gyliu513 in https://github.com/llm-d/llm-d-router/pull/1354
- fix: strip query parameters from request path before parser resolution by @Gregory-Pereira in https://github.com/llm-d/llm-d-router/pull/1585
- standalone deployment with proxy as a separate service by @satyamg1620 in https://github.com/llm-d/llm-d-router/pull/1575
- docs: port logging guide to DEVELOPMENT.md by @liu-cong in https://github.com/llm-d/llm-d-router/pull/1506
- Reduce allocations in response usage parsing in openai parser by @rsevilla87 in #1583
- feat(approximateprefix): default maxPrefixTokensToMatch to 131072 by @kaushikmitr in https://github.com/llm-d/llm-d-router/pull/1548
- feat(prefixcacheaffinity): derive TTFT gate from peak prefill throughput by @kaushikmitr in https://github.com/llm-d/llm-d-router/pull/1547
- docs: add release-note fragment for PR #1548 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1611
- docs: add release-note fragment for PR #1575 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1605
- observability: standardize OTel service, scope, and span naming by @gyliu513 in https://github.com/llm-d/llm-d-router/pull/1565
- docs: update deprecated parser config examples in readme and helm templates by @liu-cong in https://github.com/llm-d/llm-d-router/pull/1614
- Set session affinity token in response headers by @learner0810 in https://github.com/llm-d/llm-d-router/pull/1597
- Fix/flowcontrol band gc desired by @LukeAVanDrie in https://github.com/llm-d/llm-d-router/pull/1608
- [sidecar] Add retry option for prefill queries by @ilmarkov in https://github.com/llm-d/llm-d-router/pull/1206
- docs: add release-note fragment for PR #1206 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1622
- docs: add release-note fragment for PR #1608 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1621
- fix(concurrency-detector): report zero saturation for endpoints with nil metadata by @Jwrede in https://github.com/llm-d/llm-d-router/pull/1580
- fix: Handle string array prompts in /v1/completions scoring by @albertoperdomo2 in https://github.com/llm-d/llm-d-router/pull/860
- routing sidecar support /v1/messages for pd by @Gregory-Pereira in https://github.com/llm-d/llm-d-router/pull/1587
- docs: add logr logging conventions to AGENTS.md by @Jwrede in https://github.com/llm-d/llm-d-router/pull/1571
- Add tenant labels to EPP SLO metrics by @googs1025 in https://github.com/llm-d/llm-d-router/pull/1518
- epp: add opt-in ext_proc gRPC stream metrics by @hexfusion in https://github.com/llm-d/llm-d-router/pull/1603
- fix(tokenizer): label estimate multimodal features by modality by @hexfusion in https://github.com/llm-d/llm-d-router/pull/1618
- docs: add release-note fragment for PR #1603 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1624
- feat(metrics): add inter-token latency (ITL) metric by @tessapham in #1591
- make EPP gRPC healthCheck configurable in helm by @zetxqx in https://github.com/llm-d/llm-d-router/pull/1607
- fix(epp): record error status on request and orchestration spans by @gyliu513 in https://github.com/llm-d/llm-d-router/pull/1626
- refactor(flowcontrol): improve observability logging by @RishabhSaini in https://github.com/llm-d/llm-d-router/pull/1609
- docs: add release-note fragment for PR #1626 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1629
- docs: add release-note fragment for PR #1607 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1628
- fix(flowcontrol): map shutdown-drained requests to 503 by @RishabhSaini in https://github.com/llm-d/llm-d-router/pull/1612
- Add session-affinity filter and configurable header for session affinity by @roytman in https://github.com/llm-d/llm-d-router/pull/1631
- test(flowcontrol): Add flow control integration tests by @RishabhSaini in https://github.com/llm-d/llm-d-router/pull/1545
- docs: add release-note fragment for PR #1631 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1637
- feat: switch latency predictor to llm-d dev-latest images by @madhugoutham in https://github.com/llm-d/llm-d-router/pull/1570
- Align chart docs with updated default EPP CPU request by @Copilot in https://github.com/llm-d/llm-d-router/pull/1642
- feat(prefixcacheaffinity): raise MaxTTFTPenaltyMs default to 18000 by @kaushikmitr in https://github.com/llm-d/llm-d-router/pull/1639
- Migrate grafana dashboard by @LukeAVanDrie in https://github.com/llm-d/llm-d-router/pull/1550
- docs: add release-note fragment for PR #1550 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1643
- mori : WRITE-mode + Wide-EP Internode sidecar support (2P2D DP=EP=16 TP=1) by @shikamd123 in https://github.com/llm-d/llm-d-router/pull/1564
- Add program aware fairness plugin by @D-Sai-Venkatesh in https://github.com/llm-d/llm-d-router/pull/1534
- docs: update precise-prefix-cache plugin and default plugins by @zdtsw in https://github.com/llm-d/llm-d-router/pull/1649
- Refactor request metrics by @ahg-g in https://github.com/llm-d/llm-d-router/pull/1644
- chore(deps): bump llm-d-kv-cache to v0.9.0-rc.1 by @vMaroon in https://github.com/llm-d/llm-d-router/pull/1648
- Export cache hit metrics for encoder cache data producer by @Jwrede in https://github.com/llm-d/llm-d-router/pull/1429
- docs: add release-note fragment for PR #1429 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1652
- Extend llm_d_router_epp_plugin_duration_seconds to all extension points by @liu-cong in https://github.com/llm-d/llm-d-router/pull/1651
- docs: add release-note fragment for PR #1651 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1655
- Tag container images with commit SHA by @pierDipi in https://github.com/llm-d/llm-d-router/pull/1657
- deps(go): bump the kubernetes group with 6 updates by @dependabot[bot] in https://github.com/llm-d/llm-d-router/pull/1659
- Session affinity filter/scorer can optionally pick the scheduling profile to inject the routed endpoint from. This enables P/D disaggregation support. by @liu-cong in https://github.com/llm-d/llm-d-router/pull/1653
- docs: add release-note fragment for PR #1653 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1662
- fix: correct shell bugs, broken links, and gaps in release issue template by @liu-cong in https://github.com/llm-d/llm-d-router/pull/1663
- fix(chart): qualify latency predictor uvicorn module path by @kaushikmitr in https://github.com/llm-d/llm-d-router/pull/1666
- feat(prefixcacheaffinity): default explorationProbability to 0 by @kaushikmitr in https://github.com/llm-d/llm-d-router/pull/1664
- changed subsystem metrics name from llm_d_router_epp to llm_d_epp by @ahg-g in #1661
- docs: add release-note fragment for PR #1661 by @llm-d-router-release-notes[bot] in https://github.com/llm-d/llm-d-router/pull/1669
- Normalize EPP span attributes under the llm_d.epp.* namespace by @gyliu513 in https://github.com/llm-d/llm-d-router/pull/1670
- Pin :latest image tags and enable strict presubmit check by @Jwrede in https://github.com/llm-d/llm-d-router/pull/1579
New Contributors
- @setsunakute made their first contribution in #494
- @evacchi made their first contribution in #503
- @acardace made their first contribution in #512
- @MregXN made their first contribution in #518
- @davidbreitgand made their first contribution in #525
- @kyanokashi made their first contribution in #569
- @sagearc made their first contribution in #570
- @albertoperdomo2 made their first contribution in #602
- @yangligt2 made their first contribution in #632
- @sallyom made their first contribution in #506
- @gyliu513 made their first contribution in #666
- @roytman made their first contribution in #696
- @Mohamedma96 made their first contribution in #697
- @modassarrana89-new made their first contribution in #691
- @dagrayvid made their first contribution in #669
- @hexfusion made their first contribution in #742
- @revit13 made their first contribution in #777
- @zetxqx made their first contribution in #806
- @tlrmchlsmth made their first contribution in #746
- @asharkhan3101 made their first contribution in #599
- @ahg-g made their first contribution in #835
- @andresllh made their first contribution in #861
- @kaushikmitr made their first contribution in #881
- @jia-gao made their first contribution in #894
- @danehans made their first contribution in #950
- @abdallahsamabd made their first contribution in #898
- @ezrasilvera made their first contribution in #908
- @noalimoy made their first contribution in #905
- @Jwrede made their first contribution in #960
- @szedan-rh made their first contribution in #842
- @yehuditkerido made their first contribution in #997
- @wenhug made their first contribution in #931
- @zhouyou9505 made their first contribution in #989
- @github-actions[bot] made their first contribution in #1028
- @rahulgurnani made their first contribution in #1026
- @jgchn made their first contribution in #959
- @lioraron made their first contribution in #1027
- @varad-ahirwadkar made their first contribution in #1087
- @DhritiShikhar made their first contribution in #683
- @kebe7jun made their first contribution in #916
- @gkneighb made their first contribution in #1132
- @tessapham made their first contribution in #1098
- @liu-cong made their first contribution in #1180
- @notpad made their first contribution in #1204
- @andrewdoatgoogle made their first contribution in #1208
- @weizhoublue made their first contribution in #1230
- @Iceber made their first contribution in #1233
- @mayuka-c made their first contribution in #1229
- @kube-gopher made their first contribution in #1226
- @LukeAVanDrie made their first contribution in #1155
- @capri-xiyue made their first contribution in #1225
- @adelsam made their first contribution in #1237
- @rlakhtakia made their first contribution in #1071
- @ErikJiang made their first contribution in #1286
- @rsevilla87 made their first contribution in #1235
- @D-Sai-Venkatesh made their first contribution in #1244
- @jtechapps made their first contribution in #1357
- @ophirazulai made their first contribution in #919
- @namgyu-youn made their first contribution in #1402
- @sudoalok made their first contribution in #1249
- @richl9 made their first contribution in #1434
- @omerap12 made their first contribution in #1469
- @RishabhSaini made their first contribution in https://github.com/llm-d/llm-d-router/pull/1544
- @S1ro1 made their first contribution in https://github.com/llm-d/llm-d-router/pull/1458
- @apollofps made their first contribution in https://github.com/llm-d/llm-d-router/pull/1457
- @BenjaminBraunDev made their first contribution in https://github.com/llm-d/llm-d-router/pull/1267
- @nicolexin made their first contribution in https://github.com/llm-d/llm-d-router/pull/1567
- @satyamg1620 made their first contribution in https://github.com/llm-d/llm-d-router/pull/1575
- @ilmarkov made their first contribution in https://github.com/llm-d/llm-d-router/pull/1206
- @madhugoutham made their first contribution in https://github.com/llm-d/llm-d-router/pull/1570
- @Copilot made their first contribution in https://github.com/llm-d/llm-d-router/pull/1642
- @shikamd123 made their first contribution in https://github.com/llm-d/llm-d-router/pull/1564
Full Changelog: v0.4.0-rc.1...v0.9.0-rc.2