Add AutoTune config to prefix scorer to make it explicit when auto vs. manual config is used #1888

liu-cong · 2025-11-20T21:34:25Z

Previously the prefix cache scorer implicitly overrides the BlockSize and LRUCapacityPerServer via model server metrics. This is the default behavior we want. However it can be confusing if users manually configure these parameters.

This PR adds an autoTune flag to the prefix cache scorer config to allow the user to explicitly set whether they want autoTune or not, and is defaulted to true.

This also allows use to manually set the CPU cache capacity because the corresponding metric isn't ready in vllm yet.

Test

Tried the following config:

 pluginsConfigFile: "default-plugins.yaml"
  # This is the plugins configuration file.
  pluginsCustomConfig:
    default-plugins.yaml: |
      apiVersion: inference.networking.x-k8s.io/v1alpha1
      kind: EndpointPickerConfig
      plugins:
      - type: queue-scorer
      - type: kv-cache-utilization-scorer
      - type: prefix-cache-scorer
        name: gpu-prefix-cache-scorer
        parameters:
          autoTune: true
      - type: prefix-cache-scorer
        name: cpu-prefix-cache-scorer
        parameters:
          autoTune: false
          lruCapacityPerServer: 41000
      schedulingProfiles:
      - name: default
        plugins:
        - pluginRef: queue-scorer
          weight: 2
        - pluginRef: kv-cache-utilization-scorer
          weight: 2
        - pluginRef: gpu-prefix-cache-scorer
          weight: 2
        - pluginRef: cpu-prefix-cache-scorer
          weight: 1

And got the following log:

{"level":"Level(-2)","ts":"2025-11-20T22:58:56Z","caller":"prefix/plugin.go:190","msg":"PrefixCachePlugin initialized","config":{"autoTune":true,"blockSize":64,"maxPrefixBlocksToMatch":256,"lruCapacityPerServer":31250}}                       │
│ {"level":"Level(-2)","ts":"2025-11-20T22:58:56Z","caller":"prefix/plugin.go:190","msg":"PrefixCachePlugin initialized","config":{"autoTune":false,"blockSize":64,"maxPrefixBlocksToMatch":256,"lruCapacityPerServer":41000}}

k8s-ci-robot · 2025-11-20T21:34:28Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

netlify · 2025-11-20T21:34:31Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`23d4d96`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/691f99ad09f8320008d73c90
😎 Deploy Preview	https://deploy-preview-1888--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-11-20T21:34:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liu-cong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/epp/scheduling/framework/plugins/multi/prefix/OWNERS~~ [liu-cong]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

zetxqx · 2025-11-20T21:59:59Z

/lgtm

liu-cong · 2025-11-20T22:16:37Z

/hold

Running some test

…. manual config is used

zetxqx · 2025-11-20T23:10:58Z

/lgtm

liu-cong · 2025-11-20T23:16:21Z

/hold cancel

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 20, 2025

k8s-ci-robot requested review from danehans and elevran November 20, 2025 21:34

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 20, 2025

k8s-ci-robot assigned zetxqx Nov 20, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 20, 2025

liu-cong force-pushed the prefix-plugin-manual-config branch from 68f5cf0 to 7144e95 Compare November 20, 2025 22:13

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 20, 2025

liu-cong marked this pull request as ready for review November 20, 2025 22:16

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Nov 20, 2025

k8s-ci-robot requested review from nirrozenbaum and robscott November 20, 2025 22:16

Add AutoTune config to prefix scorer to make it explicit when auto vs…

23d4d96

…. manual config is used

liu-cong force-pushed the prefix-plugin-manual-config branch from 7144e95 to 23d4d96 Compare November 20, 2025 22:43

liu-cong mentioned this pull request Nov 20, 2025

Release v1.2.0 #1887

Open

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 20, 2025

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 20, 2025

k8s-ci-robot merged commit f357ece into kubernetes-sigs:main Nov 20, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AutoTune config to prefix scorer to make it explicit when auto vs. manual config is used #1888

Add AutoTune config to prefix scorer to make it explicit when auto vs. manual config is used #1888

liu-cong commented Nov 20, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

netlify bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

zetxqx commented Nov 20, 2025

Uh oh!

liu-cong commented Nov 20, 2025

Uh oh!

zetxqx commented Nov 20, 2025

Uh oh!

liu-cong commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add AutoTune config to prefix scorer to make it explicit when auto vs. manual config is used #1888

Add AutoTune config to prefix scorer to make it explicit when auto vs. manual config is used #1888

Conversation

liu-cong commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

netlify bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Nov 20, 2025

Uh oh!

zetxqx commented Nov 20, 2025

Uh oh!

liu-cong commented Nov 20, 2025

Uh oh!

zetxqx commented Nov 20, 2025

Uh oh!

liu-cong commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liu-cong commented Nov 20, 2025 •

edited

Loading

netlify bot commented Nov 20, 2025 •

edited

Loading