Feature request: Cost reporting via dynamic metadata in EPP

FYI I also put the design in a globally readable and commentable Google Doc: https://docs.google.com/document/d/10_u1Pvb3MD2Wii6NB50OVIGQVN3p_YEuV0j7PBdIOiY/edit?tab=t.0 if that's easier to leave comments on.

## Overview

The Gateway API Inference Extension project aims to optimize self-hosting Generative Models on GKE. A key component of this system is the Endpoint Picker (EPP), which intelligently routes requests to appropriate model server backends. For advanced routing and load balancing, particularly in features like prefix sharding, the data plane needs to be aware of the "cost" associated with processing a request.

This proposal defines a flexible plugin for the EPP server configuration to allow users to declaratively configure how this cost is calculated and reported, without requiring code changes to the EPP binary itself. This API will be configurable via the existing `--config-file` or `--config-text command-line flags` used by the EPP server.

## Proposed API

```
apiVersion: config.apix.gateway-api-inference-extension.sigs.k8s.io/v1alpha1
kind: EndpointPickerConfig
plugins:
  - name: input-tokens-cost-reporter
    type: cost-reporter
    parameters:
      # Defines where in dynamic metadata to return the data
      metric:
        namespace: envoy.lb  # Defaults to envoy.lb if omitted
        # What key to use in the provided namespace for the value from expression
        name: x-gateway-inference-request-cost      
        # Specifies the source of data for the CEL expression.
        dataSource: responseBody
        # The CEL expression to calculate the cost.
        expression: |
          (has(responseBody.usage.prompt_tokens) ? responseBody.usage.prompt_tokens : 0) + \
          (has(responseBody.usage.completion_tokens) ? responseBody.usage.completion_tokens : 0)
        # Optional: CEL expression to determine if this metric should be calculated/reported
        condition: "has(responseBody.usage)"
```

## Detailed design
### Data plane
The initial implementation will only support parsing the response body. The cost reporting logic will be triggered within the response processing path, specifically when handling the response body.
From pkg/epp/requestcontrol/[plugins.go](http://plugins.go/), the plugin will implement the `ResponseStreaming` and `ResponseComplete` interfaces.

For each configured metric, if a condition is provided, it's evaluated first. If the condition is met (or absent), the expression is evaluated against the dataSource. The result is expected to be an integer. If not, or if evaluation fails, the defaultValue is used if provided. Evaluation failure will not result in failing the request, instead a warning log will be emitted.

We will avoid buffering the entire response. The CEL expression will be evaluated individually on each chunk of the response body in the streaming case. The first successful evaluation will stop subsequent evaluation of the response body.

The calculated cost value is added to the ext-proc response to instruct Envoy to set the dynamic metadata in the specified namespace with the specified name.

### Configuration Loading

The EPP server will parse the costReporting section from the YAML provided via `--config-file` or `--config-text`.

### CEL Environment

The EPP binary would take a dependency upon the  [github.com/google/cel-go](http://github.com/google/cel-go) library. When the plugin is enabled, the plugin will initialize a CEL environment on a per-request basis. For each metric defined in configuration, it will compile the expression and condition strings. The environment will be configured to understand each of the dataSources. The initial implementation will only support responseBody. The expression assumes an integer output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: Cost reporting via dynamic metadata in EPP #2019

Overview

Proposed API

Detailed design

Data plane

Configuration Loading

CEL Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: Cost reporting via dynamic metadata in EPP #2019

Description

Overview

Proposed API

Detailed design

Data plane

Configuration Loading

CEL Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions