Implement Inference Extension #4091

ciarams87 · 2025-10-17T02:33:01Z

Proposed changes

Problem: As a cluster operator managing traffic for generative models
I want to route prompt traffic within my cluster based on generative model request criteria
So that I can build a system to host multiple generative models.

Solution: Add Gateway API Inference Extension support

Ref: https://gateway-api-inference-extension.sigs.k8s.io/

Testing: Extensive manual testing, and conformance testing is enabled in the pipeline, and everything is passing

Closes #3644

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Added Gateway API Inference Extension support

codecov · 2025-10-17T02:38:11Z

Codecov Report

❌ Patch coverage is 85.87922% with 159 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.59%. Comparing base (d1300f7) to head (3bdae01).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
cmd/gateway/endpoint_picker.go	72.29%	37 Missing and 4 partials ⚠️
internal/controller/state/conditions/conditions.go	0.00%	40 Missing ⚠️
internal/controller/manager.go	23.52%	13 Missing ⚠️
cmd/gateway/commands.go	47.61%	11 Missing ⚠️
internal/controller/status/status_setters.go	79.06%	6 Missing and 3 partials ⚠️
internal/framework/controller/resource.go	0.00%	9 Missing ⚠️
internal/controller/state/graph/backend_refs.go	95.26%	7 Missing and 1 partial ⚠️
internal/controller/nginx/config/servers.go	96.05%	4 Missing and 2 partials ⚠️
...ternal/controller/state/dataplane/configuration.go	77.77%	5 Missing and 1 partial ⚠️
internal/controller/status/prepare_requests.go	93.40%	4 Missing and 2 partials ⚠️
... and 3 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4091      +/-   ##
==========================================
- Coverage   86.71%   86.59%   -0.12%     
==========================================
  Files         128      131       +3     
  Lines       16758    17788    +1030     
  Branches       62       74      +12     
==========================================
+ Hits        14531    15404     +873     
- Misses       2042     2185     +143     
- Partials      185      199      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

salonichf5 · 2025-10-17T03:43:25Z

Did some manual testing for inference extension and with some other path based headers, query params and seemed to be fine.

Did not work with URLRewrite filter, is that expectation correct?

salonichf5 · 2025-10-17T03:46:53Z

looks like the helm chart is failing

Also not sure what pipeline wants to do here. Do we just test by installing minimum args needed? or this should have inference flag to test with

  ct install --config .ct.yaml --namespace nginx-gateway --helm-extra-set-args="--set=nginxGateway.image.tag=pr-4091 \
  --set=nginx.image.repository=ghcr.io/nginx/nginx-gateway-fabric/nginx \
  --set=nginx.plus=false \
  --set=nginx.image.tag=pr-4091 \
  --set=nginxGateway.image.pullPolicy=Never \
  --set=nginx.image.pullPolicy=Never \
  --set=nginxGateway.productTelemetry.enable=false"

bjee19

lgtm if we can just get those helm tests working

Problem: To support the full Gateway API Inference Extension, we need to be able to extract the model name from the client request body in certain situations. Solution: Add a basic NJS module to extract the model name. This module will be enhanced (I've added notes) to be included in the full solution. On its own, it is not yet used.

This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled. When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints. In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.

Problem: In order for NGINX to get the endpoint of the AI workload from the EndpointPicker, it needs to send a gRPC request using the proper protobuf protocol. Solution: A simple Go server is injected as an additional container when the inference extension feature is enabled, that will listen for a request from our (upcoming) NJS module, and forward to the configured EPP to get a response in a header.

Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload. Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint. The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.

Update the inference extension design doc to specify different status that needs to be set on Inference Pools to understand its state

…4006) Update gateway inference extension proposal on inability to provide a secure TLS connection to EPP.

Add status to Inference Pools Problem: Users want to see the current status of their Inference pools Solution: Add status for inference pools

Proposed changes Problem: Want to collect number of referenced InferencePools in cluster. Solution: Collect the count of referenced InferencePools. Testing: Unit tests and manually verified collection via debug logs.

When setting inference pool statuses, it loops through all the inference pools and checks if there are any nginx gateways that have parentRefs in the statuses. If there are AND the infernece pool is not referenced (not connected to the graph), it will remove that parentRef.

Enable conformance tests for Inference Extension Co-author: Ciara Stacker <c.stacke@f5.com>

ciarams87 requested review from a team as code owners October 17, 2025 02:33

nginx-bot bot added the release-notes label Oct 17, 2025

github-project-automation bot added this to NGINX Gateway Fabric Oct 17, 2025

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Oct 17, 2025

github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request dependencies Pull requests that update a dependency file helm-chart Relates to helm chart labels Oct 17, 2025

bjee19 approved these changes Oct 17, 2025

View reviewed changes

sjberman and others added 11 commits October 17, 2025 09:55

Adds status information to describe the state of Inference Pools (#3970)

fb5243b

Update the inference extension design doc to specify different status that needs to be set on Inference Pools to understand its state

Update gateway inference extension proposal security considerations (#…

10b76cb

…4006) Update gateway inference extension proposal on inability to provide a secure TLS connection to EPP.

Add status to Inference Pools (#4005)

be1bb09

Add status to Inference Pools Problem: Users want to see the current status of their Inference pools Solution: Add status for inference pools

Collect InferencePoolCount in telemetry (#4008)

6630b85

Proposed changes Problem: Want to collect number of referenced InferencePools in cluster. Solution: Collect the count of referenced InferencePools. Testing: Unit tests and manually verified collection via debug logs.

rebase inference extension branch with main

3778e08

Update inference extension conformance tests (#4089)

3bdae01

Enable conformance tests for Inference Extension Co-author: Ciara Stacker <c.stacke@f5.com>

ciarams87 force-pushed the feat/inference-extension branch from 941ac2d to 3bdae01 Compare October 17, 2025 08:56

shaun-nx approved these changes Oct 17, 2025

View reviewed changes

ciarams87 merged commit 3dd0bed into main Oct 17, 2025
65 of 68 checks passed

ciarams87 deleted the feat/inference-extension branch October 17, 2025 09:29

github-project-automation bot moved this from 🆕 New to ✅ Done in NGINX Gateway Fabric Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Inference Extension #4091

Implement Inference Extension #4091

Uh oh!

ciarams87 commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

salonichf5 commented Oct 17, 2025

Uh oh!

salonichf5 commented Oct 17, 2025

Uh oh!

bjee19 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implement Inference Extension #4091

Implement Inference Extension #4091

Uh oh!

Conversation

ciarams87 commented Oct 17, 2025

Proposed changes

Checklist

Release notes

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

salonichf5 commented Oct 17, 2025

Uh oh!

salonichf5 commented Oct 17, 2025

Uh oh!

bjee19 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Oct 17, 2025 •

edited

Loading