Query EPP and proxy AI traffic #3942

sjberman · 2025-09-22T15:07:03Z

Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload.

Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint.

The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.

Testing: Manual testing e2e traffic. Both with HTTP matching conditions, and without. Also tried Routest that referenced multiple InferencePool rules.

Closes #3838

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload. Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint. The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.

internal/controller/nginx/config/servers.go

internal/controller/state/dataplane/configuration.go

internal/controller/nginx/config/servers_test.go

internal/controller/nginx/config/servers.go

internal/controller/nginx/modules/src/epp.js

bjee19

nice job, lgtm

Copilot

Pull Request Overview

This PR adds support for querying the EndpointPicker Proxy (EPP) and routing AI traffic to inference workloads. It introduces an NJS module that queries a local Go server to get the appropriate AI endpoint and redirects client traffic to the chosen endpoint through internal NGINX locations.

Adds new location types and routing logic for inference backends with optional HTTP matching conditions
Implements a Go shim server that communicates with the EndpointPicker via gRPC
Creates an NJS module for endpoint discovery and traffic redirection with proper argument preservation

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
internal/framework/types/types.go	Defines HTTP headers and port constants for EPP communication
internal/controller/state/graph/	Adds EndpointPickerConfig field to backend references and enforces single InferencePool per rule
internal/controller/state/dataplane/	Adds inference backend tracking and updates backend group creation
internal/controller/nginx/modules/	Replaces model extraction with endpoint discovery functionality in NJS module
internal/controller/nginx/config/	Implements inference location generation and proxy pass logic with variable mapping
cmd/gateway/endpoint_picker.go	Updates EPP handler to use shared constants and adds request body validation
deploy/inference-nginx-plus/deploy.yaml	Enables initial usage report enforcement

Comments suppressed due to low confidence (2)

internal/controller/nginx/config/servers.go:1

Using contains for checking location types could lead to false positives. For example, a type like 'external-internal' would match this condition. Consider using exact string comparison or enum-based checks instead of substring matching for better type safety.

package config

internal/controller/nginx/config/servers.go:1

Similar to the previous issue, using contains for type checking could match unintended types. Consider checking for specific inference location types (InferenceExternalLocationType or InferenceInternalLocationType) explicitly rather than using substring matching.

package config

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

internal/controller/nginx/config/servers_test.go

internal/controller/nginx/config/maps.go

internal/controller/nginx/config/servers.go

internal/controller/nginx/modules/test/epp.test.js

internal/controller/state/dataplane/configuration_test.go

salonichf5

🚀

Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload. Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint. The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.

github-project-automation bot added this to NGINX Gateway Fabric Sep 22, 2025

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Sep 22, 2025

github-actions bot added the enhancement New feature or request label Sep 22, 2025

sjberman force-pushed the feat/query-epp branch from 07e4709 to fa1ab61 Compare September 22, 2025 15:22

sjberman marked this pull request as ready for review September 22, 2025 15:54

sjberman requested a review from a team as a code owner September 22, 2025 15:54