Skip to content

Conversation

sjberman
Copy link
Collaborator

Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload.

Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint.

The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.

Testing: Manual testing e2e traffic. Both with HTTP matching conditions, and without. Also tried Routest that referenced multiple InferencePool rules.

Closes #3838

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload.

Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint.

The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
@sjberman sjberman marked this pull request as ready for review September 22, 2025 15:54
@sjberman sjberman requested a review from a team as a code owner September 22, 2025 15:54
@github-project-automation github-project-automation bot moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Sep 22, 2025
Copy link
Contributor

@bjee19 bjee19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job, lgtm

@sjberman sjberman requested a review from Copilot September 22, 2025 20:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for querying the EndpointPicker Proxy (EPP) and routing AI traffic to inference workloads. It introduces an NJS module that queries a local Go server to get the appropriate AI endpoint and redirects client traffic to the chosen endpoint through internal NGINX locations.

  • Adds new location types and routing logic for inference backends with optional HTTP matching conditions
  • Implements a Go shim server that communicates with the EndpointPicker via gRPC
  • Creates an NJS module for endpoint discovery and traffic redirection with proper argument preservation

Reviewed Changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
internal/framework/types/types.go Defines HTTP headers and port constants for EPP communication
internal/controller/state/graph/ Adds EndpointPickerConfig field to backend references and enforces single InferencePool per rule
internal/controller/state/dataplane/ Adds inference backend tracking and updates backend group creation
internal/controller/nginx/modules/ Replaces model extraction with endpoint discovery functionality in NJS module
internal/controller/nginx/config/ Implements inference location generation and proxy pass logic with variable mapping
cmd/gateway/endpoint_picker.go Updates EPP handler to use shared constants and adds request body validation
deploy/inference-nginx-plus/deploy.yaml Enables initial usage report enforcement
Comments suppressed due to low confidence (2)

internal/controller/nginx/config/servers.go:1

  • Using contains for checking location types could lead to false positives. For example, a type like 'external-internal' would match this condition. Consider using exact string comparison or enum-based checks instead of substring matching for better type safety.
package config

internal/controller/nginx/config/servers.go:1

  • Similar to the previous issue, using contains for type checking could match unintended types. Consider checking for specific inference location types (InferenceExternalLocationType or InferenceInternalLocationType) explicitly rather than using substring matching.
package config

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@salonichf5 salonichf5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@sjberman sjberman merged commit 6995f2f into feat/inference-extension Sep 24, 2025
38 checks passed
@sjberman sjberman deleted the feat/query-epp branch September 24, 2025 14:07
@github-project-automation github-project-automation bot moved this from 🏗 In Progress to ✅ Done in NGINX Gateway Fabric Sep 24, 2025
salonichf5 pushed a commit that referenced this pull request Oct 2, 2025
Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload.

Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint.

The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants