-
Notifications
You must be signed in to change notification settings - Fork 137
Query EPP and proxy AI traffic #3942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload. Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint. The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
07e4709
to
fa1ab61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice job, lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for querying the EndpointPicker Proxy (EPP) and routing AI traffic to inference workloads. It introduces an NJS module that queries a local Go server to get the appropriate AI endpoint and redirects client traffic to the chosen endpoint through internal NGINX locations.
- Adds new location types and routing logic for inference backends with optional HTTP matching conditions
- Implements a Go shim server that communicates with the EndpointPicker via gRPC
- Creates an NJS module for endpoint discovery and traffic redirection with proper argument preservation
Reviewed Changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
internal/framework/types/types.go | Defines HTTP headers and port constants for EPP communication |
internal/controller/state/graph/ | Adds EndpointPickerConfig field to backend references and enforces single InferencePool per rule |
internal/controller/state/dataplane/ | Adds inference backend tracking and updates backend group creation |
internal/controller/nginx/modules/ | Replaces model extraction with endpoint discovery functionality in NJS module |
internal/controller/nginx/config/ | Implements inference location generation and proxy pass logic with variable mapping |
cmd/gateway/endpoint_picker.go | Updates EPP handler to use shared constants and adds request body validation |
deploy/inference-nginx-plus/deploy.yaml | Enables initial usage report enforcement |
Comments suppressed due to low confidence (2)
internal/controller/nginx/config/servers.go:1
- Using
contains
for checking location types could lead to false positives. For example, a type like 'external-internal' would match this condition. Consider using exact string comparison or enum-based checks instead of substring matching for better type safety.
package config
internal/controller/nginx/config/servers.go:1
- Similar to the previous issue, using
contains
for type checking could match unintended types. Consider checking for specific inference location types (InferenceExternalLocationType
orInferenceInternalLocationType
) explicitly rather than using substring matching.
package config
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload. Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint. The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
Problem: We need to connect NGINX to the Golang shim that talks to the EndpointPicker, and then pass client traffic to the proper inference workload.
Solution: Write an NJS module that will query the local Go server to get the AI endpoint to route traffic to. Then redirect the original client request to an internal location that proxies the traffic to the chosen endpoint.
The location building gets a bit complicated especially when using both HTTP matching conditions and inference workloads. It requires 2 layers of internal redirects. I added lots of comments to hopefully clear up how we build these locations to perform all the routing steps.
Testing: Manual testing e2e traffic. Both with HTTP matching conditions, and without. Also tried Routest that referenced multiple InferencePool rules.
Closes #3838
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.