-
Notifications
You must be signed in to change notification settings - Fork 152
Add support for multiple InferencePool backends #4439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up. |
e611f14 to
a8cbd36
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4439 +/- ##
==========================================
+ Coverage 86.07% 86.23% +0.15%
==========================================
Files 132 132
Lines 14389 14562 +173
Branches 35 35
==========================================
+ Hits 12386 12557 +171
+ Misses 1793 1791 -2
- Partials 210 214 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.
Key Changes:
- Removed restriction preventing multiple InferencePool backends in a single rule
- Added validation to prevent mixing InferencePool and non-InferencePool backends
- Implemented deduplication of inference maps to handle multiple backends efficiently
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/Makefile |
Enabled GatewayWeightedAcrossTwoInferencePools conformance test and added --ignore-not-found flags for cleanup commands |
internal/controller/state/graph/httproute_test.go |
Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches |
internal/controller/state/graph/httproute.go |
Replaced single-backend restriction with validation for mixed backend types and added checkForMixedBackendTypes function |
internal/controller/nginx/config/split_clients_test.go |
Added test cases for inference backends with endpoint picker configs and split client value generation |
internal/controller/nginx/config/split_clients.go |
Updated split client generation to support inference backend groups with specialized variable naming |
internal/controller/nginx/config/servers_test.go |
Added extensive test coverage for multiple inference backend scenarios with various match conditions |
internal/controller/nginx/config/servers.go |
Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations |
internal/controller/nginx/config/maps_test.go |
Added test cases for unique backend deduplication and failure mode verification |
internal/controller/nginx/config/maps.go |
Implemented deduplication logic using a map to prevent duplicate inference backend entries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
sjberman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.
Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?
| ruleIdx int, | ||
| ) http.Location { | ||
| return http.Location{ | ||
| Path: inferencePath(pathruleIdx, matchRuleIdx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're losing pathruleIdx here, couldn't that mean that two different pathrules could collide in naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, i need to update a comment somewhere talking about the naming. The ruleIdx used here is the backendGroup ruleIdx which is the index on a specific HTTPRoute.
That is different than the pathRuleIdx which is some collapsed data structure spanning across all rules on this path, which is really confusing.
But tldr, I don't think so because we attach both the UpstreamName, and httproute NsName, with the rule idx in the httproute. So with the httprouteNsName and the ruleIdx of the httproute, alongside the unique UpstreamName, i think that guarantees unique naming. Only scenario i could see is something like a backendgroup with multiple of the same backend which doesn't make sense to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can't keep pathruleIdx because in the split_clients.go file, I couldn't find a way for it to access that field from the backendgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm...couldn't you have a route with multiple rules with different matching conditions, with splitting across the same inferencepool backends?
It's weird, but trying to think if there is a scenario where a collision could actually happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, yea you're right, a route with even a single rule, but multiple matching conditions will run into this case. let me see what i can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| internalLocations = append(internalLocations, intInfLocation) | ||
|
|
||
| // skip adding match and creating split clients location if its a duplicate intEPPLocation.Path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // skip adding match and creating split clients location if its a duplicate intEPPLocation.Path | |
| // skip adding match and creating split clients location if it's a duplicate intEPPLocation.Path |
| Backends []Backend | ||
| // RuleIdx is the index of the corresponding rule in the HTTPRoute. | ||
| RuleIdx int | ||
| // PathRuleIdx is the index of the corresponding path rule in the HTTPRoute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically a PathRule spans all HTTPRoutes that share a path, so this comment may need to be clearer about that instead of saying "in the HTTPRoute".
Proposed changes
Add support for multiple InferencePool backends on a Route.
Problem: A route should be able to have multiple InferencePools in its backendRefs.
Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.
Testing: Added unit tests and enabled correlating
GatewayWeightedAcrossTwoInferencePoolsconformance test. Manually tested situations for multiple inferencepool backends with and without http matches.Closes #4192
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.