Add support for multiple InferencePool backends #4439

bjee19 · 2025-12-10T23:32:33Z

Proposed changes

Add support for multiple InferencePool backends on a Route.

Problem: A route should be able to have multiple InferencePools in its backendRefs.

Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.

Testing: Added unit tests and enabled correlating GatewayWeightedAcrossTwoInferencePools conformance test. Manually tested situations for multiple inferencepool backends with and without http matches.

Closes #4192

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Add support for multiple InferencePool backends on a Route.

bjee19 · 2025-12-10T23:33:04Z

Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up.

codecov · 2025-12-10T23:40:09Z

Codecov Report

❌ Patch coverage is 98.51301% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.23%. Comparing base (7046732) to head (84220f9).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/controller/nginx/config/servers.go	99.02%	1 Missing and 1 partial ⚠️
internal/controller/nginx/config/split_clients.go	89.47%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4439      +/-   ##
==========================================
+ Coverage   86.07%   86.23%   +0.15%     
==========================================
  Files         132      132              
  Lines       14389    14562     +173     
  Branches       35       35              
==========================================
+ Hits        12386    12557     +171     
+ Misses       1793     1791       -2     
- Partials      210      214       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.

Key Changes:

Removed restriction preventing multiple InferencePool backends in a single rule
Added validation to prevent mixing InferencePool and non-InferencePool backends
Implemented deduplication of inference maps to handle multiple backends efficiently

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`tests/Makefile`	Enabled `GatewayWeightedAcrossTwoInferencePools` conformance test and added `--ignore-not-found` flags for cleanup commands
`internal/controller/state/graph/httproute_test.go`	Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches
`internal/controller/state/graph/httproute.go`	Replaced single-backend restriction with validation for mixed backend types and added `checkForMixedBackendTypes` function
`internal/controller/nginx/config/split_clients_test.go`	Added test cases for inference backends with endpoint picker configs and split client value generation
`internal/controller/nginx/config/split_clients.go`	Updated split client generation to support inference backend groups with specialized variable naming
`internal/controller/nginx/config/servers_test.go`	Added extensive test coverage for multiple inference backend scenarios with various match conditions
`internal/controller/nginx/config/servers.go`	Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations
`internal/controller/nginx/config/maps_test.go`	Added test cases for unique backend deduplication and failure mode verification
`internal/controller/nginx/config/maps.go`	Implemented deduplication logic using a map to prevent duplicate inference backend entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/controller/nginx/config/servers.go

internal/controller/nginx/config/servers_test.go

internal/controller/nginx/config/servers.go

internal/controller/nginx/config/maps.go

sjberman

Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

internal/controller/nginx/config/servers.go

sjberman · 2025-12-11T19:01:34Z

internal/controller/nginx/config/servers.go

+	ruleIdx int,
+) http.Location {
 	return http.Location{
-		Path: inferencePath(pathruleIdx, matchRuleIdx),


We're losing pathruleIdx here, couldn't that mean that two different pathrules could collide in naming?

Good catch, i need to update a comment somewhere talking about the naming. The ruleIdx used here is the backendGroup ruleIdx which is the index on a specific HTTPRoute.

That is different than the pathRuleIdx which is some collapsed data structure spanning across all rules on this path, which is really confusing.

But tldr, I don't think so because we attach both the UpstreamName, and httproute NsName, with the rule idx in the httproute. So with the httprouteNsName and the ruleIdx of the httproute, alongside the unique UpstreamName, i think that guarantees unique naming. Only scenario i could see is something like a backendgroup with multiple of the same backend which doesn't make sense to do.

And we can't keep pathruleIdx because in the split_clients.go file, I couldn't find a way for it to access that field from the backendgroup.

Hm...couldn't you have a route with multiple rules with different matching conditions, with splitting across the same inferencepool backends?

It's weird, but trying to think if there is a scenario where a collision could actually happen.

hm, yea you're right, a route with even a single rule, but multiple matching conditions will run into this case. let me see what i can do

Copilot

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sjberman · 2025-12-12T15:03:13Z

internal/controller/nginx/config/servers.go

 			}
-			internalLocations = append(internalLocations, intInfLocation)
+
+			// skip adding match and creating split clients location if its a duplicate intEPPLocation.Path


Suggested change

// skip adding match and creating split clients location if its a duplicate intEPPLocation.Path

// skip adding match and creating split clients location if it's a duplicate intEPPLocation.Path

sjberman · 2025-12-12T15:07:57Z

internal/controller/state/dataplane/types.go

 	Backends []Backend
 	// RuleIdx is the index of the corresponding rule in the HTTPRoute.
 	RuleIdx int
+	// PathRuleIdx is the index of the corresponding path rule in the HTTPRoute.


Technically a PathRule spans all HTTPRoutes that share a path, so this comment may need to be clearer about that instead of saying "in the HTTPRoute".

github-project-automation bot added this to NGINX Gateway Fabric Dec 10, 2025

nginx-bot bot added the release-notes label Dec 10, 2025

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Dec 10, 2025

github-actions bot added enhancement New feature or request tests Pull requests that update tests labels Dec 10, 2025

Add support for multiple inferencepool backends

a8cbd36

bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from e611f14 to a8cbd36 Compare December 10, 2025 23:37

bjee19 changed the title ~~Add support for multiple inferencepool backends~~ Add support for multiple InferencePool backends Dec 10, 2025

bjee19 added 3 commits December 10, 2025 16:27

Add another split clients test case

1ab44d6

Update comment with correct naming

7400fe0

Remove some comments and update array capacity calculation

69a72d3

bjee19 marked this pull request as ready for review December 11, 2025 18:25

bjee19 requested a review from a team as a code owner December 11, 2025 18:25

Update max location calculation

abe9a7b

bjee19 requested a review from Copilot December 11, 2025 18:42

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Add ordering to inferenceMaps

6127cbf

sjberman reviewed Dec 11, 2025

View reviewed changes

bjee19 added 3 commits December 11, 2025 23:21

Add pathRuleIdx to epp location

e4b94bc

Refactor server test

3a990bf

Refactor servers tempRule for deep copy

9ed74df

bjee19 requested a review from Copilot December 12, 2025 08:03

Copilot AI reviewed Dec 12, 2025

View reviewed changes

bjee19 added 2 commits December 12, 2025 00:09

Update locations comment

463f315

Fix includes spacing in template

84220f9

sjberman reviewed Dec 12, 2025

View reviewed changes

	// skip adding match and creating split clients location if its a duplicate intEPPLocation.Path
	// skip adding match and creating split clients location if it's a duplicate intEPPLocation.Path

Add support for multiple InferencePool backends #4439

Are you sure you want to change the base?

Add support for multiple InferencePool backends #4439

Conversation

bjee19 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Release notes

Uh oh!

bjee19 commented Dec 10, 2025

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjberman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjberman Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bjee19 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bjee19 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sjberman Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

bjee19 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sjberman Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

sjberman Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bjee19 commented Dec 10, 2025 •

edited

Loading

codecov bot commented Dec 10, 2025 •

edited

Loading