Skip to content

Conversation

sjberman
Copy link
Collaborator

This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.

Testing: Manually verified

  • single InferencePool results in headless service and proper nginx config
  • multiple InferencePools result in multiple services and proper nginx config
  • services are cleaned up when InferencePool is cleaned up
  • backendRef conditions are set properly if InferencePools don't exist
  • Reference Grants are utilized properly for InferencePools in different namespaces

Closes #3835

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.


@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request dependencies Pull requests that update a dependency file helm-chart Relates to helm chart labels Sep 11, 2025
@sjberman sjberman marked this pull request as ready for review September 11, 2025 19:26
@sjberman sjberman requested a review from a team as a code owner September 11, 2025 19:26
@sjberman sjberman force-pushed the feat/inference-pools branch from dc5e130 to 7ca30f8 Compare September 11, 2025 20:20
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
@sjberman sjberman force-pushed the feat/inference-pools branch from 7ca30f8 to dcde38c Compare September 11, 2025 20:28
@sjberman sjberman requested a review from salonichf5 September 15, 2025 15:01
@sjberman sjberman force-pushed the feat/inference-pools branch from e5b2167 to 53872cf Compare September 15, 2025 15:02
@sjberman sjberman force-pushed the feat/inference-pools branch from 26f2966 to 5162110 Compare September 15, 2025 19:16
@sjberman sjberman force-pushed the feat/inference-pools branch from 5162110 to 2dc7e03 Compare September 15, 2025 19:17
Copy link
Contributor

@salonichf5 salonichf5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just had one last comment but it looks good to me otherwise

@github-project-automation github-project-automation bot moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Sep 15, 2025
@sjberman sjberman requested a review from bjee19 September 16, 2025 15:02
Copy link
Contributor

@bjee19 bjee19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job!

@sjberman sjberman merged commit e9a3568 into feat/inference-extension Sep 16, 2025
45 checks passed
@sjberman sjberman deleted the feat/inference-pools branch September 16, 2025 18:07
@github-project-automation github-project-automation bot moved this from 🏗 In Progress to ✅ Done in NGINX Gateway Fabric Sep 16, 2025
sjberman added a commit that referenced this pull request Sep 18, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
salonichf5 pushed a commit that referenced this pull request Oct 2, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
salonichf5 pushed a commit that referenced this pull request Oct 15, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
salonichf5 pushed a commit that referenced this pull request Oct 15, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
ciarams87 pushed a commit that referenced this pull request Oct 16, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
ciarams87 pushed a commit that referenced this pull request Oct 17, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
ciarams87 pushed a commit that referenced this pull request Oct 17, 2025
This commit adds support for the control plane to watch InferencePools. A feature flag has been added to enable/disable processing these resources. By default, it is disabled.

When an HTTPRoute references an InferencePool, we will create a headless Service associated with that InferencePool, and reference it internally in the graph config for that Route. This allows us to use all of our existing logic to get the endpoints and build the proper nginx config for those endpoints.

In a future commit, the nginx config will be updated to handle the proper load balancing for the AI workloads, but for now we just use our default methods by proxy_passing to the upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request helm-chart Relates to helm chart

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants