Make endpoint picker connection flags configurable #4105

salonichf5 · 2025-10-17T23:35:59Z

Proposed changes

Write a clear and concise description that helps reviewers understand the purpose and impact of your changes. Use the
following format:

Problem: User should be able to configure Endpoint picker connection flags

Solution: Adds command line support for endpoint picker flags

Testing: Manual testing

Binary testing for flag

./gateway controller --gateway-ctlr-name="gateway.nginx.org/nginx-gateway-controller" --gatewayclass=nginx                                                                           
epp flags in controller command: false true
epp flags in epp command: false true

    Args:
      controller
      --gateway-ctlr-name=gateway.nginx.org/nginx-gateway-controller
      --gatewayclass=nginx
      --config=nginx-gateway-config
      --service=nginx-gateway-nginx-gateway-fabric
      --agent-tls-secret=agent-tls
      --metrics-port=9113
      --health-port=8081
      --leader-election-lock-name=nginx-gateway-nginx-gateway-fabric-leader-election
      --endpoint-picker-tls-skip-verify=true

default flags for NGF container

Endpoint picker default

   endpoint-picker-shim:
    Image:           nginx-gateway-fabric:sa.choudhary
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      /usr/bin/gateway
      endpoint-picker
      --endpoint-picker-tls-skip-verify

Update adding/updating flags

      --gateway-api-experimental-features
      --gateway-api-inference-extension
      --endpoint-picker-tls-skip-verify=false
      --endpoint-picker-disable-tls
      --snippets-filters


sa.choudhary@N9939CQ4P0 nginx-gateway-fabric % k logs gateway-nginx-68754cc8df-h5mkv endpoint-picker-shim
epp flags in epp command: false true

Please focus on (optional): If you any specific areas where you would like reviewers to focus their attention or provide
specific feedback, add them here.

Closes #4090

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

NONE

codecov · 2025-10-17T23:40:22Z

Codecov Report

❌ Patch coverage is 72.50000% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.00%. Comparing base (3305254) to head (a6be2b9).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
cmd/gateway/commands.go	77.77%	6 Missing ⚠️
cmd/gateway/endpoint_picker.go	0.00%	3 Missing ⚠️
internal/controller/manager.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4105      +/-   ##
==========================================
+ Coverage   85.97%   86.00%   +0.02%     
==========================================
  Files         131      131              
  Lines       14063    14093      +30     
  Branches       35       35              
==========================================
+ Hits        12090    12120      +30     
+ Misses       1772     1771       -1     
- Partials      201      202       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

salonichf5 · 2025-10-17T23:41:59Z

Currently, configurable flags can only be set via NGF command-line flags. Each NGF command (for example, controller and endpoint-picker) maintains its own flag values independently. This means that even if the same flag is defined for both commands, changing it in one does not automatically propagate to the other.

For example:

./gateway endpoint-picker --endpoint-picker-enable-tls=false --endpoint-picker-skip-secure-verify=false
endpointPickerEnableTLS: false
endpointPickerSkipSecureVerify: false
Starting the endpoint-picker server...

./gateway controller --gateway-ctlr-name="gateway.nginx.org/nginx-gateway-controller" --gatewayclass=nginx --endpoint-picker-enable-tls=false --endpoint-picker-skip-secure-verify=false
{"level":"info","ts":"2025-10-17T15:22:11-06:00","msg":"Starting the NGINX Gateway Fabric control plane"}
endpointPickerEnableTLS: false
endpointPickerSkipSecureVerify: false

If these flags are set differently between the two commands, the values will not stay in sync — each process will only respect its own flag values. However, the value of the controller’s flag determines the container args passed to the endpoint picker.

I also tried supporting these flags through values.yaml and updating the NGF Helm deployment to render them as container args. However, this only propagates the values to the controller command — not to the endpoint picker, which reads its own flag values from its command invocation.

So effectively, to configure these flags correctly, we need to provide the same flag values to both the controller and endpoint-picker commands.

For now, by default we have TLS enabled and skip secure verification on.

@ciarams87 is this feasible for users to do? Irregardless they are configurable now, just separately. Do you have any better ideas, i am curious!

Chatgpt recommended just using both commands to set the value

#!/bin/bash

# Run the controller command with its flags
/usr/bin/gateway controller \
  --telemetry-report-period=24h \
  --telemetry-endpoint=http://example.com

# Run the endpoint-picker command with its flags
/usr/bin/gateway endpoint-picker \
  --endpoint-picker-enable-tls=true \
  --endpoint-picker-skip-secure-verify=false

salonichf5 · 2025-10-17T23:42:58Z

i'll add more test coverage on monday

cmd/gateway/commands.go

ciarams87

Overall, this looks like the correct approach! Main binary flags -> provisioner -> epp shim flags.

The user cannot configure the EPP shim flags directly, so exposing it in this way makes sense as this is the same way we allow the user to enable the inference extension shim in the first place.

I left a few comments around deduplicating the commands, inverting the enableTLS -> disableTLS, and false flag evaluation.

We should also add this to the Helm chart e.g.:

  gwAPIInferenceExtension:
    # -, - Enable Gateway API Inference Extension support. Allows for configuring InferencePools to route traffic to AI workloads.
    enable: false
    
    # -- Endpoint picker TLS configuration.
    endpointPicker:
      # -- Disable TLS for endpoint picker communication. By default, TLS is enabled.
      # Set to true only for development/testing or when using a service mesh for encryption.
      disableTLS: false
      
      # -- Skip TLS certificate verification for endpoint picker.
      # REQUIRED: Must be true until Gateway API Inference Extension EPP supports mounting certificates.
      # See: https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/1556
      skipSecureVerify: true

Then update charts/nginx-gateway-fabric/templates/deployment.yaml:

{{- if .Values.nginxGateway.gwAPIInferenceExtension.enable }}
- --gateway-api-inference-extension
{{- if .Values.nginxGateway.gwAPIInferenceExtension.endpointPicker.disableTLS }}
- --endpoint-picker-disable-tls
{{- end }}
{{- if not .Values.nginxGateway.gwAPIInferenceExtension.endpointPicker.skipSecureVerify }}
- --endpoint-picker-skip-secure-verify=false
{{- else }}
- --endpoint-picker-skip-secure-verify=true
{{- end }}
{{- end }}

cmd/gateway/commands.go

internal/controller/provisioner/objects.go

cmd/gateway/commands.go

salonichf5 · 2025-10-20T15:36:28Z

Overall, this looks like the correct approach! Main binary flags -> provisioner -> epp shim flags.

The user cannot configure the EPP shim flags directly, so exposing it in this way makes sense as this is the same way we allow the user to enable the inference extension shim in the first place.

I left a few comments around deduplicating the commands, inverting the enableTLS -> disableTLS, and false flag evaluation.

We should also add this to the Helm chart e.g.:

  gwAPIInferenceExtension:
    # -, - Enable Gateway API Inference Extension support. Allows for configuring InferencePools to route traffic to AI workloads.
    enable: false
    
    # -- Endpoint picker TLS configuration.
    endpointPicker:
      # -- Disable TLS for endpoint picker communication. By default, TLS is enabled.
      # Set to true only for development/testing or when using a service mesh for encryption.
      disableTLS: false
      
      # -- Skip TLS certificate verification for endpoint picker.
      # REQUIRED: Must be true until Gateway API Inference Extension EPP supports mounting certificates.
      # See: https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/1556
      skipSecureVerify: true

Then update charts/nginx-gateway-fabric/templates/deployment.yaml:

{{- if .Values.nginxGateway.gwAPIInferenceExtension.enable }}
- --gateway-api-inference-extension
{{- if .Values.nginxGateway.gwAPIInferenceExtension.endpointPicker.disableTLS }}
- --endpoint-picker-disable-tls
{{- end }}
{{- if not .Values.nginxGateway.gwAPIInferenceExtension.endpointPicker.skipSecureVerify }}
- --endpoint-picker-skip-secure-verify=false
{{- else }}
- --endpoint-picker-skip-secure-verify=true
{{- end }}
{{- end }}

I did add this prior, and did some testing but these flags don't really change the mode in EPP but only get added/updated as flags in the container. If it doesn't like actually change the mode in EPP don't you think it will be a little misleading?

I tested this exact scenario

Flags enabled in NGF container, set based on value. Update to the flags here only reflected changes in the controller command, and not EPP command. So I don't think we should be setting it here, until obviously we provide the mechanism to do what it says it will do.

I am okay with keeping the minimal flag support. Once we finalise this conversation, i can inverse the flags and update the documentation you wanted.

I also tried supporting these flags through values.yaml and updating the NGF Helm deployment to render them as container args. However, this only propagates the values to the controller command — not to the endpoint picker, which reads its own flag values from its command invocation.

ciarams87 · 2025-10-20T15:57:15Z

I also tried supporting these flags through values.yaml and updating the NGF Helm deployment to render them as container args. However, this only propagates the values to the controller command — not to the endpoint picker, which reads its own flag values from its command invocation.

Yes exactly, but this is correct:

How It Should Work (and does work):
Step 1: User sets Helm values:

nginxGateway:
  gwAPIInferenceExtension:
    endpointPicker:
      disableTLS: true

Step 2: Helm template renders controller deployment with these as controller flags:

# deployment.yaml
args:
  - controller
  - --endpoint-picker-disable-tls=true  # ← From Helm values

Step 3: Controller reads its flags and uses them to build the sidecar container command in objects.go:

command := []string{
    "/usr/bin/gateway",
    "endpoint-picker",
    "--endpoint-picker-disable-tls",  // ← Controller ADDS this to sidecar command
}

Step 4: The sidecar container starts with those flags and reads them.

You don't need to (and can't) put Helm values directly into the sidecar command because:

The sidecar container is created dynamically by the provisioner, not by Helm
The controller acts as the "bridge" - it reads Helm values from its own flags, then passes them to the sidecar via the container command it builds

Once the flags are configured through Helm, the controller will automatically propagate them to the sidecar (via the code in objects.go)

The user can't directly set the epp shim commands - the only option for setting them is exposing them through the controller flags (which is what you have already implemented). Creating the Helm options simply exposes configuring these controller flags through Helm.

The values flow: Helm → Controller flags → Controller reads → Provisioner builds sidecar command → Sidecar reads. Does this make sense?

salonichf5 · 2025-10-20T18:50:24Z

@ciarams87 I do understand this workflow, but these settings can only be applied at startup, i guess I was thinking of runtime reconciliation of the flag values. In my testing using deployment updates, they did not get reconciled when updated during run time so I thought maybe that would not be the expectation.

I have updated the flags to reflect false values at startup and if settings are updated , flags will be reflected at container level for EPP and NGF deployment now. Let me know if all looks well to you.

Flag settings now

Default:
    endpointPicker:
      disableTLS: false
      enableSecureVerify: false


no args in EPP container and NGF deployment

    endpointPicker:
      disableTLS: true
      enableSecureVerify: true


flags added to EPP container and NGF deployment

--endpoint-picker-disable-tls
--endpoint-picker-enable-secure-verify

cmd/gateway/commands.go

ciarams87 · 2025-10-20T19:39:56Z

these settings can only be applied at startup, i guess I was thinking of runtime reconciliation of the flag values. In my testing using deployment updates, they did not get reconciled when updated during run time so I thought maybe that would not be the expectation.

@salonichf5 If the flags are updated using a helm upgrade, they should be propagated all the way down as the provisioner will detect the pod spec has changed and will re-deploy the dataplane deployments, so while run time reconciliation isn't the expectation, the flags should still work as intended

salonichf5 · 2025-10-20T20:15:24Z

Sorry for the back and forth, i guess i wanted to have both settings show up the since they are correlated but I have added all the changes as you asked @ciarams87

gateway

bjee19

lgtm, though could you describe any manual testing you did in the pr description?

salonichf5 · 2025-10-21T17:57:43Z

lgtm, though could you describe any manual testing you did in the pr description?

I added some testing details here to discuss some things I was seeing. I will update the PR description

bjee19

Could you also do a sanity manual testing check that the default values still work with the Inference extension work? So just deploy the getting started guide and such.

deploy/default/deploy.yaml

salonichf5 · 2025-10-21T20:38:21Z

Could you also do a sanity manual testing check that the default values still work with the Inference extension work? So just deploy the getting started guide and such.

Yes I run the inference extension example with it. Also these settings determine GatewayFollowingEPPRouting conformance tests to pass, since that's the time we connect to EPP and those run in the pipeline along with the other Inference conformance tests

cmd/gateway/endpoint_picker.go

salonichf5 · 2025-10-21T22:13:57Z

Updating flag values

    endpointPicker:
      # -- Disable TLS for EndpointPicker communication. By default, TLS is enabled.
      # Set to true only for development/testing or when using a service mesh for encryption.
      disableTLS: true

      # -- Disables TLS certificate verification when connecting to the EndpointPicker.
      # By default, certificate verification is disabled.
      # REQUIRED: Must be true until Gateway API Inference Extension EndpointPicker supports mounting certificates.
      # See: https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/1556
      skipVerify: true

    Args:
      controller
      --gateway-ctlr-name=gateway.nginx.org/nginx-gateway-controller
      --gatewayclass=nginx
      --config=nginx-gateway-config
      --service=nginx-gateway
      --agent-tls-secret=agent-tls
      --metrics-port=9113
      --health-port=8081
      --leader-election-lock-name=nginx-gateway-leader-election
      --gateway-api-inference-extension
      --endpoint-picker-disable-tls
      --endpoint-picker-tls-skip-verify=true
      --snippets-filters

      /var/run/secrets/ngf/serviceaccount from token (rw)
   endpoint-picker-shim:
    Image:           nginx-gateway-fabric:sa.choudhary
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      /usr/bin/gateway
      endpoint-picker
      --endpoint-picker-disable-tls
      --endpoint-picker-tls-skip-verify

Objects.go value

commands field for EPP configuration: [/usr/bin/gateway endpoint-picker --endpoint-picker-disable-tls --endpoint-picker-tls-skip-verify]

Endpoint picker flag values in controller command:
  disable TLS: true
  skip verify: true

level":"debug","ts":"2025-10-21T22:03:14Z","logger":"eventLoop.eventHandler","msg":"Finished handling the batch","batchID":16}
sa.choudhary@N9939CQ4P0 nginx-gateway-fabric % k logs gateway-nginx-6dd8f4ff-sgd4x endpoint-picker-shim
Endpoint picker flag values in endpoint picker command:
  disable TLS: true
  skip verify: true

Updating NGF deployment args

    Args:
      controller
      --gateway-ctlr-name=gateway.nginx.org/nginx-gateway-controller
      --gatewayclass=nginx
      --config=nginx-gateway-config
      --service=nginx-gateway
      --agent-tls-secret=agent-tls
      --metrics-port=9113
      --health-port=8081
      --leader-election-lock-name=nginx-gateway-leader-election
      --gateway-api-inference-extension
      --endpoint-picker-tls-skip-verify=false
      --snippets-filters

EPP container args

  endpoint-picker-shim:
    Image:           nginx-gateway-fabric:sa.choudhary
    Port:            <none>
    Host Port:       <none>
    SeccompProfile:  RuntimeDefault
    Command:
      /usr/bin/gateway
      endpoint-picker

Flags get updated

sa.choudhary@N9939CQ4P0 nginx-gateway-fabric % k logs gateway-nginx-68f597f55f-s29gb endpoint-picker-shim
Endpoint picker flag values in endpoint picker command:
  disable TLS: false
  skip verify: true

ciarams87 · 2025-10-24T10:59:52Z

charts/nginx-gateway-fabric/templates/deployment.yaml

+        {{- if (and .Values.nginxGateway.gwAPIInferenceExtension.enable .Values.nginxGateway.gwAPIInferenceExtension.endpointPicker.disableTLS) }}
+        - --endpoint-picker-disable-tls
+        {{- end }}
+         {{- if .Values.nginxGateway.gwAPIInferenceExtension.enable }}


Good call on putting both these flags behind the gwAPIInferenceExtension.enable flag!

You only need to specify it once though for both flags, so you can remove the first {{-end }} and second {{- if<...>

internal/controller/provisioner/objects.go

bjee19

nice job, just gotta remove that last print statement then lgtm

salonichf5 requested a review from a team as a code owner October 17, 2025 23:36

github-project-automation bot added this to NGINX Gateway Fabric Oct 17, 2025

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Oct 17, 2025

github-actions bot added the chore Pull requests for routine tasks label Oct 17, 2025

tataruty reviewed Oct 20, 2025

View reviewed changes

cmd/gateway/commands.go Outdated Show resolved Hide resolved

cmd/gateway/commands.go Outdated Show resolved Hide resolved

ciarams87 requested changes Oct 20, 2025

View reviewed changes

github-project-automation bot moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Oct 20, 2025

github-actions bot added documentation Improvements or additions to documentation helm-chart Relates to helm chart labels Oct 20, 2025

salonichf5 force-pushed the chore/epp-flags branch from d84bb72 to 4840163 Compare October 20, 2025 18:47

salonichf5 force-pushed the chore/epp-flags branch from 4840163 to a883fe6 Compare October 20, 2025 19:00

salonichf5 requested review from ciarams87 October 20, 2025 19:00

ciarams87 reviewed Oct 20, 2025

View reviewed changes

cmd/gateway/commands.go Outdated Show resolved Hide resolved

salonichf5 commented Oct 20, 2025

View reviewed changes

cmd/gateway/commands.go Outdated Show resolved Hide resolved

salonichf5 force-pushed the chore/epp-flags branch from 28422ec to b3f5327 Compare October 20, 2025 20:12

salonichf5 force-pushed the chore/epp-flags branch 3 times, most recently from 1d3cbde to 50e5814 Compare October 21, 2025 14:45

bjee19 reviewed Oct 21, 2025

View reviewed changes

gateway Outdated Show resolved Hide resolved

bjee19 reviewed Oct 21, 2025

View reviewed changes

salonichf5 force-pushed the chore/epp-flags branch 2 times, most recently from 78fc43d to c542ea3 Compare October 21, 2025 19:21

salonichf5 requested a review from bjee19 October 21, 2025 19:22

bjee19 reviewed Oct 21, 2025

View reviewed changes

deploy/default/deploy.yaml Outdated Show resolved Hide resolved

salonichf5 force-pushed the chore/epp-flags branch from 3cfabd1 to ccf17a8 Compare October 21, 2025 20:49

salonichf5 commented Oct 21, 2025

View reviewed changes

cmd/gateway/endpoint_picker.go Outdated Show resolved Hide resolved

ciarams87 reviewed Oct 24, 2025

View reviewed changes

salonichf5 added 5 commits October 24, 2025 11:27

add configurable epp flags

ab64909

update deployment to only render flags if inference extension is enabled

67097f8

print statement

d67198c

pass flag by pointer for value to persist

924ed90

remove print statement

271a647

salonichf5 force-pushed the chore/epp-flags branch from b0c3b10 to 1b4f80e Compare October 24, 2025 17:27

salonichf5 requested review from bjee19 and ciarams87 October 24, 2025 17:29

bjee19 reviewed Oct 24, 2025

View reviewed changes

internal/controller/provisioner/objects.go Outdated Show resolved Hide resolved

bjee19 approved these changes Oct 24, 2025

View reviewed changes

simplify the deployment.yaml

a6be2b9

salonichf5 force-pushed the chore/epp-flags branch from 1b4f80e to a6be2b9 Compare October 24, 2025 18:43

Make endpoint picker connection flags configurable #4105

Are you sure you want to change the base?

Make endpoint picker connection flags configurable #4105

Conversation

salonichf5 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Release notes

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

salonichf5 commented Oct 17, 2025

Uh oh!

salonichf5 commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

ciarams87 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

salonichf5 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ciarams87 commented Oct 20, 2025

Uh oh!

salonichf5 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ciarams87 commented Oct 20, 2025

Uh oh!

salonichf5 commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bjee19 left a comment

Choose a reason for hiding this comment

Uh oh!

salonichf5 commented Oct 21, 2025

Uh oh!

bjee19 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

salonichf5 commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

salonichf5 commented Oct 21, 2025

Uh oh!

ciarams87 Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bjee19 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

salonichf5 commented Oct 17, 2025 •

edited

Loading

codecov bot commented Oct 17, 2025 •

edited

Loading

ciarams87 left a comment •

edited

Loading

salonichf5 commented Oct 20, 2025 •

edited

Loading

salonichf5 commented Oct 20, 2025 •

edited

Loading

salonichf5 commented Oct 20, 2025 •

edited

Loading

salonichf5 commented Oct 21, 2025 •

edited

Loading

ciarams87 Oct 24, 2025 •

edited

Loading