feat: add IPv6/dual-stack support for agent bind host and UI probes#1673
Conversation
d991c4f to
e47f2bc
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds Helm- and controller-level configurability to support IPv6-first / dual-stack Kubernetes clusters by making the agent bind host and UI probes customizable.
Changes:
- Introduces
controller.agentDeployment.hostHelm value that flows through controller ConfigMap/env → controller flag--default-agent-bind-host→ agent deployment args. - Adds
ui.startupProbe/ui.readinessProbeHelm values to allow overriding the UI container probes (while keeping currenthttpGetprobes as the default). - Adds Helm unit tests validating ConfigMap rendering and default/overridden probe behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
go/core/internal/controller/translator/agent/adk_api_translator.go |
Adds global DefaultAgentBindHost defaulting to 0.0.0.0. |
go/core/internal/controller/translator/agent/deployments.go |
Uses DefaultAgentBindHost instead of hardcoded "0.0.0.0" for --host. |
go/core/pkg/app/app.go |
Registers --default-agent-bind-host so env/ConfigMap can override it. |
helm/kagent/templates/controller-configmap.yaml |
Conditionally emits DEFAULT_AGENT_BIND_HOST when Helm value is non-empty. |
helm/kagent/templates/ui-deployment.yaml |
Renders overridable startup/readiness probes with defaults preserved. |
helm/kagent/values.yaml |
Documents/adds new Helm values for agent bind host and UI probe overrides. |
helm/kagent/tests/controller-deployment_test.yaml |
Tests ConfigMap emission behavior for DEFAULT_AGENT_BIND_HOST. |
helm/kagent/tests/ui-deployment_test.yaml |
Tests default httpGet probes and custom exec probe overrides. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2c9a439 to
db1356e
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db1356e to
0b47233
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
TOMOFUMI-KONDO
left a comment
There was a problem hiding this comment.
@go-faustino
As I was assigned the issue, I have reviewed the PR overall.
I noted several things below:
- Is overriding probes necessary?
When the ui pod has an ipv6 primary address, the httpGet probe targets that ipv6 address, so the existing probes seem to work.
I don't think it's a bad idea to allow customisation of probe settings. However, rather than intending them specifically for the ipv6 pod, it feels that it's for general flexibility.
- supervisord.conf has the hard-coded
HOSTNAME="0.0.0.0. Does it need to be fixed?
|
@TOMOFUMI-KONDO — thanks for taking the time to review this, really appreciate it. Both great points. 1. Probe overrides You're absolutely right — once nginx is properly listening on IPv6 (via the I've updated the framing accordingly: they're there for general flexibility (e.g., custom health check paths, different thresholds, exec-based probes for edge cases), not specifically for IPv6. It's a small, non-invasive addition, so I'd lean toward keeping it — but happy to drop it if you think it adds unnecessary surface area. 2. Nice catch! You're right that it's hardcoded to IPv4. That said, looking at the architecture, it shouldn't actually cause IPv6 issues in practice:
So it's not strictly needed for IPv6 support, but it's cleaner to fix for consistency. I'll make I'll push the updates shortly. Also — I want to acknowledge that you were already assigned to #1642 and had offered to work on it. I jumped ahead because of some urgency on our side, but I genuinely appreciate your review and input. If you'd like to be included as a co-author on the final commit, I'd be happy to add you — just let me know. And of course, feel free to take any of these changes in a different direction if you prefer. |
af2e69a to
f9be1ff
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@go-faustino
I just wanted to confirm the necessity of changing the probes. As you said, it's a small and non-invasive addition, so I think it's fine to keep it.
Thank you for the update to make it configurable. It would be better to be able to override the supervisord.conf and its It's no problem that you worked on the issue. I'm glad to see such a great PR. |
62a28cd to
b5b1e44
Compare
|
Hey there, thanks so much for this awesome change, going to take a look today :) |
b5b1e44 to
2e690d4
Compare
EItanya
left a comment
There was a problem hiding this comment.
Just out of curiosity, why did you add full overwrites of the config files, rather than just fix the individual binds in them with template variables?
@EItanya Thanks for your review! You can see the rationale for the overwrites on the design note on the PR body. Basically, it aims to be fully backwards compatible with clusters that have IPv6 disabled at the kernel level. But, as mentioned in the note, we can simplify this if it is not considered an issue. Let me know how you'd prefer the implementation to look like. EDIT: Re-reading your question, I think I misunderstood — you're not asking why opt-in vs baked-in, but rather why a full config blob override instead of targeted Helm values. If that's the case: we could replace ui.nginxConfig / ui.supervisordConfig (which take the entire file as a string) with small, focused values like: ui:
nginx:
ipv6Enabled: false # when true, adds `listen [::]:8080;` to the default config
supervisord:
hostname: "0.0.0.0" # set to "::" for dual-stackThe nginx.conf and supervisord.conf would become Helm-templated ConfigMaps (always rendered, always mounted), with just the bind-related lines conditionally injected. No full-file override needed. This would be much cleaner for the IPv6 use case — users just flip ui.nginx.ipv6Enabled: true instead of copy-pasting an entire nginx.conf. The trade-off is less flexibility for unrelated customizations, but that could be a separate feature if needed. Is that more along the lines of what you had in mind? |
Yes that's much more along the lines of what I was thinking. I was even thinking about a top-level ipv6 enabled flag |
bcf2517 to
93fd1dd
Compare
|
@EItanya Refactored as discussed — pushed a new commit that replaces the full config overrides with targeted Helm templating behind a single top-level flag: ipv6:
enabled: trueWhen enabled, this controls all three IPv6 touch points:
The Explicit All 155 helm unit tests pass, including new tests for the flag behavior. |
Awesome, can you move the original |
|
@EItanya Done — moved the config content into Much cleaner to read and maintain this way, agreed. |
… nginx config On IPv6-first Kubernetes clusters (e.g., EKS with IPv6 pod networking), kagent deployments fail health checks because: 1. Agent pods bind to 0.0.0.0 (IPv4 only), so kubelet probes sent to the pod's IPv6 address get "connection refused" 2. The UI nginx config only has "listen 8080" (IPv4), so it is unreachable on its IPv6 pod address 3. The UI httpGet probes are hardcoded, preventing workarounds 4. The A2A server constructs listen addresses via string concatenation (host + ":" + port), producing invalid addresses for IPv6 literals This commit fixes all four: - A2A server: use net.JoinHostPort for proper IPv6 address formatting (e.g. "[::]:8080" instead of ":::8080") - controller: add configurable agent bind host via controller.agentDeployment.host Helm value (default "0.0.0.0", set to "::" for dual-stack) - UI deployment: make startupProbe and readinessProbe overridable via ui.startupProbe and ui.readinessProbe Helm values - UI deployment: add ui.nginxConfig for optional custom nginx.conf mounted via ConfigMap (for dual-stack listen directives) Fixes kagent-dev#1642 Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
The configurable startupProbe and readinessProbe values are useful for general customization (thresholds, health paths, probe types), not specifically for IPv6. Update the comments accordingly. Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
Allow overriding the supervisord.conf via ui.supervisordConfig Helm value. When set, a ConfigMap is created and mounted over the baked-in config. The default supervisord.conf sets HOSTNAME="0.0.0.0" for the Next.js server. While this only affects the internal proxy target (nginx connects via 127.0.0.1:8001, not the pod IP), making it configurable allows operators to adjust it for consistency on IPv6-first clusters. Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
- Add Helm unit tests for ui-nginx-configmap.yaml (rendered/not rendered) - Add Helm unit tests for ui-supervisord-configmap.yaml (rendered/not rendered) - Add Go unit test for --default-agent-bind-host flag registration and env var loading via DEFAULT_AGENT_BIND_HOST Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
Replace the full-blob ui.nginxConfig and ui.supervisordConfig values with a single top-level ipv6.enabled flag that controls all IPv6/ dual-stack configuration through targeted Helm templating. When ipv6.enabled is true: - nginx adds `listen [::]:8080;` for dual-stack - supervisord sets HOSTNAME="::" for Next.js - controller sets DEFAULT_AGENT_BIND_HOST="::" for agent pods The nginx and supervisord ConfigMaps are now always rendered as Helm templates (instead of conditionally from user-provided blobs), with the IPv6-specific lines injected only when the flag is set. This is cleaner, safer, and impossible to misconfigure compared to requiring users to copy-paste entire config files. Signed-off-by: Goncalo Santos <goncalo.santos@gympass.com> Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
Move the inline nginx.conf and supervisord.conf content from the ConfigMap templates into helm/kagent/files/ and load them via tpl(.Files.Get ...). This keeps the config files readable and maintainable as standalone files while still supporting the ipv6.enabled templating. Signed-off-by: Goncalo Santos <goncalo.santos@gympass.com> Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
2ab56d2 to
ae2f306
Compare
EItanya
left a comment
There was a problem hiding this comment.
Can you please delete the old .conf files, and address the one nit, then we'll be good to go!
There was a problem hiding this comment.
Can you combine the 2 configmaps, I don't think we need 2
There was a problem hiding this comment.
Done — merged both into a single ui-config ConfigMap with nginx.conf and supervisord.conf as data keys. The deployment now references one volume source with two subPath mounts.
…figMap Combine the two separate ConfigMaps (ui-nginx and ui-supervisord) into a single ui-config ConfigMap with both nginx.conf and supervisord.conf as data keys. This simplifies the deployment by reducing the number of ConfigMap resources while keeping the same volume mount behavior. Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
Remove ui/conf/nginx.conf and ui/conf/supervisord.conf and their corresponding COPY lines from the Dockerfile. These configs are now always provided at runtime via the Helm-managed ui-config ConfigMap, making the baked-in copies redundant. Note: the container now requires nginx.conf and supervisord.conf to be mounted externally (the Helm chart handles this automatically). Running the image standalone without mounting these files will require providing them via volume mounts. Signed-off-by: Gonçalo Faustino <goncalo.santos@wellhub.com> Made-with: Cursor
|
@EItanya Both addressed — pushed two commits:
All 158 helm unit tests pass. |
Summary
On IPv6-first Kubernetes clusters (e.g., EKS with IPv6 pod networking), kagent deployments fail health checks because:
0.0.0.0(IPv4 only), so probes sent to the pod's IPv6 address getconnection refusedlisten 8080(IPv4), so it is unreachable on its IPv6 pod addresshttpGetwhich kubelet sends to the pod IP — on IPv6-first clusters this failshost + ":" + port, producing invalid addresses for IPv6 literals (e.g.,:::8080instead of[::]:8080)Changes
1. Fix A2A server address construction (
go/adk/pkg/a2a/server/server.go):config.Host + ":" + config.Portwithnet.JoinHostPort(config.Host, config.Port)[::]:8080), which is required fornet.Listento parse the address correctly--host ::on agent pods would produce:::8080and fail to start2. Configurable agent bind host (
controller.agentDeployment.host):DEFAULT_AGENT_BIND_HOST) → CLI flag (--default-agent-bind-host) →resolveInlineDeployment()"0.0.0.0"— fully backward compatible"::"for dual-stack (IPv4 + IPv6) support on agent deployments managed by the controller3. Overridable nginx config (
ui.nginxConfig):/etc/nginx/nginx.conflisten [::]:8080;for dual-stack without modifying the container image4. Configurable UI probes (
ui.startupProbe/ui.readinessProbe):httpGetprobes are rendered — no change for existing usersexec-based probes if neededDesign note: why
ui.nginxConfiginstead of modifyingui/conf/nginx.confdirectlyThe most obvious fix for the UI's IPv6 issue would be to add
listen [::]:8080;directly to the baked-innginx.conf. However, this would break clusters where IPv6 is disabled at the kernel level (ipv6.disable=1boot param ornet.ipv6.conf.all.disable_ipv6=1sysctl). In those environments, nginx would fail to start with:Since we can't know at image build time whether the target cluster supports IPv6, the safe approach is to keep the default
nginx.confIPv4-only and let users opt in to dual-stack viaui.nginxConfig.That said, if the project considers IPv6-disabled kernels out of scope (they're increasingly rare in modern Kubernetes), adding
listen [::]:8080;directly to the default config would be simpler. Happy to adjust based on project preference.Files changed
go/adk/pkg/a2a/server/server.gonet.JoinHostPortfor proper IPv6 address formattinggo/core/internal/controller/translator/agent/adk_api_translator.goDefaultAgentBindHostvariablego/core/internal/controller/translator/agent/deployments.goDefaultAgentBindHostinstead of hardcoded"0.0.0.0"go/core/pkg/app/app.go--default-agent-bind-hostCLI flaghelm/kagent/templates/controller-configmap.yamlDEFAULT_AGENT_BIND_HOSThelm/kagent/templates/ui-deployment.yamlhelm/kagent/templates/ui-nginx-configmap.yamlhelm/kagent/values.yamlcontroller.agentDeployment.host,ui.startupProbe,ui.readinessProbe,ui.nginxConfighelm/kagent/tests/controller-deployment_test.yamlhelm/kagent/tests/ui-deployment_test.yamlTesting
go test ./internal/controller/translator/agent/...,go test ./pkg/app/...)go build ./...passes for bothgo/coreandgo/adkUsage example
Fixes #1642