feat: add per-agent A2A client timeout configuration#1615
feat: add per-agent A2A client timeout configuration#1615ko3n1g wants to merge 2 commits intokagent-dev:mainfrom
Conversation
Agents performing expensive LLM work (e.g. log analysis with multi-chunk reads) can exceed the default 600s A2A client read timeout, causing cascading A2AClientTimeoutError → JSON-RPC -32603 Internal Error. Add an optional `timeout` field (seconds, float64) on TypedReference so operators can set longer timeouts for slow agent-to-agent calls. The value is propagated through RemoteAgentConfig to the ADK runtime. The CRD schema is updated manually; re-run `make generate manifests` to regenerate it from the kubebuilder markers after this lands. Signed-off-by: oliver könig <okoenig@nvidia.com>
There was a problem hiding this comment.
Pull request overview
This PR adds per-agent A2A client timeout configuration, allowing operators to specify a custom timeout for agent-to-agent calls directly in the Agent CRD. The timeout is meant to flow from the CRD through the translator into RemoteAgentConfig for use by the ADK runtime.
Changes:
- Add
Timeout *float64field toTypedReferencestruct in go/api/v1alpha2/agent_types.go - Add
Timeout *float64field toRemoteAgentConfigstruct in go/api/adk/types.go - Propagate timeout from tool.Agent to RemoteAgentConfig in the translator
- Add timeout field to CRD schema in helm/kagent-crds/templates/kagent.dev_agents.yaml
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| go/api/v1alpha2/agent_types.go | Adds Timeout field to TypedReference with proper documentation |
| go/api/adk/types.go | Adds Timeout field to RemoteAgentConfig as *float64 |
| go/core/internal/controller/translator/agent/adk_api_translator.go | Correctly propagates timeout from TypedReference to RemoteAgentConfig |
| helm/kagent-crds/templates/kagent.dev_agents.yaml | Updates CRD schema with timeout field (though schema is incomplete - missing core TypedReference fields) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| timeout: | ||
| description: A2A client read timeout in seconds for calls | ||
| to this agent. Defaults to 600s if unset. | ||
| format: double | ||
| type: number |
There was a problem hiding this comment.
The CRD schema for the agent tool reference does not match the Go struct definition. The CRD shows only "ref" and "timeout" fields, but the Go TypedReference struct has "kind", "apiGroup", "name", "namespace", and "timeout" fields. The code at line 723 uses tool.Agent.NamespacedName() which expects "name" and "namespace" fields separately. Users following the CRD schema would specify only a "ref" string, which would be incompatible with the Go struct deserialization. The CRD should be regenerated with make generate manifests to match the actual struct definition.
| if tool.Agent.Timeout != nil { | ||
| remoteAgent.Timeout = tool.Agent.Timeout | ||
| } | ||
| cfg.RemoteAgents = append(cfg.RemoteAgents, remoteAgent) |
There was a problem hiding this comment.
The new timeout field for agent tools lacks test coverage. There are existing tests for agent tool translation (e.g., Test_AdkApiTranslator_CrossNamespaceAgentTool), but none verify that the timeout value is correctly propagated from the Tool.Agent reference to the RemoteAgentConfig. Consider adding test cases to verify timeout handling, including when timeout is set and when it's nil.
| // Timeout specifies the A2A client read timeout in seconds for calls to this agent. | ||
| // Defaults to 600s if unset. | ||
| // +optional | ||
| Timeout *float64 `json:"timeout,omitempty"` |
There was a problem hiding this comment.
The TypedReference struct has been modified with a new Timeout field, but the generated deepcopy code in zz_generated.deepcopy.go was not regenerated. While the current simple *out = *in assignment works correctly for this case (all fields are scalar types), it's important that make generate is run to properly update all generated code. The PR description mentions controller-gen wasn't available during development, but this must be done before merging.
| if tool.Agent.Timeout != nil { | ||
| remoteAgent.Timeout = tool.Agent.Timeout | ||
| } | ||
| cfg.RemoteAgents = append(cfg.RemoteAgents, remoteAgent) |
There was a problem hiding this comment.
The timeout field is being correctly propagated to RemoteAgentConfig by the translator (line 755). However, note that this timeout value must still be consumed by the ADK runtime when creating the HTTP client for the remote agent tool (in go/adk/pkg/agent/agent.go around line 60, where NewKAgentRemoteA2ATool is called with nil for httpClient). If the timeout is not consumed by the runtime, the configuration will have no effect.
- Create a custom http.Client with Timeout set when RemoteAgentConfig.Timeout is non-nil, so the per-agent A2A read timeout actually takes effect at the transport layer (previously the value was stored but never used). - Add Test_AdkApiTranslator_AgentToolTimeout covering both the set and nil timeout cases to verify propagation from TypedReference → RemoteAgentConfig. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: oliver könig <okoenig@nvidia.com>
ko3n1g
left a comment
There was a problem hiding this comment.
Thank you for the thorough review. Here is the response to each comment, with two real fixes:
Comment 1 — CRD schema mismatch (ref vs name/namespace/kind/apiGroup)
The ref field in the CRD is pre-existing — it was present before this PR (see HEAD~1). This PR only adds timeout to the already-existing agent tool schema block. The ref-vs-struct mismatch predates this change and is out of scope here. make generate manifests (which CI will run) is the correct fix for the full schema; the PR description already calls this out explicitly.
Comment 2 — Missing test coverage
Fixed. Added Test_AdkApiTranslator_AgentToolTimeout with two sub-cases:
timeoutset →RemoteAgentConfig.Timeoutis non-nil and equals the input valuetimeoutnil →RemoteAgentConfig.Timeoutis nil
Comment 3 — zz_generated.deepcopy.go not regenerated
The current *out = *in in the generated deepcopy is correct for *float64 (a pointer to a scalar — the pointer itself is copied by value, which is the right shallow-copy semantics for immutable scalars). No functional issue exists. make generate must still run before merge to keep the generated file in sync; this is already called out in the PR description.
Comment 4 — Timeout not consumed by ADK runtime
Fixed in go/adk/pkg/agent/agent.go. When remoteAgent.Timeout is non-nil, a custom http.Client is now constructed with Timeout set accordingly and passed to NewKAgentRemoteA2ATool instead of nil. This was a real no-op bug.
var httpClient *http.Client
if remoteAgent.Timeout != nil {
httpClient = &http.Client{Timeout: time.Duration(*remoteAgent.Timeout * float64(time.Second))}
}
remoteTool, sessionID, err := tools.NewKAgentRemoteA2ATool(remoteAgent.Name, remoteAgent.Description, remoteAgent.Url, httpClient, remoteAgent.Headers)
Problem
Agents that perform expensive LLM work — such as log analysis with multi-chunk reads — can exceed the default 600 s A2A client read timeout. When this happens, the call site receives an
A2AClientTimeoutErrorwhich the JSON-RPC layer surfaces as a-32603 Internal Error, making the root cause hard to diagnose and impossible to work around without code changes.Solution
Add an optional
timeoutfield (float64, seconds) toTypedReferenceso operators can configure a per-agent A2A read timeout directly in the Agent CRD:The value flows from the CRD through the translator into
RemoteAgentConfig, where the ADK runtime uses it as the HTTP client read deadline for that specific agent-to-agent call. Agents without an explicittimeoutcontinue to use the existing default (600 s).Changed files
go/api/v1alpha2/agent_types.goTimeout *float64toTypedReferencego/api/adk/types.goTimeout *float64toRemoteAgentConfiggo/core/internal/controller/translator/agent/adk_api_translator.gotool.Agent.Timeoutinto the builtRemoteAgentConfighelm/kagent-crds/templates/kagent.dev_agents.yamltimeoutin CRD schema (manual update; re-runmake generate manifeststo regenerate from kubebuilder markers)Testing
cd go && go build ./...passes cleanly (verified viagolang:1.26-alpineDocker image).🤖 Generated with Claude Code