Skip to content

feat: add per-agent A2A client timeout configuration#1615

Open
ko3n1g wants to merge 2 commits intokagent-dev:mainfrom
ko3n1g:feat/a2a-agent-timeout
Open

feat: add per-agent A2A client timeout configuration#1615
ko3n1g wants to merge 2 commits intokagent-dev:mainfrom
ko3n1g:feat/a2a-agent-timeout

Conversation

@ko3n1g
Copy link
Copy Markdown

@ko3n1g ko3n1g commented Apr 2, 2026

Problem

Agents that perform expensive LLM work — such as log analysis with multi-chunk reads — can exceed the default 600 s A2A client read timeout. When this happens, the call site receives an A2AClientTimeoutError which the JSON-RPC layer surfaces as a -32603 Internal Error, making the root cause hard to diagnose and impossible to work around without code changes.

Solution

Add an optional timeout field (float64, seconds) to TypedReference so operators can configure a per-agent A2A read timeout directly in the Agent CRD:

tools:
  - type: Agent
    agent:
      ref: my-slow-analyst-agent
      timeout: 1800   # 30 minutes

The value flows from the CRD through the translator into RemoteAgentConfig, where the ADK runtime uses it as the HTTP client read deadline for that specific agent-to-agent call. Agents without an explicit timeout continue to use the existing default (600 s).

Changed files

File Change
go/api/v1alpha2/agent_types.go Add Timeout *float64 to TypedReference
go/api/adk/types.go Add Timeout *float64 to RemoteAgentConfig
go/core/internal/controller/translator/agent/adk_api_translator.go Propagate tool.Agent.Timeout into the built RemoteAgentConfig
helm/kagent-crds/templates/kagent.dev_agents.yaml Expose timeout in CRD schema (manual update; re-run make generate manifests to regenerate from kubebuilder markers)

Note: make generate manifests was not run as part of this PR because controller-gen is not available in the contributor environment. The CRD YAML has been updated manually to match the new field; CI's generate/manifests job should produce an identical result.

Testing

  • cd go && go build ./... passes cleanly (verified via golang:1.26-alpine Docker image).

🤖 Generated with Claude Code

Agents performing expensive LLM work (e.g. log analysis with multi-chunk
reads) can exceed the default 600s A2A client read timeout, causing
cascading A2AClientTimeoutError → JSON-RPC -32603 Internal Error.

Add an optional `timeout` field (seconds, float64) on TypedReference so
operators can set longer timeouts for slow agent-to-agent calls. The
value is propagated through RemoteAgentConfig to the ADK runtime.

The CRD schema is updated manually; re-run `make generate manifests` to
regenerate it from the kubebuilder markers after this lands.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Copilot AI review requested due to automatic review settings April 2, 2026 14:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds per-agent A2A client timeout configuration, allowing operators to specify a custom timeout for agent-to-agent calls directly in the Agent CRD. The timeout is meant to flow from the CRD through the translator into RemoteAgentConfig for use by the ADK runtime.

Changes:

  • Add Timeout *float64 field to TypedReference struct in go/api/v1alpha2/agent_types.go
  • Add Timeout *float64 field to RemoteAgentConfig struct in go/api/adk/types.go
  • Propagate timeout from tool.Agent to RemoteAgentConfig in the translator
  • Add timeout field to CRD schema in helm/kagent-crds/templates/kagent.dev_agents.yaml

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
go/api/v1alpha2/agent_types.go Adds Timeout field to TypedReference with proper documentation
go/api/adk/types.go Adds Timeout field to RemoteAgentConfig as *float64
go/core/internal/controller/translator/agent/adk_api_translator.go Correctly propagates timeout from TypedReference to RemoteAgentConfig
helm/kagent-crds/templates/kagent.dev_agents.yaml Updates CRD schema with timeout field (though schema is incomplete - missing core TypedReference fields)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2241 to +2245
timeout:
description: A2A client read timeout in seconds for calls
to this agent. Defaults to 600s if unset.
format: double
type: number
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD schema for the agent tool reference does not match the Go struct definition. The CRD shows only "ref" and "timeout" fields, but the Go TypedReference struct has "kind", "apiGroup", "name", "namespace", and "timeout" fields. The code at line 723 uses tool.Agent.NamespacedName() which expects "name" and "namespace" fields separately. Users following the CRD schema would specify only a "ref" string, which would be incompatible with the Go struct deserialization. The CRD should be regenerated with make generate manifests to match the actual struct definition.

Copilot uses AI. Check for mistakes.
Comment on lines +754 to +757
if tool.Agent.Timeout != nil {
remoteAgent.Timeout = tool.Agent.Timeout
}
cfg.RemoteAgents = append(cfg.RemoteAgents, remoteAgent)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new timeout field for agent tools lacks test coverage. There are existing tests for agent tool translation (e.g., Test_AdkApiTranslator_CrossNamespaceAgentTool), but none verify that the timeout value is correctly propagated from the Tool.Agent reference to the RemoteAgentConfig. Consider adding test cases to verify timeout handling, including when timeout is set and when it's nil.

Copilot uses AI. Check for mistakes.
Comment on lines +448 to +451
// Timeout specifies the A2A client read timeout in seconds for calls to this agent.
// Defaults to 600s if unset.
// +optional
Timeout *float64 `json:"timeout,omitempty"`
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TypedReference struct has been modified with a new Timeout field, but the generated deepcopy code in zz_generated.deepcopy.go was not regenerated. While the current simple *out = *in assignment works correctly for this case (all fields are scalar types), it's important that make generate is run to properly update all generated code. The PR description mentions controller-gen wasn't available during development, but this must be done before merging.

Copilot uses AI. Check for mistakes.
Comment on lines +754 to +757
if tool.Agent.Timeout != nil {
remoteAgent.Timeout = tool.Agent.Timeout
}
cfg.RemoteAgents = append(cfg.RemoteAgents, remoteAgent)
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout field is being correctly propagated to RemoteAgentConfig by the translator (line 755). However, note that this timeout value must still be consumed by the ADK runtime when creating the HTTP client for the remote agent tool (in go/adk/pkg/agent/agent.go around line 60, where NewKAgentRemoteA2ATool is called with nil for httpClient). If the timeout is not consumed by the runtime, the configuration will have no effect.

Copilot uses AI. Check for mistakes.
- Create a custom http.Client with Timeout set when RemoteAgentConfig.Timeout
  is non-nil, so the per-agent A2A read timeout actually takes effect at the
  transport layer (previously the value was stored but never used).
- Add Test_AdkApiTranslator_AgentToolTimeout covering both the set and nil
  timeout cases to verify propagation from TypedReference → RemoteAgentConfig.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Copy link
Copy Markdown
Author

@ko3n1g ko3n1g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the thorough review. Here is the response to each comment, with two real fixes:


Comment 1 — CRD schema mismatch (ref vs name/namespace/kind/apiGroup)

The ref field in the CRD is pre-existing — it was present before this PR (see HEAD~1). This PR only adds timeout to the already-existing agent tool schema block. The ref-vs-struct mismatch predates this change and is out of scope here. make generate manifests (which CI will run) is the correct fix for the full schema; the PR description already calls this out explicitly.


Comment 2 — Missing test coverage

Fixed. Added Test_AdkApiTranslator_AgentToolTimeout with two sub-cases:

  • timeout set → RemoteAgentConfig.Timeout is non-nil and equals the input value
  • timeout nil → RemoteAgentConfig.Timeout is nil

Comment 3 — zz_generated.deepcopy.go not regenerated

The current *out = *in in the generated deepcopy is correct for *float64 (a pointer to a scalar — the pointer itself is copied by value, which is the right shallow-copy semantics for immutable scalars). No functional issue exists. make generate must still run before merge to keep the generated file in sync; this is already called out in the PR description.


Comment 4 — Timeout not consumed by ADK runtime

Fixed in go/adk/pkg/agent/agent.go. When remoteAgent.Timeout is non-nil, a custom http.Client is now constructed with Timeout set accordingly and passed to NewKAgentRemoteA2ATool instead of nil. This was a real no-op bug.

var httpClient *http.Client
if remoteAgent.Timeout != nil {
    httpClient = &http.Client{Timeout: time.Duration(*remoteAgent.Timeout * float64(time.Second))}
}
remoteTool, sessionID, err := tools.NewKAgentRemoteA2ATool(remoteAgent.Name, remoteAgent.Description, remoteAgent.Url, httpClient, remoteAgent.Headers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants