Surfacing 5 bugs we hit while running agents-cli inside the benchmark suite at ivanmkc/agent-generator (full triage at issue #574). Each block has the verbatim error / observed behaviour, the case that hit it, and a suggested fix direction.
Filing as one issue because they share an environment (CI / sandboxed CLI invocation) and a team. Happy to split into individual issues if that's easier to triage.
1. agents-cli deploy doesn't auto-detect GCP project from ADC
Verbatim error (captured in claude_opus:CLI-DEV-INCIDENT-RESPONSE-001-EXP transcript, event 177):
Error: Could not determine GCP project. Pass --project <PROJECT_ID> or set a default: gcloud config set project <PROJECT_ID>
Setup: ADC was valid (gcloud auth application-default print-access-token succeeded), GOOGLE_CLOUD_PROJECT env var was set, but agents-cli deploy failed without an explicit --project flag.
Suggested fix: fall back to google.auth.default() to read the project from ADC (or os.environ['GOOGLE_CLOUD_PROJECT']) when --project is missing.
2. agents-cli infra cicd doesn't auto-detect $GH_TOKEN from env
Observed: when running agents-cli infra cicd without --github-pat, the agent prompts for input. Even with GH_TOKEN (and GITHUB_TOKEN) set in the environment and gh auth status succeeding, the CLI doesn't pick those up.
Setup: cases that drive infra cicd from a non-interactive runner can't satisfy the prompt. We worked around this in benchmark prompts by passing --github-pat "$GH_TOKEN" --github-app-installation-id "$GITHUB_APP_INSTALLATION_ID" explicitly, but auto-detection would be more ergonomic.
Suggested fix: when --github-pat is not supplied, read GH_TOKEN then GITHUB_TOKEN from env (matching the gh CLI's own precedence).
3. agents-cli deploy to agent_runtime doesn't poll long-running ops
Observed (gemini_pro:CLI-DEV-SELF-TUNING-SUPPORT-001-EXP, event 144):
agents-cli deploy --project adk-coding-agents --region global
# returns 0 after a few seconds
Workspace deployment_metadata.json after the call:
{
"remote_agent_runtime_id": "None",
"pending_operation": {
"operation_name": "projects/.../reasoningEngines/.../operations/...",
"started_at": "..."
}
}
The deploy started the Vertex op but didn't wait for it to complete. remote_agent_runtime_id is the string "None" (which downstream code then can't use).
Suggested fix: expose a --wait flag (or make polling-to-completion the default) that polls the operation, blocks until done, and writes the real remote_agent_runtime_id.
4. agents-cli deploy to cloud_run requires gcloud beta but doesn't check
Observed (gemini_pro:CLI-DEV-INCIDENT-RESPONSE-001-EXP, agent reasoning at event 86):
"I'm encountering an issue where the deployment failed because gcloud beta isn't installed or requires sudo."
The deploy succeeded for everyone with gcloud beta pre-installed but silently failed in our sandbox where it wasn't installed by default.
Suggested fix: at the start of deploy --target cloud_run, run gcloud components list --filter='id:beta' and emit an actionable error like "agents-cli deploy --target cloud_run requires gcloud beta. Install via gcloud components install beta."
5. agents-cli eval run records empty actual_tools_called even when tools were called
Observed (both INCIDENT-RESPONSE cases, claude_opus + gemini_pro):
The eval framework records actual_tools_called: [] for all eval cases even though the agent transcripts clearly show the agent calling the relevant tools during the original conversation. Both cases worked around this by removing tool_trajectory_avg_score from eval_config.json to make the eval "pass" — defeating the metric.
agent-project/app/.adk/eval_history/*.json contains the empty arrays.
Suggested fix: trace through where actual_tools_called gets populated in agents-cli eval (might be an adk eval issue upstream). Likely either replay-vs-live mode mismatch or schema-name drift between the evalset format and the runner's expectation.
6. agents-cli data-ingestion: uv venv excludes pip by default, breaks KFP/swifter
Observed (gemini_pro:CLI-DEV-INSTITUTIONAL-MEMORY-001-EXP, events 61-82):
agents-cli data-ingestion triggers a uv venv setup that doesn't include pip. Downstream code (KFP / swifter / dask) tries to import pip and crashes. Agent had to patch pyproject.toml and inject a sitecustomize.py to silence psutil.NoSuchProcess errors.
Suggested fix: either include pip in the uv venv invocation (uv venv --seed or uv venv --python-preference managed --seed), or document the dependency requirement clearly so callers know to opt in.
Context
All 6 bugs surfaced via the agents-cli-golden benchmark suite in ivanmkc/agent-generator, which we use to track agent CLI evaluation. We've worked around each one in case prompts / sandbox config, but the workarounds are brittle and a fix in agents-cli would benefit any non-interactive runner. Happy to share full trace transcripts if useful (each is a gzipped JSON in our viewer).
Surfacing 5 bugs we hit while running
agents-cliinside the benchmark suite ativanmkc/agent-generator(full triage at issue #574). Each block has the verbatim error / observed behaviour, the case that hit it, and a suggested fix direction.Filing as one issue because they share an environment (CI / sandboxed CLI invocation) and a team. Happy to split into individual issues if that's easier to triage.
1.
agents-cli deploydoesn't auto-detect GCP project from ADCVerbatim error (captured in
claude_opus:CLI-DEV-INCIDENT-RESPONSE-001-EXPtranscript, event 177):Setup: ADC was valid (
gcloud auth application-default print-access-tokensucceeded),GOOGLE_CLOUD_PROJECTenv var was set, butagents-cli deployfailed without an explicit--projectflag.Suggested fix: fall back to
google.auth.default()to read the project from ADC (oros.environ['GOOGLE_CLOUD_PROJECT']) when--projectis missing.2.
agents-cli infra cicddoesn't auto-detect$GH_TOKENfrom envObserved: when running
agents-cli infra cicdwithout--github-pat, the agent prompts for input. Even withGH_TOKEN(andGITHUB_TOKEN) set in the environment andgh auth statussucceeding, the CLI doesn't pick those up.Setup: cases that drive
infra cicdfrom a non-interactive runner can't satisfy the prompt. We worked around this in benchmark prompts by passing--github-pat "$GH_TOKEN" --github-app-installation-id "$GITHUB_APP_INSTALLATION_ID"explicitly, but auto-detection would be more ergonomic.Suggested fix: when
--github-patis not supplied, readGH_TOKENthenGITHUB_TOKENfrom env (matching theghCLI's own precedence).3.
agents-cli deploytoagent_runtimedoesn't poll long-running opsObserved (gemini_pro:
CLI-DEV-SELF-TUNING-SUPPORT-001-EXP, event 144):Workspace
deployment_metadata.jsonafter the call:{ "remote_agent_runtime_id": "None", "pending_operation": { "operation_name": "projects/.../reasoningEngines/.../operations/...", "started_at": "..." } }The deploy started the Vertex op but didn't wait for it to complete.
remote_agent_runtime_idis the string"None"(which downstream code then can't use).Suggested fix: expose a
--waitflag (or make polling-to-completion the default) that polls the operation, blocks until done, and writes the realremote_agent_runtime_id.4.
agents-cli deploytocloud_runrequiresgcloud betabut doesn't checkObserved (gemini_pro:
CLI-DEV-INCIDENT-RESPONSE-001-EXP, agent reasoning at event 86):The deploy succeeded for everyone with
gcloud betapre-installed but silently failed in our sandbox where it wasn't installed by default.Suggested fix: at the start of
deploy --target cloud_run, rungcloud components list --filter='id:beta'and emit an actionable error like "agents-cli deploy --target cloud_run requiresgcloud beta. Install viagcloud components install beta."5.
agents-cli eval runrecords emptyactual_tools_calledeven when tools were calledObserved (both INCIDENT-RESPONSE cases, claude_opus + gemini_pro):
The eval framework records
actual_tools_called: []for all eval cases even though the agent transcripts clearly show the agent calling the relevant tools during the original conversation. Both cases worked around this by removingtool_trajectory_avg_scorefromeval_config.jsonto make the eval "pass" — defeating the metric.agent-project/app/.adk/eval_history/*.jsoncontains the empty arrays.Suggested fix: trace through where
actual_tools_calledgets populated inagents-cli eval(might be anadk evalissue upstream). Likely either replay-vs-live mode mismatch or schema-name drift between the evalset format and the runner's expectation.6.
agents-cli data-ingestion:uv venvexcludespipby default, breaks KFP/swifterObserved (gemini_pro:
CLI-DEV-INSTITUTIONAL-MEMORY-001-EXP, events 61-82):agents-cli data-ingestiontriggers auv venvsetup that doesn't includepip. Downstream code (KFP /swifter/dask) tries to importpipand crashes. Agent had to patchpyproject.tomland inject asitecustomize.pyto silencepsutil.NoSuchProcesserrors.Suggested fix: either include
pipin theuv venvinvocation (uv venv --seedoruv venv --python-preference managed --seed), or document the dependency requirement clearly so callers know to opt in.Context
All 6 bugs surfaced via the
agents-cli-goldenbenchmark suite in ivanmkc/agent-generator, which we use to track agent CLI evaluation. We've worked around each one in case prompts / sandbox config, but the workarounds are brittle and a fix inagents-cliwould benefit any non-interactive runner. Happy to share full trace transcripts if useful (each is a gzipped JSON in our viewer).