Skip to content

[cli-tools-test] MCP CLI bridge passes integer parameters as strings, breaking count and max_tokens in logs tool #26824

@github-actions

Description

@github-actions

Daily exploratory testing (run §24549261520) found multiple issues across the audit, logs, and compile MCP tools. The most critical is in the CLI bridge integer type handling.

Critical: Integer params passed as strings (logs tool)

The mcp_cli_bridge.cjs parseToolArgs function stores all CLI --key value pairs as strings. The MCP server validates against a JSON schema that expects count and max_tokens to be integers, causing every call with these params to fail.

Reproduction:

agenticworkflows logs --start_date "-1d" --count 3
# → validating "arguments": validating root: validating /properties/count: type: 3 has type "string", want "integer"

agenticworkflows logs --start_date "-1d" --count 3 --max_tokens 3000
# → validating "arguments": validating root: validating /properties/max_tokens: type: 3000 has type "string", want "integer"

Root cause: parseToolArgs in mcp_cli_bridge.cjs (line ~461):

result[raw] = args[i + 1]; // always string, no type coercion

Fix needed: Use the tool's JSON schema (when available) to coerce numeric string values to integers/numbers before calling mcpToolsCall. When the tools file is empty (as it currently is for agenticworkflows), fall back to trying parseInt/parseFloat for values that look numeric.

Impact: The count and max_tokens parameters are completely unusable from the CLI, making it impossible to limit result counts or token consumption. This is a workflow testing/automation blocker.


Additional Issues Found During Testing

Major: `/tmp/gh-aw/logs-cache/` files not readable (permission denied)

The logs tool writes results to /tmp/gh-aw/logs-cache/*.json but those files are created with permissions that prevent the agent from reading them.

ls -la /tmp/gh-aw/logs-cache/
# → ls: cannot open directory '/tmp/gh-aw/logs-cache/': Permission denied

The tool returns a file_path pointing to this cache, but users cannot read it. Additionally, a future/out-of-range date (2030-01-01) and a far-past date range (2020-01-01 to 2020-12-31) both return the same cache file hash, suggesting a cache key collision when both queries produce empty results.

Minor: `compile` tool rejects workflow-specific filtering params

The compile tool does not accept any parameter to filter compilation to a single workflow — both workflow_name and workflow are rejected as "unexpected additional properties":

agenticworkflows compile --workflow_name "agent-performance-analyzer"
# → validating "arguments": validating root: unexpected additional properties ["workflow_name"]

agenticworkflows compile --workflow "agent-performance-analyzer"
# → validating "arguments": validating root: unexpected additional properties ["workflow"]

Only bulk compilation of all 194 workflows is possible. A workflow_name (or workflow) filter param would significantly speed up targeted testing.

Minor: `audit` does not set `isError: true` on failure responses

When auditing an invalid or non-existent run ID, the response contains an error key in the JSON body but isError is not set to true in the MCP result envelope. Clients that rely on isError for error detection will silently treat errors as successful results.

agenticworkflows audit --run_id_or_url "999999999"
# Response: isError=None, body: {"error":"failed to audit workflow run: ✗ failed to fetch run metadata","run_id_or_url":"999999999"}

Expected: "isError": true should be set in the MCP result when an error occurs.

Minor: `audit` `error_count` inconsistency between summary.json and report

For run 24549148853 (Design Decision Gate, conclusion: failure):

  • summary.json reports error_count: 1
  • The audit report metrics.error_count shows 0

The audit report still correctly identifies the failure via key_findings, but the metrics.error_count field is inconsistent with the raw summary data.

What Worked Well

  • status tool returns all 194 workflows with correct metadata
  • logs tool works with string params (engine, workflow_name, start_date, end_date)
  • compile (no params) compiles all 194 workflows: 0 errors, 0 warnings
  • audit returns rich structured reports (overview, metrics, key_findings, recommendations, jobs, behavior_fingerprint)
  • ✅ Non-existent workflow in logs returns graceful empty result (total_runs: 0)
  • ✅ Failed run audit correctly identifies failure with key findings and recommendations
  • audit performance on cached runs: ~150ms

Environment

  • Repository: github/gh-aw
  • Run ID: §24549261520
  • Date: 2026-04-17
  • Workflows tested: 194 compiled, 4 audited, 6 log filter variants

Generated by Daily CLI Tools Exploratory Tester · ● 2.6M ·

  • expires on Apr 24, 2026, 5:39 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions