Daily exploratory testing (run §24549261520) found multiple issues across the audit, logs, and compile MCP tools. The most critical is in the CLI bridge integer type handling.
Critical: Integer params passed as strings (logs tool)
The mcp_cli_bridge.cjs parseToolArgs function stores all CLI --key value pairs as strings. The MCP server validates against a JSON schema that expects count and max_tokens to be integers, causing every call with these params to fail.
Reproduction:
agenticworkflows logs --start_date "-1d" --count 3
# → validating "arguments": validating root: validating /properties/count: type: 3 has type "string", want "integer"
agenticworkflows logs --start_date "-1d" --count 3 --max_tokens 3000
# → validating "arguments": validating root: validating /properties/max_tokens: type: 3000 has type "string", want "integer"
Root cause: parseToolArgs in mcp_cli_bridge.cjs (line ~461):
result[raw] = args[i + 1]; // always string, no type coercion
Fix needed: Use the tool's JSON schema (when available) to coerce numeric string values to integers/numbers before calling mcpToolsCall. When the tools file is empty (as it currently is for agenticworkflows), fall back to trying parseInt/parseFloat for values that look numeric.
Impact: The count and max_tokens parameters are completely unusable from the CLI, making it impossible to limit result counts or token consumption. This is a workflow testing/automation blocker.
Additional Issues Found During Testing
Major: `/tmp/gh-aw/logs-cache/` files not readable (permission denied)
The logs tool writes results to /tmp/gh-aw/logs-cache/*.json but those files are created with permissions that prevent the agent from reading them.
ls -la /tmp/gh-aw/logs-cache/
# → ls: cannot open directory '/tmp/gh-aw/logs-cache/': Permission denied
The tool returns a file_path pointing to this cache, but users cannot read it. Additionally, a future/out-of-range date (2030-01-01) and a far-past date range (2020-01-01 to 2020-12-31) both return the same cache file hash, suggesting a cache key collision when both queries produce empty results.
Minor: `compile` tool rejects workflow-specific filtering params
The compile tool does not accept any parameter to filter compilation to a single workflow — both workflow_name and workflow are rejected as "unexpected additional properties":
agenticworkflows compile --workflow_name "agent-performance-analyzer"
# → validating "arguments": validating root: unexpected additional properties ["workflow_name"]
agenticworkflows compile --workflow "agent-performance-analyzer"
# → validating "arguments": validating root: unexpected additional properties ["workflow"]
Only bulk compilation of all 194 workflows is possible. A workflow_name (or workflow) filter param would significantly speed up targeted testing.
Minor: `audit` does not set `isError: true` on failure responses
When auditing an invalid or non-existent run ID, the response contains an error key in the JSON body but isError is not set to true in the MCP result envelope. Clients that rely on isError for error detection will silently treat errors as successful results.
agenticworkflows audit --run_id_or_url "999999999"
# Response: isError=None, body: {"error":"failed to audit workflow run: ✗ failed to fetch run metadata","run_id_or_url":"999999999"}
Expected: "isError": true should be set in the MCP result when an error occurs.
Minor: `audit` `error_count` inconsistency between summary.json and report
For run 24549148853 (Design Decision Gate, conclusion: failure):
summary.json reports error_count: 1
- The audit report
metrics.error_count shows 0
The audit report still correctly identifies the failure via key_findings, but the metrics.error_count field is inconsistent with the raw summary data.
What Worked Well
- ✅
status tool returns all 194 workflows with correct metadata
- ✅
logs tool works with string params (engine, workflow_name, start_date, end_date)
- ✅
compile (no params) compiles all 194 workflows: 0 errors, 0 warnings
- ✅
audit returns rich structured reports (overview, metrics, key_findings, recommendations, jobs, behavior_fingerprint)
- ✅ Non-existent workflow in
logs returns graceful empty result (total_runs: 0)
- ✅ Failed run audit correctly identifies failure with key findings and recommendations
- ✅
audit performance on cached runs: ~150ms
Environment
- Repository: github/gh-aw
- Run ID: §24549261520
- Date: 2026-04-17
- Workflows tested: 194 compiled, 4 audited, 6 log filter variants
Generated by Daily CLI Tools Exploratory Tester · ● 2.6M · ◷
Daily exploratory testing (run §24549261520) found multiple issues across the
audit,logs, andcompileMCP tools. The most critical is in the CLI bridge integer type handling.Critical: Integer params passed as strings (
logstool)The
mcp_cli_bridge.cjsparseToolArgsfunction stores all CLI--key valuepairs as strings. The MCP server validates against a JSON schema that expectscountandmax_tokensto be integers, causing every call with these params to fail.Reproduction:
Root cause:
parseToolArgsinmcp_cli_bridge.cjs(line ~461):Fix needed: Use the tool's JSON schema (when available) to coerce numeric string values to integers/numbers before calling
mcpToolsCall. When the tools file is empty (as it currently is foragenticworkflows), fall back to tryingparseInt/parseFloatfor values that look numeric.Impact: The
countandmax_tokensparameters are completely unusable from the CLI, making it impossible to limit result counts or token consumption. This is a workflow testing/automation blocker.Additional Issues Found During Testing
Major: `/tmp/gh-aw/logs-cache/` files not readable (permission denied)
The
logstool writes results to/tmp/gh-aw/logs-cache/*.jsonbut those files are created with permissions that prevent the agent from reading them.ls -la /tmp/gh-aw/logs-cache/ # → ls: cannot open directory '/tmp/gh-aw/logs-cache/': Permission deniedThe tool returns a
file_pathpointing to this cache, but users cannot read it. Additionally, a future/out-of-range date (2030-01-01) and a far-past date range (2020-01-01to2020-12-31) both return the same cache file hash, suggesting a cache key collision when both queries produce empty results.Minor: `compile` tool rejects workflow-specific filtering params
The
compiletool does not accept any parameter to filter compilation to a single workflow — bothworkflow_nameandworkfloware rejected as "unexpected additional properties":Only bulk compilation of all 194 workflows is possible. A
workflow_name(orworkflow) filter param would significantly speed up targeted testing.Minor: `audit` does not set `isError: true` on failure responses
When auditing an invalid or non-existent run ID, the response contains an
errorkey in the JSON body butisErroris not set totruein the MCP result envelope. Clients that rely onisErrorfor error detection will silently treat errors as successful results.Expected:
"isError": trueshould be set in the MCP result when an error occurs.Minor: `audit` `error_count` inconsistency between summary.json and report
For run
24549148853(Design Decision Gate, conclusion:failure):summary.jsonreportserror_count: 1metrics.error_countshows0The audit report still correctly identifies the failure via
key_findings, but themetrics.error_countfield is inconsistent with the raw summary data.What Worked Well
statustool returns all 194 workflows with correct metadatalogstool works with string params (engine, workflow_name, start_date, end_date)compile(no params) compiles all 194 workflows: 0 errors, 0 warningsauditreturns rich structured reports (overview, metrics, key_findings, recommendations, jobs, behavior_fingerprint)logsreturns graceful empty result (total_runs: 0)auditperformance on cached runs: ~150msEnvironment