[audit-workflows] Agentic Workflow Audit — 2026-05-15 (last 24h) #32485

2026-05-15T21:46:10Z

github-actions[bot]
Bot May 15, 2026

Overview

First seeded run of the Agentic Workflow Audit Agent for the 24-hour window ending 2026-05-15 21:33 UTC. The fleet executed 79 runs across 35 distinct workflows with a 92.9% success rate (65 success / 1 failure / 4 cancelled / 2 in-progress). Cost was modest at $25.38 across 281.7M effective tokens, but per-workflow firewall block pressure on the Visual Regression Checker and Linter Miner warrants attention. Repo memory was bootstrapped from this audit, so trend charts will gain shape with each subsequent cycle.

Summary

Metric	Value
Total runs	79
Success rate	92.9% (65 / 70 with conclusion)
Failures	1 (Test Quality Sentinel)
Cancelled	4 (all Smoke CI — concurrency cancellation on PR pushes)
In-progress at snapshot	2
Total duration	8.5 h
Total action minutes	546
Total cost (USD)	$25.38
Total tokens / effective tokens	46.6M / 281.7M
Total errors / warnings	23 / 0
Missing tools / missing data	0 / 0
GitHub API calls	373
Firewall blocked / total	516 / 2,326 (22.2%)
DIFC integrity-filtered events	7
Engine mix	copilot 53, claude 11, codex 8

Status: ✅ Overall healthy fleet.

Critical Issues

Test Quality Sentinel failure (§25934885936): the agent first called safeoutputs.add_comment with pull_request_number instead of the required body field; it retried with the correct schema and posted the report successfully. Workflow still concluded failure, likely due to OTLP telemetry export receiving HTTP 401 Unauthorized and 404 Not Found responses during the run. Worth investigating whether GH_AW_OTEL_* credentials are stale or whether the prompt template should pre-format the comment body.
Visual Regression Checker is in a network-friction hotspot: 21/33 requests blocked (64%) including accounts.google.com, www.google.com, clients2.google.com, safebrowsingohttpgateway.googleapis.com. These look like Chromium safe-browsing / sign-in beacons rather than required functionality, but they generate noise in the firewall log.
Linter Miner generated 127 firewall blocks against (unknown) destinations in a single run — by far the largest block count today. Worth inspecting whether the linter is reaching for cached binaries the egress allowlist hasn't been told about (proxy.golang.org, sum.golang.org, and storage.googleapis.com are present in allowed traffic, so this is something else).

Workflow Health Trend

First data point only — subsequent runs will populate the trend line. Today the fleet ran 65 successful, 1 failed, and 4 cancelled runs for a 92.9% success rate. Future audits will let us spot drift in this rate and flag regression episodes.

Token & Cost Trend

Today: 46.6M raw tokens (281.7M effective with cache amplification) at $25.38 — averaging about $0.32 per run. The 7-day moving average becomes meaningful after a week of data; for now this is the baseline.

Top duration consumers

Workflow	Runs	Total min	Avg min
AI Moderator	7	89.7	12.8
Smoke CI	22	63.9	2.9
PR Sous Chef	7	54.6	7.8
Linter Miner	1	23.5	23.5
Release	1	19.2	19.2
Documentation Unbloat	1	18.7	18.7
Smoke OTEL Backends	3	17.7	5.9
Chaos PR Bundle Fuzzer	2	15.1	7.5
Contribution Check	2	13.7	6.8
Daily Caveman Optimizer	1	13.1	13.1

AI Moderator dominates wall-clock time; PR Sous Chef varied 8–45 turns across runs (avg 20), which the platform flagged as execution drift — worth a look at prompt stability if it persists.

Top firewall-blocked workflows

Workflow	Blocked	Total	Block rate
Linter Miner	127	326	39%
Contribution Check	26	69	38%
Visual Regression Checker	21	33	64%
Chaos PR Bundle Fuzzer	20	41	49%
Daily Testify Uber Super Expert	20	45	44%
Daily Project Performance Summary	19	45	42%
Outcome Collector	13	32	41%
Bot Detection	12	22	55%
GEO Optimizer Daily Audit	11	25	44%
Matt Pocock Skills Reviewer	11	23	48%
Smoke OTEL Backends	11	24	46%

Most blocked traffic resolves as (unknown) (502 of 516 blocks), which suggests connections that never completed SNI/DNS far enough for the egress proxy to label them. The named blocked domains all belong to Google sign-in / safe-browsing, exclusively from Visual Regression Checker.

Top MCP tool calls

MCP servers in use: safeoutputs (92 calls), github (48), grafana (20), sentry (13), agenticworkflows (4).

Server / Tool	Calls
github / issue_read	24
safeoutputs / add_comment	21
safeoutputs / noop	19
grafana / tempo_traceql-search	13
safeoutputs / create_pull_request	13
github / pull_request_read	9
github / search_repositories	8
safeoutputs / create_discussion	8
safeoutputs / add_labels	7
sentry / find_projects	7
safeoutputs / create_issue	6
safeoutputs / upload_asset	6

No MCP failures, no missing tools, no missing data signals — a clean MCP day.

DIFC integrity-filtered events

7 GitHub MCP calls (list_issues, search_issues) had results filtered because target issues (#32467, #32459, #32446, #32413) carried integrity below the agent's approved threshold. This is the integrity guard working as designed — the underlying issues were created via low-trust pathways and were correctly hidden from agent reads.

Concentrated-risk episodes

The platform flagged 4 episodes with risk_distribution=concentrated:

PR Sous Chef §25933434561 — risky=1, resource_heavy=1, 19 blocked requests
PR Sous Chef §25937457945 — risky=1, resource_heavy=1, 26 blocked requests
Smoke OTEL Backends §25936110696 — risky=1, resource_heavy=1, 14 blocked requests
Issue Monster §25939977409 — risky=1, 5 blocked requests

None of these are escalation-eligible per the platform classifier, but PR Sous Chef shows up twice today and also drove the execution-drift insight — worth a closer look on the next audit.

Recommendations

Investigate Test Quality Sentinel OTLP 401/404: the agent retry succeeded but the workflow concluded failure. If the failure was inference-access-related (the run set inference_access_error, mcp_policy_error, and model_not_supported_error outputs), confirm the OTEL credentials are current; otherwise the workflow may be marking false-positive failures.
Tighten egress allowlist or suppress Chromium beacons for Visual Regression Checker: a 64% block rate is noisy and currently swallows real signals. Either add the safe-browsing/Google endpoints to a Chromium-specific allowlist, or disable safe-browsing/sign-in inside the test browser profile.
Investigate Linter Miner 127 (unknown) blocks: the magnitude suggests a tooling step is reaching outside the allowlist. Compare today's run to past runs to see if a dependency moved.
Watch PR Sous Chef for drift: 8–45 turn variance is large for a triage-style workflow. Worth pinning more of the prompt or trimming branching in the agent's task description.
Suppress Smoke CI cancellation noise: 4 of the 23 reported errors come from concurrency-cancelled Smoke CI runs (correct CI behavior on rapid PR pushes). Consider tagging cancelled runs separately in the audit pipeline so they do not inflate the error counter.

Repo Memory Updated

Bootstrapped under /tmp/gh-aw/repo-memory/default/:

audits/2026-05-15.json — full audit snapshot
audits/2026-05-15-anomalies.json — 7 anomaly entries
audits/index.json — chronological index seeded
metrics/daily.jsonl — day-0 metrics row
patterns/errors.json — Test Quality Sentinel failure signature opened
patterns/missing-tools.json — zero-state seeded
patterns/mcp-failures.json — zero-state seeded
trending/workflow_health/history.jsonl — first data point
trending/token_cost/history.jsonl — first data point

References:

Audit run: §25942489497
Test Quality Sentinel failure: §25934885936
Concentrated-risk PR Sous Chef: §25937457945

Generated by 🔍 Agentic Workflow Audit Agent · ● 13M · ◷

expires on May 16, 2026, 9:46 PM UTC

2026-05-16T21:43:13Z

github-actions[bot]
Bot May 16, 2026
Author

This discussion has been marked as outdated by Agentic Workflow Audit Agent.

A newer discussion is available at Discussion #32712.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Agentic Workflow Audit — 2026-05-15 (last 24h) #32485

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] Agentic Workflow Audit — 2026-05-15 (last 24h) #32485

Uh oh!

github-actions[bot] Bot May 15, 2026

Overview

Summary

Critical Issues

Workflow Health Trend

Token & Cost Trend

Recommendations

Repo Memory Updated

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 16, 2026 Author

github-actions[bot]
Bot May 15, 2026

github-actions[bot]
Bot May 16, 2026
Author