[audit-workflows] Agentic Workflow Audit — 2026-05-15 (last 24h) #32485
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #32712. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
First seeded run of the Agentic Workflow Audit Agent for the 24-hour window ending 2026-05-15 21:33 UTC. The fleet executed 79 runs across 35 distinct workflows with a 92.9% success rate (65 success / 1 failure / 4 cancelled / 2 in-progress). Cost was modest at $25.38 across 281.7M effective tokens, but per-workflow firewall block pressure on the
Visual Regression CheckerandLinter Minerwarrants attention. Repo memory was bootstrapped from this audit, so trend charts will gain shape with each subsequent cycle.Summary
Status: ✅ Overall healthy fleet.
Critical Issues
safeoutputs.add_commentwithpull_request_numberinstead of the requiredbodyfield; it retried with the correct schema and posted the report successfully. Workflow still concludedfailure, likely due to OTLP telemetry export receiving HTTP 401 Unauthorized and 404 Not Found responses during the run. Worth investigating whetherGH_AW_OTEL_*credentials are stale or whether the prompt template should pre-format the comment body.accounts.google.com,www.google.com,clients2.google.com,safebrowsingohttpgateway.googleapis.com. These look like Chromium safe-browsing / sign-in beacons rather than required functionality, but they generate noise in the firewall log.(unknown)destinations in a single run — by far the largest block count today. Worth inspecting whether the linter is reaching for cached binaries the egress allowlist hasn't been told about (proxy.golang.org,sum.golang.org, andstorage.googleapis.comare present in allowed traffic, so this is something else).Workflow Health Trend
First data point only — subsequent runs will populate the trend line. Today the fleet ran 65 successful, 1 failed, and 4 cancelled runs for a 92.9% success rate. Future audits will let us spot drift in this rate and flag regression episodes.
Token & Cost Trend
Today: 46.6M raw tokens (281.7M effective with cache amplification) at $25.38 — averaging about $0.32 per run. The 7-day moving average becomes meaningful after a week of data; for now this is the baseline.
Top duration consumers
AI Moderator dominates wall-clock time; PR Sous Chef varied 8–45 turns across runs (avg 20), which the platform flagged as execution drift — worth a look at prompt stability if it persists.
Top firewall-blocked workflows
Most blocked traffic resolves as
(unknown)(502 of 516 blocks), which suggests connections that never completed SNI/DNS far enough for the egress proxy to label them. The named blocked domains all belong to Google sign-in / safe-browsing, exclusively from Visual Regression Checker.Top MCP tool calls
MCP servers in use:
safeoutputs(92 calls),github(48),grafana(20),sentry(13),agenticworkflows(4).No MCP failures, no missing tools, no missing data signals — a clean MCP day.
DIFC integrity-filtered events
7 GitHub MCP calls (
list_issues,search_issues) had results filtered because target issues (#32467,#32459,#32446,#32413) carried integrity below the agent'sapprovedthreshold. This is the integrity guard working as designed — the underlying issues were created via low-trust pathways and were correctly hidden from agent reads.Concentrated-risk episodes
The platform flagged 4 episodes with
risk_distribution=concentrated:None of these are escalation-eligible per the platform classifier, but PR Sous Chef shows up twice today and also drove the execution-drift insight — worth a closer look on the next audit.
Recommendations
Test Quality SentinelOTLP 401/404: the agent retry succeeded but the workflow concludedfailure. If the failure was inference-access-related (the run setinference_access_error,mcp_policy_error, andmodel_not_supported_erroroutputs), confirm the OTEL credentials are current; otherwise the workflow may be marking false-positive failures.Visual Regression Checker: a 64% block rate is noisy and currently swallows real signals. Either add the safe-browsing/Google endpoints to a Chromium-specific allowlist, or disable safe-browsing/sign-in inside the test browser profile.Linter Miner127(unknown)blocks: the magnitude suggests a tooling step is reaching outside the allowlist. Compare today's run to past runs to see if a dependency moved.PR Sous Cheffor drift: 8–45 turn variance is large for a triage-style workflow. Worth pinning more of the prompt or trimming branching in the agent's task description.Smoke CIcancellation noise: 4 of the 23 reported errors come from concurrency-cancelled Smoke CI runs (correct CI behavior on rapid PR pushes). Consider tagging cancelled runs separately in the audit pipeline so they do not inflate the error counter.Repo Memory Updated
Bootstrapped under
/tmp/gh-aw/repo-memory/default/:audits/2026-05-15.json— full audit snapshotaudits/2026-05-15-anomalies.json— 7 anomaly entriesaudits/index.json— chronological index seededmetrics/daily.jsonl— day-0 metrics rowpatterns/errors.json—Test Quality Sentinelfailure signature openedpatterns/missing-tools.json— zero-state seededpatterns/mcp-failures.json— zero-state seededtrending/workflow_health/history.jsonl— first data pointtrending/token_cost/history.jsonl— first data pointReferences:
Beta Was this translation helpful? Give feedback.
All reactions