You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The 14-day window (Mar 16–30, 2026) shows a healthy, active repository with 105 runs across 46 workflows. No episodes are marked escalation_eligible by the deterministic model, and no runs carry a risky classification label. The dominant concern is that two smoke-test workflows — Smoke Claude and Smoke Copilot — are consistently assessed as resource-heavy and poorly controlled across every run in the period, crossing the escalation thresholds for those categories. Seven runs across four workflows failed, most tied to known fragile workflows (Changeset Generator, Smoke Codex). One missing-tool event was recorded for Smoke Copilot (Serena MCP server). No MCP server failures were recorded.
Key Metrics
Metric
Value
Date range
Mar 16–30, 2026 (14 days)
Workflows analyzed
46
Runs analyzed
105
Episodes analyzed
104
High-confidence episodes
103 (99%)
Runs with risky classification
0
Medium/high severity assessments
9 workflows
Escalation-eligible episodes
0
Total token usage
9.4 M
Total estimated cost
$2.72
Total Actions minutes
223 min
Total errors
7
MCP failures
0
Missing-tool events
1
Highest Risk Episodes
No episodes carry escalation_eligible = true in the deterministic model. The two highest-concern standalone episodes are flagged by repeated threshold crossings at the workflow level:
Both workflows appear in the triage domain. Their behavior fingerprints are write_heavy, heavy resource profile, and exploratory/directed style respectively. For smoke tests, these signals are structurally suspicious — smoke tests should typically be lean, directed, and read-capable.
Smoke Copilot additionally had a missing-tool event (Serena MCP server: activate_project, find_symbol) in run §23730326985. The tool was unavailable in the environment.
Episode Regressions
Issue Monster shows an increasing turn count across 3 runs (4 → 11 → 5 turns), with two runs classified as changed and a turns_increase reason code. The most recent run (§23735859039) also shows a posture_changed code alongside turns_increase, indicating the write posture shifted. Only one run carries poor_agentic_control:medium, so it does not yet cross the escalation threshold, but it is trending in the wrong direction.
One low-confidence workflow_run episode was detected for Dev Hawk (2 runs, confidence=low, all skipped/non-agentic). No agent activity; the episode reflects the workflow_run trigger chain between Dev and Dev Hawk, both of which skipped. No risk.
Recommended Actions
Smoke Claude & Smoke Copilot (escalated): Review why these smoke tests are classified in the triage domain with write_heavy / heavy resource profiles. Smoke tests should complete in lean, directed mode. Check whether the prompt or task scope is broader than intended. See linked escalation issue.
Issue Monster (watch): The posture_changed signal on the latest in-progress run is new. If the next run also shows posture_changed, reconsider the prompt or activation criteria before it crosses the escalation threshold.
Serena MCP / Smoke Copilot: The missing activate_project/find_symbol tools signal that the Serena MCP server is not always available in the Actions environment. Consider adding an availability guard or skipping the capability gracefully when the server is absent.
Failed workflow cleanup (informational): Changeset Generator and Smoke Codex each failed in 2 of 2 runs during the period — a 100% failure rate. Schema Feature Coverage Checker and Auto-Triage Issues each failed in 1 run. These may need maintenance attention independent of agentic control concerns.
Per-workflow detail: Smoke Claude
Both runs completed successfully in ~9 minutes with 28–33 turns and ~1.09 M tokens each.
The second run matched a cohort baseline on all 7 dimensions (event, task_domain, execution_style, resource_profile, actuation_style, dispatch_mode, tool_breadth) and was still classified as changed with a turns_increase reason. The resource profile is heavy and the actuation style is write_heavy. The partially_reducible:medium assessment suggests that parts of this workflow could be offloaded to deterministic automation.
Per-workflow detail: Smoke Copilot
Both runs completed successfully, showing 0 tokens (Copilot engine token usage is not tracked via this mechanism). Duration was 6.8–7.0 minutes.
Both runs are classified in the triage domain with directed execution style, narrow tool breadth, and write_heavy actuation. agentic_fraction=0 confirms the AI engine contributed 0 tracked turns, yet the workflow still carries a heavy resource and write-heavy profile. The poor_agentic_control:high on the second run is the highest severity control assessment in the dataset.
Per-workflow detail: Issue Monster
Three runs over the period, all in the issue_response domain.
These workflows each ran once and carried resource_heavy_for_domain:high assessments. They do not cross escalation thresholds (single run), but are listed here for portfolio awareness.
Go Fan and Layout Specification Maintainer are the heaviest single-run consumers by turn count (44 and 43 turns respectively). Go Fan is also the second-costliest workflow at $1.06. The partially_reducible:medium on all of these suggests that the orchestration overhead could be trimmed.
Failed runs summary
Seven runs ended in failure, all with 1 error each. No escalation thresholds are crossed by failures alone.
Workflow
Runs
Failure count
Notes
Changeset Generator
2
2 (100%)
Failed in both runs this period
Smoke Codex
2
2 (100%)
Failed in both runs this period
The Great Escapi
3
1
One run failed, two succeeded
Auto-Triage Issues
1
1
Only run this period
Schema Feature Coverage Checker
1
1
Only run this period
Changeset Generator and Smoke Codex have a 100% failure rate for the window. This warrants investigation separately from agentic control concerns.
Optimization candidates (portfolio cleanup)
These workflows appear consistently lean, directed, and narrow, suggesting they may be candidates for deterministic automation or model downgrade. They do not require immediate action.
/cloclo — 5 runs, all skipped (0 tokens), always cohort-matched as stable
Dev / Dev Hawk — all skipped; low-confidence workflow_run chain, no agent activity
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
The 14-day window (Mar 16–30, 2026) shows a healthy, active repository with 105 runs across 46 workflows. No episodes are marked
escalation_eligibleby the deterministic model, and no runs carry ariskyclassification label. The dominant concern is that two smoke-test workflows — Smoke Claude and Smoke Copilot — are consistently assessed as resource-heavy and poorly controlled across every run in the period, crossing the escalation thresholds for those categories. Seven runs across four workflows failed, most tied to known fragile workflows (Changeset Generator, Smoke Codex). One missing-tool event was recorded for Smoke Copilot (Serena MCP server). No MCP server failures were recorded.Key Metrics
riskyclassificationHighest Risk Episodes
No episodes carry
escalation_eligible = truein the deterministic model. The two highest-concern standalone episodes are flagged by repeated threshold crossings at the workflow level:resource_heavy_for_domain:high,poor_agentic_control:mediumchanged(turns_increase)resource_heavy_for_domain:medium,poor_agentic_control:highstableBoth workflows appear in the
triagedomain. Their behavior fingerprints arewrite_heavy,heavyresource profile, andexploratory/directedstyle respectively. For smoke tests, these signals are structurally suspicious — smoke tests should typically be lean, directed, and read-capable.Smoke Copilot additionally had a missing-tool event (Serena MCP server:
activate_project,find_symbol) in run §23730326985. The tool was unavailable in the environment.Episode Regressions
Issue Monster shows an increasing turn count across 3 runs (4 → 11 → 5 turns), with two runs classified as
changedand aturns_increasereason code. The most recent run (§23735859039) also shows aposture_changedcode alongsideturns_increase, indicating the write posture shifted. Only one run carriespoor_agentic_control:medium, so it does not yet cross the escalation threshold, but it is trending in the wrong direction.One low-confidence
workflow_runepisode was detected for Dev Hawk (2 runs, confidence=low, all skipped/non-agentic). No agent activity; the episode reflects the workflow_run trigger chain between Dev and Dev Hawk, both of which skipped. No risk.Recommended Actions
Smoke Claude & Smoke Copilot (escalated): Review why these smoke tests are classified in the
triagedomain withwrite_heavy/heavyresource profiles. Smoke tests should complete in lean, directed mode. Check whether the prompt or task scope is broader than intended. See linked escalation issue.Issue Monster (watch): The
posture_changedsignal on the latest in-progress run is new. If the next run also showsposture_changed, reconsider the prompt or activation criteria before it crosses the escalation threshold.Serena MCP / Smoke Copilot: The missing
activate_project/find_symboltools signal that the Serena MCP server is not always available in the Actions environment. Consider adding an availability guard or skipping the capability gracefully when the server is absent.Failed workflow cleanup (informational): Changeset Generator and Smoke Codex each failed in 2 of 2 runs during the period — a 100% failure rate. Schema Feature Coverage Checker and Auto-Triage Issues each failed in 1 run. These may need maintenance attention independent of agentic control concerns.
Per-workflow detail: Smoke Claude
Both runs completed successfully in ~9 minutes with 28–33 turns and ~1.09 M tokens each.
resource_heavy:high,poor_control:medium,partially_reducible:mediumresource_heavy:high,poor_control:medium,partially_reducible:mediumThe second run matched a cohort baseline on all 7 dimensions (event, task_domain, execution_style, resource_profile, actuation_style, dispatch_mode, tool_breadth) and was still classified as
changedwith aturns_increasereason. The resource profile isheavyand the actuation style iswrite_heavy. Thepartially_reducible:mediumassessment suggests that parts of this workflow could be offloaded to deterministic automation.Per-workflow detail: Smoke Copilot
Both runs completed successfully, showing 0 tokens (Copilot engine token usage is not tracked via this mechanism). Duration was 6.8–7.0 minutes.
resource_heavy_for_domain:mediumresource_heavy_for_domain:medium,poor_agentic_control:high, missing Serena MCPBoth runs are classified in the
triagedomain withdirectedexecution style,narrowtool breadth, andwrite_heavyactuation.agentic_fraction=0confirms the AI engine contributed 0 tracked turns, yet the workflow still carries a heavy resource and write-heavy profile. Thepoor_agentic_control:highon the second run is the highest severity control assessment in the dataset.Per-workflow detail: Issue Monster
Three runs over the period, all in the
issue_responsedomain.changed(turns_increase)poor_agentic_control:medium,partially_reducible:mediumchanged(turns_increase, posture_changed)The
posture_changedcode on the latest run indicates the write posture differs from the baseline. If this persists, it may indicate prompt drift.Per-workflow detail: Single-run high-resource workflows
These workflows each ran once and carried
resource_heavy_for_domain:highassessments. They do not cross escalation thresholds (single run), but are listed here for portfolio awareness.resource_heavy:high,poor_control:medium,partially_reducible:mediumresource_heavy:high,poor_control:medium,partially_reducible:mediumresource_heavy:high,partially_reducible:mediumresource_heavy:high,partially_reducible:mediumresource_heavy:high,poor_control:medium— failedGo Fan and Layout Specification Maintainer are the heaviest single-run consumers by turn count (44 and 43 turns respectively). Go Fan is also the second-costliest workflow at $1.06. The
partially_reducible:mediumon all of these suggests that the orchestration overhead could be trimmed.Failed runs summary
Seven runs ended in failure, all with 1 error each. No escalation thresholds are crossed by failures alone.
Changeset Generator and Smoke Codex have a 100% failure rate for the window. This warrants investigation separately from agentic control concerns.
Optimization candidates (portfolio cleanup)
These workflows appear consistently lean, directed, and narrow, suggesting they may be candidates for deterministic automation or model downgrade. They do not require immediate action.
/cloclo— 5 runs, all skipped (0 tokens), always cohort-matched as stableDev/Dev Hawk— all skipped; low-confidence workflow_run chain, no agent activityReferences:
Beta Was this translation helpful? Give feedback.
All reactions