Problem
Two smoke-test workflows crossed the escalation thresholds today (2026-04-06):
- Smoke Copilot — 3 runs: all triggered
resource_heavy_for_domain (2 high, 1 medium) and poor_agentic_control (1 high, 2 medium)
- Smoke Claude — 3 runs: all triggered
resource_heavy_for_domain (high) and 2 runs also triggered poor_agentic_control (medium)
Smoke tests are designed to be lightweight validation probes. Consuming 675K–1.7M tokens per run signals the agent is doing substantive exploratory work rather than a targeted smoke check. The poor_agentic_control signal (especially one high-severity reading) suggests the agent is looping, backtracking, or making redundant tool calls.
Evidence
Smoke Copilot (3 runs)
Smoke Claude (3 runs)
Thresholds Crossed
- ✅ ≥2 runs with
resource_heavy_for_domain: high/medium — both workflows
- ✅ ≥2 runs with
poor_agentic_control: medium/high — both workflows
Suggested Route
workflow:Smoke Copilot, workflow:Smoke Claude
Recommended Actions
- Audit the smoke workflow prompts — determine whether the prompt is accidentally scoping the agent to do more than a lightweight smoke check. Smoke tests should complete in <100K tokens.
- Add tool breadth or turn limits to the smoke workflows to constrain agent behavior.
- Review agent loop patterns — the
poor_agentic_control signal points to redundant tool calls. Enable debug logging (DEBUG=workflow:*) on next smoke run to trace the tool call sequence.
- Consider downgrading the model for smoke tests — both workflows also carry
model_downgrade_available: low assessments, suggesting a smaller model would be sufficient.
Also Flagged
- GitHub Remote MCP Authentication Test — 100% failure rate (2/2 runs). Zero-token failure on second run suggests a pre-agent config/auth problem. Not a regression threshold breach, but warrants immediate investigation.
References: §24016631769 · §24016762959 · §24018427871
Generated by Agentic Observability Kit · ● 1.8M · ◷
Problem
Two smoke-test workflows crossed the escalation thresholds today (2026-04-06):
resource_heavy_for_domain(2 high, 1 medium) andpoor_agentic_control(1 high, 2 medium)resource_heavy_for_domain(high) and 2 runs also triggeredpoor_agentic_control(medium)Smoke tests are designed to be lightweight validation probes. Consuming 675K–1.7M tokens per run signals the agent is doing substantive exploratory work rather than a targeted smoke check. The
poor_agentic_controlsignal (especially one high-severity reading) suggests the agent is looping, backtracking, or making redundant tool calls.Evidence
Smoke Copilot (3 runs)
Smoke Claude (3 runs)
Thresholds Crossed
resource_heavy_for_domain: high/medium— both workflowspoor_agentic_control: medium/high— both workflowsSuggested Route
workflow:Smoke Copilot,workflow:Smoke ClaudeRecommended Actions
poor_agentic_controlsignal points to redundant tool calls. Enable debug logging (DEBUG=workflow:*) on next smoke run to trace the tool call sequence.model_downgrade_available: lowassessments, suggesting a smaller model would be sufficient.Also Flagged
References: §24016631769 · §24016762959 · §24018427871