Skip to content

[observability escalation] Smoke Copilot and Smoke Claude repeatedly resource-heavy and poorly controlled #24844

@github-actions

Description

@github-actions

Problem

Two smoke-test workflows crossed the escalation thresholds today (2026-04-06):

  • Smoke Copilot — 3 runs: all triggered resource_heavy_for_domain (2 high, 1 medium) and poor_agentic_control (1 high, 2 medium)
  • Smoke Claude — 3 runs: all triggered resource_heavy_for_domain (high) and 2 runs also triggered poor_agentic_control (medium)

Smoke tests are designed to be lightweight validation probes. Consuming 675K–1.7M tokens per run signals the agent is doing substantive exploratory work rather than a targeted smoke check. The poor_agentic_control signal (especially one high-severity reading) suggests the agent is looping, backtracking, or making redundant tool calls.

Evidence

Smoke Copilot (3 runs)

Run Tokens resource_heavy poor_control
§24016631769 987,748 high high
§24016762986 1,321,781 high medium
§24018427871 675,373 medium high

Smoke Claude (3 runs)

Run Tokens resource_heavy poor_control
§24016631773 1,077,182 high
§24016762959 1,733,152 high medium
§24016851157 1,344,274 high medium

Thresholds Crossed

  • ✅ ≥2 runs with resource_heavy_for_domain: high/medium — both workflows
  • ✅ ≥2 runs with poor_agentic_control: medium/high — both workflows

Suggested Route

workflow:Smoke Copilot, workflow:Smoke Claude

Recommended Actions

  1. Audit the smoke workflow prompts — determine whether the prompt is accidentally scoping the agent to do more than a lightweight smoke check. Smoke tests should complete in <100K tokens.
  2. Add tool breadth or turn limits to the smoke workflows to constrain agent behavior.
  3. Review agent loop patterns — the poor_agentic_control signal points to redundant tool calls. Enable debug logging (DEBUG=workflow:*) on next smoke run to trace the tool call sequence.
  4. Consider downgrading the model for smoke tests — both workflows also carry model_downgrade_available: low assessments, suggesting a smaller model would be sufficient.

Also Flagged

  • GitHub Remote MCP Authentication Test — 100% failure rate (2/2 runs). Zero-token failure on second run suggests a pre-agent config/auth problem. Not a regression threshold breach, but warrants immediate investigation.

References: §24016631769 · §24016762959 · §24018427871

Generated by Agentic Observability Kit · ● 1.8M ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions