[observability escalation] Smoke Copilot and Smoke Claude repeatedly resource-heavy and poorly controlled

### Problem

Two smoke-test workflows crossed the escalation thresholds today (2026-04-06):

- **Smoke Copilot** — 3 runs: all triggered `resource_heavy_for_domain` (2 high, 1 medium) and `poor_agentic_control` (1 high, 2 medium)
- **Smoke Claude** — 3 runs: all triggered `resource_heavy_for_domain` (high) and 2 runs also triggered `poor_agentic_control` (medium)

Smoke tests are designed to be lightweight validation probes. Consuming 675K–1.7M tokens per run signals the agent is doing substantive exploratory work rather than a targeted smoke check. The `poor_agentic_control` signal (especially one high-severity reading) suggests the agent is looping, backtracking, or making redundant tool calls.

### Evidence

#### Smoke Copilot (3 runs)

| Run | Tokens | resource_heavy | poor_control |
|-----|--------|---------------|-------------|
| [§24016631769](https://github.com/github/gh-aw/actions/runs/24016631769) | 987,748 | **high** | **high** |
| [§24016762986](https://github.com/github/gh-aw/actions/runs/24016762986) | 1,321,781 | **high** | medium |
| [§24018427871](https://github.com/github/gh-aw/actions/runs/24018427871) | 675,373 | medium | **high** |

#### Smoke Claude (3 runs)

| Run | Tokens | resource_heavy | poor_control |
|-----|--------|---------------|-------------|
| [§24016631773](https://github.com/github/gh-aw/actions/runs/24016631773) | 1,077,182 | **high** | — |
| [§24016762959](https://github.com/github/gh-aw/actions/runs/24016762959) | 1,733,152 | **high** | medium |
| [§24016851157](https://github.com/github/gh-aw/actions/runs/24016851157) | 1,344,274 | **high** | medium |

### Thresholds Crossed

- ✅ ≥2 runs with `resource_heavy_for_domain: high/medium` — both workflows
- ✅ ≥2 runs with `poor_agentic_control: medium/high` — both workflows

### Suggested Route

`workflow:Smoke Copilot`, `workflow:Smoke Claude`

### Recommended Actions

1. **Audit the smoke workflow prompts** — determine whether the prompt is accidentally scoping the agent to do more than a lightweight smoke check. Smoke tests should complete in <100K tokens.
2. **Add tool breadth or turn limits** to the smoke workflows to constrain agent behavior.
3. **Review agent loop patterns** — the `poor_agentic_control` signal points to redundant tool calls. Enable debug logging (`DEBUG=workflow:*`) on next smoke run to trace the tool call sequence.
4. **Consider downgrading the model** for smoke tests — both workflows also carry `model_downgrade_available: low` assessments, suggesting a smaller model would be sufficient.

### Also Flagged

- **GitHub Remote MCP Authentication Test** — 100% failure rate (2/2 runs). Zero-token failure on second run suggests a pre-agent config/auth problem. Not a regression threshold breach, but warrants immediate investigation.

**References:** [§24016631769](https://github.com/github/gh-aw/actions/runs/24016631769) · [§24016762959](https://github.com/github/gh-aw/actions/runs/24016762959) · [§24018427871](https://github.com/github/gh-aw/actions/runs/24018427871)







> Generated by [Agentic Observability Kit](https://github.com/github/gh-aw/actions/runs/24025271024/agentic_workflow) · ● 1.8M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fagentic-observability-kit%22&type=issues)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[observability escalation] Smoke Copilot and Smoke Claude repeatedly resource-heavy and poorly controlled #24844

Problem

Evidence

Smoke Copilot (3 runs)

Smoke Claude (3 runs)

Thresholds Crossed

Suggested Route

Recommended Actions

Also Flagged

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Run	Tokens	resource_heavy	poor_control
§24016631769	987,748	high	high
§24016762986	1,321,781	high	medium
§24018427871	675,373	medium	high

Run	Tokens	resource_heavy	poor_control
§24016631773	1,077,182	high	—
§24016762959	1,733,152	high	medium
§24016851157	1,344,274	high	medium

[observability escalation] Smoke Copilot and Smoke Claude repeatedly resource-heavy and poorly controlled #24844

Description

Problem

Evidence

Smoke Copilot (3 runs)

Smoke Claude (3 runs)

Thresholds Crossed

Suggested Route

Recommended Actions

Also Flagged

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions