[Outcome Report] Workflow Health Report — 2026-06-04

### Workflow Health — 2026-06-04

**Executive read:** 10 pending items concentrated in Smoke Claude and Changeset Generator; most outcomes are commented/reviewed (medium evidence). Acceptance rate at 78% is solid, but zero-touch rate of 0% indicates all outputs required human engagement. Several items without trackable URLs (pending with "no url" detail).

| Workflow | Status | Lifecycle health | References |
|---|---|---|---|
| Smoke Claude | 🟨🟨🟥🟩🟨🟨🟨⬜🟩⬜🟨🟨 | 🟡 in flight | 🟨 [36748](https://github.com/github/gh-aw/pull/36748) · 🟥 [36748](https://github.com/github/gh-aw/pull/36748) · 🟩 [36748](https://github.com/github/gh-aw/pull/36748) · ⬜ [36773](https://github.com/github/gh-aw/issues/36773) · ⬜ [36766](https://github.com/github/gh-aw/discussions/36766) · 🟨 [36748](https://github.com/github/gh-aw/pull/36748) |
| Smoke Gemini | 🟨🟩⬜🟩🟩 | 🟢 resolving | 🟨 [36774](https://github.com/github/gh-aw/issues/36774) · 🟩 [36769](https://github.com/github/gh-aw/pull/36769) · ⬜ [36770](https://github.com/github/gh-aw/issues/36770) · 🟩 [36748](https://github.com/github/gh-aw/pull/36748) · 🟩 [36748](https://github.com/github/gh-aw/pull/36748) |
| Smoke Codex | ⬜🟨 | 🟡 in flight | ⬜ [36771](https://github.com/github/gh-aw/issues/36771) · 🟨 [36748](https://github.com/github/gh-aw/pull/36748) |
| Changeset Generator | 🟨🟥 | 🟡 in flight | 🟨 [36769](https://github.com/github/gh-aw/pull/36769) · 🟥 [36769](https://github.com/github/gh-aw/pull/36769) |
| Agent Container Smoke Test | 🟩🟩 | 🟢 resolving | 🟩 [36769](https://github.com/github/gh-aw/pull/36769) · 🟩 [36748](https://github.com/github/gh-aw/pull/36748) |

**Legend:**
- **Status:** 🟩 accepted · 🟥 rejected · 🟨 pending · ⬜ unknown
- **Lifecycle health:** 🟢 resolving · 🟡 in flight · 🟠 aging · 🔴 stuck · ⚪ underdefined
- **References:** one linked item per status emoji, in the same order as the Status column

### 🔴 Action Items

1. **Smoke Claude — 7 pending items + 2 unknown** — This workflow has the highest pending volume. Review the pending `create_pull_request_review_comment`, `post_slack_message`, `create_code_scanning_alert`, and `create_check_run` items marked "no url" — these cannot be evaluated without proper tracking URLs. Need to verify if these tools succeeded but failed to return URLs, or if they failed silently.

2. **Items without trackable URLs** — 5 outcomes have `detail: "no url"` (Smoke Claude: 4 items; untracked). These are all in pending state and cannot be verified. Either the safe-output tools are not returning URLs, or the workflow is not capturing them. Needs investigation.

3. **Zero-touch rate at 0%** — All 7 accepted items required human engagement (comments, reviews, edits). This suggests the agents' outputs are always incomplete or require human review by design. Flag: Is this expected, or should agents produce more self-contained outputs?

4. **Acceptance rate at 78%** — Solid but below ideal. The 2 rejected items (update_pull_request in Changeset Generator, add_reviewer in Smoke Claude) should be reviewed to understand why outputs were not retained.

<details>
<summary>Detailed metrics, evidence quality, workflow counts, and trends</summary>

### Outcome Scorecard — 2026-06-04

| Metric | Value | Status |
|--------|-------|--------|
| **Acceptance rate** | **77.8%** | 🟡 60-80% |
| **Zero-touch rate** | **0%** | 🔴 <25% |
| **Waste rate** | **8.7%** | 🟢 <10% |
| **Median time to resolution** | 6 minutes 27 seconds | — |
| Accepted | 7 / 23 | — |
| — strong evidence | 0 | (none) |
| — medium evidence | 7 | acted on, state retained/replaced |
| — weak evidence | 0 | (none) |
| Rejected | 2 | — |
| Ignored | 0 | — |
| Zero-touch | 0 / 7 | — |
| Pending | 10 | — |
| Runs checked | 7 | — |

### Per-Workflow Breakdown

| Workflow | Accepted | Rejected | Ignored | Pending | Unknown | Total | Acceptance | Waste |
|----------|----------|----------|---------|---------|---------|-------|------------|-------|
| Smoke Claude | 2 | 1 | 0 | 7 | 2 | 12 | 66.7% | 8.3% |
| Changeset Generator | 0 | 1 | 0 | 1 | 0 | 2 | 0% | 50.0% |
| Smoke Codex | 0 | 0 | 0 | 1 | 1 | 2 | — | 0% |
| Smoke Gemini | 3 | 0 | 0 | 1 | 1 | 5 | 100% | 0% |
| Agent Container Smoke Test | 2 | 0 | 0 | 0 | 0 | 2 | 100% | 0% |

**Sort note:** Sorted by waste rate descending (Changeset Generator worst first).

### Evidence Quality

No items evaluated with weak existence-only signals (fallback_exists_only_count = 0). All accepted items have medium evidence (acted on, state retained). No strong evidence outcomes (merged PRs, closed issues, completed tasks with state verification).

### Tracking & Data Quality Issues

⚠️ **5 items with missing URLs** — These outcomes are in pending state but have `url: ""` and `detail: "no url"`:
- `create_pull_request_review_comment` (Smoke Claude, 2x)
- `post_slack_message` (Smoke Claude)
- `create_code_scanning_alert` (Smoke Claude)
- `create_check_run` (Smoke Claude)
- `add_labels` (Smoke Claude)

These cannot be verified as accepted or rejected. **Recommendation:** Check if the safe-output tools are designed to return URLs for all output types, or if these tool types should not generate trackable artifacts.

### Trend Comparison (vs. 2026-06-01)

**Previous (2026-06-01):**
- Acceptance rate: 90.9%
- Waste rate: 3.7%
- Zero-touch rate: 0%

**Current (2026-06-04):**
- Acceptance rate: 77.8% (⬇️ down 13.1pp)
- Waste rate: 8.7% (⬇️ up 5.0pp)
- Zero-touch rate: 0% (➡️ stable)

⚠️ **Regressing:** Acceptance rate dropped significantly (13pp), and waste rate increased (5pp). This suggests workflows are producing less polished outputs or targeting less amenable items. Investigate: Did the prompt change, or did the sample shift to harder problems?

</details>







> 📊 *Measured by [Outcome Collector](https://github.com/github/gh-aw/actions/runs/26923329051)* · haiku45 41.3K
> - [x] expires  on Jun 11, 2026, 1:09 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Outcome Report] Workflow Health Report — 2026-06-04 #36783

Workflow Health — 2026-06-04

🔴 Action Items

Outcome Scorecard — 2026-06-04

Per-Workflow Breakdown

Evidence Quality

Tracking & Data Quality Issues

Trend Comparison (vs. 2026-06-01)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workflow	Status	Lifecycle health	References
Smoke Claude	🟨🟨🟥🟩🟨🟨🟨⬜🟩⬜🟨🟨	🟡 in flight	🟨 36748 · 🟥 36748 · 🟩 36748 · ⬜ 36773 · ⬜ 36766 · 🟨 36748
Smoke Gemini	🟨🟩⬜🟩🟩	🟢 resolving	🟨 36774 · 🟩 36769 · ⬜ 36770 · 🟩 36748 · 🟩 36748
Smoke Codex	⬜🟨	🟡 in flight	⬜ 36771 · 🟨 36748
Changeset Generator	🟨🟥	🟡 in flight	🟨 36769 · 🟥 36769
Agent Container Smoke Test	🟩🟩	🟢 resolving	🟩 36769 · 🟩 36748

Metric	Value	Status
Acceptance rate	77.8%	🟡 60-80%
Zero-touch rate	0%	🔴 <25%
Waste rate	8.7%	🟢 <10%
Median time to resolution	6 minutes 27 seconds	—
Accepted	7 / 23	—
— strong evidence	0	(none)
— medium evidence	7	acted on, state retained/replaced
— weak evidence	0	(none)
Rejected	2	—
Ignored	0	—
Zero-touch	0 / 7	—
Pending	10	—
Runs checked	7	—

Workflow	Accepted	Rejected	Pending	Unknown	Total	Acceptance	Waste
Smoke Claude	2	1	7	2	12	66.7%	8.3%
Changeset Generator	0	1	1	0	2	0%	50.0%
Smoke Codex	0	0	1	1	2	—	0%
Smoke Gemini	3	0	1	1	5	100%	0%
Agent Container Smoke Test	2	0	0	0	2	100%	0%

[Outcome Report] Workflow Health Report — 2026-06-04 #36783

Description

Workflow Health — 2026-06-04

🔴 Action Items

Outcome Scorecard — 2026-06-04

Per-Workflow Breakdown

Evidence Quality

Tracking & Data Quality Issues

Trend Comparison (vs. 2026-06-01)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions