[otel-advisor] add gh-aw.output.item_count to conclusion spans for successful runs

### 📡 OTel Instrumentation Improvement: surface agent output metrics in conclusion spans for all runs

**Analysis Date**: 2026-04-20
**Priority**: Medium
**Effort**: Small (< 2h)

### Problem

`sendJobConclusionSpan` in `actions/setup/js/send_otlp_span.cjs` reads `agent_output.json` **only when the agent job failed or timed out**. For successful runs the file is never opened, so the conclusion span carries no information about what the agent actually produced. This means an engineer looking at traces in Grafana, Honeycomb, or Datadog cannot answer:

- "How many items did this run create?"
- "Was this a read-only run or a write-capable run?"
- "Is there a correlation between item count and token cost?"
- "Why did a 'successful' run produce zero outputs when we expected changes?"

The gap is visible by comparing two files in the same directory:

- `generate_observability_summary.cjs` always reads `agent_output.json` and surfaces `createdItemCount` in the job step summary.
- `send_otlp_span.cjs` skips the read entirely for success, so the same metric never reaches the OTLP backend.

### Why This Matters (DevOps Perspective)

The step summary is ephemeral and not queryable. OTLP spans are indexed, filterable, and alertable. Without `gh-aw.output.item_count` in the span:

- You cannot build a Grafana panel grouping runs by output volume.
- You cannot set an alert when a workflow that normally creates N items suddenly creates 0.
- You cannot segment cost (token counts) by productivity (items created) in a single query.
- An on-call engineer investigating "why did the bot do nothing?" has no span attribute to filter on — they must hunt through step summary HTML.

Adding this single numeric attribute unblocks all of the above.

### Current Behavior

```javascript
// actions/setup/js/send_otlp_span.cjs — lines 708–715
// When the agent failed, read agent_output.json to surface structured error details.
// Lazy-read: skip I/O entirely when the job succeeded or was cancelled.
const agentOutput = isAgentFailure ? readJSONIfExists("/tmp/gh-aw/agent_output.json") || {} : {};
const outputErrors = Array.isArray(agentOutput.errors) ? agentOutput.errors : [];
```

For a successful run `isAgentFailure === false`, so `agentOutput` is `{}`, `items` is never accessed, and no output-count attribute is set on the span.

Compare with the summary generator which always reads the file:

```javascript
// actions/setup/js/generate_observability_summary.cjs
const agentOutput = readJSONIfExists(AGENT_OUTPUT_PATH) || { items: [], errors: [] };
const items = Array.isArray(agentOutput.items) ? agentOutput.items : [];
// → shown as "created items: N" in step summary but never in OTLP
```

### Proposed Change

```javascript
// actions/setup/js/send_otlp_span.cjs — replace lines 708–715
// Always read agent_output.json so output metrics appear in spans for all runs,
// not just failures. readJSONIfExists returns null when the file is absent (e.g.
// cancelled jobs), which the fallback {} handles safely.
const agentOutput = readJSONIfExists("/tmp/gh-aw/agent_output.json") || {};
const outputErrors = Array.isArray(agentOutput.errors) ? agentOutput.errors : [];
const outputItems = Array.isArray(agentOutput.items) ? agentOutput.items : [];

// --- further down, after the existing error-message attributes ---

// Always include item count so success-path runs are queryable by output volume.
attributes.push(buildAttr("gh-aw.output.item_count", outputItems.length));

// Include unique item types as a comma-separated string for easy grouping.
const itemTypes = [...new Set(outputItems.map(i => (i && typeof i.type === "string" ? i.type : "")).filter(Boolean))].sort();
if (itemTypes.length > 0) {
  attributes.push(buildAttr("gh-aw.output.item_types", itemTypes.join(",")));
}
```

The same two attributes should also be added to the `gh-aw.agent.agent` sub-span (built a few lines below using the same `attributes` array — they'll propagate automatically once added above).

### Expected Outcome

After this change:

- **In Grafana / Honeycomb / Datadog**: filter spans by `gh-aw.output.item_count = 0` to find read-only or zero-output runs; group by `gh-aw.output.item_types` to see which kinds of items each engine creates; build a scatter plot of `gh-aw.tokens.input` vs `gh-aw.output.item_count` to spot cost-to-productivity anomalies.
- **In the JSONL mirror**: every conclusion span line in `/tmp/gh-aw/otel.jsonl` will include the item count, making artifact-based post-mortems immediately actionable without querying the backend.
- **For on-call engineers**: a single attribute query (`gh-aw.output.item_count = 0 AND gh-aw.agent.conclusion = "success"`) surfaces runs where the agent ran to completion but produced nothing — a common symptom of silent permission or routing failures.

<details>
<summary><b>Implementation Steps</b></summary>

- [ ] In `actions/setup/js/send_otlp_span.cjs`: change the ternary on line 710 to always call `readJSONIfExists("/tmp/gh-aw/agent_output.json")` regardless of `isAgentFailure`.
- [ ] Extract `outputItems` from the parsed result (same pattern as `outputErrors`).
- [ ] Add `gh-aw.output.item_count` (`outputItems.length`) to `attributes` unconditionally — place it near the existing `gh-aw.effective_tokens` attribute for logical grouping.
- [ ] Add `gh-aw.output.item_types` (comma-separated sorted unique types) to `attributes` when `itemTypes.length > 0`.
- [ ] Update the corresponding test file (`send_otlp_span.test.cjs`) to assert `gh-aw.output.item_count` is present on success-path conclusion spans and equals the number of items in the mock `agent_output.json`.
- [ ] Run `cd actions/setup/js && npx vitest run` to confirm tests pass.
- [ ] Run `make fmt` to ensure formatting.
- [ ] Open a PR referencing this issue.

</details>

### Evidence from Live Sentry Data

> ⚠️ **Note**: The Sentry MCP server returned an empty tool list during this analysis run (`sentry --help` showed 0 tools). Live span sampling was not possible. The gap above is confirmed purely by static code analysis — specifically the asymmetry between `generate_observability_summary.cjs` (which always reads `agent_output.json`) and `sendJobConclusionSpan` (which skips the read for non-failure runs). No conflicting live-data evidence is available to deprioritize this recommendation.

### Related Files

- `actions/setup/js/send_otlp_span.cjs` — primary change site (lines ~708–760)
- `actions/setup/js/generate_observability_summary.cjs` — reference implementation for reading `agent_output.json`
- `actions/setup/js/action_conclusion_otlp.cjs` — calls `sendJobConclusionSpan`; no change needed
- Test file: `actions/setup/js/send_otlp_span.test.cjs` (or equivalent)

---

*Generated by the [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24691145037) workflow*







> Generated by [Daily OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/24691145037/agentic_workflow) · ● 179.8K · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on Apr 27, 2026, 9:30 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[otel-advisor] add gh-aw.output.item_count to conclusion spans for successful runs #27440

📡 OTel Instrumentation Improvement: surface agent output metrics in conclusion spans for all runs

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Expected Outcome

Evidence from Live Sentry Data

Related Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[otel-advisor] add gh-aw.output.item_count to conclusion spans for successful runs #27440

Description

📡 OTel Instrumentation Improvement: surface agent output metrics in conclusion spans for all runs

Problem

Why This Matters (DevOps Perspective)

Current Behavior

Proposed Change

Expected Outcome

Evidence from Live Sentry Data

Related Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions