[grafana-otel-advisor] OTel improvement: emit gh-aw.engine.id and gen_ai.system on the setup span

### OTel Instrumentation Improvement: emit `gh-aw.engine.id` and `gen_ai.system` on the setup span

**Analysis Date**: 2026-05-16
**Priority**: Medium
**Effort**: Small (< 2h)

### Problem

The `gh-aw.<jobName>.setup` span (built in `actions/setup/js/send_otlp_span.cjs:962` by `sendJobSetupSpan`) is emitted **without** the `gh-aw.engine.id` and `gen_ai.system` attributes, even though the workflow declares an engine. This happens because the `Setup Scripts` step runs **before** the `Generate agentic run info` step in the compiled lock file (see `pkg/workflow/compiler_yaml_step_generation.go:130-199`). At setup time:

1. `/tmp/gh-aw/aw_info.json` does not yet exist, so `awInfo.engine_id` and `awInfo.context.engine_id` resolve to empty strings.
2. `GH_AW_INFO_ENGINE_ID` is injected only into the env block of the `generate_aw_info` step (`pkg/workflow/compiler_yaml.go:801`) — it is **not** injected into the env block of the `Setup Scripts` step.

The result: `resolveEngineId(awInfo)` returns `""` at setup time, and the `if (engineId)` guard at `actions/setup/js/send_otlp_span.cjs:1048-1052` skips pushing both `gen_ai.system` and `gh-aw.engine.id`. A DevOps engineer querying Tempo/Grafana for **"p95 setup latency by engine"** cannot answer the question from a single span — they must join setup spans to the conclusion span by trace ID, doubling query cost and breaking when only the setup span survives (e.g., when the agent step is cancelled before conclusion).

<details>
<summary>Why This Matters (DevOps Perspective)</summary>

- **Unblocks engine-segmented setup latency dashboards.** Grafana's GenAI / Application Observability panels filter on `gen_ai.system` — without it on setup spans, the panel under-counts cold-start time.
- **Improves cancelled-run diagnostics.** When a job is cancelled during setup, the conclusion span never fires. The setup span is the *only* surviving signal for that run, and right now it carries no engine identity at all.
- **Reduces MTTR for noisy-neighbor incidents.** "Is the claude setup phase slow today or is it all engines?" requires `gh-aw.engine.id` on the setup span itself; joining via trace ID adds latency and fails on partial traces.

</details>

<details>
<summary>Current Behavior</summary>

**Live evidence from this workflow run** (trace `d945112102984b62d8c85d2bf1dc6ba3`, span `gh-aw.agent.setup`, workflow uses `claude` engine):

Resource attributes present: ✅ `service.name`, `service.version`, `github.repository`, `github.run_id`, `github.event_name`, `deployment.environment`, etc.

Span attributes present on `gh-aw.agent.setup`:
```
gh-aw.episode.id
gh-aw.episode.kind
gh-aw.event_name
gh-aw.hop.id
gh-aw.job.name
gh-aw.repository
gh-aw.run.actor
gh-aw.run.attempt
gh-aw.run.id
gh-aw.staged
gh-aw.workflow.name
gh-aw.workflow_call.id
```

**Missing from the setup span**: `gh-aw.engine.id`, `gen_ai.system`.

The code path that should set them (`actions/setup/js/send_otlp_span.cjs:1048-1052`):

```javascript
const engineId = resolveEngineId(awInfo); // returns "" at setup time
// ...
if (engineId) {
 const genAiSystem = ENGINE_TO_SYSTEM_MAP[engineId] || engineId;
 attributes.push(buildAttr("gen_ai.system", genAiSystem));
 attributes.push(buildAttr("gh-aw.engine.id", engineId));
}
```

The compiler-side env block that omits it (`pkg/workflow/compiler_yaml_step_generation.go:185-198`):

```go
lines = append(lines,
 " env:\n",
 fmt.Sprintf(" GH_AW_SETUP_WORKFLOW_NAME: %q\n", data.Name),
 fmt.Sprintf(" GH_AW_CURRENT_WORKFLOW_REF: %s\n", buildSetupWorkflowRefExpr(data)),
)
if v := getVersionForSetup(data); v != "" {
 lines = append(lines, fmt.Sprintf(" GH_AW_INFO_VERSION: %q\n", v))
}
// no GH_AW_INFO_ENGINE_ID here
```

</details>

<details>
<summary>Proposed Change</summary>

Inject `GH_AW_INFO_ENGINE_ID` into the `Setup Scripts` step's env block in `generateSetupStep`. The engine ID is already in scope on `data.EngineConfig.ID` / `data.AI` (see `pkg/workflow/compiler_yaml.go:721-725`), so this is a small, mechanical addition with no new lookups.

```go
// pkg/workflow/compiler_yaml_step_generation.go (both script-mode branch and dev/release branch)
if data != nil {
 // existing GH_AW_SETUP_WORKFLOW_NAME / GH_AW_CURRENT_WORKFLOW_REF lines ...

 // NEW: propagate engine ID so the setup span carries gh-aw.engine.id and gen_ai.system.
 engineID := ""
 if data.EngineConfig != nil && data.EngineConfig.ID != "" {
 engineID = data.EngineConfig.ID
 } else if data.AI != "" {
 engineID = data.AI
 }
 if engineID != "" {
 lines = append(lines, fmt.Sprintf(" GH_AW_INFO_ENGINE_ID: %q\n", engineID))
 }
}
```

No runtime JS change is needed — `resolveEngineId(awInfo)` already falls back to `process.env.GH_AW_INFO_ENGINE_ID` at `actions/setup/js/send_otlp_span.cjs:178`. This fix just makes the env var visible to the setup step.

</details>

<details>
<summary>Expected Outcome</summary>

After this change, every `gh-aw.<jobName>.setup` span (agent, activation, safe-outputs, conclusion, threat-detection, etc.) carries `gh-aw.engine.id` and `gen_ai.system` from the moment it is created:

- **In Grafana / Tempo**: TraceQL `{ span.gh-aw.engine.id = "claude" && name =~ ".*\\.setup" }` returns just claude setup spans. Span-metrics generators can now break out p95 setup latency per engine.
- **In Honeycomb / Datadog**: `gen_ai.system` populates the native GenAI service panels for setup spans, not only for conclusion/agent spans.
- **In the JSONL mirror**: `/tmp/gh-aw/otel.jsonl` shows the engine on the first span of every job — useful when the conclusion span never gets written (cancelled / timed-out runs).
- **For on-call**: a single span search by engine answers "is this slow for everyone or just one engine?" without a trace-ID join.

</details>

<details>
<summary>Implementation Steps</summary>

- [ ] Edit `pkg/workflow/compiler_yaml_step_generation.go`: add `GH_AW_INFO_ENGINE_ID` env injection in both the script-mode branch (around line 142) and the dev/release-mode branch (around line 185) of `generateSetupStep`.
- [ ] Update `pkg/workflow/setup_step_version_test.go` (and `pkg/workflow/observability_otlp_test.go` if it asserts on setup-step env) to expect the new env line for engines `copilot`, `claude`, `codex`, `gemini`.
- [ ] Verify `actions/setup/js/action_setup_otlp.test.cjs` covers the path where `process.env.GH_AW_INFO_ENGINE_ID` is set and asserts that the resulting span attributes include `gh-aw.engine.id` and `gen_ai.system`. If not, add the assertion.
- [ ] Recompile golden fixtures: `make recompile` (or equivalent) to regenerate `pkg/workflow/testdata/**.golden` so the new env line is present.
- [ ] Run `make test-unit` and `cd actions/setup/js && npx vitest run`.
- [ ] Run `make fmt`.
- [ ] Open a PR referencing this issue.

</details>

<details>
<summary>Evidence from Live Grafana Data</summary>

**Tempo backend status**: `tempo_traceql-search` against `grafanacloud-traces` returned 0 traces over the last 7 days for `{}` and `{resource.service.name="gh-aw"}` — the Grafana Cloud Tempo instance bound to this MCP is not the production OTLP destination for this repository, so the live tracing-backend playbook was not directly usable. Falling back to telemetry-source priority #2 in the otel-queries skill (`/tmp/gh-aw/otel.jsonl`) gave a current, real span produced by this very workflow run.

**JSONL evidence (this run)**:

- `traceId`: `d945112102984b62d8c85d2bf1dc6ba3`
- `spanId`: `d43a7a6b0171531b`
- `name`: `gh-aw.agent.setup`
- workflow uses `claude` engine (confirmed in `.github/workflows/daily-grafana-otel-instrumentation-advisor.lock.yml:132` `GH_AW_INFO_ENGINE_ID: "claude"`)
- Span attribute keys (12 total): `gh-aw.episode.id`, `gh-aw.episode.kind`, `gh-aw.event_name`, `gh-aw.hop.id`, `gh-aw.job.name`, `gh-aw.repository`, `gh-aw.run.actor`, `gh-aw.run.attempt`, `gh-aw.run.id`, `gh-aw.staged`, `gh-aw.workflow.name`, `gh-aw.workflow_call.id`
- **Not present**: `gh-aw.engine.id`, `gen_ai.system`

This is reproducible on every gh-aw run — the gap is structural, not an outlier.

</details>

<details>
<summary>Related Files</summary>

- `pkg/workflow/compiler_yaml_step_generation.go` (primary change — inject env in `generateSetupStep`)
- `pkg/workflow/compiler_yaml.go` (reference for how `engineID` is resolved at compile time, lines 721–725)
- `actions/setup/js/send_otlp_span.cjs` (runtime — `resolveEngineId` at line 177, attribute push at 1048)
- `actions/setup/js/action_setup_otlp.cjs` (entry point that triggers `sendJobSetupSpan`)
- `actions/setup/js/action_setup_otlp.test.cjs` (add coverage)
- `pkg/workflow/setup_step_version_test.go` (golden assertions for setup-step env)
- `pkg/workflow/testdata/**.golden` (regenerated lock-file fixtures)

</details>

---

*Generated by the [Daily Grafana OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/25953882846) workflow*







> Generated by [📊 Daily Grafana OTel Instrumentation Advisor](https://github.com/github/gh-aw/actions/runs/25953882846) · ● 26M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-grafana-otel-instrumentation-advisor%22&type=issues)
> - [x] expires  on May 23, 2026, 5:37 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[grafana-otel-advisor] OTel improvement: emit gh-aw.engine.id and gen_ai.system on the setup span #32563

OTel Instrumentation Improvement: emit `gh-aw.engine.id` and `gen_ai.system` on the setup span

Problem

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[grafana-otel-advisor] OTel improvement: emit gh-aw.engine.id and gen_ai.system on the setup span #32563

Description

OTel Instrumentation Improvement: emit gh-aw.engine.id and gen_ai.system on the setup span

Problem

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

OTel Instrumentation Improvement: emit `gh-aw.engine.id` and `gen_ai.system` on the setup span