OTel Instrumentation Improvement: emit gh-aw.engine.id and gen_ai.system on the setup span
Analysis Date: 2026-05-16
Priority: Medium
Effort: Small (< 2h)
Problem
The gh-aw.<jobName>.setup span (built in actions/setup/js/send_otlp_span.cjs:962 by sendJobSetupSpan) is emitted without the gh-aw.engine.id and gen_ai.system attributes, even though the workflow declares an engine. This happens because the Setup Scripts step runs before the Generate agentic run info step in the compiled lock file (see pkg/workflow/compiler_yaml_step_generation.go:130-199). At setup time:
/tmp/gh-aw/aw_info.json does not yet exist, so awInfo.engine_id and awInfo.context.engine_id resolve to empty strings.
GH_AW_INFO_ENGINE_ID is injected only into the env block of the generate_aw_info step (pkg/workflow/compiler_yaml.go:801) — it is not injected into the env block of the Setup Scripts step.
The result: resolveEngineId(awInfo) returns "" at setup time, and the if (engineId) guard at actions/setup/js/send_otlp_span.cjs:1048-1052 skips pushing both gen_ai.system and gh-aw.engine.id. A DevOps engineer querying Tempo/Grafana for "p95 setup latency by engine" cannot answer the question from a single span — they must join setup spans to the conclusion span by trace ID, doubling query cost and breaking when only the setup span survives (e.g., when the agent step is cancelled before conclusion).
Why This Matters (DevOps Perspective)
- Unblocks engine-segmented setup latency dashboards. Grafana's GenAI / Application Observability panels filter on
gen_ai.system — without it on setup spans, the panel under-counts cold-start time.
- Improves cancelled-run diagnostics. When a job is cancelled during setup, the conclusion span never fires. The setup span is the only surviving signal for that run, and right now it carries no engine identity at all.
- Reduces MTTR for noisy-neighbor incidents. "Is the claude setup phase slow today or is it all engines?" requires
gh-aw.engine.id on the setup span itself; joining via trace ID adds latency and fails on partial traces.
Current Behavior
Live evidence from this workflow run (trace d945112102984b62d8c85d2bf1dc6ba3, span gh-aw.agent.setup, workflow uses claude engine):
Resource attributes present: ✅ service.name, service.version, github.repository, github.run_id, github.event_name, deployment.environment, etc.
Span attributes present on gh-aw.agent.setup:
gh-aw.episode.id
gh-aw.episode.kind
gh-aw.event_name
gh-aw.hop.id
gh-aw.job.name
gh-aw.repository
gh-aw.run.actor
gh-aw.run.attempt
gh-aw.run.id
gh-aw.staged
gh-aw.workflow.name
gh-aw.workflow_call.id
Missing from the setup span: gh-aw.engine.id, gen_ai.system.
The code path that should set them (actions/setup/js/send_otlp_span.cjs:1048-1052):
const engineId = resolveEngineId(awInfo); // returns "" at setup time
// ...
if (engineId) {
const genAiSystem = ENGINE_TO_SYSTEM_MAP[engineId] || engineId;
attributes.push(buildAttr("gen_ai.system", genAiSystem));
attributes.push(buildAttr("gh-aw.engine.id", engineId));
}
The compiler-side env block that omits it (pkg/workflow/compiler_yaml_step_generation.go:185-198):
lines = append(lines,
" env:\n",
fmt.Sprintf(" GH_AW_SETUP_WORKFLOW_NAME: %q\n", data.Name),
fmt.Sprintf(" GH_AW_CURRENT_WORKFLOW_REF: %s\n", buildSetupWorkflowRefExpr(data)),
)
if v := getVersionForSetup(data); v != "" {
lines = append(lines, fmt.Sprintf(" GH_AW_INFO_VERSION: %q\n", v))
}
// no GH_AW_INFO_ENGINE_ID here
Proposed Change
Inject GH_AW_INFO_ENGINE_ID into the Setup Scripts step's env block in generateSetupStep. The engine ID is already in scope on data.EngineConfig.ID / data.AI (see pkg/workflow/compiler_yaml.go:721-725), so this is a small, mechanical addition with no new lookups.
// pkg/workflow/compiler_yaml_step_generation.go (both script-mode branch and dev/release branch)
if data != nil {
// existing GH_AW_SETUP_WORKFLOW_NAME / GH_AW_CURRENT_WORKFLOW_REF lines ...
// NEW: propagate engine ID so the setup span carries gh-aw.engine.id and gen_ai.system.
engineID := ""
if data.EngineConfig != nil && data.EngineConfig.ID != "" {
engineID = data.EngineConfig.ID
} else if data.AI != "" {
engineID = data.AI
}
if engineID != "" {
lines = append(lines, fmt.Sprintf(" GH_AW_INFO_ENGINE_ID: %q\n", engineID))
}
}
No runtime JS change is needed — resolveEngineId(awInfo) already falls back to process.env.GH_AW_INFO_ENGINE_ID at actions/setup/js/send_otlp_span.cjs:178. This fix just makes the env var visible to the setup step.
Expected Outcome
After this change, every gh-aw.<jobName>.setup span (agent, activation, safe-outputs, conclusion, threat-detection, etc.) carries gh-aw.engine.id and gen_ai.system from the moment it is created:
- In Grafana / Tempo: TraceQL
{ span.gh-aw.engine.id = "claude" && name =~ ".*\\.setup" } returns just claude setup spans. Span-metrics generators can now break out p95 setup latency per engine.
- In Honeycomb / Datadog:
gen_ai.system populates the native GenAI service panels for setup spans, not only for conclusion/agent spans.
- In the JSONL mirror:
/tmp/gh-aw/otel.jsonl shows the engine on the first span of every job — useful when the conclusion span never gets written (cancelled / timed-out runs).
- For on-call: a single span search by engine answers "is this slow for everyone or just one engine?" without a trace-ID join.
Implementation Steps
Evidence from Live Grafana Data
Tempo backend status: tempo_traceql-search against grafanacloud-traces returned 0 traces over the last 7 days for {} and {resource.service.name="gh-aw"} — the Grafana Cloud Tempo instance bound to this MCP is not the production OTLP destination for this repository, so the live tracing-backend playbook was not directly usable. Falling back to telemetry-source priority #2 in the otel-queries skill (/tmp/gh-aw/otel.jsonl) gave a current, real span produced by this very workflow run.
JSONL evidence (this run):
traceId: d945112102984b62d8c85d2bf1dc6ba3
spanId: d43a7a6b0171531b
name: gh-aw.agent.setup
- workflow uses
claude engine (confirmed in .github/workflows/daily-grafana-otel-instrumentation-advisor.lock.yml:132 GH_AW_INFO_ENGINE_ID: "claude")
- Span attribute keys (12 total):
gh-aw.episode.id, gh-aw.episode.kind, gh-aw.event_name, gh-aw.hop.id, gh-aw.job.name, gh-aw.repository, gh-aw.run.actor, gh-aw.run.attempt, gh-aw.run.id, gh-aw.staged, gh-aw.workflow.name, gh-aw.workflow_call.id
- Not present:
gh-aw.engine.id, gen_ai.system
This is reproducible on every gh-aw run — the gap is structural, not an outlier.
Related Files
pkg/workflow/compiler_yaml_step_generation.go (primary change — inject env in generateSetupStep)
pkg/workflow/compiler_yaml.go (reference for how engineID is resolved at compile time, lines 721–725)
actions/setup/js/send_otlp_span.cjs (runtime — resolveEngineId at line 177, attribute push at 1048)
actions/setup/js/action_setup_otlp.cjs (entry point that triggers sendJobSetupSpan)
actions/setup/js/action_setup_otlp.test.cjs (add coverage)
pkg/workflow/setup_step_version_test.go (golden assertions for setup-step env)
pkg/workflow/testdata/**.golden (regenerated lock-file fixtures)
Generated by the Daily Grafana OTel Instrumentation Advisor workflow
Generated by 📊 Daily Grafana OTel Instrumentation Advisor · ● 26M · ◷
OTel Instrumentation Improvement: emit
gh-aw.engine.idandgen_ai.systemon the setup spanAnalysis Date: 2026-05-16
Priority: Medium
Effort: Small (< 2h)
Problem
The
gh-aw.<jobName>.setupspan (built inactions/setup/js/send_otlp_span.cjs:962bysendJobSetupSpan) is emitted without thegh-aw.engine.idandgen_ai.systemattributes, even though the workflow declares an engine. This happens because theSetup Scriptsstep runs before theGenerate agentic run infostep in the compiled lock file (seepkg/workflow/compiler_yaml_step_generation.go:130-199). At setup time:/tmp/gh-aw/aw_info.jsondoes not yet exist, soawInfo.engine_idandawInfo.context.engine_idresolve to empty strings.GH_AW_INFO_ENGINE_IDis injected only into the env block of thegenerate_aw_infostep (pkg/workflow/compiler_yaml.go:801) — it is not injected into the env block of theSetup Scriptsstep.The result:
resolveEngineId(awInfo)returns""at setup time, and theif (engineId)guard atactions/setup/js/send_otlp_span.cjs:1048-1052skips pushing bothgen_ai.systemandgh-aw.engine.id. A DevOps engineer querying Tempo/Grafana for "p95 setup latency by engine" cannot answer the question from a single span — they must join setup spans to the conclusion span by trace ID, doubling query cost and breaking when only the setup span survives (e.g., when the agent step is cancelled before conclusion).Why This Matters (DevOps Perspective)
gen_ai.system— without it on setup spans, the panel under-counts cold-start time.gh-aw.engine.idon the setup span itself; joining via trace ID adds latency and fails on partial traces.Current Behavior
Live evidence from this workflow run (trace
d945112102984b62d8c85d2bf1dc6ba3, spangh-aw.agent.setup, workflow usesclaudeengine):Resource attributes present: ✅
service.name,service.version,github.repository,github.run_id,github.event_name,deployment.environment, etc.Span attributes present on
gh-aw.agent.setup:Missing from the setup span:
gh-aw.engine.id,gen_ai.system.The code path that should set them (
actions/setup/js/send_otlp_span.cjs:1048-1052):The compiler-side env block that omits it (
pkg/workflow/compiler_yaml_step_generation.go:185-198):Proposed Change
Inject
GH_AW_INFO_ENGINE_IDinto theSetup Scriptsstep's env block ingenerateSetupStep. The engine ID is already in scope ondata.EngineConfig.ID/data.AI(seepkg/workflow/compiler_yaml.go:721-725), so this is a small, mechanical addition with no new lookups.No runtime JS change is needed —
resolveEngineId(awInfo)already falls back toprocess.env.GH_AW_INFO_ENGINE_IDatactions/setup/js/send_otlp_span.cjs:178. This fix just makes the env var visible to the setup step.Expected Outcome
After this change, every
gh-aw.<jobName>.setupspan (agent, activation, safe-outputs, conclusion, threat-detection, etc.) carriesgh-aw.engine.idandgen_ai.systemfrom the moment it is created:{ span.gh-aw.engine.id = "claude" && name =~ ".*\\.setup" }returns just claude setup spans. Span-metrics generators can now break out p95 setup latency per engine.gen_ai.systempopulates the native GenAI service panels for setup spans, not only for conclusion/agent spans./tmp/gh-aw/otel.jsonlshows the engine on the first span of every job — useful when the conclusion span never gets written (cancelled / timed-out runs).Implementation Steps
pkg/workflow/compiler_yaml_step_generation.go: addGH_AW_INFO_ENGINE_IDenv injection in both the script-mode branch (around line 142) and the dev/release-mode branch (around line 185) ofgenerateSetupStep.pkg/workflow/setup_step_version_test.go(andpkg/workflow/observability_otlp_test.goif it asserts on setup-step env) to expect the new env line for enginescopilot,claude,codex,gemini.actions/setup/js/action_setup_otlp.test.cjscovers the path whereprocess.env.GH_AW_INFO_ENGINE_IDis set and asserts that the resulting span attributes includegh-aw.engine.idandgen_ai.system. If not, add the assertion.make recompile(or equivalent) to regeneratepkg/workflow/testdata/**.goldenso the new env line is present.make test-unitandcd actions/setup/js && npx vitest run.make fmt.Evidence from Live Grafana Data
Tempo backend status:
tempo_traceql-searchagainstgrafanacloud-tracesreturned 0 traces over the last 7 days for{}and{resource.service.name="gh-aw"}— the Grafana Cloud Tempo instance bound to this MCP is not the production OTLP destination for this repository, so the live tracing-backend playbook was not directly usable. Falling back to telemetry-source priority #2 in the otel-queries skill (/tmp/gh-aw/otel.jsonl) gave a current, real span produced by this very workflow run.JSONL evidence (this run):
traceId:d945112102984b62d8c85d2bf1dc6ba3spanId:d43a7a6b0171531bname:gh-aw.agent.setupclaudeengine (confirmed in.github/workflows/daily-grafana-otel-instrumentation-advisor.lock.yml:132GH_AW_INFO_ENGINE_ID: "claude")gh-aw.episode.id,gh-aw.episode.kind,gh-aw.event_name,gh-aw.hop.id,gh-aw.job.name,gh-aw.repository,gh-aw.run.actor,gh-aw.run.attempt,gh-aw.run.id,gh-aw.staged,gh-aw.workflow.name,gh-aw.workflow_call.idgh-aw.engine.id,gen_ai.systemThis is reproducible on every gh-aw run — the gap is structural, not an outlier.
Related Files
pkg/workflow/compiler_yaml_step_generation.go(primary change — inject env ingenerateSetupStep)pkg/workflow/compiler_yaml.go(reference for howengineIDis resolved at compile time, lines 721–725)actions/setup/js/send_otlp_span.cjs(runtime —resolveEngineIdat line 177, attribute push at 1048)actions/setup/js/action_setup_otlp.cjs(entry point that triggerssendJobSetupSpan)actions/setup/js/action_setup_otlp.test.cjs(add coverage)pkg/workflow/setup_step_version_test.go(golden assertions for setup-step env)pkg/workflow/testdata/**.golden(regenerated lock-file fixtures)Generated by the Daily Grafana OTel Instrumentation Advisor workflow