Skip to content

[otel-advisor] OTel improvement: add OTel GenAI semantic convention attributes to the agent spanΒ #28503

@github-actions

Description

@github-actions

πŸ“‘ OTel Instrumentation Improvement: Add gen_ai.* attributes and SPAN_KIND_CLIENT to the agent execution span

Analysis Date: 2026-04-25
Priority: High
Effort: Small (< 2h)

Problem

The gh-aw.agent.agent span β€” the dedicated span that measures AI agent execution latency β€” uses custom gh-aw.* token attributes and defaults to SPAN_KIND_INTERNAL. It does not emit the standard [OTel GenAI semantic convention]((opentelemetry.io/redacted) attributes (gen_ai.*) that observability backends now use for out-of-the-box LLM dashboards.

All the required data is already collected (model name from awInfo.model, token counts from agent_usage.json) but is only stored under private gh-aw.* keys. As a result, a DevOps engineer cannot answer the following questions using standard Grafana/Datadog/Honeycomb GenAI dashboards without writing custom attribute mappings:

  • What is the average AI token cost per workflow run?
  • How does prompt cache hit rate trend over time?
  • Which model version are we running and is it changing?
Why This Matters (DevOps Perspective)

The OTel GenAI semantic conventions were stabilized in late 2024 and are now the standard way to instrument LLM API calls. Grafana Cloud, Datadog, and Honeycomb all ship out-of-the-box dashboards keyed on gen_ai.* attributes. Without these attributes on the agent span, engineers must write custom queries or dashboard panels β€” a maintenance burden that grows over time.

The gh-aw.agent.agent span is also the only span in the trace typed as SPAN_KIND_INTERNAL despite representing a call to an external AI service. This causes service-map tools to render it as an in-process operation rather than an outbound dependency, hiding the AI provider from topology views. Setting SPAN_KIND_CLIENT corrects this and enables client-latency percentile statistics in Datadog APM and similar tools.

Concretely: after this change, the same Grafana LLM Observability dashboard that works for OpenAI/Bedrock spans will work for gh-aw traces, reducing MTTR for "the agent is slow or expensive" investigations from hours (custom query) to seconds (open dashboard).

Current Behavior

actions/setup/js/send_otlp_span.cjs, lines 878–898 β€” the agent span is built with no kind parameter (defaults to SPAN_KIND_INTERNAL) and inherits the shared conclusion attributes array which uses custom gh-aw.* keys:

// Current: actions/setup/js/send_otlp_span.cjs (lines 878–898)
const agentPayload = buildOTLPPayload({
  traceId,
  spanId: generateSpanId(),
  parentSpanId: conclusionSpanId,
  spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
  startMs: agentStartMs,
  endMs: agentEndMs,
  serviceName,
  scopeVersion: version,
  attributes,           // shared with conclusion span; only gh-aw.* custom attrs
  resourceAttributes,
  statusCode,
  statusMessage,
  events: agentSpanEvents,
  // no 'kind' β†’ defaults to SPAN_KIND_INTERNAL (1)
});

Token attributes currently on the span (lines 757–768, shared array):

// gh-aw.* custom attributes β€” not recognized by standard GenAI dashboards
attributes.push(buildAttr("gh-aw.tokens.input",       agentUsage.input_tokens));
attributes.push(buildAttr("gh-aw.tokens.output",      agentUsage.output_tokens));
attributes.push(buildAttr("gh-aw.tokens.cache_read",  agentUsage.cache_read_tokens));
attributes.push(buildAttr("gh-aw.tokens.cache_write", agentUsage.cache_write_tokens));
Proposed Change

In sendJobConclusionSpan (inside the if (jobName === "agent" && ...) block), build a dedicated agentAttributes array that extends the shared attributes with gen_ai.* standard attributes, and pass kind: SPAN_KIND_CLIENT:

// Proposed addition β€” actions/setup/js/send_otlp_span.cjs (agent span block)

// Infer AI provider from model name for gen_ai.system.
const genAiSystem = model && model.toLowerCase().startsWith("claude") ? "anthropic" : "";

// Build gen_ai semantic convention attributes for the agent span only.
// These complement the existing gh-aw.* attributes and unlock out-of-the-box
// GenAI dashboards in Grafana, Datadog, and Honeycomb.
const agentAttributes = [...attributes];
if (genAiSystem) {
  agentAttributes.push(buildAttr("gen_ai.system", genAiSystem));
}
if (model) {
  agentAttributes.push(buildAttr("gen_ai.request.model", model));
}
if (typeof agentUsage.input_tokens === "number" && agentUsage.input_tokens > 0) {
  agentAttributes.push(buildAttr("gen_ai.usage.input_tokens", agentUsage.input_tokens));
}
if (typeof agentUsage.output_tokens === "number" && agentUsage.output_tokens > 0) {
  agentAttributes.push(buildAttr("gen_ai.usage.output_tokens", agentUsage.output_tokens));
}
if (typeof agentUsage.cache_read_tokens === "number" && agentUsage.cache_read_tokens > 0) {
  agentAttributes.push(buildAttr("gen_ai.usage.cache_read_input_tokens", agentUsage.cache_read_tokens));
}
if (typeof agentUsage.cache_write_tokens === "number" && agentUsage.cache_write_tokens > 0) {
  agentAttributes.push(buildAttr("gen_ai.usage.cache_creation_input_tokens", agentUsage.cache_write_tokens));
}

const agentPayload = buildOTLPPayload({
  traceId,
  spanId: generateSpanId(),
  parentSpanId: conclusionSpanId,
  spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
  startMs: agentStartMs,
  endMs: agentEndMs,
  serviceName,
  scopeVersion: version,
  attributes: agentAttributes,   // enriched with gen_ai.* attrs
  resourceAttributes,
  statusCode,
  statusMessage,
  events: agentSpanEvents,
  kind: SPAN_KIND_CLIENT,        // LLM invocation is an outbound client call
});

Note: The existing gh-aw.* token attributes on the shared attributes array are preserved on both the conclusion span and the agent span for backwards compatibility. The gen_ai.* attributes are additive on the agent span only.

The gen_ai.system inference heuristic (model.startsWith("claude") β†’ "anthropic") is intentionally simple. A future PR can extend it with a lookup table if other providers are added.

Expected Outcome

After this change:

  • In Grafana / Honeycomb / Datadog: The gh-aw.agent.agent span will appear in out-of-the-box GenAI / LLM dashboards keyed on gen_ai.system and gen_ai.request.model. Token cost panels based on gen_ai.usage.* attributes will populate without custom attribute mappings.
  • In service maps: The agent span will appear as an outbound CLIENT call to the anthropic service rather than an internal operation, giving an accurate dependency picture.
  • In the JSONL mirror (/tmp/gh-aw/otel.jsonl): The agent span lines will include both gh-aw.* and gen_ai.* attributes, making artifact-based debugging richer.
  • For on-call engineers: Cache hit rate (gen_ai.usage.cache_read_input_tokens / gen_ai.usage.input_tokens) becomes instantly queryable with standard attribute names, enabling threshold alerts without custom queries.
Implementation Steps
  • In sendJobConclusionSpan (actions/setup/js/send_otlp_span.cjs), build agentAttributes = [...attributes] before the if (jobName === "agent" && ...) block
  • Append gen_ai.* attributes to agentAttributes (using the snippets above)
  • Pass attributes: agentAttributes and kind: SPAN_KIND_CLIENT to the buildOTLPPayload call for the agent span
  • Update actions/setup/js/send_otlp_span.test.cjs (test at line 1646 "emits a dedicated gh-aw.(job).agent span...") to assert:
    • agentSpan.kind === 3 (SPAN_KIND_CLIENT)
    • agentSpan.attributes contains { key: "gen_ai.system", value: { stringValue: "anthropic" } } when model starts with "claude"
    • agentSpan.attributes contains { key: "gen_ai.request.model", ... }
    • agentSpan.attributes contains { key: "gen_ai.usage.input_tokens", ... } when tokens > 0
  • Run cd actions/setup/js && npx vitest run to confirm tests pass
  • Run make fmt to ensure formatting
  • Open a PR referencing this issue
Evidence from Live Sentry Data

⚠️ The Sentry MCP server was inaccessible during this analysis run (the tool returned an empty tool list). The recommendation is based entirely on static code analysis.

The gap is confirmed by the source: send_otlp_span.cjs lines 878–898 show no kind parameter and no gen_ai.* attributes on the buildOTLPPayload call for the agent span. The test at send_otlp_span.test.cjs:1673 likewise does not assert on agentSpan.kind or any gen_ai.* attribute, confirming the attributes are not currently set.

Related Files
  • actions/setup/js/send_otlp_span.cjs β€” primary change site (agent span builder, lines 878–898)
  • actions/setup/js/send_otlp_span.test.cjs β€” tests to update (agent span test, line 1646)
  • actions/setup/js/action_conclusion_otlp.cjs β€” no change needed (delegates to sendJobConclusionSpan)
  • actions/setup/js/action_setup_otlp.cjs β€” no change needed

Generated by the Daily OTel Instrumentation Advisor workflow

Generated by Daily OTel Instrumentation Advisor Β· ● 223.9K Β· β—·

  • expires on May 2, 2026, 9:25 PM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions