π‘ OTel Instrumentation Improvement: Add gen_ai.* attributes and SPAN_KIND_CLIENT to the agent execution span
Analysis Date: 2026-04-25
Priority: High
Effort: Small (< 2h)
Problem
The gh-aw.agent.agent span β the dedicated span that measures AI agent execution latency β uses custom gh-aw.* token attributes and defaults to SPAN_KIND_INTERNAL. It does not emit the standard [OTel GenAI semantic convention]((opentelemetry.io/redacted) attributes (gen_ai.*) that observability backends now use for out-of-the-box LLM dashboards.
All the required data is already collected (model name from awInfo.model, token counts from agent_usage.json) but is only stored under private gh-aw.* keys. As a result, a DevOps engineer cannot answer the following questions using standard Grafana/Datadog/Honeycomb GenAI dashboards without writing custom attribute mappings:
- What is the average AI token cost per workflow run?
- How does prompt cache hit rate trend over time?
- Which model version are we running and is it changing?
Why This Matters (DevOps Perspective)
The OTel GenAI semantic conventions were stabilized in late 2024 and are now the standard way to instrument LLM API calls. Grafana Cloud, Datadog, and Honeycomb all ship out-of-the-box dashboards keyed on gen_ai.* attributes. Without these attributes on the agent span, engineers must write custom queries or dashboard panels β a maintenance burden that grows over time.
The gh-aw.agent.agent span is also the only span in the trace typed as SPAN_KIND_INTERNAL despite representing a call to an external AI service. This causes service-map tools to render it as an in-process operation rather than an outbound dependency, hiding the AI provider from topology views. Setting SPAN_KIND_CLIENT corrects this and enables client-latency percentile statistics in Datadog APM and similar tools.
Concretely: after this change, the same Grafana LLM Observability dashboard that works for OpenAI/Bedrock spans will work for gh-aw traces, reducing MTTR for "the agent is slow or expensive" investigations from hours (custom query) to seconds (open dashboard).
Current Behavior
actions/setup/js/send_otlp_span.cjs, lines 878β898 β the agent span is built with no kind parameter (defaults to SPAN_KIND_INTERNAL) and inherits the shared conclusion attributes array which uses custom gh-aw.* keys:
// Current: actions/setup/js/send_otlp_span.cjs (lines 878β898)
const agentPayload = buildOTLPPayload({
traceId,
spanId: generateSpanId(),
parentSpanId: conclusionSpanId,
spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
startMs: agentStartMs,
endMs: agentEndMs,
serviceName,
scopeVersion: version,
attributes, // shared with conclusion span; only gh-aw.* custom attrs
resourceAttributes,
statusCode,
statusMessage,
events: agentSpanEvents,
// no 'kind' β defaults to SPAN_KIND_INTERNAL (1)
});
Token attributes currently on the span (lines 757β768, shared array):
// gh-aw.* custom attributes β not recognized by standard GenAI dashboards
attributes.push(buildAttr("gh-aw.tokens.input", agentUsage.input_tokens));
attributes.push(buildAttr("gh-aw.tokens.output", agentUsage.output_tokens));
attributes.push(buildAttr("gh-aw.tokens.cache_read", agentUsage.cache_read_tokens));
attributes.push(buildAttr("gh-aw.tokens.cache_write", agentUsage.cache_write_tokens));
Proposed Change
In sendJobConclusionSpan (inside the if (jobName === "agent" && ...) block), build a dedicated agentAttributes array that extends the shared attributes with gen_ai.* standard attributes, and pass kind: SPAN_KIND_CLIENT:
// Proposed addition β actions/setup/js/send_otlp_span.cjs (agent span block)
// Infer AI provider from model name for gen_ai.system.
const genAiSystem = model && model.toLowerCase().startsWith("claude") ? "anthropic" : "";
// Build gen_ai semantic convention attributes for the agent span only.
// These complement the existing gh-aw.* attributes and unlock out-of-the-box
// GenAI dashboards in Grafana, Datadog, and Honeycomb.
const agentAttributes = [...attributes];
if (genAiSystem) {
agentAttributes.push(buildAttr("gen_ai.system", genAiSystem));
}
if (model) {
agentAttributes.push(buildAttr("gen_ai.request.model", model));
}
if (typeof agentUsage.input_tokens === "number" && agentUsage.input_tokens > 0) {
agentAttributes.push(buildAttr("gen_ai.usage.input_tokens", agentUsage.input_tokens));
}
if (typeof agentUsage.output_tokens === "number" && agentUsage.output_tokens > 0) {
agentAttributes.push(buildAttr("gen_ai.usage.output_tokens", agentUsage.output_tokens));
}
if (typeof agentUsage.cache_read_tokens === "number" && agentUsage.cache_read_tokens > 0) {
agentAttributes.push(buildAttr("gen_ai.usage.cache_read_input_tokens", agentUsage.cache_read_tokens));
}
if (typeof agentUsage.cache_write_tokens === "number" && agentUsage.cache_write_tokens > 0) {
agentAttributes.push(buildAttr("gen_ai.usage.cache_creation_input_tokens", agentUsage.cache_write_tokens));
}
const agentPayload = buildOTLPPayload({
traceId,
spanId: generateSpanId(),
parentSpanId: conclusionSpanId,
spanName: jobName ? `gh-aw.\$\{jobName}.agent` : "gh-aw.job.agent",
startMs: agentStartMs,
endMs: agentEndMs,
serviceName,
scopeVersion: version,
attributes: agentAttributes, // enriched with gen_ai.* attrs
resourceAttributes,
statusCode,
statusMessage,
events: agentSpanEvents,
kind: SPAN_KIND_CLIENT, // LLM invocation is an outbound client call
});
Note: The existing gh-aw.* token attributes on the shared attributes array are preserved on both the conclusion span and the agent span for backwards compatibility. The gen_ai.* attributes are additive on the agent span only.
The gen_ai.system inference heuristic (model.startsWith("claude") β "anthropic") is intentionally simple. A future PR can extend it with a lookup table if other providers are added.
Expected Outcome
After this change:
- In Grafana / Honeycomb / Datadog: The
gh-aw.agent.agent span will appear in out-of-the-box GenAI / LLM dashboards keyed on gen_ai.system and gen_ai.request.model. Token cost panels based on gen_ai.usage.* attributes will populate without custom attribute mappings.
- In service maps: The agent span will appear as an outbound
CLIENT call to the anthropic service rather than an internal operation, giving an accurate dependency picture.
- In the JSONL mirror (
/tmp/gh-aw/otel.jsonl): The agent span lines will include both gh-aw.* and gen_ai.* attributes, making artifact-based debugging richer.
- For on-call engineers: Cache hit rate (
gen_ai.usage.cache_read_input_tokens / gen_ai.usage.input_tokens) becomes instantly queryable with standard attribute names, enabling threshold alerts without custom queries.
Implementation Steps
Evidence from Live Sentry Data
β οΈ The Sentry MCP server was inaccessible during this analysis run (the tool returned an empty tool list). The recommendation is based entirely on static code analysis.
The gap is confirmed by the source: send_otlp_span.cjs lines 878β898 show no kind parameter and no gen_ai.* attributes on the buildOTLPPayload call for the agent span. The test at send_otlp_span.test.cjs:1673 likewise does not assert on agentSpan.kind or any gen_ai.* attribute, confirming the attributes are not currently set.
Related Files
actions/setup/js/send_otlp_span.cjs β primary change site (agent span builder, lines 878β898)
actions/setup/js/send_otlp_span.test.cjs β tests to update (agent span test, line 1646)
actions/setup/js/action_conclusion_otlp.cjs β no change needed (delegates to sendJobConclusionSpan)
actions/setup/js/action_setup_otlp.cjs β no change needed
Generated by the Daily OTel Instrumentation Advisor workflow
Generated by Daily OTel Instrumentation Advisor Β· β 223.9K Β· β·
π‘ OTel Instrumentation Improvement: Add
gen_ai.*attributes andSPAN_KIND_CLIENTto the agent execution spanAnalysis Date: 2026-04-25
Priority: High
Effort: Small (< 2h)
Problem
The
gh-aw.agent.agentspan β the dedicated span that measures AI agent execution latency β uses customgh-aw.*token attributes and defaults toSPAN_KIND_INTERNAL. It does not emit the standard [OTel GenAI semantic convention]((opentelemetry.io/redacted) attributes (gen_ai.*) that observability backends now use for out-of-the-box LLM dashboards.All the required data is already collected (model name from
awInfo.model, token counts fromagent_usage.json) but is only stored under privategh-aw.*keys. As a result, a DevOps engineer cannot answer the following questions using standard Grafana/Datadog/Honeycomb GenAI dashboards without writing custom attribute mappings:Why This Matters (DevOps Perspective)
The OTel GenAI semantic conventions were stabilized in late 2024 and are now the standard way to instrument LLM API calls. Grafana Cloud, Datadog, and Honeycomb all ship out-of-the-box dashboards keyed on
gen_ai.*attributes. Without these attributes on the agent span, engineers must write custom queries or dashboard panels β a maintenance burden that grows over time.The
gh-aw.agent.agentspan is also the only span in the trace typed asSPAN_KIND_INTERNALdespite representing a call to an external AI service. This causes service-map tools to render it as an in-process operation rather than an outbound dependency, hiding the AI provider from topology views. SettingSPAN_KIND_CLIENTcorrects this and enables client-latency percentile statistics in Datadog APM and similar tools.Concretely: after this change, the same Grafana LLM Observability dashboard that works for OpenAI/Bedrock spans will work for gh-aw traces, reducing MTTR for "the agent is slow or expensive" investigations from hours (custom query) to seconds (open dashboard).
Current Behavior
actions/setup/js/send_otlp_span.cjs, lines 878β898 β the agent span is built with nokindparameter (defaults toSPAN_KIND_INTERNAL) and inherits the shared conclusionattributesarray which uses customgh-aw.*keys:Token attributes currently on the span (lines 757β768, shared array):
Proposed Change
In
sendJobConclusionSpan(inside theif (jobName === "agent" && ...)block), build a dedicatedagentAttributesarray that extends the shared attributes withgen_ai.*standard attributes, and passkind: SPAN_KIND_CLIENT:The
gen_ai.systeminference heuristic (model.startsWith("claude") β "anthropic") is intentionally simple. A future PR can extend it with a lookup table if other providers are added.Expected Outcome
After this change:
gh-aw.agent.agentspan will appear in out-of-the-box GenAI / LLM dashboards keyed ongen_ai.systemandgen_ai.request.model. Token cost panels based ongen_ai.usage.*attributes will populate without custom attribute mappings.CLIENTcall to theanthropicservice rather than an internal operation, giving an accurate dependency picture./tmp/gh-aw/otel.jsonl): The agent span lines will include bothgh-aw.*andgen_ai.*attributes, making artifact-based debugging richer.gen_ai.usage.cache_read_input_tokens / gen_ai.usage.input_tokens) becomes instantly queryable with standard attribute names, enabling threshold alerts without custom queries.Implementation Steps
sendJobConclusionSpan(actions/setup/js/send_otlp_span.cjs), buildagentAttributes = [...attributes]before theif (jobName === "agent" && ...)blockgen_ai.*attributes toagentAttributes(using the snippets above)attributes: agentAttributesandkind: SPAN_KIND_CLIENTto thebuildOTLPPayloadcall for the agent spanactions/setup/js/send_otlp_span.test.cjs(test at line 1646 "emits a dedicated gh-aw.(job).agent span...") to assert:agentSpan.kind === 3(SPAN_KIND_CLIENT)agentSpan.attributescontains{ key: "gen_ai.system", value: { stringValue: "anthropic" } }when model starts with "claude"agentSpan.attributescontains{ key: "gen_ai.request.model", ... }agentSpan.attributescontains{ key: "gen_ai.usage.input_tokens", ... }when tokens > 0cd actions/setup/js && npx vitest runto confirm tests passmake fmtto ensure formattingEvidence from Live Sentry Data
Related Files
actions/setup/js/send_otlp_span.cjsβ primary change site (agent span builder, lines 878β898)actions/setup/js/send_otlp_span.test.cjsβ tests to update (agent span test, line 1646)actions/setup/js/action_conclusion_otlp.cjsβ no change needed (delegates tosendJobConclusionSpan)actions/setup/js/action_setup_otlp.cjsβ no change neededGenerated by the Daily OTel Instrumentation Advisor workflow