Skip to content

Mark agent spans as Langfuse "generation" observations and (opt-in) emit prompt/completion content #33644

@jaroslawgajewski

Description

@jaroslawgajewski

Problem

The agent's prompt and completion text never leave the runner. gh-aw emits token counts and a model name but not the prompt body, the response body, or the OTel langfuse.observation.type = "generation" marker. The consequence in Langfuse:

  • Agent spans render as plain "span" rows rather than the rich generation rows (no model + cost + usage in-table, no prompt-link UI).
  • Trace detail view has empty Input / Output panels — useless for human review, evals, or annotation queues.
  • LLM-as-a-Judge evaluators can't run because they need input/output content.

Proposal

Two changes:

A. Always (no opt-in): add langfuse.observation.type = "generation" to the dedicated agent span (and to the conclusion span when it carries the token usage because no dedicated agent span exists). In send_otlp_span.cjs around line ~1997 (the agentAttributes = [...attributes, ...usageAttrs] line):

agentAttributes.push(buildAttr("langfuse.observation.type", "generation"));

And when usage attributes fall through to the conclusion span (line ~1985 region), similarly add langfuse.observation.type = "generation" to the conclusion span attributes.

B. Opt-in via frontmatter: new field

observability:
  otlp:
    endpoint: ...
  langfuse:
    capture-content: true            # default false
    redact-patterns:                 # optional
      - 'ghp_[A-Za-z0-9]{36,}'
      - 'sk-[A-Za-z0-9]{20,}'

When capture-content: true:

  1. Add a post-agent step that reads /tmp/gh-aw/agent_output.json and the resolved prompt body, applies sanitizeOTLPPayload-style redaction plus user-supplied redact-patterns, and emits a child span carrying langfuse.observation.input and langfuse.observation.output attributes.
  2. The default for capture-content must be false because prompts can carry repo paths, issue bodies, or secrets.

Acceptance criteria

  • (A) Agent / agent-conclusion spans always carry langfuse.observation.type = "generation". Langfuse trace detail renders model name, tokens, cost without configuration. Tests cover both the dedicated-agent-span path and the fallback (conclusion-span-carries-usage) path.
  • (B) When capture-content: true, post-agent step writes a child span with langfuse.observation.input / output populated. Redaction is applied first.
  • When capture-content is unset or false, no content attributes are emitted (regression test).
  • Documentation in docs/ describes both options and warns about content sensitivity.

References

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions