Skip to content

LangGraph hosted agent: gen_ai.output.messages text part content is a stringified Python list, not GenAI-spec text #189

@ninghu

Description

@ninghu

Summary

When building a Foundry hosted agent with LangGraph, the invoke_agent <name> wrapper span emitted by microsoft-opentelemetry 1.3.2 puts a stringified Python list into the content field of an assistant text part on gen_ai.output.messages, instead of a plain text string as required by the GenAI semantic conventions. The value is the Python repr() of LangGraph''s internal multi-part content list (note single quotes — it isn''t even valid JSON), which breaks any downstream consumer that parses the span per spec.

Spec gap

Per semantic-conventions-genaidocs/gen-ai/gen-ai-agent-spans.md and the Output messages JSON schema:

Instrumentations MUST follow Output messages JSON schema.

A TextPart is defined as { "type": "text", "content": "<string>" }, where content is the plain text of the message. It is not a place to dump a serialized provider-internal structure.

The instrumentation must either:

  1. Flatten LangGraph''s multi-part content list into a single text part whose content is just the concatenated text, or
  2. Emit one spec-compliant part per item in the list (one TextPart per text item, one ToolCallRequestPart per tool call, etc.).

Observed behavior

Wrapper span on a LangGraph hosted agent (gen_ai.agent.name: travel-planner-langgraph, gen_ai.request.model: gpt-5.4-mini):

"gen_ai.output.messages": [
  {
    "role": "assistant",
    "parts": [
      {
        "type": "text",
        "content": "[{''type'': ''text'', ''text'': ''# One-Day Food Walk in Vancouver\\n\\n## Assumptions\\n- ...'', ''phase'': ''final_answer'', ''index'': 0, ''id'': ''msg_045afd7960c6c21a006a21f209696c81949c03a57e7e986631''}]"
      }
    ],
    "finish_reason": "stop"
  }
]

Diagnostic giveaways that this is str(list_of_dicts) rather than spec-compliant content:

  • Single quotes (''type'', ''text'') — Python repr(), not JSON.
  • A list wrapper ([{...}]) inside what should be a single string.
  • Extra non-spec keys leaking through: phase, index, id — these are LangGraph/LangChain AIMessage.content fields, not GenAI semconv fields.
  • Tokens accounting (gen_ai.usage.output_tokens: 715) confirms there really is just one assistant text reply — there''s no reason for the content to be a list-shaped blob.

Expected behavior

Per the GenAI output-messages schema, the same payload should be either:

"gen_ai.output.messages": [
  {
    "role": "assistant",
    "parts": [
      { "type": "text", "content": "# One-Day Food Walk in Vancouver\n\n## Assumptions\n- ..." }
    ],
    "finish_reason": "stop"
  }
]

…or, if the instrumentation wants to preserve multi-part structure, one spec-typed part per element (still with content as a plain string on each TextPart).

Reproduction

  1. Build a Foundry hosted agent using LangGraph (create_react_agent or a custom graph that returns an AIMessage whose .content is a list of dicts — the common LangChain "content blocks" shape).
  2. Instrument with microsoft-opentelemetry==1.3.2 and export to Azure Monitor / Application Insights.
  3. Invoke the agent with any prompt that produces a single final assistant text reply, e.g. "Give me a concise one day food walk in Vancouver."
  4. Query the wrapper span:
    dependencies
    | where name startswith "invoke_agent "
    | project customDimensions
  5. Inspect gen_ai.output.messages → the assistant part''s content is a Python-repr string of a list of dicts instead of the plain assistant text.

Full offending span (trimmed)

{
  "name": "invoke_agent LangGraph",
  "attributes": {
    "gen_ai.operation.name": "invoke_agent",
    "gen_ai.request.model": "gpt-5.4-mini",
    "gen_ai.provider.name": "openai",
    "gen_ai.agent.name": "travel-planner-langgraph",
    "gen_ai.agent.version": 4,
    "gen_ai.agent.id": "eaef4509-fcb4-4cb0-8b63-c2019fe50fbe",
    "gen_ai.usage.input_tokens": 2638,
    "gen_ai.usage.output_tokens": 715,
    "gen_ai.input.messages": [
      { "role": "user", "parts": [ { "type": "text", "content": "Give me a concise one day food walk in Vancouver." } ] }
    ],
    "gen_ai.output.messages": [
      {
        "role": "assistant",
        "parts": [
          {
            "type": "text",
            "content": "[{''type'': ''text'', ''text'': ''# One-Day Food Walk in Vancouver ...'', ''phase'': ''final_answer'', ''index'': 0, ''id'': ''msg_045afd7960c6c21a006a21f209696c81949c03a57e7e986631''}]"
          }
        ],
        "finish_reason": "stop"
      }
    ]
  }
}

Why this matters

  • Violates the GenAI semconv TextPart.content: string contract — any spec-conformant consumer (evaluators, trace viewers, schema validators) will reject or mis-parse it.
  • Concretely breaks Azure AI Foundry''s cloud trace-evaluation pipeline: graders like builtin.coherence, builtin.fluency, builtin.relevance, builtin.response_completeness read the final assistant text out of gen_ai.output.messages[*].parts[*].content; they get the Python-repr blob (with \n literally escaped, single quotes, surrounding [{...}]) instead of the actual answer, producing junk scores.
  • The single quotes mean the string is not even valid JSON, so consumers can''t safely "re-parse" their way out of it either.

Expected fix

In the LangGraph/LangChain instrumentation path, when an AIMessage.content (or equivalent) is a list[dict] of LangChain content blocks, normalize it before writing to gen_ai.output.messages:

  • Concatenate all {"type": "text", "text": "..."} blocks into the content of a single TextPart (or emit one TextPart per text block), and
  • Map {"type": "tool_use", ...} / tool-call blocks to spec ToolCallRequestParts rather than letting them serialize as part of an opaque string.

Never call str(...) / repr(...) on a Python list/dict and assign the result to a TextPart.content.

Environment

  • microsoft-opentelemetry 1.3.2
  • Foundry hosted agent, framework: LangGraph
  • Model: gpt-5.4-mini, provider: openai
  • Backend exporter: Azure Monitor / Application Insights
  • Verified against semantic-conventions-genai main

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions