Summary
When building a Foundry hosted agent with LangGraph, the invoke_agent <name> wrapper span emitted by microsoft-opentelemetry 1.3.2 puts a stringified Python list into the content field of an assistant text part on gen_ai.output.messages, instead of a plain text string as required by the GenAI semantic conventions. The value is the Python repr() of LangGraph''s internal multi-part content list (note single quotes — it isn''t even valid JSON), which breaks any downstream consumer that parses the span per spec.
Spec gap
Per semantic-conventions-genai — docs/gen-ai/gen-ai-agent-spans.md and the Output messages JSON schema:
Instrumentations MUST follow Output messages JSON schema.
A TextPart is defined as { "type": "text", "content": "<string>" }, where content is the plain text of the message. It is not a place to dump a serialized provider-internal structure.
The instrumentation must either:
- Flatten LangGraph''s multi-part content list into a single
text part whose content is just the concatenated text, or
- Emit one spec-compliant part per item in the list (one
TextPart per text item, one ToolCallRequestPart per tool call, etc.).
Observed behavior
Wrapper span on a LangGraph hosted agent (gen_ai.agent.name: travel-planner-langgraph, gen_ai.request.model: gpt-5.4-mini):
Diagnostic giveaways that this is str(list_of_dicts) rather than spec-compliant content:
- Single quotes (
''type'', ''text'') — Python repr(), not JSON.
- A list wrapper (
[{...}]) inside what should be a single string.
- Extra non-spec keys leaking through:
phase, index, id — these are LangGraph/LangChain AIMessage.content fields, not GenAI semconv fields.
- Tokens accounting (
gen_ai.usage.output_tokens: 715) confirms there really is just one assistant text reply — there''s no reason for the content to be a list-shaped blob.
Expected behavior
Per the GenAI output-messages schema, the same payload should be either:
…or, if the instrumentation wants to preserve multi-part structure, one spec-typed part per element (still with content as a plain string on each TextPart).
Reproduction
- Build a Foundry hosted agent using LangGraph (
create_react_agent or a custom graph that returns an AIMessage whose .content is a list of dicts — the common LangChain "content blocks" shape).
- Instrument with
microsoft-opentelemetry==1.3.2 and export to Azure Monitor / Application Insights.
- Invoke the agent with any prompt that produces a single final assistant text reply, e.g.
"Give me a concise one day food walk in Vancouver."
- Query the wrapper span:
dependencies
| where name startswith "invoke_agent "
| project customDimensions
- Inspect
gen_ai.output.messages → the assistant part''s content is a Python-repr string of a list of dicts instead of the plain assistant text.
Full offending span (trimmed)
Why this matters
- Violates the GenAI semconv
TextPart.content: string contract — any spec-conformant consumer (evaluators, trace viewers, schema validators) will reject or mis-parse it.
- Concretely breaks Azure AI Foundry''s cloud trace-evaluation pipeline: graders like
builtin.coherence, builtin.fluency, builtin.relevance, builtin.response_completeness read the final assistant text out of gen_ai.output.messages[*].parts[*].content; they get the Python-repr blob (with \n literally escaped, single quotes, surrounding [{...}]) instead of the actual answer, producing junk scores.
- The single quotes mean the string is not even valid JSON, so consumers can''t safely "re-parse" their way out of it either.
Expected fix
In the LangGraph/LangChain instrumentation path, when an AIMessage.content (or equivalent) is a list[dict] of LangChain content blocks, normalize it before writing to gen_ai.output.messages:
- Concatenate all
{"type": "text", "text": "..."} blocks into the content of a single TextPart (or emit one TextPart per text block), and
- Map
{"type": "tool_use", ...} / tool-call blocks to spec ToolCallRequestParts rather than letting them serialize as part of an opaque string.
Never call str(...) / repr(...) on a Python list/dict and assign the result to a TextPart.content.
Environment
microsoft-opentelemetry 1.3.2
- Foundry hosted agent, framework: LangGraph
- Model:
gpt-5.4-mini, provider: openai
- Backend exporter: Azure Monitor / Application Insights
- Verified against
semantic-conventions-genai main
Related
Summary
When building a Foundry hosted agent with LangGraph, the
invoke_agent <name>wrapper span emitted bymicrosoft-opentelemetry1.3.2 puts a stringified Python list into thecontentfield of anassistanttext part ongen_ai.output.messages, instead of a plain text string as required by the GenAI semantic conventions. The value is the Pythonrepr()of LangGraph''s internal multi-part content list (note single quotes — it isn''t even valid JSON), which breaks any downstream consumer that parses the span per spec.Spec gap
Per
semantic-conventions-genai—docs/gen-ai/gen-ai-agent-spans.mdand the Output messages JSON schema:A
TextPartis defined as{ "type": "text", "content": "<string>" }, wherecontentis the plain text of the message. It is not a place to dump a serialized provider-internal structure.The instrumentation must either:
textpart whosecontentis just the concatenated text, orTextPartper text item, oneToolCallRequestPartper tool call, etc.).Observed behavior
Wrapper span on a LangGraph hosted agent (
gen_ai.agent.name: travel-planner-langgraph,gen_ai.request.model: gpt-5.4-mini):Diagnostic giveaways that this is
str(list_of_dicts)rather than spec-compliant content:''type'',''text'') — Pythonrepr(), not JSON.[{...}]) inside what should be a single string.phase,index,id— these are LangGraph/LangChainAIMessage.contentfields, not GenAI semconv fields.gen_ai.usage.output_tokens: 715) confirms there really is just one assistant text reply — there''s no reason for the content to be a list-shaped blob.Expected behavior
Per the GenAI output-messages schema, the same payload should be either:
…or, if the instrumentation wants to preserve multi-part structure, one spec-typed part per element (still with
contentas a plain string on eachTextPart).Reproduction
create_react_agentor a custom graph that returns anAIMessagewhose.contentis a list of dicts — the common LangChain "content blocks" shape).microsoft-opentelemetry==1.3.2and export to Azure Monitor / Application Insights."Give me a concise one day food walk in Vancouver."gen_ai.output.messages→ theassistantpart''scontentis a Python-reprstring of a list of dicts instead of the plain assistant text.Full offending span (trimmed)
{ "name": "invoke_agent LangGraph", "attributes": { "gen_ai.operation.name": "invoke_agent", "gen_ai.request.model": "gpt-5.4-mini", "gen_ai.provider.name": "openai", "gen_ai.agent.name": "travel-planner-langgraph", "gen_ai.agent.version": 4, "gen_ai.agent.id": "eaef4509-fcb4-4cb0-8b63-c2019fe50fbe", "gen_ai.usage.input_tokens": 2638, "gen_ai.usage.output_tokens": 715, "gen_ai.input.messages": [ { "role": "user", "parts": [ { "type": "text", "content": "Give me a concise one day food walk in Vancouver." } ] } ], "gen_ai.output.messages": [ { "role": "assistant", "parts": [ { "type": "text", "content": "[{''type'': ''text'', ''text'': ''# One-Day Food Walk in Vancouver ...'', ''phase'': ''final_answer'', ''index'': 0, ''id'': ''msg_045afd7960c6c21a006a21f209696c81949c03a57e7e986631''}]" } ], "finish_reason": "stop" } ] } }Why this matters
TextPart.content: stringcontract — any spec-conformant consumer (evaluators, trace viewers, schema validators) will reject or mis-parse it.builtin.coherence,builtin.fluency,builtin.relevance,builtin.response_completenessread the final assistant text out ofgen_ai.output.messages[*].parts[*].content; they get the Python-reprblob (with\nliterally escaped, single quotes, surrounding[{...}]) instead of the actual answer, producing junk scores.Expected fix
In the LangGraph/LangChain instrumentation path, when an
AIMessage.content(or equivalent) is alist[dict]of LangChain content blocks, normalize it before writing togen_ai.output.messages:{"type": "text", "text": "..."}blocks into thecontentof a singleTextPart(or emit oneTextPartper text block), and{"type": "tool_use", ...}/ tool-call blocks to specToolCallRequestParts rather than letting them serialize as part of an opaque string.Never call
str(...)/repr(...)on a Python list/dict and assign the result to aTextPart.content.Environment
microsoft-opentelemetry1.3.2gpt-5.4-mini, provider:openaisemantic-conventions-genaimainRelated
gen_ai.tool.definitions). This one is about the encoding of the assistant text ongen_ai.output.messages.