LangGraph hosted agent: gen_ai.output.messages text part content is a stringified Python list, not GenAI-spec text

## Summary
When building a Foundry hosted agent with **LangGraph**, the `invoke_agent <name>` wrapper span emitted by `microsoft-opentelemetry` 1.3.2 puts a **stringified Python list** into the `content` field of an `assistant` text part on `gen_ai.output.messages`, instead of a plain text string as required by the GenAI semantic conventions. The value is the Python `repr()` of LangGraph''s internal multi-part content list (note single quotes — it isn''t even valid JSON), which breaks any downstream consumer that parses the span per spec.

## Spec gap
Per [`semantic-conventions-genai`](https://github.com/open-telemetry/semantic-conventions-genai) — [`docs/gen-ai/gen-ai-agent-spans.md`](https://github.com/open-telemetry/semantic-conventions-genai/blob/main/docs/gen-ai/gen-ai-agent-spans.md) and the [Output messages JSON schema](https://github.com/open-telemetry/semantic-conventions-genai/blob/main/docs/gen-ai/gen-ai-output-messages.json):

> Instrumentations **MUST** follow [Output messages JSON schema](https://github.com/open-telemetry/semantic-conventions-genai/blob/main/docs/gen-ai/gen-ai-output-messages.json).

A `TextPart` is defined as `{ "type": "text", "content": "<string>" }`, where `content` is the **plain text** of the message. It is **not** a place to dump a serialized provider-internal structure.

The instrumentation must either:
1. Flatten LangGraph''s multi-part content list into a single `text` part whose `content` is just the concatenated text, **or**
2. Emit one spec-compliant part per item in the list (one `TextPart` per text item, one `ToolCallRequestPart` per tool call, etc.).

## Observed behavior
Wrapper span on a LangGraph hosted agent (`gen_ai.agent.name: travel-planner-langgraph`, `gen_ai.request.model: gpt-5.4-mini`):

```jsonc
"gen_ai.output.messages": [
  {
    "role": "assistant",
    "parts": [
      {
        "type": "text",
        "content": "[{''type'': ''text'', ''text'': ''# One-Day Food Walk in Vancouver\\n\\n## Assumptions\\n- ...'', ''phase'': ''final_answer'', ''index'': 0, ''id'': ''msg_045afd7960c6c21a006a21f209696c81949c03a57e7e986631''}]"
      }
    ],
    "finish_reason": "stop"
  }
]
```

Diagnostic giveaways that this is `str(list_of_dicts)` rather than spec-compliant content:
- Single quotes (`''type''`, `''text''`) — Python `repr()`, not JSON.
- A list wrapper (`[{...}]`) inside what should be a single string.
- Extra non-spec keys leaking through: `phase`, `index`, `id` — these are LangGraph/LangChain `AIMessage.content` fields, not GenAI semconv fields.
- Tokens accounting (`gen_ai.usage.output_tokens: 715`) confirms there really is just one assistant text reply — there''s no reason for the content to be a list-shaped blob.

## Expected behavior
Per the GenAI output-messages schema, the same payload should be either:

```jsonc
"gen_ai.output.messages": [
  {
    "role": "assistant",
    "parts": [
      { "type": "text", "content": "# One-Day Food Walk in Vancouver\n\n## Assumptions\n- ..." }
    ],
    "finish_reason": "stop"
  }
]
```

…or, if the instrumentation wants to preserve multi-part structure, one spec-typed part per element (still with `content` as a plain string on each `TextPart`).

## Reproduction
1. Build a Foundry hosted agent using LangGraph (`create_react_agent` or a custom graph that returns an `AIMessage` whose `.content` is a list of dicts — the common LangChain "content blocks" shape).
2. Instrument with `microsoft-opentelemetry==1.3.2` and export to Azure Monitor / Application Insights.
3. Invoke the agent with any prompt that produces a single final assistant text reply, e.g. `"Give me a concise one day food walk in Vancouver."`
4. Query the wrapper span:
   ```kql
   dependencies
   | where name startswith "invoke_agent "
   | project customDimensions
   ```
5. Inspect `gen_ai.output.messages` → the `assistant` part''s `content` is a Python-`repr` string of a list of dicts instead of the plain assistant text.

## Full offending span (trimmed)
```jsonc
{
  "name": "invoke_agent LangGraph",
  "attributes": {
    "gen_ai.operation.name": "invoke_agent",
    "gen_ai.request.model": "gpt-5.4-mini",
    "gen_ai.provider.name": "openai",
    "gen_ai.agent.name": "travel-planner-langgraph",
    "gen_ai.agent.version": 4,
    "gen_ai.agent.id": "eaef4509-fcb4-4cb0-8b63-c2019fe50fbe",
    "gen_ai.usage.input_tokens": 2638,
    "gen_ai.usage.output_tokens": 715,
    "gen_ai.input.messages": [
      { "role": "user", "parts": [ { "type": "text", "content": "Give me a concise one day food walk in Vancouver." } ] }
    ],
    "gen_ai.output.messages": [
      {
        "role": "assistant",
        "parts": [
          {
            "type": "text",
            "content": "[{''type'': ''text'', ''text'': ''# One-Day Food Walk in Vancouver ...'', ''phase'': ''final_answer'', ''index'': 0, ''id'': ''msg_045afd7960c6c21a006a21f209696c81949c03a57e7e986631''}]"
          }
        ],
        "finish_reason": "stop"
      }
    ]
  }
}
```

## Why this matters
- Violates the GenAI semconv `TextPart.content: string` contract — any spec-conformant consumer (evaluators, trace viewers, schema validators) will reject or mis-parse it.
- Concretely breaks Azure AI Foundry''s cloud trace-evaluation pipeline: graders like `builtin.coherence`, `builtin.fluency`, `builtin.relevance`, `builtin.response_completeness` read the final assistant text out of `gen_ai.output.messages[*].parts[*].content`; they get the Python-`repr` blob (with `\n` literally escaped, single quotes, surrounding `[{...}]`) instead of the actual answer, producing junk scores.
- The single quotes mean the string is **not even valid JSON**, so consumers can''t safely "re-parse" their way out of it either.

## Expected fix
In the LangGraph/LangChain instrumentation path, when an `AIMessage.content` (or equivalent) is a `list[dict]` of LangChain content blocks, normalize it before writing to `gen_ai.output.messages`:
- Concatenate all `{"type": "text", "text": "..."}` blocks into the `content` of a single `TextPart` (or emit one `TextPart` per text block), and
- Map `{"type": "tool_use", ...}` / tool-call blocks to spec `ToolCallRequestPart`s rather than letting them serialize as part of an opaque string.

Never call `str(...)` / `repr(...)` on a Python list/dict and assign the result to a `TextPart.content`.

## Environment
- `microsoft-opentelemetry` **1.3.2**
- Foundry hosted agent, framework: **LangGraph**
- Model: `gpt-5.4-mini`, provider: `openai`
- Backend exporter: Azure Monitor / Application Insights
- Verified against `semantic-conventions-genai` `main`

## Related
- #172 — same span, different defect (missing tool turns / `gen_ai.tool.definitions`). This one is about the *encoding* of the assistant text on `gen_ai.output.messages`.
- #159 — original message-shape fix on the wrapper span.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LangGraph hosted agent: gen_ai.output.messages text part content is a stringified Python list, not GenAI-spec text #189

Summary

Spec gap

Observed behavior

Expected behavior

Reproduction

Full offending span (trimmed)

Why this matters

Expected fix

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

LangGraph hosted agent: gen_ai.output.messages text part content is a stringified Python list, not GenAI-spec text #189

Description

Summary

Spec gap

Observed behavior

Expected behavior

Reproduction

Full offending span (trimmed)

Why this matters

Expected fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions