Python: fix reasoning model workflow handoff and history serialization by eavanvalkenburg · Pull Request #4083 · microsoft/agent-framework

eavanvalkenburg · 2026-02-19T13:04:29Z

Summary

Fixes multiple related failures when using reasoning models (gpt-5-mini, gpt-5.2) in multi-agent workflows. The root issues are all about how reasoning items from the Responses API are emitted, serialized, and carried into subsequent agent runs.

Closes #4047

Problems Fixed

1. "reasoning was provided without its required following item"

The Responses API only accepts a reasoning item in input when it directly precedes a function_call. Sending a reasoning item that preceded a text response (no tool call) causes an API error.

Fix: _prepare_message_for_openai now checks whether the message contains a function_call. text_reasoning content is only serialized as a reasoning input item when a function_call is also present in the same message.

2. Reasoning items never emitted for encrypted/hidden reasoning

When a reasoning model produces encrypted or hidden reasoning, the output_item.added event fires with an empty content list and no reasoning_text.delta events follow. Previously, no text_reasoning Content was emitted — making it invisible to downstream serialization logic.

Fix: Both _parse_response_from_openai (non-streaming) and the output_item.added handler (streaming) now always emit at least one text_reasoning Content, even when the text is empty. The reasoning_id and encrypted_content (if present) are stored in additional_properties.

3. `summary` field must be an array, not an object

The summary field on a reasoning input item must be an array of objects ([{"type": "summary_text", "text": ...}]), not a single object. This caused a 400 invalid_type error.

Fix: _prepare_content_for_openai now wraps summary in a list. summary is omitted entirely when there is no visible text (e.g. encrypted reasoning, where only encrypted_content is sent).

Files Changed

File	Change
`packages/core/agent_framework/openai/_responses_client.py`	Always emit `text_reasoning` on reasoning output items; fix `summary` to be an array; skip reasoning serialization when no `function_call` in same message
`packages/core/agent_framework/_workflows/_agent_executor.py`	Clear `service_session_id` in `run` and `from_response` handlers; remove no-op `_prepare_handoff_messages`
`packages/core/tests/workflow/test_full_conversation.py`	Add `test_run_request_with_full_history_clears_service_session_id` and `test_from_response_clears_service_session_id` (TDD: fail without fix, pass with fix)

Copilot

Pull request overview

This PR fixes workflow + function-calling failures when using reasoning-capable models with the OpenAI/Azure Responses API by tightening how reasoning items are emitted/serialized and by preventing duplicate history replay across agent handoffs.

Changes:

Adjusts Responses API parsing/serialization to (a) only include reasoning input items when paired with a function_call, (b) always emit a text_reasoning marker (even empty) for hidden/encrypted reasoning, and (c) serialize summary as an array.
Updates workflow execution to clear service_session_id when explicitly replaying full history to avoid “Duplicate item found” errors.
Improves function-invocation behavior across multi-message responses and adds/expands tests (unit + integration) covering these scenarios.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
python/packages/core/agent_framework/openai/_responses_client.py	Updates reasoning item parsing and input serialization rules for Responses API.
python/packages/core/agent_framework/_workflows/_agent_executor.py	Clears `service_session_id` when replaying explicit history into an executor.
python/packages/core/agent_framework/_tools.py	Improves function-call extraction across multiple messages and adjusts stop-path handling.
python/packages/core/tests/workflow/test_full_conversation.py	Adds workflow tests for handoff history and `service_session_id` clearing.
python/packages/core/tests/core/test_function_invocation_logic.py	Adds tests for multi-message function calls and stop-path conversation_id behavior.
python/packages/core/tests/azure/test_azure_responses_client.py	Adds an integration test that validates minimal workflow handoff across reasoning vs non-reasoning deployments.
python/samples/05-end-to-end/workflow_evaluation/run_evaluation.py	Updates the default workflow deployment name to a reasoning model for the evaluation sample.
python/samples/02-agents/conversations/redis_chat_message_store_session.py	Makes Redis URL configurable via `REDIS_URL` env var and updates sample messaging.

python/packages/core/agent_framework/openai/_responses_client.py

python/packages/core/agent_framework/_workflows/_agent_executor.py

python/packages/core/agent_framework/_tools.py

python/packages/core/agent_framework/openai/_responses_client.py

markwallace-microsoft · 2026-02-19T13:28:01Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework
_tools.py	846	91	89%	165–166, 303, 305, 323–325, 332, 350, 364, 371, 378, 394, 396, 403, 440, 465, 469, 486–488, 535–537, 600, 622, 685–691, 727, 738–749, 771–773, 778, 782, 796–798, 837, 906, 916, 926, 982, 1013, 1032, 1294, 1351, 1371, 1442–1446, 1568, 1572, 1596, 1622, 1624, 1640, 1642, 1727, 1757, 1777, 1779, 1830, 1893, 2077–2078, 2115, 2128, 2138–2139, 2174–2175, 2235
_types.py	998	87	91%	49, 58–59, 113, 118, 137, 139, 143, 147, 149, 151, 153, 171, 175, 201, 223, 228, 233, 237, 263, 267, 615–616, 987, 1049, 1066, 1084, 1089, 1107, 1117, 1134–1135, 1137, 1155–1156, 1158, 1165–1166, 1168, 1203, 1214–1215, 1217, 1255, 1482, 1534, 1625–1630, 1652, 1657, 1823, 1835, 2078, 2087, 2108, 2203, 2428, 2635, 2705, 2717, 2735, 2933–2935, 2938–2940, 2944, 2949, 2953, 3037–3039, 3068, 3122, 3141–3142, 3145–3149, 3155
packages/core/agent_framework/_workflows
_agent_executor.py	200	26	87%	97, 113, 168–169, 221–222, 224–225, 255–257, 265–267, 275–277, 279, 283, 287, 291–292, 391–392, 438, 456
packages/core/agent_framework/openai
_responses_client.py	639	87	86%	290–293, 297–298, 301–302, 308–309, 314, 327–333, 354, 362, 385, 548, 551, 606, 610, 612, 614, 616, 692, 702, 707, 750, 829, 846, 859, 920, 1011, 1016, 1020–1022, 1026–1027, 1050, 1119, 1141–1142, 1157–1158, 1176–1177, 1218–1221, 1330–1331, 1347, 1349, 1428–1436, 1555, 1610, 1625, 1668–1671, 1679–1680, 1682–1684, 1698–1700, 1710–1711, 1717, 1732
TOTAL	21261	3314	84%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
4189	240 💤	0 ❌	0 🔥	1m 14s ⏱️

python/samples/02-agents/conversations/redis_history_provider.py

python/packages/core/agent_framework/_workflows/_agent_executor.py

python/samples/03-workflows/human-in-the-loop/agents_with_HITL.py

… handoff When a reasoning model (e.g. gpt-5-mini) runs as Agent 1 in a workflow, its response includes text_reasoning items (with server-scoped IDs like rs_XXXX) and function_call items. Forwarding these to Agent 2 in a fresh conversation caused API errors because the reasoning/call IDs are scoped to the original stored response context. Changes: - Strip 'function_call', 'text_reasoning', 'function_approval_request', and 'function_approval_response' from handoff messages in _agent_executor.py - Keep 'function_result' so the actual tool output content is preserved for the next agent's context - Update unit tests to reflect that function_result messages survive handoff (messages grow from 2→3: user, tool(result), assistant(summary)) - Fix incorrect test assertions in test_function_invocation_stop_clears_* that assumed the client layer updates session.service_session_id - Also fixed _extract_function_calls to search all messages with call_id deduplication, and the error-limit stop path to submit function_call_output items before halting (via tool_choice=none cleanup call) Relates to: microsoft#4047 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fixes multiple related issues when using reasoning models (gpt-5-mini, gpt-5.2) in multi-agent workflows that chain agents via from_response or replay full conversation history via AgentExecutorRequest. ## Reasoning items always emitted on output_item.added When a reasoning model produces encrypted or hidden reasoning (no visible text), the Responses API still fires a reasoning output item without any reasoning_text.delta events. Previously no text_reasoning Content was emitted in that case, making it invisible to downstream logic. Both the non-streaming (_parse_response_from_openai) and streaming (output_item.added) paths now always emit at least one text_reasoning Content — with empty text if no content is available — so co-occurrence detection and serialization guards work reliably. ## Reasoning items only serialized when paired with a function_call The Responses API only accepts reasoning items in input when they directly preceded a function_call in the original response. Sending a reasoning item that preceded a text response (no tool call) causes: "reasoning was provided without its required following item" _prepare_message_for_openai now checks has_function_call per message and skips text_reasoning serialization when there is no accompanying function_call. ## summary field is an array, not an object The reasoning item summary field sent to the Responses API must be an array of objects ([{"type": "summary_text", "text": ...}]), not a single object. Fixed _prepare_content_for_openai accordingly. ## service_session_id cleared when explicit history is provided When a workflow coordinator replays a full conversation (including function calls from a previous agent run) back to an executor via AgentExecutorRequest or from_response, the executor's session still held a service_session_id (previous_response_id) from the prior run. The API then received the same function-call items twice — once from previous_response_id (server-stored) and once from the explicit input — causing: "Duplicate item found with id fc_...". AgentExecutor.run (when should_respond=True) and from_response now reset self._session.service_session_id = None before running so that explicit input is the sole source of conversation context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…cit history replay Replace the implicit 'always clear service_session_id when should_respond=True' with an explicit opt-in field on AgentExecutorRequest. The old approach used should_respond=True as a proxy for 'full history replay', but that conflates two distinct intents: - Orchestrations group chat sends should_respond=True with an empty/single-message list (not a full replay) — unnecessarily clearing service_session_id. - HITL / feedback coordinators send the full prior conversation and truly need a fresh service session ID to avoid duplicate-item API errors. Changes: - Add AgentExecutorRequest.reset_service_session: bool = False - AgentExecutor.run only clears service_session_id when this flag is True - AgentExecutor.from_response unchanged (always clears; always full conversation) - Set reset_service_session=True in all full-history-replay call sites: agents_with_HITL.py, azure_chat_agents_tool_calls_with_feedback.py, autogen-migration round-robin coordinator, tau2 runner - Update _FullHistoryReplayCoordinator test helper to pass the flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

python/samples/03-workflows/agents/azure_chat_agents_tool_calls_with_feedback.py

Copilot AI review requested due to automatic review settings February 19, 2026 13:04

markwallace-microsoft added the python label Feb 19, 2026

github-actions bot changed the title ~~fix(python): reasoning model workflow handoff and history serialization~~ Python: fix(python): reasoning model workflow handoff and history serialization Feb 19, 2026

Copilot started reviewing on behalf of eavanvalkenburg February 19, 2026 13:05 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

eavanvalkenburg force-pushed the fix_4047 branch from 3061161 to 3689393 Compare February 19, 2026 13:25

markwallace-microsoft added the lab Agent Framework Lab label Feb 19, 2026

eavanvalkenburg changed the title ~~Python: fix(python): reasoning model workflow handoff and history serialization~~ Python: fix reasoning model workflow handoff and history serialization Feb 19, 2026

dmytrostruk approved these changes Feb 19, 2026

View reviewed changes

python/samples/02-agents/conversations/redis_history_provider.py Show resolved Hide resolved

TaoChenOSU reviewed Feb 19, 2026

View reviewed changes

python/packages/core/agent_framework/_workflows/_agent_executor.py Outdated Show resolved Hide resolved

TaoChenOSU reviewed Feb 19, 2026

View reviewed changes

python/packages/core/agent_framework/_workflows/_agent_executor.py Outdated Show resolved Hide resolved

TaoChenOSU reviewed Feb 19, 2026

View reviewed changes

python/samples/03-workflows/human-in-the-loop/agents_with_HITL.py Outdated Show resolved Hide resolved

eavanvalkenburg force-pushed the fix_4047 branch from c0ba53f to bd2d608 Compare February 19, 2026 17:48

eavanvalkenburg enabled auto-merge February 19, 2026 17:59

giles17 and others added 8 commits February 19, 2026 20:23

small improvements in text reasoning

b014fce

comment update

88294f0

fixes from feedback

ab94e00

fix test

2aaf01f

reverted changes to agent executor

d9193a2

eavanvalkenburg force-pushed the fix_4047 branch from 2989b60 to d9193a2 Compare February 19, 2026 19:24

eavanvalkenburg and others added 3 commits February 19, 2026 20:25

fix: remove reset_service_session from tau2 runner

155e7b9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

two other reverts

9ff3bd6

fix sample

05d336c

TaoChenOSU reviewed Feb 19, 2026

View reviewed changes

python/samples/03-workflows/agents/azure_chat_agents_tool_calls_with_feedback.py Show resolved Hide resolved

TaoChenOSU approved these changes Feb 19, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Feb 19, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 19, 2026

TaoChenOSU added this pull request to the merge queue Feb 19, 2026

Merged via the queue into microsoft:main with commit 67ce1ba Feb 19, 2026
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Python: fix reasoning model workflow handoff and history serialization#4083

Python: fix reasoning model workflow handoff and history serialization#4083
TaoChenOSU merged 11 commits intomicrosoft:mainfrom
eavanvalkenburg:fix_4047

eavanvalkenburg commented Feb 19, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Feb 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

eavanvalkenburg commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problems Fixed

1. "reasoning was provided without its required following item"

2. Reasoning items never emitted for encrypted/hidden reasoning

3. summary field must be an array, not an object

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eavanvalkenburg commented Feb 19, 2026 •

edited

Loading

3. `summary` field must be an array, not an object

markwallace-microsoft commented Feb 19, 2026 •

edited

Loading