Python: Fix per-service-call history persistence with server-storing clients#6310
Conversation
When an Agent set require_per_service_call_history_persistence=True together with a HistoryProvider, and the chat client stored history server-side by default (e.g. OpenAIChatClient, STORES_BY_DEFAULT=True), the external history provider was silently never persisted. Unify persistence on the per-service-call middleware: when the flag is set and a HistoryProvider exists, the middleware is always installed and owns persistence. service_stores_history now only selects middleware behavior: - service does not store: load providers and drive the function loop with a local sentinel conversation id, or - service stores: skip loading (the service owns history) and persist each service call while the real conversation id flows through. Also rationalize chat-options handling in _prepare_run_context: - _merge_options now skips None overrides and strips remaining None values, so an unset `store` is never forwarded and the service decides its own default. - Resolve `store` and `conversation_id` once from a single combined view (effective_options) instead of probing both default and runtime dicts; the auto-injection and per-service-call resolution now agree on conversation_id. Fixes microsoft#5798 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a Python SDK bug where require_per_service_call_history_persistence=True combined with an external HistoryProvider could silently skip external persistence when the underlying chat client stores history server-side by default (e.g., OpenAI clients with STORES_BY_DEFAULT=True). The fix centralizes history persistence responsibility in the per-service-call middleware and rationalizes option-merging so unset values (notably store=None) are not forwarded to clients.
Changes:
- Always installs
PerServiceCallHistoryPersistingMiddlewarewhen per-service-call persistence is required and aHistoryProvideris present; service-side storage now only changes how the middleware behaves (load+persist vs persist-only). - Updates
_prepare_run_contextto resolvestore/conversation_idfrom a single merged options view and to treatNoneas “unset” (not forwarded). - Adds a scenario-matrix test suite validating persistence timing across storing/non-storing clients, streaming/non-streaming runs, and
storeoverrides.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/packages/core/agent_framework/_agents.py | Unifies option resolution and ensures per-service-call persistence is owned by middleware in both local and service-managed cases, with warning logging when load is bypassed. |
| python/packages/core/agent_framework/_sessions.py | Extends per-service-call middleware to support a “service stores history” mode (persist-only; no provider load; no local sentinel behavior). |
| python/packages/core/agent_framework/_clients.py | Updates as_agent() docstring to describe the per-service-call persistence behavior (note: one doc line currently contradicts implementation). |
| python/packages/core/tests/core/test_agents.py | Adds regression tests and a scenario matrix asserting per-service-call persistence timing and store=None non-forwarding. |
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||||||||||||
…ce per run Address PR review: when the client stores history server-side, the per-service-call middleware still persists after each model call; only provider loading is skipped. The previous "persist once per run()" wording contradicted the implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| require_per_service_call_history_persistence: When True (and a HistoryProvider is | ||
| present), the provider always persists history via per-service-call middleware, | ||
| regardless of whether the client stores history server-side. If the client does | ||
| not store history, the middleware also loads providers around each model call and | ||
| drives the function loop with a local conversation; if it does, loading is skipped | ||
| (the service-managed conversation is the source of truth) and the middleware only | ||
| persists. A warning is logged for providers with ``load_messages=True`` when | ||
| loading is skipped because service-side storage is active. |
There was a problem hiding this comment.
nit: this becomes sort of confusing. How about when this is true and there isn't a HistoryProvider?
| "HistoryProvider '%s' has load_messages=True but the chat client stores history " | ||
| "server-side; skipping local history load and relying on the service-managed " | ||
| "conversation. Set store=False to load from the provider, or load_messages=False " | ||
| "to silence this warning.", | ||
| provider.source_id, |
There was a problem hiding this comment.
Why not throw an exception at agent initialization?
| ) -> ChatResponse: | ||
| """Persist a model response and apply the local follow-up sentinel when needed.""" | ||
| if response.conversation_id is not None and not is_local_history_conversation_id(response.conversation_id): | ||
| if ( |
There was a problem hiding this comment.
What happens when service_stores_history is True but the client returns conversation_id=None? This guard is skipped, provider loading was already skipped (_sessions.py _prepare_service_call_context), and session.service_session_id only gets set when conversation_id is truthy (_agents.py:1400). So we persist this turn to the provider but the next run skips loading again with no service id to resume from, dropping cross-turn history silently with no error and no warning. Is a storing client guaranteed to always echo a conversation_id, or should we assert/warn in the storing branch when it comes back empty so the assumption can't fail quietly?
| # Without service-side storage the middleware persists locally and drives the function | ||
| # loop with a local sentinel, which cannot be reconciled with an existing service-managed | ||
| # conversation. When the service stores history, an existing conversation id is expected. | ||
| if conversation_id is not None and not service_stores_history: |
There was a problem hiding this comment.
Are we covering the allow side of this boundary? The PR adds and not service_stores_history, so flag-on + storing client + an existing conversation_id now resumes instead of raising. Tests cover the raise side (store=False + conversation_id), but I couldn't find one for storing + conversation_id asserting it does NOT raise and that the id propagates to service_session_id. A regression dropping the and not service_stores_history qualifier would wrongly raise on the resume-a-stored-conversation path and nothing would catch it. Worth a test?
| # persists each service call while the real conversation id flows through. | ||
| # In the service-managed case loading is skipped, so warn for providers that expect to load. | ||
| history_providers = self._get_history_providers() | ||
| if self.require_per_service_call_history_persistence and history_providers and service_stores_history: |
There was a problem hiding this comment.
This sits in _prepare_run_context, which runs once per run(), so a long-lived agent looping turns against a storing client with a load_messages=True provider re-logs the same WARNING every turn. Could that train users to tune it out? Wondering if we should gate it to once per session/agent (or first run only) so it stays a signal rather than per-turn noise.
|
|
||
| if conversation_id is not None: | ||
| # A live service-managed session id takes precedence over the resolved conversation id. | ||
| if session and session.service_session_id: |
There was a problem hiding this comment.
Are we covering a second run on the same session in storing mode? This precedence branch only fires once service_session_id is populated (run 2+), and every per-service-call test does a single run. A two-run test asserting persistence keeps happening, service_session_id stays stable, and loading stays skipped would lock down this path.
Motivation and Context
When an
Agentwas configured withrequire_per_service_call_history_persistence=Truetogether with aHistoryProvider, and the underlying chat client stored history server-side by default (e.g.OpenAIChatClient, whereSTORES_BY_DEFAULT=True), the external history provider was silently never persisted. The per-service-call middleware was skipped because the service was assumed to own history, and the once-per-run path also skipped the provider — so neither persisted.Fixes #5798
Description
Unify persistence on the per-service-call middleware. When
require_per_service_call_history_persistence=Trueand aHistoryProviderexists, thePerServiceCallHistoryPersistingMiddlewareis now always installed and owns persistence.service_stores_historyonly selects how the middleware behaves, never whether it persists:load_messages=Truewhose load is bypassed.The observable contract: with the flag on, persistence happens per service call — in a function-call → final-completion run, the function-call turn is persisted before the second call starts.
Rationalize chat-options handling in
_prepare_run_context:_merge_optionsnow skipsNoneoverrides and strips remainingNonevalues in a single pass, so an unsetstoreis never forwarded to the client and the service decides its own default (STORES_BY_DEFAULTis only an internal behavior hint).storeandconversation_idare resolved once from a single combined view (effective_options) instead of probing both the agent-default and runtime dicts separately. TheInMemoryHistoryProviderauto-injection and the per-service-call resolution now agree onconversation_id(an agent-level default is honored consistently).Tests are added to
test_agents.pyas a scenario matrix (sync + streaming) that asserts the per-service-call persistence timing across storing/non-storing clients andstoreoverrides.Contribution Checklist