Skip to content

Python: Add allowed_checkpoint_types support to CosmosCheckpointStorage for parity with FileCheckpointStorage #5200

@moonbox3

Description

@moonbox3

Summary

CosmosCheckpointStorage (in agent-framework-azure-cosmos) should reach API and behavior parity with FileCheckpointStorage (in agent-framework) for checkpoint deserialization. Two related changes should land together:

  1. Accept an allowed_checkpoint_types constructor parameter on CosmosCheckpointStorage, with the same "module:qualname" format and semantics as FileCheckpointStorage.
  2. Align the Cosmos load path with File so it flows through the same restricted-type loading behavior that File already uses by default.

Today, the two providers differ in how they call into decode_checkpoint_value, which in turn decides whether checkpoint documents are loaded via RestrictedUnpickler or via plain pickle.loads. Bringing them into alignment makes the two providers interchangeable from a user's perspective and removes a surprising behavior difference that is easy to miss when swapping providers.

Background and code paths

FileCheckpointStorage stores an _allowed_types frozenset on the instance and forwards it on every load:

  • python/packages/core/agent_framework/_workflows/_checkpoint.py:
    • L279 — self._allowed_types: frozenset[str] = frozenset(allowed_checkpoint_types or [])
    • L355 — decode_checkpoint_value(encoded_checkpoint, allowed_types=self._allowed_types)
    • L381 — same pattern in the second load path

Inside the encoding module, the loading mode is decided by a simple is not None check:

  • python/packages/core/agent_framework/_workflows/_checkpoint_encoding.py:
    • _base64_to_unpickleif allowed_types is not None: uses _RestrictedUnpickler, otherwise falls through to pickle.loads.

Because FileCheckpointStorage always passes a frozenset (even an empty one), it is always on the restricted-type path. The empty frozenset still enforces the built-in safe set plus agent_framework.* framework types — the additive allowed_checkpoint_types layers user types on top of that floor.

CosmosCheckpointStorage does not pass allowed_types at all:

  • python/packages/azure-cosmos/agent_framework_azure_cosmos/_checkpoint_storage.py:
    • L416 — decoded = decode_checkpoint_value(cleaned)

The two providers therefore take different branches of _base64_to_unpickle for the same inputs, which is surprising given they implement the same CheckpointStorage protocol and share the same hybrid JSON + pickle encoding.

Expected behavior

  • CosmosCheckpointStorage.__init__ should accept allowed_checkpoint_types: list[str] | None = None as a keyword-only argument, with the same format and semantics as FileCheckpointStorage.
  • CosmosCheckpointStorage should forward the stored allowed-types frozenset to decode_checkpoint_value on every load path, matching FileCheckpointStorage. A user who swaps FileCheckpointStorage(...) for CosmosCheckpointStorage(...) with the same allowed_checkpoint_types should observe the same load behavior for the same checkpoint contents.
  • Behavior should be consistent across all load code paths on CosmosCheckpointStorage (load, list_checkpoints, get_latest, and any internal helpers that call _document_to_checkpoint).

Suggested implementation

Mirror the File pattern:

  1. In CosmosCheckpointStorage.__init__ (python/packages/azure-cosmos/agent_framework_azure_cosmos/_checkpoint_storage.py), add a keyword-only allowed_checkpoint_types: list[str] | None = None parameter and store it as self._allowed_types: frozenset[str] = frozenset(allowed_checkpoint_types or []).
  2. Change _document_to_checkpoint from a @staticmethod to an instance method and pass allowed_types=self._allowed_types into decode_checkpoint_value. Update all call sites accordingly.
  3. Update the class docstring to describe allowed_checkpoint_types alongside the existing authentication and container setup notes, pointing at the same Learn docs section as FileCheckpointStorage.

Note: the error message raised by _RestrictedUnpickler currently names FileCheckpointStorage.allowed_checkpoint_types as the canonical example. Once this change lands, consider either generalizing that message to reference the user's actual storage class or leaving it as-is and accepting File as the documented example.

Tests to add

Under the azure-cosmos package tests, add coverage that parallels the existing File tests in packages/core/tests/workflow/test_checkpoint_unrestricted_pickle.py and test_checkpoint.py:

  • Built-in safe set still loads without opt-in. A checkpoint whose state uses only JSON-native types, datetime, uuid, Decimal, common collections, and agent_framework.* types loads cleanly with no allowed_checkpoint_types configured.
  • Application types require opt-in. Save a checkpoint whose state contains an application-defined class; loading it without allowed_checkpoint_types raises WorkflowCheckpointException. Passing that type's "module:qualname" via allowed_checkpoint_types makes the load succeed and return an instance of the expected class.
  • All load paths are covered. Exercise load, list_checkpoints, and get_latest so a future regression in any single path is caught.
  • No regressions in existing Cosmos checkpoint tests.

Docs follow-up

Once this ships, the Learn docs page at agent-framework/workflows/checkpoints.md in MicrosoftDocs/semantic-kernel-pr should be updated so the "Pickle serialization" subsection describes allowed_checkpoint_types as available on both providers. A separate docs PR is already in flight to add CosmosCheckpointStorage to that page; this follow-up can be rolled into a subsequent docs update once the Python change lands.

Acceptance criteria

  • CosmosCheckpointStorage.__init__ accepts allowed_checkpoint_types: list[str] | None = None with the same "module:qualname" format as FileCheckpointStorage.
  • All CosmosCheckpointStorage load paths forward the stored allowed-types frozenset into decode_checkpoint_value.
  • New tests cover the built-in-safe-set case, the application-type opt-in case, and all load paths.
  • No regressions in existing Cosmos checkpoint tests.
  • Class docstring updated to describe allowed_checkpoint_types.
  • Changelog entry noting the parity change so existing Cosmos checkpoint users are aware they may need to pass application types via allowed_checkpoint_types when upgrading.

Migration note

This is a behavior change for existing CosmosCheckpointStorage users who store application-defined types in their checkpoints. After this change, those loads will raise WorkflowCheckpointException until the application passes its types via allowed_checkpoint_types. The exception message from _RestrictedUnpickler tells users exactly which key to add, which should keep the migration straightforward, and the changelog entry should call this out explicitly with a one-liner example.

Metadata

Metadata

Assignees

Labels

pythonworkflowsRelated to Workflows in agent-framework

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions