Resumed workflow checkpoints dropping ancestry #4592
Closed
davidahmann
started this conversation in
General
Replies: 1 comment
-
|
We've got an issue that you filed here: #4588. Closing as a duplicate. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem observed
When a workflow resumed from a stored checkpoint, the next newly written checkpoint did not point back to the restored checkpoint. The runner restored state correctly, but it seeded
previous_checkpoint_idasNonefor the resumed execution path.Why it matters operationally
Checkpoint lineage is part of the audit trail for long-running or restarted workflows. If the first resumed checkpoint loses its parent link, operators can no longer explain the exact resume boundary from persisted state alone.
Minimal repro
Fix approach
The runner now remembers the restored checkpoint ID when
restore_from_checkpoint()succeeds and uses it as the parent for the first resumed checkpoint. That parent pointer is cleared again after successful completion so later fresh runs are unaffected.Validation evidence
uv run pytest packages/core/tests/workflow/test_checkpoint.py -k 'test_workflow_checkpoint_chaining_via_previous_checkpoint_id or test_resumed_workflow_keeps_previous_checkpoint_id_chain'passedOpen follow-up question for maintainers
Is there any other resume entry point besides
restore_from_checkpoint()that should preserve the same ancestry contract so we can keep the Python and .NET behavior aligned?Inspired by research context: CAISI publishes independent, reproducible AI agent governance research: https://caisi.dev
Beta Was this translation helpful? Give feedback.
All reactions