feat(temporal): opt-in continue-as-new for long-lived agent workflows#447
Merged
Conversation
5d63a08 to
4170651
Compare
4170651 to
891ef6d
Compare
891ef6d to
1fb74c4
Compare
1fb74c4 to
22e7358
Compare
22e7358 to
ad68bd8
Compare
ad68bd8 to
65ab89a
Compare
43f62c2 to
9d71bb7
Compare
467b202 to
80ab955
Compare
1069028 to
6350513
Compare
6562c2d to
f480ffd
Compare
Long-lived chat/session agents run as a single Temporal workflow that stays open indefinitely. Two things killed them: event history grows past Temporal's ~50k-event / 50MB limit, and the 24h execution-timeout default terminated the whole chain. This adds an opt-in continue-as-new pattern on BaseWorkflow and defaults the execution timeout to infinite. Opt-in by adoption: an agent gets recycling only by calling run_until_complete from its @workflow.run instead of a bare wait_condition(timeout=None). No flag, no patch gate (no in-flight long-running workflows to preserve; Worker Versioning + upgrade-on-continue-as-new is the path for evolving these later). BaseWorkflow helpers: - run_until_complete(*args, is_complete, timeout=None): keep the workflow open and recycle history when Temporal suggests it. Optional timeout caps the wait (default None = wait indefinitely). - should_continue_as_new(): recycle when workflow.info().is_continue_as_new_suggested(). - drain_and_continue_as_new(): drain all_handlers_finished and re-check completion before workflow.continue_as_new. - is_continued_run(): gate one-time @workflow.run prologue (e.g. a welcome message, state rehydration) so it doesn't repeat on each recycle. Execution timeout: WORKFLOW_EXECUTION_TIMEOUT_SECONDS now defaults to None (no execution timeout; None/0/negative -> execution_timeout=None) — the workflow-level lifetime cap, configurable per deployment. It is chain-wide (continue-as-new does not reset it), so leaving it unset is required for a forever-chat. State restoration after a recycle is framework-specific and left to follow-up PRs, one per integration. 000_hello_acp adopts the pattern and gates its one-time welcome behind is_continued_run() so it isn't re-emitted on recycle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f480ffd to
ce7dc41
Compare
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Long-lived chat/session agents run as a single Temporal workflow that stays open indefinitely. Two things killed them:
This adds an opt-in continue-as-new pattern to
BaseWorkflowso a session can stay open forever, and defaults the execution timeout to infinite so a long-lived chat isn't capped at 24h.Design
Opt-in by adoption, no flag, no patch gate. An agent gets recycling only by calling
run_until_completefrom its@workflow.runinstead of a barewait_condition(timeout=None). Agents that keep the old wait are untouched. (Noworkflow.patched()gate: we have no in-flight long-running workflows to preserve, and per Temporal guidance the right tool for evolving these "pure entity" workflows later is Worker Versioning + upgrade-on-continue-as-new — tracked in AGX1-420.)BaseWorkflowhelpers (src/agentex/lib/core/temporal/workflows/workflow.py):run_until_complete(*args, is_complete, timeout=None)— keeps the workflow open and recycles history when Temporal suggests it. Optionaltimeoutcaps the wait (defaultNone= wait forever). Workflow-level lifetime cap is the execution timeout (WORKFLOW_EXECUTION_TIMEOUT_SECONDS, infinite by default).should_continue_as_new()— recycle whenworkflow.info().is_continue_as_new_suggested()(Temporal owns the threshold).drain_and_continue_as_new()— waitsall_handlers_finished(so an in-flight turn isn't lost) and re-checks completion beforeworkflow.continue_as_new.is_continued_run()— gate one-time@workflow.runprologue (welcome message, state rehydration) so it doesn't repeat on each recycle.Execution timeout (
environment_variables.py+temporal_task_service.py+temporal_client.py):WORKFLOW_EXECUTION_TIMEOUT_SECONDSnow defaults to None = no execution timeout (None/0/negative →execution_timeout=None). It's chain-wide (continue-as-new does NOT reset it), so capping it would still kill a forever-chat.Scope (deliberately concise)
Just the pattern + the timeout default. Restoring state after a recycle is framework-specific (rebuild from
adk.messages, anadk.statesnapshot, or a framework's own memory like a LangGraph checkpointer / Pydantic AI history) and is left to follow-up PRs, one per integration. The only example touched is000_hello_acp, which keeps no cross-turn state — it adopts the pattern and gates its one-time welcome behindis_continued_run()so it isn't re-emitted on recycle.Follow-ups (Temporal-team feedback)
Verification
should_continue_as_new,is_continued_run) —tests/lib/core/temporal/test_base_workflow_continue_as_new.py.py_compile+ruff+pyrightclean.drain_and_continue_as_newagainst a Temporal test server.🤖 Generated with Claude Code
Greptile Summary
This PR adds opt-in Temporal continue-as-new support for long-lived workflows. The main changes are:
BaseWorkflowhelpers for recycle decisions, handler draining, continued-run detection, and long waits.000_hello_acpTemporal example now opts intorun_until_completeand skips its one-time welcome on recycled runs.Confidence Score: 5/5
The changes are narrowly scoped to opt-in workflow recycling helpers and timeout configuration behavior, with no reported correctness or security issues.
The implementation is covered by focused unit tests for the pure helper behavior, and the touched example adopts the new pattern without adding persistent cross-turn state requirements.
What T-Rex did
Reviews (18): Last reviewed commit: "feat(temporal): opt-in continue-as-new f..." | Re-trigger Greptile