Skip to content

Merge upstream cloudflare/agents PR #1559: pre-stream chat recovery#2

Closed
rwdaigle wants to merge 7 commits into
mainfrom
claude/merge-upstream-pr-1559-ckfEz
Closed

Merge upstream cloudflare/agents PR #1559: pre-stream chat recovery#2
rwdaigle wants to merge 7 commits into
mainfrom
claude/merge-upstream-pr-1559-ckfEz

Conversation

@rwdaigle
Copy link
Copy Markdown
Owner

Summary

Brings in the changes from upstream cloudflare/agents#1559 (feat/pre-stream-recovery).

The upstream PR adds early chat-turn recovery snapshots so interrupted turns can be recovered before any stream metadata or chunks exist. It introduces a ChatRecoveryOptions.retry field that enables recovery to retry the latest unanswered user message rather than continuing a partial assistant response. The behavior is applied consistently across @cloudflare/think and @cloudflare/ai-chat.

PR commits included

  • b2c347a feat: Stash chat metadata for recovery before inference starts
  • bcdee9c CI merge fixes
  • b464554 Update packages/think/src/think.ts

The merged upstream branch also carries the following commits that landed on cloudflare/agents:main after our fork point and were pulled into the PR branch via its merge-from-main:

Diff scope

64 files changed, ~6190 insertions / ~189 deletions. The merge applied cleanly with no conflicts.

Test plan

  • Install deps and run the workspace build
  • Run package test suites (@cloudflare/think, @cloudflare/ai-chat, agents)
  • Spot-check the new packages/agents/src/chat/recovery.ts and recovery flow integration
  • Sanity-check the new examples/chat-sdk-messenger example (came in via the merged main commits)

Generated by Claude Code

cjol and others added 7 commits May 19, 2026 17:06
- Add early stashing of chat fiber snapshots before model inference begins, allowing interrupted pre-stream turns to be reconciled by chat recovery.
- Introduce `retry: true` as a ChatRecoveryOptions field for retrying an interrupted turn against the existing unanswered user message when no partial assistant message exists.
…are#1563)

* Add Chat SDK messenger example

Demonstrates Chat SDK ingress on Agents with subagent-backed state and Think-owned conversation replies.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Stream Chat SDK messenger replies

Adds Think chat streaming with RPC-safe cancellation so messenger delivery failures can stop the corresponding sub-agent turn.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Add managed fiber jobs

Introduce managed fiber jobs on top of runFiber so agents can durably accept idempotent background work, inspect retained status, cancel running jobs, explicitly resolve interrupted jobs, and record recovery policy decisions. This adds the cf_agents_fibers ledger, schema v8 migration, status/list/delete/resolve APIs, cooperative cancellation signals, and waitForCompletion support that waits on terminal ledger state instead of only the callback promise.

Tighten crash recovery semantics for managed work by reconciling stale run rows, recovering ledger-only pending/running rows, skipping recovery for already-terminal fibers, settling setup failures, and letting onFiberRecovered return a FiberRecoveryResult to move interrupted fibers to completed, error, aborted, or intentionally interrupted. The implementation also tracks active managed executions and terminal waiters so duplicate requests can join in-memory work when possible while post-restart retries drive the same recovery path.

Use the new managed fiber API in the Chat SDK messenger example for AI replies. Telegram messages now get a stable per-message idempotency boundary, completion waiting preserves Chat SDK per-thread visible reply serialization, and recovery policy is explicit: accepted replies are replayed while mid-stream interruptions post a concise apology and settle the retained job.

Expand coverage across unit, sub-agent, schema, and real eviction tests. The E2E harness now starts wrangler dev with persisted SQLite state, kills it mid-managed-fiber, restarts it, and verifies interrupted retention, recovery-result settlement, duplicate waitForCompletion retries after restart, and sub-agent managed fiber recovery through the parent alarm.

Document the new durable job surface in the Agent and durable execution docs, including waitForCompletion, cancellation behavior, retained terminal records, explicit recovery outcomes, and how this differs from Think message admission.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Polish managed fiber cleanup API

Rename the public managed-fiber terminal timestamp from completedAt to settledAt, and rename the cleanup filter from completedBefore to settledBefore. These names better describe terminal rows across completed, error, aborted, and interrupted states while keeping the existing SQLite completed_at column internal.

Make default deleteFibers() cleanup preserve interrupted rows. Interrupted managed fibers often need inspection or explicit application-level resolution, so callers must now opt in to deleting them by passing status: "interrupted".

Clarify FiberContext.snapshot documentation so it does not imply callbacks are automatically re-entered with recovered snapshots; recovery snapshots are delivered through onFiberRecovered(). Add a regression test that default cleanup deletes completed rows while preserving interrupted rows, then verifies explicit interrupted cleanup still works.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Document managed fiber adoption patterns

Add practical guidance for using managed fibers around webhook-style application jobs, including retained cleanup with settledBefore, interrupted recovery, resolveFiber, and waitForCompletion behavior.

Clarify the boundary between Think submissions and managed fibers across the Think docs, package README, server-driven messaging docs, webhook docs, and examples so users can distinguish durable Think turn admission from app-owned side-effect jobs.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix PR install after main package bumps

Use the workspace dependency for the Chat SDK messenger example's Think package so npm ci can resolve the merged branch after main's version-package release.

Always run npm ci in the shared GitHub install action while relying on setup-node's npm package cache, avoiding stale node_modules cache hits that can mask lockfile drift.

Co-authored-by: Cursor <cursoragent@cursor.com>

* Fix managed fiber review issues

Correct the malformed Think changeset frontmatter so Changesets can parse the release metadata.

Ensure waitForCompletion waits for a terminal managed fiber status even when duplicate calls race with an already-running recovery pass, and cover the race with a regression test. Also document and test the Chat SDK state adapter's list-level TTL behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ata for recovery before inference starts

Merges the changes from cloudflare#1559 (feat/pre-stream-recovery)
which adds early chat-turn recovery snapshots so interrupted turns can be recovered before
any stream metadata or chunks exist. Introduces ChatRecoveryOptions.retry to retry the latest
unanswered user message instead of continuing a partial assistant response.

PR commits included:
- b2c347a feat: Stash chat metadata for recovery before inference starts
- bcdee9c CI merge fixes
- b464554 Update packages/think/src/think.ts

Also includes upstream main commits merged into the PR branch (cloudflare#1561, cloudflare#1563).
@rwdaigle rwdaigle closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants