Skip to content

feat(runner): HITL via long-running interrupts with history rehydra…#960

Merged
wolo-lab merged 4 commits into
v2from
wolo/cl-workflow
Jun 8, 2026
Merged

feat(runner): HITL via long-running interrupts with history rehydra…#960
wolo-lab merged 4 commits into
v2from
wolo/cl-workflow

Conversation

@wolo-lab
Copy link
Copy Markdown

@wolo-lab wolo-lab commented Jun 3, 2026

Workflow-engine support for human-in-the-loop (HITL), built on a single
mechanism — history rehydration — matching adk-python. Paused run state is
reconstructed from session events each turn; there is no persisted run-state
blob and no PendingRequest field.

What changed

  • scheduler
    • Per-event back-pressure handshake: a non-partial function-response is
      persisted before the node's flow rebuilds the next model request, fixing a
      non-deterministic race where the model re-issued the same tool call.
      Mirrors adk-python's enqueue_event/processed_signal.
    • Pause a node on accumulated Event.LongRunningToolIDs (a RequestInput
      pause rides on them) — no synthetic pause event.
    • Stamp NodeInfo.Path = node name on static node events so rehydration can
      attribute interrupts back to their node (dynamic children fold into their
      static ancestor).
  • persistenceReconstructRunState ports adk-python's
    _reconstruct_node_states + _infer_node_state: per-node scan (interrupts,
    resolved user responses, schemas, output), status inference
    (WAITING / PENDING+ResumedInputs re-entry / COMPLETED+Output
    handoff), backward-edge predecessor input, and schema validation on the
    surviving (last-wins) response. Removes LoadRunState / NewRunStateEvent /
    RunStateSessionKey.
  • resume — a single path over the rehydrated state, gated on the current
    turn's responses for idempotency; already-run handoff successors are skipped
    (RunState.completed). Removes the separate PendingRequest loop.
  • stateNodeState.Interrupts + unexported interruptSchemas;
    RunState.completed; HasWaiting. No PendingRequest, no persisted blob.
  • workflowagentdetectResume uses ReconstructRunState and surfaces
    reconstruction (schema-validation) errors instead of silently falling
    through to a fresh Run.
    A node may now raise multiple interrupts per activation.

@wolo-lab wolo-lab requested a review from hanorik June 3, 2026 17:03
@wolo-lab wolo-lab marked this pull request as ready for review June 3, 2026 17:04
@wolo-lab wolo-lab self-assigned this Jun 3, 2026
@wolo-lab wolo-lab force-pushed the wolo/cl-workflow branch from 418382f to 293628e Compare June 4, 2026 20:18
…tion

Workflow-engine support for human-in-the-loop, unified on a single
mechanism — history rehydration — matching adk-python (no persisted
run-state event, no PendingRequest field).

- scheduler: per-event back-pressure handshake (a non-partial
  function-response is persisted before the node's flow rebuilds the
  next model request, fixing a non-deterministic re-issue race); pause
  a node on accumulated Event.LongRunningToolIDs (RequestInput rides on
  them); stamp NodeInfo.Path = node name on static node events so
  rehydration can attribute interrupts (dynamic children fold into
  their static ancestor).
- persistence: ReconstructRunState ports adk-python's
  _reconstruct_node_states + _infer_node_state — per-node scan
  (interrupts, resolved user responses, schemas, output), status
  inference (WAITING / PENDING+ResumedInputs re-entry /
  COMPLETED+Output handoff), backward-edge predecessor input, and
  schema validation on the surviving (last-wins) response.
- resume: single path over the rehydrated state, gated on the current
  turn's responses for idempotency; already-run handoff successors are
  skipped (RunState.completed).
- state: NodeState.Interrupts + unexported interruptSchemas;
  RunState.completed; HasWaiting. No PendingRequest, no persisted
  run-state blob.
- workflowagent: detectResume uses ReconstructRunState and surfaces
  reconstruction (schema-validation) errors.

A node may raise multiple interrupts per activation. workflow and
workflowagent suites pass with -race.
…, Routes)

AppendEvent (in-memory) and the database storage layer dropped Event's
workflow fields when persisting: the in-memory copy omitted NodeInfo,
RequestedInput and Routes, and the database layer never serialized
NodeInfo or RequestedInput. History-based resume attributes interrupts
by NodeInfo.Path, so losing it broke HITL resume — a RequestInput
workflow (e.g. examples/workflow/hitl_simple) would re-prompt instead of
continuing after the reply.

Persist all three fields in both backends and add round-trip regression
tests for each.
wolo-lab added a commit that referenced this pull request Jun 5, 2026
…ution

Add NodeInfo.OutputFor: the node paths an event's Output counts for —
the emitter plus any WithUseAsOutput delegating ancestors. A delegating
child's single event is stamped OutputFor=[child, parent, ...] and flows
up, and the parent no longer re-emits a duplicate terminal output event
(full suppression, matching adk-python's _output_delegated + output_for).
Resume attributes a descendant's output to its delegating ancestors via
OutputFor. Every output event records OutputFor (own path minimum),
mirroring adk-python _enrich_event.

Built on the temp integration branch (#960 + #920 + #966); rebase onto v2
once those merge.
wolo-lab added a commit that referenced this pull request Jun 5, 2026
…ution

Add NodeInfo.OutputFor: the node paths an event's Output counts for —
the emitter plus any WithUseAsOutput delegating ancestors. A delegating
child's single event is stamped OutputFor=[child, parent, ...] and flows
up, and the parent no longer re-emits a duplicate terminal output event
(full suppression, matching adk-python's _output_delegated + output_for).
Resume attributes a descendant's output to its delegating ancestors via
OutputFor. Every output event records OutputFor (own path minimum),
mirroring adk-python _enrich_event.

Built on the temp integration branch (#960 + #920 + #966); rebase onto v2
once those merge.
wolo-lab added a commit that referenced this pull request Jun 8, 2026
…ution

Add NodeInfo.OutputFor: the node paths an event's Output counts for —
the emitter plus any WithUseAsOutput delegating ancestors. A delegating
child's single event is stamped OutputFor=[child, parent, ...] and flows
up, and the parent no longer re-emits a duplicate terminal output event
(full suppression, matching adk-python's _output_delegated + output_for).
Resume attributes a descendant's output to its delegating ancestors via
OutputFor. Every output event records OutputFor (own path minimum),
mirroring adk-python _enrich_event.

Built on the temp integration branch (#960 + #920 + #966); rebase onto v2
once those merge.
Two resume-correctness fixes for dynamic orchestrators and HITL.

1. Cross-resume dedup. A dynamic node body re-runs from the top on
   resume, so every RunNode before the pause point would re-execute its
   child. rehydrateCache rebuilds the sub-scheduler's resultByPath from
   session events (child terminal events carry NodeInfo.Path + Output),
   so completed children with a stable WithRunID are served from cache.
   Mirrors adk-python's _rehydrate_from_events / DynamicNodeScheduler.

2. Terminal handoff asker now resumes. Resume only bumped its scheduled
   counter per scheduled successor, so a single-asker workflow (no
   successors) wrongly returned ErrNothingToResume. A matched handoff
   asker now counts as an effective resume itself, gated on
   answeredThisTurn (from a per-interrupt resolvedCount during
   rehydration) so a duplicate resume stays an idempotent no-op.
@wolo-lab wolo-lab force-pushed the wolo/cl-workflow branch from 17aa0ce to 6861671 Compare June 8, 2026 08:13
wolo-lab added a commit that referenced this pull request Jun 8, 2026
End-to-end HITL coverage through a real Runner, kept separate from the
feature PR (#960) to keep it focused. Covers handoff round-trip,
re-entry resume, FunctionResponse routing by ID, and the dynamic
orchestrator dedup+HITL scenario (b/515644762): two children run
sequentially via RunNode; on resume the first is served from cache and
the second delivers the human response.
wolo-lab added a commit that referenced this pull request Jun 8, 2026
…ution

Add NodeInfo.OutputFor: the node paths an event's Output counts for —
the emitter plus any WithUseAsOutput delegating ancestors. A delegating
child's single event is stamped OutputFor=[child, parent, ...] and flows
up, and the parent no longer re-emits a duplicate terminal output event
(full suppression, matching adk-python's _output_delegated + output_for).
Resume attributes a descendant's output to its delegating ancestors via
OutputFor. Every output event records OutputFor (own path minimum),
mirroring adk-python _enrich_event.

Built on the temp integration branch (#960 + #920 + #966); rebase onto v2
once those merge.
wolo-lab added a commit that referenced this pull request Jun 8, 2026
…ution

Add NodeInfo.OutputFor: the node paths an event's Output counts for —
the emitter plus any WithUseAsOutput delegating ancestors. A delegating
child's single event is stamped OutputFor=[child, parent, ...] and flows
up, and the parent no longer re-emits a duplicate terminal output event
(full suppression, matching adk-python's _output_delegated + output_for).
Resume attributes a descendant's output to its delegating ancestors via
OutputFor. Every output event records OutputFor (own path minimum),
mirroring adk-python _enrich_event.

Built on the temp integration branch (#960 + #920 + #966); rebase onto v2
once those merge.
wolo-lab added a commit that referenced this pull request Jun 8, 2026
…ution

Add NodeInfo.OutputFor: the node paths an event's Output counts for —
the emitter plus any WithUseAsOutput delegating ancestors. A delegating
child's single event is stamped OutputFor=[child, parent, ...] and flows
up, and the parent no longer re-emits a duplicate terminal output event
(full suppression, matching adk-python's _output_delegated + output_for).
Resume attributes a descendant's output to its delegating ancestors via
OutputFor. Every output event records OutputFor (own path minimum),
mirroring adk-python _enrich_event.

Built on the temp integration branch (#960 + #920 + #966); rebase onto v2
once those merge.
@wolo-lab wolo-lab merged commit 9a413a2 into v2 Jun 8, 2026
3 checks passed
@wolo-lab wolo-lab deleted the wolo/cl-workflow branch June 8, 2026 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants