Skip to content

fix: wait for the Runner to persist a step's events before the ADK flow's next step (sequential-tool-execution race)#1234

Merged
copybara-service[bot] merged 1 commit into
mainfrom
test_921989444
Jun 3, 2026
Merged

fix: wait for the Runner to persist a step's events before the ADK flow's next step (sequential-tool-execution race)#1234
copybara-service[bot] merged 1 commit into
mainfrom
test_921989444

Conversation

@copybara-service
Copy link
Copy Markdown

@copybara-service copybara-service Bot commented May 28, 2026

fix: wait for the Runner to persist a step's events before the ADK flow's next step (sequential-tool-execution race)

BaseLlmFlow.run() builds each step's request from the session, but the
Runner persists events asynchronously downstream of the flow. A slow
appendEvent could let the next step start from a stale session missing the
prior step's events (e.g. a tool's function response), making the model re-call
the tool or hallucinate its result.

The Runner stays the sole appendEvent caller and the flow waits: it calls
PersistBarrier.markPersisted(id) after each append (or markFailed(id, error)
if it fails), and the flow calls PersistBarrier.awaitPersisted(stepEvents)
between steps. The barrier is a reactive per-event signal in the shared
InvocationContext.callbackContextData and never blocks a thread; Contents is
unchanged. PersistBarrier.enable(), called by the Runner, keeps
awaitPersisted a no-op when the flow runs without a Runner.

Each event id maps to a CompletableSubject: pending until its append finishes,
then terminally completed or failed. The subject retains its terminal state, so
awaitPersisted/mark* may happen in any order and a late await -- e.g. at a
higher flow level across an agent transfer -- resolves immediately. If an append
fails, the matching await fails rather than blocking forever. It is thread-safe
and lock-free: markPersisted/markFailed may run off-thread when an async
appendEvent completes, and ConcurrentHashMap.computeIfAbsent hands both sides
the same subject.

@copybara-service copybara-service Bot force-pushed the test_921989444 branch 4 times, most recently from eff8f8d to f5536cb Compare June 2, 2026 11:33
@copybara-service copybara-service Bot changed the title fix: Fix ADK Runner race condition for sequential tool execution fix: wait for the Runner to persist a step's events before the ADK flow's next step (sequential-tool-execution race) Jun 2, 2026
@copybara-service copybara-service Bot force-pushed the test_921989444 branch 6 times, most recently from c869602 to 1e004b6 Compare June 3, 2026 08:25
…ow's next step (sequential-tool-execution race)

`BaseLlmFlow.run()` builds each step's request from the session, but the
`Runner` persists events asynchronously downstream of the flow. A slow
`appendEvent` could let the next step start from a stale session missing the
prior step's events (e.g. a tool's function response), making the model re-call
the tool or hallucinate its result.

The `Runner` stays the sole `appendEvent` caller and the flow waits: it calls
`PersistBarrier.markPersisted(id)` after each append (or `markFailed(id, error)`
if it fails), and the flow calls `PersistBarrier.awaitPersisted(stepEvents)`
between steps. The barrier is a reactive per-event signal in the shared
`InvocationContext.callbackContextData` and never blocks a thread; `Contents` is
unchanged. `PersistBarrier.enable()`, called by the `Runner`, keeps
`awaitPersisted` a no-op when the flow runs without a `Runner`.

Each event id maps to a `CompletableSubject`: pending until its append finishes,
then terminally completed or failed. The subject retains its terminal state, so
`awaitPersisted`/`mark*` may happen in any order and a late await -- e.g. at a
higher flow level across an agent transfer -- resolves immediately. If an append
fails, the matching await fails rather than blocking forever. It is thread-safe
and lock-free: `markPersisted`/`markFailed` may run off-thread when an async
`appendEvent` completes, and `ConcurrentHashMap.computeIfAbsent` hands both sides
the same subject.

PiperOrigin-RevId: 925858188
@copybara-service copybara-service Bot merged commit 0a40557 into main Jun 3, 2026
@copybara-service copybara-service Bot deleted the test_921989444 branch June 3, 2026 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant