Summary
When df.wait_for_signal is wrapped inside a df.race branch, an external signal sent via df.signal(instance_id, name, data) is recorded on the parent instance but is never propagated to the child sub-orchestration that hosts the waiter. The signal branch therefore loses every race against any sibling that completes (timeout, sleep, sql), eliminating the canonical "approve / reject within N seconds" pattern.
Originally filed against v0.1.1 during bug bash; confirmed still present on HEAD of main (v0.2.1 development).
Repro
-- Race: signal branch (waits up to 20s) vs sleep branch (15s)
SELECT df.start_workflow(
df.race(
df.seq(df.wait_for_signal('approve', 20), df.sql('INSERT INTO audit VALUES (...);')),
df.seq(df.sleep(15), df.sql('INSERT INTO audit VALUES (...);'))
),
'some-instance-id'
);
-- Send signal 4s later
SELECT df.signal('some-instance-id', 'approve', '{}');
Expected: signal branch wins at T+4s, audit row written.
Actual: sleep branch wins at T+15s; signal branch is cancelled with parent dropped sub-orchestration future.
Control (works correctly)
df.seq(df.wait_for_signal('go', 60), df.sql(...))
The wait_for_signal mechanism itself is fine; only the race / sub-orchestration routing path is broken.
Evidence (duroxide.history of failing instance)
| event |
detail |
| 6 |
SubOrchestrationScheduled — signal branch as sub::6 |
| 7 |
SubOrchestrationScheduled — sleep branch as sub::7 |
| 8 (T+4s) |
ExternalEvent name='approve' data='{}' on the parent |
| 9 (T+15s) |
SubOrchestrationCompleted source=7 (sleep wins) |
| 11 |
SubOrchestrationCancelRequested source=6, reason=dropped_future |
| 12 |
SubOrchestrationFailed kind=Cancelled |
The parent's ExternalEvent is never routed to sub::6 where wait_for_signal('approve', 20) is blocked.
Root cause
Signal routing does not traverse the parent → child sub-orchestration boundary.
Impact
Breaks the canonical "approve / reject within N seconds" workflow. Any composition that places wait_for_signal inside race (or any other operator that lowers branches into sub-orchestrations) is silently broken. Workarounds (hoisting wait_for_signal to the top level and emulating race via timers) defeat the purpose of df.race.
Suggested fix
When df.signal writes an ExternalEvent for instance I, fan the event out to all live descendant sub-orchestration sessions of I whose pending WaitForSignal node has a matching signal_name. Candidate locations:
raise_external_event in src/client.rs — query duroxide for live children + matching waiters, raise to each.
- Or push the fan-out into the duroxide-pg-opt provider where parent/child relationships are already tracked.
Test coverage gap
Signal tests and race tests exist independently but no E2E test combines wait_for_signal inside a race branch:
- Signal-only:
tests/e2e/sql/07_signals.sql, tests/e2e/sql/11_cross_connection.sql
- Race-only:
tests/e2e/sql/01_core_primitives.sql, tests/e2e/sql/19_vars_in_join_race.sql, tests/e2e/sql/22_break_in_join_race.sql
A new E2E exercising the repro above should land alongside the fix.
Summary
When
df.wait_for_signalis wrapped inside adf.racebranch, an external signal sent viadf.signal(instance_id, name, data)is recorded on the parent instance but is never propagated to the child sub-orchestration that hosts the waiter. The signal branch therefore loses every race against any sibling that completes (timeout, sleep, sql), eliminating the canonical "approve / reject within N seconds" pattern.Originally filed against v0.1.1 during bug bash; confirmed still present on HEAD of main (v0.2.1 development).
Repro
Expected: signal branch wins at T+4s, audit row written.
Actual: sleep branch wins at T+15s; signal branch is cancelled with
parent dropped sub-orchestration future.Control (works correctly)
The
wait_for_signalmechanism itself is fine; only the race / sub-orchestration routing path is broken.Evidence (duroxide.history of failing instance)
SubOrchestrationScheduled— signal branch assub::6SubOrchestrationScheduled— sleep branch assub::7ExternalEvent name='approve' data='{}'on the parentSubOrchestrationCompleted source=7(sleep wins)SubOrchestrationCancelRequested source=6, reason=dropped_futureSubOrchestrationFailed kind=CancelledThe parent's
ExternalEventis never routed tosub::6wherewait_for_signal('approve', 20)is blocked.Root cause
Signal routing does not traverse the parent → child sub-orchestration boundary.
src/client.rs—raise_external_event()only delivers to the target instance; no child fan-out.src/orchestrations/execute_function_graph.rs—df.raceschedules each branch as its own sub-orchestration with a distinct duroxide instance ID viactx.schedule_sub_orchestration(...).src/orchestrations/execute_function_graph.rs—wait_for_signalcallsctx.schedule_wait(signal_name)inside the sub-orch context, so it only sees signals raised against that sub-orch's instance ID.Impact
Breaks the canonical "approve / reject within N seconds" workflow. Any composition that places
wait_for_signalinsiderace(or any other operator that lowers branches into sub-orchestrations) is silently broken. Workarounds (hoistingwait_for_signalto the top level and emulating race via timers) defeat the purpose ofdf.race.Suggested fix
When
df.signalwrites anExternalEventfor instanceI, fan the event out to all live descendant sub-orchestration sessions ofIwhose pendingWaitForSignalnode has a matchingsignal_name. Candidate locations:raise_external_eventinsrc/client.rs— query duroxide for live children + matching waiters, raise to each.Test coverage gap
Signal tests and race tests exist independently but no E2E test combines
wait_for_signalinside aracebranch:tests/e2e/sql/07_signals.sql,tests/e2e/sql/11_cross_connection.sqltests/e2e/sql/01_core_primitives.sql,tests/e2e/sql/19_vars_in_join_race.sql,tests/e2e/sql/22_break_in_join_race.sqlA new E2E exercising the repro above should land alongside the fix.