-
Notifications
You must be signed in to change notification settings - Fork 11
Description
When a pod running the Restate Python SDK receives SIGTERM during an active BidiStream invocation, the worker container enters a CPU-burning hot loop (~82% CPU) and never exits. We observed a pod stuck in Terminating state for 3.5+ hours.
Root Cause
Two interacting bugs in the receive channel and polling loop:
-
ReceiveChannel.__call__()blocks forever after disconnect — Once thehttp.disconnectevent is consumed from the queue, subsequent calls to__call__()block forever onself._queue.get()since no more events will ever arrive. -
notify_input(b'')creates a tight loop — ASGI servers send empty body frames (b'') during teardown.create_poll_or_cancel_coroutine()passes these tovm.notify_input(b''), which has no useful work to do and immediately returnsDoProgressReadFromInput. Since the queue already has items,await self.receive()returns instantly, creating a synchronous tight loop with no real await points.
Fix
ReceiveChannel.__call__(): Return synthetichttp.disconnectwhen queue is empty and_disconnectedis setcreate_poll_or_cancel_coroutine(): Skipnotify_input()for empty body framesleave(): Add 30s timeout toblock_until_http_input_closed()as a safety net
Affected Version
restate-sdk-python/0.15.0