Skip to content

bug: BidiStream SIGTERM causes CPU hot loop and pod stuck in Terminating #175

@brightsparc

Description

@brightsparc

When a pod running the Restate Python SDK receives SIGTERM during an active BidiStream invocation, the worker container enters a CPU-burning hot loop (~82% CPU) and never exits. We observed a pod stuck in Terminating state for 3.5+ hours.

Root Cause

Two interacting bugs in the receive channel and polling loop:

  1. ReceiveChannel.__call__() blocks forever after disconnect — Once the http.disconnect event is consumed from the queue, subsequent calls to __call__() block forever on self._queue.get() since no more events will ever arrive.

  2. notify_input(b'') creates a tight loop — ASGI servers send empty body frames (b'') during teardown. create_poll_or_cancel_coroutine() passes these to vm.notify_input(b''), which has no useful work to do and immediately returns DoProgressReadFromInput. Since the queue already has items, await self.receive() returns instantly, creating a synchronous tight loop with no real await points.

Fix

  • ReceiveChannel.__call__(): Return synthetic http.disconnect when queue is empty and _disconnected is set
  • create_poll_or_cancel_coroutine(): Skip notify_input() for empty body frames
  • leave(): Add 30s timeout to block_until_http_input_closed() as a safety net

Affected Version

restate-sdk-python/0.15.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions