Skip to content

Support external replication slots in pw.io.postgres.read; add persistence support #222

@zxqfd555

Description

@zxqfd555

Is your feature request related to a problem? Please describe.

pw.io.postgres.read in "streaming" mode currently always creates a temporary replication slot internally. The slot is automatically dropped when the connection is closed. As stated in the documentation, this is an intentional guardrail against WAL accumulation. However, this design makes it impossible to use Pathway's persistence mechanism with the Postgres connector: since the slot is gone after restart, the connector cannot resume from where it left off and must perform a full re-snapshot on every startup.

Describe the solution you'd like

Add support for an externally managed (pre-created) replication slot in pw.io.postgres.read. When the user provides a slot name via a new parameter (e.g. replication_slot_name), the connector should:

  1. Use that slot as-is instead of creating a temporary one.
  2. Integrate with Pathway's persistence layer by storing the LSN of the last acknowledged WAL record as the connector's persistent offset. On restart, the connector should resume reading from that LSN rather than re-snapshotting.
  3. Emit a warning at startup that the lifecycle of the replication slot (creation, monitoring, and deletion) is the user's responsibility, and that an inactive or abandoned slot can cause unbounded WAL retention on the PostgreSQL server.

The existing behavior (temporary slot, no persistence) should remain the default when no slot name is provided.

Describe alternatives you've considered

  • Using pw.io.debezium.read with Kafka as a persistence-capable alternative — works, but introduces significant operational overhead (Kafka + Debezium) for use cases where a direct Postgres connection would suffice.
  • Absorbing the re-snapshot cost: acceptable for small tables but not for large ones where a full re-read on every restart is prohibitive.

Additional context

The documentation for pw.io.postgres.read currently contains the following note:

"There is no need to create a replication slot manually, and doing so is strongly discouraged."

This feature request proposes making the externally-managed slot an opt-in mode with explicit user acknowledgment (via the warning), rather than removing the guardrail from the default path. The LSN-based offset aligns naturally with how PostgreSQL logical replication tracks progress via confirmed_flush_lsn in pg_replication_slots.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions