[Postgres] Use primary key for REPLICA IDENTITY FULL tables

For replicating data from Postgres to bucket storage, we keep track of the "replica identity" of the row to identify unique rows, and this may be different from the synced id (synced id may contain duplicates in theory). Essentially, we need a set of columns for each table, that combined form an unique identifier for the table.

Currently, we use the Postgres [replica identity](https://www.postgresql.org/docs/current/logical-replication-publication.html#LOGICAL-REPLICATION-PUBLICATION-REPLICA-IDENTITY) columns pretty-much as-is, which can be one of:
1. Primary key (single column or compound primary key).
2. Unique index (single column or compound index).
3. Replica identity full (uses all columns).
4. Nothing (only supports inserts).

Note that the replica identity of a row may change, in which case PowerSync treats it as a delete + insert.

The current logic is here: https://github.com/powersync-ja/powersync-service/blob/b457d92e491b11e1bed20b2fb801994b15b7f97d/modules/module-postgres/src/replication/replication-utils.ts#L59

The specific case of _replica identity full_ is not optimal at the moment:
1. Generated columns may case consistency issues (see #379).
2. _Every_ update is treated as a delete+insert, which doubles the number of sync operations.
3. The replica identity is large, which may make the processing less efficient.

It is common to use _replica identity full_ for other logical replication consumers, because it avoids all issues around TOAST columns.

## Proposal

If a table is configured with _replica identity full_ but has a primary key defined, use the primary key for PowerSync's replica identity instead.

We need to take care to not break replication due to changes on the replica identity for existing data. There is already logic to cater for replica identity changes by re-replicating the table, but that should not be triggered just by an service upgrade. We could either:
1. Keep the existing replica identity columns if it is already computed defined.
2. Add an "replication logic version" to control the behavior, only using the new version for new sync rule deploys.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Postgres] Use primary key for REPLICA IDENTITY FULL tables #397

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Postgres] Use primary key for REPLICA IDENTITY FULL tables #397

Description

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions