-
Notifications
You must be signed in to change notification settings - Fork 30
Description
For replicating data from Postgres to bucket storage, we keep track of the "replica identity" of the row to identify unique rows, and this may be different from the synced id (synced id may contain duplicates in theory). Essentially, we need a set of columns for each table, that combined form an unique identifier for the table.
Currently, we use the Postgres replica identity columns pretty-much as-is, which can be one of:
- Primary key (single column or compound primary key).
- Unique index (single column or compound index).
- Replica identity full (uses all columns).
- Nothing (only supports inserts).
Note that the replica identity of a row may change, in which case PowerSync treats it as a delete + insert.
The current logic is here:
| export async function getReplicationIdentityColumns( |
The specific case of replica identity full is not optimal at the moment:
- Generated columns may case consistency issues (see [Postgres] Generated columns #379).
- Every update is treated as a delete+insert, which doubles the number of sync operations.
- The replica identity is large, which may make the processing less efficient.
It is common to use replica identity full for other logical replication consumers, because it avoids all issues around TOAST columns.
Proposal
If a table is configured with replica identity full but has a primary key defined, use the primary key for PowerSync's replica identity instead.
We need to take care to not break replication due to changes on the replica identity for existing data. There is already logic to cater for replica identity changes by re-replicating the table, but that should not be triggered just by an service upgrade. We could either:
- Keep the existing replica identity columns if it is already computed defined.
- Add an "replication logic version" to control the behavior, only using the new version for new sync rule deploys.