Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no REPLICA IDENTITY error #537

Closed
rsquaretrade opened this issue Jun 23, 2023 · 20 comments
Closed

no REPLICA IDENTITY error #537

rsquaretrade opened this issue Jun 23, 2023 · 20 comments
Assignees
Labels

Comments

@rsquaretrade
Copy link

Hello. This might be a duplicate issue, but I'd like to post it. Today, I migrated a partitioned table from inheritance to native, the set up is this: the table in OLTP is NOT partitioned, it is replicated to the warehouse where the table is partitioned. I added a non-partitioned primary key in a template table, the control is a column_date, the child tables appear to be inheriting it. After the migration, error logs shows the following:

"the table "logical replication target relation ""public.table_name"" has neither REPLICA IDENTITY index nor PRIMARY KEY and published relation does not have REPLICA IDENTITY FULL"

Is there a workaround? Thank you in advance.

@keithf4
Copy link
Collaborator

keithf4 commented Jun 23, 2023

Can you double-check that all child tables have the primary key created? If you add the primary key to the template table after you create the partition set, it does not go back and create that primary key on any existing children, only new ones.

@rsquaretrade
Copy link
Author

rsquaretrade commented Jun 23, 2023

It is there, premake was only 1

warehouse=> \d+ public.audit_log_p2025
Table "public.audit_log_p2025"
Column │ Type │ Collation │ Nullable │ Default │ Storage │ Compression │ Stats target │ Description
─────────────────────┼──────────────────────────┼───────────┼──────────┼─────────┼──────────┼─────────────┼──────────────┼─────────────
id │ text │ │ not null │ │ extended │ │ │
audit_obj_id │ text │ │ not null │ │ extended │ │ │
change_time │ timestamp with time zone │ │ not null │ │ plain │ │ │
change_type_id │ text │ │ not null │ │ extended │ │ │
changed_by │ text │ │ │ │ extended │ │ │
changed_from_str │ text │ │ │ │ extended │ │ │
changed_to_str │ text │ │ │ │ extended │ │ │
changed_from_char │ character(1) │ │ │ │ extended │ │ │
changed_to_char │ character(1) │ │ │ │ extended │ │ │
changed_from_date │ timestamp with time zone │ │ │ │ plain │ │ │
changed_to_date │ timestamp with time zone │ │ │ │ plain │ │ │
comments │ text │ │ │ │ extended │ │ │
changed_from_number │ numeric(15,2) │ │ │ │ main │ │ │
changed_to_number │ numeric(15,2) │ │ │ │ main │ │ │
Partition of: audit_log_native_rollback FOR VALUES FROM ('2025-01-01 00:00:00-08') TO ('2026-01-01 00:00:00-08')
Partition constraint: ((change_time IS NOT NULL) AND (change_time >= '2025-01-01 00:00:00-08'::timestamp with time zone) AND (change_time < '2026-01-01 00:00:00-08'::timestamp with time zone))
Indexes:
"audit_log_p2025_pkey" PRIMARY KEY, btree (id)
"audit_log_p2025_audit_obj_id_change_time_idx" btree (audit_obj_id, change_time)
"audit_log_p2025_audit_obj_id_idx" btree (audit_obj_id)
"audit_log_p2025_change_time_idx" btree (change_time)
"audit_log_p2025_change_type_id_idx" btree (change_type_id)
Access method: heap

I found this website regarding this specific error with 3 solutions which is to : https://mydbanotebook.org/post/replication-key/

  1. Add a unique index, as a test, I added with 2 columns (id, change_time). Then alter table withuniqueindex replica identity using index withuniqueindex_c_key; (I will set up replication and see if this works).
  2. alter table badmodel replica identity full; (but I'm not sure if there are performance implications using this).
  3. add a primary key (i already added in the template table).

I appreciate any suggestions as well from you.

@keithf4
Copy link
Collaborator

keithf4 commented Jun 28, 2023

Any chance you could share either the publication/subscription commands you're using to create the logical set or, if it's already set up, the configuration settings for it? Trying to set this up the same way that you are so I can try to recreate the situation and see what the problem is.

Also, what version of PostgreSQL are you running?

@keithf4 keithf4 self-assigned this Jun 28, 2023
@rsquaretrade
Copy link
Author

rsquaretrade commented Jun 28, 2023

Replication has been set up awhile back, in postgresql.conf wal_level=replica, asynchronous. Basic set up is the table on the OLTP side is NOT partitioned, but the replicated table in the warehouse is partitioned. DB is on PG14

Publisher/Subscriber Set-up:
(1) Create a publication on Publisher.

SCHEMA=generic_schema
psql service=${SCHEMA}_prod -X <<DDL

CREATE PUBLICATION ${SCHEMA}_pub
FOR TABLE $SCHEMA.wm_item_file
, $SCHEMA.table1
, $SCHEMA.table2
, $SCHEMA.table3;

ALTER PUBLICATION ${SCHEMA}_pub OWNER TO replication;
DDL

database_name=> \dRp+

(2) Create subscription on Subscriber.

SCHEMA=schema_name
psql service=warehouse_prod -X <<DDL
CREATE SUBSCRIPTION ${SCHEMA}_sub
CONNECTION 'postgresql://replication@10.000.00.000:5432/database_name'
PUBLICATION ${SCHEMA}_pub;
DDL

(3) Create tables on subscriber using pg_dump

  • Add tables to publication

SCHEMA=schema_name
psql service=dbname_prod -X <<DDL

ALTER PUBLICATION ${SCHEMA}_pub
ADD TABLE $SCHEMA.table1
, $SCHEMA.table2
, $SCHEMA.table3;
DDL

(4) Alter subscription on subscriber database.

SCHEMA=schema_name
psql service=warehouse_prod -X <<DDL
ALTER SUBSCRIPTION ${SCHEMA}_sub REFRESH PUBLICATION ;
DDL

(5) Apply post-data section via pg_restore

@keithf4
Copy link
Collaborator

keithf4 commented Jun 28, 2023

So, I'm not able to get the same error that you're getting. I'm able to reproduce a similar one, but only if the published table is missing the primary key. And reading back on your error again, it does seem to indicate something similar. Do the tables on the publisher side have primary/unique keys?

SUBSCRIBER:

keith=# create table audit_log (id text, change_time timestamp with time zone) partition by range (change_time);

create table audit_log_template (like audit_log);

alter table audit_log_template add primary key (id);

select partman.create_parent('public.audit_log', 'change_time', 'native', '1 year', p_template_table := 'public.audit_log_template');
CREATE TABLE
CREATE TABLE
ALTER TABLE
 create_parent 
---------------
 t
(1 row)

keith=# \d+ audit_log
                                             Partitioned table "public.audit_log"
   Column    |           Type           | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
-------------+--------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 id          | text                     |           |          |         | extended |             |              | 
 change_time | timestamp with time zone |           |          |         | plain    |             |              | 
Partition key: RANGE (change_time)
Partitions: audit_log_p2019 FOR VALUES FROM ('2019-01-01 00:00:00-05') TO ('2020-01-01 00:00:00-05'),
            audit_log_p2020 FOR VALUES FROM ('2020-01-01 00:00:00-05') TO ('2021-01-01 00:00:00-05'),
            audit_log_p2021 FOR VALUES FROM ('2021-01-01 00:00:00-05') TO ('2022-01-01 00:00:00-05'),
            audit_log_p2022 FOR VALUES FROM ('2022-01-01 00:00:00-05') TO ('2023-01-01 00:00:00-05'),
            audit_log_p2023 FOR VALUES FROM ('2023-01-01 00:00:00-05') TO ('2024-01-01 00:00:00-05'),
            audit_log_p2024 FOR VALUES FROM ('2024-01-01 00:00:00-05') TO ('2025-01-01 00:00:00-05'),
            audit_log_p2025 FOR VALUES FROM ('2025-01-01 00:00:00-05') TO ('2026-01-01 00:00:00-05'),
            audit_log_p2026 FOR VALUES FROM ('2026-01-01 00:00:00-05') TO ('2027-01-01 00:00:00-05'),
            audit_log_p2027 FOR VALUES FROM ('2027-01-01 00:00:00-05') TO ('2028-01-01 00:00:00-05'),

PUBLISHER:

keith_pub=# create table audit_log (id text, change_time timestamp with time zone);
CREATE TABLE
keith_pub=# create publication audit_pub for table public.audit_log;
CREATE PUBLICATION

SUBSCRIBER:

keith=# create subscription audit_sub connection 'host=localhost user=keith port=5433 dbname=keith_pub' publication audit_pub;
NOTICE:  created replication slot "audit_sub" on publisher
CREATE SUBSCRIPTION

PUBLISHER:

keith_pub=# insert into audit_log values (22, now());
insert into audit_log values (44, now());
INSERT 0 1
INSERT 0 1
keith_pub=# delete from audit_log;
ERROR:  cannot delete from table "audit_log" because it does not have a replica identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.

SUBSCRIBER:

keith=# select * from audit_log;
 id |          change_time          
----+-------------------------------
 22 | 2023-06-28 17:54:17.676579-04
 44 | 2023-06-28 17:54:17.679962-04
(2 rows)

So you can see the setup worked, but I got an error if I tried to delete anything on the publication side. However, when I add a primary key there, then it works ok

keith_pub=# alter table audit_log add primary key (id);
ALTER TABLE
keith_pub=# delete from audit_log;
DELETE 2

However, this brings up another issue that I think you will have as well. Inserts seem to replicate ok, but deletes do not. Running select again, you can see they're still there
SUBSCRIBER:

keith=# select * from audit_log;
 id |          change_time          
----+-------------------------------
 22 | 2023-06-28 17:54:17.676579-04
 44 | 2023-06-28 17:54:17.679962-04

And trying to reinsert them again on the publisher works, but if you look in the subscriber logs, it's throwing a conflict error
PUBLISHER:

keith_pub=# insert into audit_log values (22, now());
INSERT 0 1

SUBSCRIBER logs:

2023-06-28 18:01:30.846 EDT [2023-06-28 18:01:30 EDT]  [716489] user=,db=,app=,err=00000,sess=649cadba.aeec9: LOG:  logical replication apply worker for subscription     "audit_sub" has started
2023-06-28 18:01:30.846 EDT [2023-06-28 18:01:30 EDT]  [716489] user=,db=,app=,err=00000,sess=649cadba.aeec9: DEBUG:  connecting to publisher using connection string     "host=localhost user=keith port=5433 dbname=keith_pub" 
2023-06-28 18:01:30.853 EDT [2023-06-28 18:01:30 EDT]  [716489] user=,db=,app=,err=23505,sess=649cadba.aeec9: ERROR:  duplicate key value violates unique constraint      "audit_log_p2023_pkey"
2023-06-28 18:01:30.853 EDT [2023-06-28 18:01:30 EDT]  [716489] user=,db=,app=,err=23505,sess=649cadba.aeec9: DETAIL:  Key (id)=(22) already exists.

Not quite sure how to fix this and not sure if it's a bug or not. I'm actually surprised the inserts even work. My understanding is that logical pub/sub replication required fully matching schemas on both sides for the intended tables. If you set this logical replication up with a matching partition set on the publication side, I imagine this would work fine.

My suggestion would be to make matching partition sets on both the publisher and subscriber. However, you can set the retention on the publisher side to be lower so you don't have to keep quite as much data around there.

@rsquaretrade
Copy link
Author

rsquaretrade commented Jun 28, 2023

Thank you for taking the time and effort. On the publisher end, there is definitely a primary key(id). The schema is public.
On the subscriber end which is partitioned, the primary key is in the template table. The schema is also public. The partition sets are identical with the exception of the primary key(id) being in the template table. And yes, the "REPLICA IDENTITY" error appears to be triggered by deletes and also updates.

I'll have to re-read your results as a second reading (or multiple) will shed some more light. Thanks again for your time.

@rsquaretrade
Copy link
Author

Hello. I neglected to tell you an important replication setting. Any chance you can test replication on your end?

wal_level=logical on OLTP is what is used for the logical extraction.
wal_level=replica is on DW, the subscriber.

@keithf4
Copy link
Collaborator

keithf4 commented Jul 3, 2023

This doesn't make any difference in this case. That controls the output of the WAL stream on the relevant system.

@rsquaretrade
Copy link
Author

rsquaretrade commented Jul 5, 2023

When you created the table in the subscriber, do you have to start with a fresh table (in which you did) or can I use 'LIKE' [the original table inheritance table] ?

CREATE TABLE IF NOT EXISTS public.audit_log_native (LIKE public.audit_log)
PARTITION BY RANGE (change_time);

public.audit_log is set up as table Inheritance and I'm migrating to native. I would not think it would matter but I'm not 100% sure.

@keithf4
Copy link
Collaborator

keithf4 commented Jul 6, 2023

I don't see why you couldn't use a LIKE clause. The result of a table made like that is still a normal table. It's just copying the schema from the original table.

@rsquaretrade
Copy link
Author

rsquaretrade commented Jul 7, 2023

We have tested twice and everything works well in sandbox, we are still troubleshooting why the error shows up post- migration in prod (we've since rollback due to it).

Separate question, do you have documentation on migrating OLTP from declarative/table inheritance to native and using publish_via_partition_root=true to replicate to the warehoue?

@keithf4
Copy link
Collaborator

keithf4 commented Jul 7, 2023

I don't have anything specific for that, no.

@rsquaretrade
Copy link
Author

rsquaretrade commented Jul 20, 2023

Hi. Question, I have a table that has a composite primary key that includes the partition key. I created a native partitioned table and also added the composite primary key in the parent and it worked, so I did not need a template table. I also used create_parent and it added a template table as described in pg_partman.md doc. Since I don't need the template table, does it hurt to UPDATE part_config to point the p_template to the parent? Or could I delete the template table? Just wondering.

@keithf4
Copy link
Collaborator

keithf4 commented Jul 20, 2023

Hi. Question, I have a table that has a composite primary key that includes the partition key. I created a native partitioned table and also added the composite primary key in the parent and it worked, so I did not need a template table. I also used create_parent and it added a template table as described in pg_partman.md doc. Since I don't need the template table, does it hurt to UPDATE part_config to point the p_template to the parent? Or could I delete the template table? Just wondering.

I don't think I have things fully tested in 4.x without having the template table at least existing there, even if it doesn't use it. Version 5.x actually makes it fully optional, so I'd recommend just leaving it there for now until that new version is out.

@rsquaretrade
Copy link
Author

Hi. FYI, I took your advise in above post and it's working well.

Different question. I'm migrating another table to native but the partition key is an integer (currently, we have an internal function that creates the monthly child tables like so: finance_p202301, for Jan, etc..) but we'd rather use pg_partman for maintenance. Could we use LIST or RANGE partitioning but how do we configure pg_partman? Thanks in advance.

@keithf4
Copy link
Collaborator

keithf4 commented Jul 21, 2023

The only method pg_partman supports with an integer for the column but time-based is epoch partitioning. See the flags in the functions related to that for how to do that. If your integer column is epoch based (seconds, milliseconds, or nanoseconds), you could convert it. If it's something else, it does not at this time.

Also, it only supports RANGE at this time.

And for future reference, please make a new entry in Discussions if you just have a question instead of replying to existing issues. Thank you!

@rsquaretrade
Copy link
Author

Hi, will do. Is it possible to keep this ticket open? We are still testing the original problem, we should know the results next week and this ticket can be closed. I've appreciated all your input, and thank you very much.

@keithf4
Copy link
Collaborator

keithf4 commented Jul 22, 2023

Sure it can be left open

@rsquaretrade
Copy link
Author

rsquaretrade commented Jul 25, 2023

Hi. We found the root-cause/bug: "logical replication's checking of replica identity when the target table is partitioned". The fix is in v14.5 but the Warehouse is currently at PostgreSQL 14.4 (Ubuntu 14.4-1.pgdg20.04+1). We will proceed after we've upgraded to 14.8.

@keithf4
Copy link
Collaborator

keithf4 commented Jul 25, 2023

Oh wow! Thanks for letting me know on that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants