Provide a primary key flag for columns #43

jpechane · 2017-11-24T09:40:37Z

When the event is processed downstream it is important to know what columns forms the primary key. Could you please provide a flag for columns that says that it is a part of primary key?

eulerto · 2017-11-25T01:59:20Z

oldkeys. However, the concept of row identification is wider than primary key (ref REPLICA IDENTITY). We can use the full row, an unique index, or primary key to identity a row.

jpechane · 2017-11-25T12:35:56Z

I can use FULL IDENTITY replica to get the orgiginal content of the row that is updated in oldkeys. But in this case I am losing the primary key information. So providing a PK flag per field would allow me eat a cake and have it too.

eulerto · 2017-11-25T12:56:29Z

I'm not following you. Why would you use FULL? If you don't, primary key is used as replica identity, hence, oldkeys is the primary key.

jpechane · 2017-11-25T13:02:23Z

Please look at http://debezium.io/docs/connectors/postgresql/, section Update events.
I'd like to have an event that would contain previoud values of the row before it was change and the new values. To get this I need to set REPLICA IDENTITY FULL but in this case I will not be able to identify which colmuns forms primary key.

eulerto · 2017-11-25T13:28:26Z

I read the docs but I'm not sure what kind of problem you are trying to solve. For updates, we provide key and new tuple. Do you want the old tuple too?

jpechane · 2017-11-25T16:56:14Z

Yes, exactly!

eulerto · 2018-05-23T14:26:15Z

@jpechane is there any interest in this feature?

jpechane · 2018-05-23T19:24:14Z

@eulerto Hi, we still have a strong interest in this feature. Unfortunately our skillset blocks us from implementing and providing a PR

rkrage · 2018-10-12T15:39:35Z

@eulerto, my company (Braintree) has actually started implementing this feature in an internal fork.

Our idea was to change the definition of oldkeys slightly and introduce a new field called oldvalues.

Here's an outline of the scenarios:

Insert

If replica identity is DEFAULT, FULL, or an index:

oldkeys will be the primary key columns from the new tuple if a pk index exists
oldvalues will not be printed

Update

If replica identity is DEFAULT:

oldkeys will be the primary key columns from the old tuple if a pk index exists
oldvalues will not be printed

If replica identity is FULL:

oldkeys will be the primary key columns from the old tuple if a pk index exists
oldvalues will be the entire old tuple with null values omitted

If replica identity is an index:

oldkeys will not be printed, as the old tuple may not contain the old pk values
oldvalues will be the columns covered by the replica identity index from the old tuple with null values omitted

Delete

If replica identity is DEFAULT:

oldkeys will be the primary key columns from the old tuple (note that if a pk index doesn't exist, we'll bail out early in the sanity checks and nothing will be outputted)
oldvalues will not be printed

If replica identity is FULL:

oldkeys will be the primary key columns from the old tuple if a pk index exists
oldvalues will be the entire old tuple with null values omitted

If replica identity is an index:

oldkeys will not be printed, as the old tuple may not contain the old pk values
oldvalues will be the columns covered by the replica identity index from the old tuple with null values omitted

I'm sure there are some weird edge cases that I didn't mention, but in summary: oldkeys will print primary key information and oldvalues will print replica identity for updates / deletes iff the value differs from oldkeys (mainly due to performance concerns).

To transition from DEFAULT to FULL, this approach makes a lot of sense for our use case, as it simply introduces a new field while oldkeys remains unchanged.

I'd love to hear your thoughts on this proposal!

Thanks!

gunnarmorling · 2019-02-15T11:07:16Z

Hey @rkrage, did you ever end up publishing this work somewhere?

rkrage · 2019-02-15T15:41:13Z

We've had some discussions about publishing our internal fork, but I'm not sure where we landed on it. I'll bring it up again and get back to you!

Also, we've been running this feature in production for the past several months with no issues 😃

deanrabinowitz4 · 2020-07-02T19:06:32Z

Hey @rkrage, did you ever decide to publish your work? Also, does the feature work with format-version 2? I'm specifically interested in how you were able to get the entire tuple for DELETE records.

rkrage · 2020-07-02T19:33:39Z

@deanrabinowitz4, we haven't been able to publish our work, unfortunately. But I didn't think we did anything special to get the entire tuple for DELETEs. I thought you just needed to set replica identity to full. As far as determining the PK when replica identity is full, I believe this is the code that inspired our work: https://github.com/postgres/postgres/blob/7551d9bc408c2402a8558367ee950ca403e25b37/contrib/tcn/tcn.c#L121-L139

Our fork doesn't currently support format version 2. We were hoping the open-source version would evolve to suit our needs, and it looks like it's getting close.

@eulerto, we had previously discussed exposing the PK in the version 2 proposal: #110 (comment)

Is that still in the works? Thanks!

eulerto · 2020-07-02T19:42:10Z

Since this is a popular request, I developed a patch to add primary key information (both formats). If I have some spare time to improve test coverage for this feature, I will commit it this week.

eulerto · 2020-07-26T14:13:46Z

Done on commit 3511384

jpechane mentioned this issue Nov 30, 2017

WIP: DBZ-379 Postgres connector minimizes use of JDBC metadata debezium/debezium#364

Closed

TimoStolz mentioned this issue Sep 23, 2018

print primary key information for newly inserted rows #81

Closed

eulerto closed this as completed Jul 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a primary key flag for columns #43

Provide a primary key flag for columns #43

jpechane commented Nov 24, 2017

eulerto commented Nov 25, 2017

jpechane commented Nov 25, 2017

eulerto commented Nov 25, 2017

jpechane commented Nov 25, 2017

eulerto commented Nov 25, 2017

jpechane commented Nov 25, 2017

eulerto commented May 23, 2018

jpechane commented May 23, 2018

rkrage commented Oct 12, 2018

gunnarmorling commented Feb 15, 2019

rkrage commented Feb 15, 2019

deanrabinowitz4 commented Jul 2, 2020 •

edited

rkrage commented Jul 2, 2020 •

edited

eulerto commented Jul 2, 2020

eulerto commented Jul 26, 2020

Provide a primary key flag for columns #43

Provide a primary key flag for columns #43

Comments

jpechane commented Nov 24, 2017

eulerto commented Nov 25, 2017

jpechane commented Nov 25, 2017

eulerto commented Nov 25, 2017

jpechane commented Nov 25, 2017

eulerto commented Nov 25, 2017

jpechane commented Nov 25, 2017

eulerto commented May 23, 2018

jpechane commented May 23, 2018

rkrage commented Oct 12, 2018

Insert

Update

Delete

gunnarmorling commented Feb 15, 2019

rkrage commented Feb 15, 2019

deanrabinowitz4 commented Jul 2, 2020 • edited

rkrage commented Jul 2, 2020 • edited

eulerto commented Jul 2, 2020

eulerto commented Jul 26, 2020

deanrabinowitz4 commented Jul 2, 2020 •

edited

rkrage commented Jul 2, 2020 •

edited