Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a primary key flag for columns #43

Closed
jpechane opened this issue Nov 24, 2017 · 15 comments
Closed

Provide a primary key flag for columns #43

jpechane opened this issue Nov 24, 2017 · 15 comments

Comments

@jpechane
Copy link

When the event is processed downstream it is important to know what columns forms the primary key. Could you please provide a flag for columns that says that it is a part of primary key?

@eulerto
Copy link
Owner

eulerto commented Nov 25, 2017

oldkeys. However, the concept of row identification is wider than primary key (ref REPLICA IDENTITY). We can use the full row, an unique index, or primary key to identity a row.

@jpechane
Copy link
Author

I can use FULL IDENTITY replica to get the orgiginal content of the row that is updated in oldkeys. But in this case I am losing the primary key information. So providing a PK flag per field would allow me eat a cake and have it too.

@eulerto
Copy link
Owner

eulerto commented Nov 25, 2017

I'm not following you. Why would you use FULL? If you don't, primary key is used as replica identity, hence, oldkeys is the primary key.

@jpechane
Copy link
Author

Please look at http://debezium.io/docs/connectors/postgresql/, section Update events.
I'd like to have an event that would contain previoud values of the row before it was change and the new values. To get this I need to set REPLICA IDENTITY FULL but in this case I will not be able to identify which colmuns forms primary key.

@eulerto
Copy link
Owner

eulerto commented Nov 25, 2017

I read the docs but I'm not sure what kind of problem you are trying to solve. For updates, we provide key and new tuple. Do you want the old tuple too?

@jpechane
Copy link
Author

Yes, exactly!

@eulerto
Copy link
Owner

eulerto commented May 23, 2018

@jpechane is there any interest in this feature?

@jpechane
Copy link
Author

@eulerto Hi, we still have a strong interest in this feature. Unfortunately our skillset blocks us from implementing and providing a PR

@rkrage
Copy link
Contributor

rkrage commented Oct 12, 2018

@eulerto, my company (Braintree) has actually started implementing this feature in an internal fork.

Our idea was to change the definition of oldkeys slightly and introduce a new field called oldvalues.

Here's an outline of the scenarios:

Insert

If replica identity is DEFAULT, FULL, or an index:

  • oldkeys will be the primary key columns from the new tuple if a pk index exists
  • oldvalues will not be printed

Update

If replica identity is DEFAULT:

  • oldkeys will be the primary key columns from the old tuple if a pk index exists
  • oldvalues will not be printed

If replica identity is FULL:

  • oldkeys will be the primary key columns from the old tuple if a pk index exists
  • oldvalues will be the entire old tuple with null values omitted

If replica identity is an index:

  • oldkeys will not be printed, as the old tuple may not contain the old pk values
  • oldvalues will be the columns covered by the replica identity index from the old tuple with null values omitted

Delete

If replica identity is DEFAULT:

  • oldkeys will be the primary key columns from the old tuple (note that if a pk index doesn't exist, we'll bail out early in the sanity checks and nothing will be outputted)
  • oldvalues will not be printed

If replica identity is FULL:

  • oldkeys will be the primary key columns from the old tuple if a pk index exists
  • oldvalues will be the entire old tuple with null values omitted

If replica identity is an index:

  • oldkeys will not be printed, as the old tuple may not contain the old pk values
  • oldvalues will be the columns covered by the replica identity index from the old tuple with null values omitted

I'm sure there are some weird edge cases that I didn't mention, but in summary: oldkeys will print primary key information and oldvalues will print replica identity for updates / deletes iff the value differs from oldkeys (mainly due to performance concerns).

To transition from DEFAULT to FULL, this approach makes a lot of sense for our use case, as it simply introduces a new field while oldkeys remains unchanged.

I'd love to hear your thoughts on this proposal!

Thanks!

@gunnarmorling
Copy link

Hey @rkrage, did you ever end up publishing this work somewhere?

@rkrage
Copy link
Contributor

rkrage commented Feb 15, 2019

We've had some discussions about publishing our internal fork, but I'm not sure where we landed on it. I'll bring it up again and get back to you!

Also, we've been running this feature in production for the past several months with no issues 😃

@deanrabinowitz4
Copy link

deanrabinowitz4 commented Jul 2, 2020

Hey @rkrage, did you ever decide to publish your work? Also, does the feature work with format-version 2? I'm specifically interested in how you were able to get the entire tuple for DELETE records.

@rkrage
Copy link
Contributor

rkrage commented Jul 2, 2020

@deanrabinowitz4, we haven't been able to publish our work, unfortunately. But I didn't think we did anything special to get the entire tuple for DELETEs. I thought you just needed to set replica identity to full. As far as determining the PK when replica identity is full, I believe this is the code that inspired our work: https://github.com/postgres/postgres/blob/7551d9bc408c2402a8558367ee950ca403e25b37/contrib/tcn/tcn.c#L121-L139

Our fork doesn't currently support format version 2. We were hoping the open-source version would evolve to suit our needs, and it looks like it's getting close.

@eulerto, we had previously discussed exposing the PK in the version 2 proposal: #110 (comment)

Is that still in the works? Thanks!

@eulerto
Copy link
Owner

eulerto commented Jul 2, 2020

Since this is a popular request, I developed a patch to add primary key information (both formats). If I have some spare time to improve test coverage for this feature, I will commit it this week.

@eulerto
Copy link
Owner

eulerto commented Jul 26, 2020

Done on commit 3511384

@eulerto eulerto closed this as completed Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants