Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osmdbt and pgoutput #38

Open
mmd-osm opened this issue Sep 24, 2023 · 2 comments
Open

osmdbt and pgoutput #38

mmd-osm opened this issue Sep 24, 2023 · 2 comments

Comments

@mmd-osm
Copy link
Contributor

mmd-osm commented Sep 24, 2023

I'm moving my comment to a new issue, as requested by @joto

Since compiling and deploying a custom plugin seemed a bit cumbersome, I've been exploring the option to use pgoutput, a fast binary format-based plugin that's built into Postgresql.

pgoutput is the standard logical decoding output plug-in in PostgreSQL 10+. It is maintained by the PostgreSQL community, and used by PostgreSQL itself for logical replication. This plug-in is always present so no additional libraries need to be installed.1

osmdbt-pgoutput interprets the raw replication event stream directly and translates it into the same text representation like the osm-logical plugin today. Most of what osm-logical plugin has been doing before has moved to osmdbt-get-log.cpp and pgoutput.[ch]pp. All command line tools should work like before. Configuration wise, a new database parameter publication was added to the osmdbt.yaml file.

Maybe in a long term, this approach could simplify our setup, or make it easier to use osmdbt in cloud environments with limited options for deploying custom plugins.

Link: https://github.com/mmd-osm/osmdbt-pgoutput

Footnotes

  1. Quoting https://debezium.io/documentation/reference/stable/connectors/postgresql.html

@joto
Copy link
Collaborator

joto commented Sep 24, 2023

I couldn't find any real documentation on the pgoutput plugin. I am all for using something that's already there instead our own implementation, but is it intended as something that "the public" can use or as something internal to PostgrSQL? We don't want to switch and then they change their internal representation or something and our code breaks?

@mmd-osm
Copy link
Contributor Author

mmd-osm commented Sep 25, 2023

Debezium seems to be one of the more prominent external consumers interfacing directly with pgoutput. This is matching our use case, with Apache Kafka as a destination, rather than some text files.

The binary format itself is documented on the postgresql.org page: Logical Replication Message Formats

Since pgoutput typically supports multiple versions of its binary protocol, clients can explicitly request one particular version when connecting to the database. In the case of osmdbt-pgoutput that's version 1. As long as future Postgresql versions still support this version, we're good.

I've noticed some minor differences across different Postgresql versions, such as omitting an empty BEGIN / COMMIT pair. From a functional point of view, this has no impact. However, some unit tests that are relying on number of rows might see different results here. I've already considered this point in the test cases.

Some links on how different projects are using pgoutput:

There's probably much more out there. If I find more interesting links, I will add them to the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants