postgres-cdc

Idea

Since version 9.4, PostgreSQL offers logical replication for efficiently and safely replicating data between different PostgreSQL instances on possibly different physical machines. It’s a write ahead log on disk, which holds all events that change the data of database => allow clients to capture all row level changes.

Advantages of using PostgreSQL’s logical replication for implementing CDC:

Log-based CDC enables the event-driven capturing of data changes in real-time. Downstream applications have always access to the latest data from PostgreSQL.
Can detect all change event types in PostgreSQL: INSERTs, UPDATEs, and DELETES.
Consuming events via logical replication boils down to directly accessing the file system, which does not impact the performance of the PostgreSQL database

Disadvantage:

Logical replication is not supported by very old versions of PostgreSQL (older than 9.4)

Architecture

Kafka: event streaming platform that is capable of processing massive amounts of real-time data
Zookeeper: another piece of software from Apache that Kafka uses to store and coordinate configuration
Debezium: an open source platform for change data capture(CDC). It plugs into Postgres, captures all row level changes and sends it to kafka topic
Kafka Connect: acts as a bridge for streaming data in and out of Kafka
Flink: framework and distributed processing engine for stateful computations over unbounded and bounded data streams (in this case used to process real-time data from kafka source and sink)

Build project

Prerequisites

Java 11
docker-compose

Procedure

Run docker-compose up -d

Set configurarion in Postgres wal_level = logical (need to restart DB)

Assume the source table is app_log_events and has 1 column payload

CREATE TABLE app_log_events (
  payload jsonb
);

Send this configuration with a POST command to running Kafka Connect service

{
    "name": "demo-connector",
    "config": {
        "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
        "database.hostname": "postgres",
        "database.port": "5432",
        "database.user": "admin",
        "database.password": "123456",
        "database.dbname": "postgres",
        "table.include.list": "public.app_log_events",
        "database.server.name": "dbpostgres",
        "topic.prefix":"dbpostgres",
        "plugin.name": "pgoutput"
    }
}

The service records the configuration and starts one connector task that performs the following actions:

Connects to the PostgreSQL database.
Reads the transaction log.
Streams change event records to Kafka topics.

A topic name dbpostgres.public.app_log_events which holds all change events for every row-level, is created

Data is streamed, tranformed, then write to files when rolling policy is met

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
conf/local		conf/local
pictures		pictures
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml
postgresql.conf		postgresql.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

postgres-cdc

Idea

Architecture

Build project

Prerequisites

Procedure

Ref

About

Releases

Packages

Languages

minhloannguyen/postgres-cdc

Folders and files

Latest commit

History

Repository files navigation

postgres-cdc

Idea

Architecture

Build project

Prerequisites

Procedure

Ref

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages