Kafka-Postgres

Data stream processing with Apache Kafka connectors, sinks and PostgreSQL

Use Case

An ecommerce store with transactional data and wishes to process the event data generated by transactions.

Tech used

Apache Kafka for stream data handling
Zookeeper for managing Kafka
Kafka Connect for "transporting data into and out of Kafka
Postgres for working with Kafka connectors to handle data going into and out of Kafka
KSQL server for create real-time processing
Kafka's schema registry for imposing the AVRO format.

Connectors

source.json (source) Defines the connection between the source database(postgres) and Kafka as the destination sink.json (Sink) Defines the connection between Kafka as the source and a postgres DB as the destination

Running the project

Spin up the services using the command docker-compose up use -d flag to run in detatch mode.

Setup a Postgres DB and Load Data

On a separate terminal, To spin up a postgres db, run the command

docker run -it --rm --network=kafka_postgres_default \
         -v $PWD:/home/data/ \
         postgres:11.0 psql -h postgres -U postgres

In the PSQL interface, run the commands defined in the tables.sql file

Make source connection

Submit source.json file to the connect service via a curl command

curl -X POST -H “Accept:application/json” -H “Content-Type: application/json” --data @source.json http://localhost:8083/connectors

Query the connector to see if it worked curl -H “Accept:application/json” localhost:8083/connectors/

if successful, the transactions table should be seen as a TOPIC. Run the command docker exec -it <kafka-container-id> /bin/bash to access the kafka container bash. In this case, "kafka" is the container name. You can also use the container ID Once in the container bash, run the commmand /usr/bin/kafka-topics — list — zookeeper zookeeper:2181 to view topics.

Access Topics via KSQL

Since the KSQL-CLI server is running courtesy of docker compose, run the command docker exec -it <ksqldb-cli-container-id> /bin/ksql http://ksqldb-server:8088 Once in the CLI enter the command SHOW TOPICS In the KSQL interface, create a stream and table for mirroring the transactions table in the postgres DB

CREATE STREAM transaction_src (id INTEGER, account_action VARCHAR, user-id INTEGER, tx_id VARCHAR, amount DOUBLE PRECISION)
WITH (KAFKA_TOPIC=’dbserver1.public.transactions, VALUE_FORMAT=’AVRO’);

CREATE STREAM transaction_rekey WITH (PARTITIONS=1) AS 
SELECT * FROM transaction_src PARTITION BY user-id;

SHOW STREAMS;

CREATE TABLE transactions (id INTEGER, account_action VARCHAR, user-id INTEGER, tx_id VARCHAR, amount DOUBLE PRECISION)
WITH (KAFKA_TOPIC=TRANSACTIONS_REKEY, VALUE_FORMAT=’AVRO’, KEY=’user-id’);

SHOW TABLES;

Perform basic analysis with KSQL

Assume that transactions above 8,000,000 are considered suspicious, we would create a table with suspicious transactions using the SQL statement below

CREATE TABLE TRANSACTIONS_SUSPECT AS
SELECT AMOUNT, TIMESTAMP, USER-ID, TX_ID, ACCOUNT_ACTION  FROM TRANSACTIONS_REKEY WHERE AMOUNT > 8000000 WITH (KAFKA_TOPIC=TRANSACTIONS_SUSPECT, VALUE_FORMAT=’delimited’, KEY=TX_ID);

Submit Sink Config to Connect Registry

Run the curl command

curl -X POST -H “Accept:application/json” -H “Content-Type: application/json” --data @sink.json http://localhost:8083/connectors

Inspect Data in Postgres

Access the running postgres container defined in the docker-compose file by running

docker exec -it <postgres_container_id> psql -U postgres -W postgres ecommerce

View the data brought in by Kafka connect by running the SQL commands

SELECT * FROM  TRANSACTIONS_SUSPECT;
SELECT COUNT(TX_ID) FROM TRANSACTIONS_SUSPECT;

When you add new data in the source database by copying csv data, the data should go through the whole process, get processed via the simple KSQL query and upon re running the above query, the results will be different.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
docker-compose.yml		docker-compose.yml
mock_transaction_data.csv		mock_transaction_data.csv
postgresql.conf		postgresql.conf
sink.json		sink.json
source.json		source.json
tables.sql		tables.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kafka-Postgres

Use Case

Tech used

Connectors

Running the project

Setup a Postgres DB and Load Data

Make source connection

Access Topics via KSQL

Perform basic analysis with KSQL

Submit Sink Config to Connect Registry

Inspect Data in Postgres

About

Uh oh!

Releases

Packages

KimaruThagna/Kafka-Postgres

Folders and files

Latest commit

History

Repository files navigation

Kafka-Postgres

Use Case

Tech used

Connectors

Running the project

Setup a Postgres DB and Load Data

Make source connection

Access Topics via KSQL

Perform basic analysis with KSQL

Submit Sink Config to Connect Registry

Inspect Data in Postgres

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages