This repository contains an Apache Flink application for real-time transaction analytics built using Docker Compose to orchestrate the infrastructure components Apache Flink, Postgres, Elastisearch, and Kibana. The application processes financial transaction data from Kafka using Kafka KRaft, performs aggregations, and stores the results in both Postgres and Elasticsearch for further analysis.
- Docker
- Confluent Kafka
- Apache Flink
- PostgreSQL
- Elastisearch
- Kibana
- Clone this repository.
- Navigate to ~/TransactionsGenerator
- If the following components are not installed execute:
pip install fakerpip install confluent_kafkapip install simplejson
- Run
docker compose up -dto start the required services (Apache Flink, Postgres, Elastisearch, and Kibana). - Run the Sales Transaction Generator
main.pyin order to generate the sales transactions into Kafka. - Navigate into the Kafka KRaft container
docker exec -it kafka-kraft /bin/bash - Send all the transactions generated from main.py to the Kafka consumer and process the transaction data
kafka-console-consumer --topic financialTransactions --bootstrap-server kafka-kraft:29092 --from-beginning - Navigate to the location of your Flink folder
- Start your flink cluster
./bin/start-cluster.sh - Navigate back to ~/FlinkTransactions
- Compile your maven packages using:
- mvn clean
- mvn compile
- mvn package
- Run the DataStreamJob using
[FLINK FOLDER LOCATION]/bin/flink run -c FlinkTransactions.DataStreamJob target/FlinkTransactions-1.0-SNAPSHOT.jar
The DataStreamJob class within the FlinkTransactions package serves as the main entry point for the Flink application. The application consumes financial transaction data from Kafka, performs transformations, and stores aggregated results in both Postgres and Elasticsearch.
- Sets up the Flink execution environment.
- Connects to Kafka as a source for financial transaction data.
- Processes, transforms, and performs aggregations on transaction data streams.
- Stores transaction data and aggregated results in tables (
transactions,sales_per_category,sales_per_day,sales_per_month).
- Stores transaction data for further analysis.
- Visualizes transaction data from Elastisearch.
DataStreamJob.java: Contains the Flink application logic, including Kafka source setup, stream processing, transformations, and sinks for Postgres and Elasticsearch.Deserializer,Dto, andutilspackages: Include necessary classes and utilities for deserialization, data transfer objects, and JSON conversion.
- Kafka settings are defined within the Kafka docker source setup using KRaft.
- Postgres connection details (username, password, database) are defined within the Postgres docker source setup under
POSTGRES_USER,POSTGRES_PASSWORD, andPOSTGRES_DB
- The application includes sink operations for Postgres using JDBC to create tables (
transactions,sales_per_category,sales_per_day,sales_per_month) and perform insert/update operations. - Additionally, it includes an Elasticsearch sink to index transaction data for further analysis.
