A streaming data pipeline typically consists of data transformation, wrangling, and (time-based window) aggregation. On top of that, we must also guarantee data integrity. One might think of Kafka Streams to solve all these challenges, and it is definitely a good choice. However, in many cases, ksqlDB queries are simpler, faster to implement, and work fine.
This repository was used in a Confluent meetup. You can watch the recording in the Community Forum.
In the meanwhile this repo became a playground for different ways of deployment as well as exploring features such as Cluster Linking, or enabling metrics. You can find them under different branches. Currently, this pipeline can be deployed:
- locally with Docker
- with C3 metrics
- locally secured with Docker
- SSL & SASL_SSL
- locally setting up Health+
- locally setting up with reduced infra mode for C3
- with Confluent for Kubernetes using Minikube
- with Health+
- on Confluent Cloud
- with Cluster Linking and Schema Linking
- with RBAC
- exporting Metrics to Grafana Cloud
- exploring Audit Logs
- on Confluent Cloud with Stream Designer
- hybrid (locally but using Cluster Linking to transfer data to CC)