Confluent Cloud is a fully managed platform for Apache Kafka, designed to simplify real-time data streaming and processing. It integrates Kafka for data ingestion, Flink for stream processing, and Tableflow for converting streaming data into analytics-ready Apache Iceberg tables. DuckDB, a lightweight analytical database, supports querying these Iceberg tables, making it an ideal tool for the workshop’s analytics component. The workshop is designed for developers with basic programming knowledge, potentially new to Kafka, Flink, or Tableflow, and aims to provide hands-on experience within a condensed time frame.
Important
|
🚨 CRITICAL - COST PREVENTION: After completing this workshop, immediately follow the teardown guide to prevent unexpected charges from Flink compute pools and Tableflow catalog integrations. 📋 Teardown Guide: Execute teardown steps within 15 minutes of workshop completion to avoid ongoing billing for cloud resources. |
This 2-hour hands-on workshop introduces developers to building real-time data pipelines using Confluent Cloud. You’ll learn to stream data with Apache Kafka, process it in real-time with Apache Flink, and convert it into Apache Iceberg tables using Tableflow. The workshop assumes basic familiarity with programming and provides step-by-step guidance.
-
Set up a Kafka cluster and manage topics in Confluent Cloud.
-
Write and run a Flink job to process streaming data.
-
Use Tableflow to materialize Kafka topics as Iceberg tables and query them with DuckDB.
-
GitHub Account: Required for accessing GitHub Codespaces or cloning the workshop repository.
-
Create a free GitHub account if you don’t have one
-
-
VSCode with Confluent Extension: For managing Confluent Cloud resources.
-
Confluent CLI: To interact with Kafka clusters and topics.
-
DuckDB: For querying Tableflow Iceberg tables.
Note
|
🖥️ Platform Compatibility: All workshop scripts are designed for Linux and macOS (with zsh shell). 💻 Windows Users: We recommend using GitHub Codespaces or VS Code Dev Containers for the best experience. The devcontainer configuration provides a consistent Linux environment with all tools pre-installed. 🚀 Quick Start Options: * GitHub Codespaces (Recommended for Windows): Click "Code" → "Codespaces" → "Create codespace" * Dev Containers: Open in VS Code → Ctrl+Shift+P → "Dev Containers: Reopen in Container" * Local Setup: Linux/macOS users can run scripts directly |
Segment | Duration | Features Covered | Objective |
---|---|---|---|
Introduction |
15 min |
Kafka, Flink, Tableflow Overview |
Understand event-driven architecture |
Setting Up Confluent Cloud |
15 min |
Kafka Cluster Creation |
Set up a managed Kafka cluster |
Kafka Hands-On |
30 min |
Kafka Topics, Producers, Consumers |
Stream data with Kafka |
Flink Hands-On |
45 min |
Flink Stream Processing |
Process data in real-time |
Tableflow Hands-On |
30 min |
Tableflow, Iceberg, DuckDB |
Materialize and query analytics-ready data |
Wrap-Up and Q&A |
15 min |
All features |
Summarize and address questions |