Skip to content

glassflow/clickhouse-etl

Repository files navigation

GlassFlow Logo

Docs · Report Bug · Feature Request · Get Help · Watch Demo

Join Next Office Hour Email Support
Slack Twitter

GlassFlow for ClickHouse Streaming ETL

GlassFlow is an open-source ETL tool that enables real-time data processing from Kafka to ClickHouse with features like deduplication and temporal joins.

⚡️ Quick Start

This guide walks you through a local installation using Docker Compose — perfect for development, testing, or trying out GlassFlow on your machine.

  1. Clone the repository:
git clone https://github.com/glassflow/clickhouse-etl.git
cd clickhouse-etl
  1. Start the services:
docker compose up
  1. Access the web interface at http://localhost:8080 to configure your pipeline.

  2. View the logs:

# Follow logs in real-time for all containers
docker compose logs -f

# logs for the backend app
docker compose logs app -f

# logs for the UI
docker compose logs ui -f

🧭 Installation Options

GlassFlow can be installed in a variety of environments depending on your use case. Below is a quick overview:

Method Use Case Docs Link
🐳 Local with Docker Compose Quick evaluation and local testing Local Docker Guide
☁️ AWS EC2 with Docker Compose Lightweight cloud deployment for testing AWS EC2 Guide
☸️ Kubernetes with Helm Kubernetes deployment Kubernetes Helm Guide

ℹ️ Note: The current GlassFlow deployment is not horizontally scalable yet. A new Kubernetes-native, scalable deployment is in development and expected by end of July.

🎥 Demo

Live Demo

See a working demo of GlassFlow in action at demo.glassflow.dev.

GlassFlow Pipeline Data Flow

GlassFlow Pipeline showing real-time streaming from Kafka through GlassFlow to ClickHouse

Demo Video

GlassFlow Overview Video

📚 Documentation

For detailed documentation, visit docs.glassflow.dev. The documentation includes:

🗺️ Roadmap

Check out our public roadmap to see what's coming next in GlassFlow. We're actively working on new features and improvements based on community feedback.

Want to suggest a feature? We'd love to hear from you! Please use our GitHub Discussions to share your ideas and help shape the future of GlassFlow.

✨ Features

  • Real-time data processing from Kafka to ClickHouse
  • Deduplication with configurable time windows
  • Temporal joins between multiple Kafka topics
  • Web-based UI for pipeline management
  • Docker-based deployment
  • Local development environment

🆘 Support

⚖️ License

This project is licensed under the Apache License 2.0.

About

Real-time deduplication and temporal joins for streaming data

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published