Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 44 additions & 121 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<br />
<p align="center">
<a href="https://supabase.io">
<picture>
<img alt="Supabase Logo" width="100%" src="docs/assets/etl-logo-extended.png">
<a href="https://supabase.com">
<picture>
<img alt="ETL by Supabase" width="100%" src="docs/assets/etl-logo-extended.png">
</picture>
</a>

Expand All @@ -19,162 +19,85 @@
</p>
</p>

**ETL** is a Rust framework by [Supabase](https://supabase.com) that enables you to build high-performance, real-time data replication applications for PostgreSQL. Whether you're creating ETL pipelines, implementing CDC (Change Data Capture), or building custom data synchronization solutions, ETL provides the building blocks you need.
ETL is a Rust framework by [Supabase](https://supabase.com) for building highperformance, realtime data replication apps on PostgreSQL. It sits on top of Postgres [logical replication](https://www.postgresql.org/docs/current/protocol-logical-replication.html) and gives you a clean, Rust‑native API for streaming changes to your own destinations.

Built on top of PostgreSQL's [logical streaming replication protocol](https://www.postgresql.org/docs/current/protocol-logical-replication.html), ETL handles the low-level complexities of database replication while providing a clean, Rust-native API that guides you towards the pit of success.
## Highlights

## Table of Contents
- 🚀 Real‑time replication: stream changes as they happen
- ⚡ High performance: batching and parallel workers
- 🛡️ Fault tolerant: retries and recovery built in
- 🔧 Extensible: implement custom stores and destinations
- 🧭 Typed, ergonomic Rust API

- [Features](#features)
- [Installation](#installation)
- [Quickstart](#quickstart)
- [Database Setup](#database-setup)
- [Running Tests](#running-tests)
- [Docker](#docker)
- [Architecture](#architecture)
- [Troubleshooting](#troubleshooting)
- [License](#license)
## Get Started

## Features

**Core Capabilities:**

- 🚀 **Real-time replication**: Stream changes from PostgreSQL as they happen
- 🔄 **Multiple destinations**: Support for various data warehouses and databases (coming soon)
- 🛡️ **Fault tolerance**: Built-in error handling, retries, and recovery mechanisms
- ⚡ **High performance**: Efficient batching and parallel processing
- 🔧 **Extensible**: Plugin architecture for custom destinations

**Supported Destinations:**

- [x] **BigQuery** - Google Cloud's data warehouse
- [ ] **Apache Iceberg** (planned) - Open table format for analytics
- [ ] **DuckDB** (planned) - In-process analytical database

## Installation

Add ETL to your Rust project via git dependencies in `Cargo.toml`:
Install via Git while we prepare for a crates.io release:

```toml
[dependencies]
etl = { git = "https://github.com/supabase/etl" }
```

> **Note**: ETL is currently distributed via Git while we prepare for the initial crates.io release.

## Quickstart

Get up and running with ETL in minutes using the built-in memory destination:
Quick example using the in‑memory destination:

```rust
use etl::config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig};
use etl::pipeline::Pipeline;
use etl::destination::memory::MemoryDestination;
use etl::store::both::memory::MemoryStore;
use etl::{
config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig},
destination::memory::MemoryDestination,
pipeline::Pipeline,
store::both::memory::MemoryStore,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure PostgreSQL connection
let pg_connection_config = PgConnectionConfig {
host: "localhost".to_string(),
let pg = PgConnectionConfig {
host: "localhost".into(),
port: 5432,
name: "mydb".to_string(),
username: "postgres".to_string(),
name: "mydb".into(),
username: "postgres".into(),
password: Some("password".into()),
tls: TlsConfig {
trusted_root_certs: String::new(),
enabled: false,
},
tls: TlsConfig { enabled: false, trusted_root_certs: String::new() },
};

// Configure the pipeline
let pipeline_config = PipelineConfig {
let store = MemoryStore::new();
let destination = MemoryDestination::new();

let config = PipelineConfig {
id: 1,
publication_name: "my_publication".to_string(),
pg_connection: pg_connection_config,
batch: BatchConfig {
max_size: 1000,
max_fill_ms: 5000,
},
table_error_retry_delay_ms: 10000,
publication_name: "my_publication".into(),
pg_connection: pg,
batch: BatchConfig { max_size: 1000, max_fill_ms: 5000 },
table_error_retry_delay_ms: 10_000,
max_table_sync_workers: 4,
};

// Create in-memory store and destination for testing
let store = MemoryStore::new();
let destination = MemoryDestination::new();

// Create and start the pipeline
let mut pipeline = Pipeline::new(1, pipeline_config, store, destination);
let mut pipeline = Pipeline::new(config, store, destination);
pipeline.start().await?;
// pipeline.wait().await?; // Optional: block until completion

Ok(())
}
```

**Need production destinations?** Add the `etl-destinations` crate with specific features:
For tutorials and deeper guidance, see the [Documentation](https://supabase.github.io/etl) or jump into the [examples](etl-examples/README.md).

```toml
[dependencies]
etl = { git = "https://github.com/supabase/etl" }
etl-destinations = { git = "https://github.com/supabase/etl", features = ["bigquery"] }
```

For comprehensive examples and tutorials, visit the [etl-examples](etl-examples/README.md) crate and our [documentation](https://supabase.github.io/etl).
## Destinations

## Database Setup
ETL is designed to be extensible. You can implement your own destinations to send data to any destination you like, however it comes with a few built in destinations:

Before running the examples, tests, or the API and replicator components, you'll need to set up a PostgreSQL database.
We provide a convenient script to help you with this setup. For detailed instructions on how to use the database setup script, please refer to our [Database Setup Guide](docs/guides/database-setup.md).
- BigQuery

## Running Tests
Out-of-the-box destinations are available in the `etl-destinations` crate:

To run the test suite:

```bash
cargo test --all-features
```

## Docker

The repository includes Docker support for both the `replicator` and `api` components:

```bash
# Build replicator image
docker build -f ./etl-replicator/Dockerfile .

# Build api image
docker build -f ./etl-api/Dockerfile .
```

## Architecture

For a detailed explanation of the ETL architecture and design decisions, please refer to our [Design Document](docs/design/etl-crate-design.md).

## Troubleshooting

### Too Many Open Files Error

If you see the following error when running tests on macOS:

```
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
```

Raise the limit of open files per process with:

```bash
ulimit -n 10000
```toml
[dependencies]
etl = { git = "https://github.com/supabase/etl" }
etl-destinations = { git = "https://github.com/supabase/etl", features = ["bigquery"] }
```

### Performance Considerations

Currently, the system parallelizes the copying of different tables, but each individual table is still copied in sequential batches.
This limits performance for large tables. We plan to address this once the ETL system reaches greater stability.

## License

Distributed under the Apache-2.0 License. See `LICENSE` for more information.
Apache2.0. See `LICENSE` for details.

---

Expand Down
Loading