-**ETL** is a Rust framework by [Supabase](https://supabase.com) that enables you to build high-performance, real-time data replication applications for PostgreSQL. Whether you're creating ETL pipelines, implementing CDC (Change Data Capture), or building custom data synchronization solutions, ETL provides the building blocks you need.
+**ETL** is a Rust framework by [Supabase](https://supabase.com) that enables you to build high-performance, real-time data replication applications for PostgreSQL. Stream changes as they happen, route to multiple destinations, and build robust data pipelines with minimal complexity.
-Built on top of PostgreSQL's [logical streaming replication protocol](https://www.postgresql.org/docs/current/protocol-logical-replication.html), ETL handles the low-level complexities of database replication while providing a clean, Rust-native API that guides you towards the pit of success.
+Built on PostgreSQL's [logical replication protocol](https://www.postgresql.org/docs/current/protocol-logical-replication.html), ETL handles the complexities so you can focus on your data.
-## Table of Contents
+## β¨ Key Features
-- [Features](#features)
-- [Installation](#installation)
-- [Quickstart](#quickstart)
-- [Database Setup](#database-setup)
-- [Running Tests](#running-tests)
-- [Docker](#docker)
-- [Architecture](#architecture)
-- [Troubleshooting](#troubleshooting)
-- [License](#license)
+- π **Real-time streaming** - Changes flow instantly from PostgreSQL
+- π **Multiple destinations** - BigQuery, custom APIs, and more
+- π‘οΈ **Built-in resilience** - Automatic retries and recovery
+- β‘ **High performance** - Efficient batching and parallel processing
+- π§ **Extensible** - Plugin architecture for any destination
-## Features
-
-**Core Capabilities:**
-- π **Real-time replication**: Stream changes from PostgreSQL as they happen
-- π **Multiple destinations**: Support for various data warehouses and databases (coming soon)
-- π‘οΈ **Fault tolerance**: Built-in error handling, retries, and recovery mechanisms
-- β‘ **High performance**: Efficient batching and parallel processing
-- π§ **Extensible**: Plugin architecture for custom destinations
-
-**Supported Destinations:**
-- [x] **BigQuery** - Google Cloud's data warehouse
-- [ ] **Apache Iceberg** (planned) - Open table format for analytics
-- [ ] **DuckDB** (planned) - In-process analytical database
-
-## Installation
-
-Add ETL to your Rust project via git dependencies in `Cargo.toml`:
-
-```toml
-[dependencies]
-etl = { git = "https://github.com/supabase/etl" }
-```
-
-> **Note**: ETL is currently distributed via Git while we prepare for the initial crates.io release.
-
-## Quickstart
-
-Get up and running with ETL in minutes using the built-in memory destination:
+## π¦ Quick Start
```rust
-use etl::config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig};
-use etl::pipeline::Pipeline;
-use etl::destination::memory::MemoryDestination;
-use etl::store::both::memory::MemoryStore;
+use etl::{
+ config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig},
+ destination::memory::MemoryDestination,
+ pipeline::Pipeline,
+ store::both::memory::MemoryStore,
+};
#[tokio::main]
async fn main() -> Result<(), Box> {
// Configure PostgreSQL connection
- let pg_connection_config = PgConnectionConfig {
+ let pg_config = PgConnectionConfig {
host: "localhost".to_string(),
port: 5432,
name: "mydb".to_string(),
username: "postgres".to_string(),
- password: Some("password".into()),
- tls: TlsConfig {
- trusted_root_certs: String::new(),
- enabled: false,
- },
+ password: Some("password".to_string().into()),
+ tls: TlsConfig { enabled: false, trusted_root_certs: String::new() },
};
+ // Create memory-based store and destination for testing
+ let store = MemoryStore::new();
+ let destination = MemoryDestination::new();
+
// Configure the pipeline
- let pipeline_config = PipelineConfig {
+ let config = PipelineConfig {
id: 1,
publication_name: "my_publication".to_string(),
- pg_connection: pg_connection_config,
- batch: BatchConfig {
- max_size: 1000,
- max_fill_ms: 5000,
- },
+ pg_connection: pg_config,
+ batch: BatchConfig { max_size: 1000, max_fill_ms: 5000 },
table_error_retry_delay_ms: 10000,
max_table_sync_workers: 4,
};
- // Create in-memory store and destination for testing
- let store = MemoryStore::new();
- let destination = MemoryDestination::new();
-
// Create and start the pipeline
- let mut pipeline = Pipeline::new(1, pipeline_config, store, destination);
+ let mut pipeline = Pipeline::new(1, config, store, destination);
pipeline.start().await?;
-
+
+ // Pipeline will run until stopped
+ pipeline.wait().await?;
+
Ok(())
}
```
-**Need production destinations?** Add the `etl-destinations` crate with specific features:
-
-```toml
-[dependencies]
-etl = { git = "https://github.com/supabase/etl" }
-etl-destinations = { git = "https://github.com/supabase/etl", features = ["bigquery"] }
-```
+**Want to try it?** β [**Build your first pipeline in 15 minutes**](https://supabase.github.io/etl/tutorials/first-pipeline/) π
-For comprehensive examples and tutorials, visit the [etl-examples](etl-examples/README.md) crate and our [documentation](https://supabase.github.io/etl).
+## π Learn More
-## Database Setup
+Our comprehensive documentation covers everything you need:
-Before running the examples, tests, or the API and replicator components, you'll need to set up a PostgreSQL database.
-We provide a convenient script to help you with this setup. For detailed instructions on how to use the database setup script, please refer to our [Database Setup Guide](docs/guides/database-setup.md).
+- **π [Tutorials](https://supabase.github.io/etl/tutorials/)** - Step-by-step learning experiences
+- **π§ [How-To Guides](https://supabase.github.io/etl/how-to/)** - Practical solutions for common tasks
+- **π [Reference](https://supabase.github.io/etl/reference/)** - Complete API documentation
+- **π‘ [Explanations](https://supabase.github.io/etl/explanation/)** - Architecture and design decisions
-## Running Tests
+## π¦ Installation
-To run the test suite:
+Add to your `Cargo.toml`:
-```bash
-cargo test --all-features
+```toml
+[dependencies]
+etl = { git = "https://github.com/supabase/etl" }
```
-## Docker
+> **Note**: ETL will be available on crates.io soon!
-The repository includes Docker support for both the `replicator` and `api` components:
+## ποΈ Development
```bash
-# Build replicator image
-docker build -f ./etl-replicator/Dockerfile .
+# Run tests
+cargo test --all-features
-# Build api image
+# Build Docker images
+docker build -f ./etl-replicator/Dockerfile .
docker build -f ./etl-api/Dockerfile .
```
-## Architecture
-
-For a detailed explanation of the ETL architecture and design decisions, please refer to our [Design Document](docs/design/etl-crate-design.md).
+## π License
-## Troubleshooting
-
-### Too Many Open Files Error
-
-If you see the following error when running tests on macOS:
-
-```
-called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
-```
-
-Raise the limit of open files per process with:
-
-```bash
-ulimit -n 10000
-```
-
-### Performance Considerations
-
-Currently, the system parallelizes the copying of different tables, but each individual table is still copied in sequential batches.
-This limits performance for large tables. We plan to address this once the ETL system reaches greater stability.
-
-## License
-
-Distributed under the Apache-2.0 License. See `LICENSE` for more information.
+Apache-2.0 License - see [`LICENSE`](LICENSE) for details.
---
+
diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md
index b62abfa82..1d757e915 100644
--- a/docs/explanation/architecture.md
+++ b/docs/explanation/architecture.md
@@ -1,4 +1,216 @@
-# ETL Architecture
+---
+type: explanation
+title: ETL Architecture Overview
+last_reviewed: 2025-01-14
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# ETL Architecture Overview
+
+**Understanding how ETL components work together to replicate data from PostgreSQL**
+
+ETL's architecture is built around a few key abstractions that work together to provide reliable, high-performance data replication. This document explains how these components interact and why they're designed the way they are.
+
+## The Big Picture
+
+At its core, ETL connects PostgreSQL's logical replication stream to configurable destination systems:
+
+```
+PostgreSQL ETL Pipeline Destination
+βββββββββββββββ ββββββββββββββββ βββββββββββββββ
+β WAL Stream ββββ·β Data Processing ββββββ·β BigQuery β
+β Publicationsβ β Batching β β Custom API β
+β Repl. Slots β β Error Handling β β Memory β
+βββββββββββββββ ββββββββββββββββββββ βββββββββββββββ
+ β
+ ββββββββΌβββββββ
+ β State Store β
+ β Schema Info β
+ βββββββββββββββ
+```
+
+The architecture separates concerns to make the system extensible, testable, and maintainable.
+
+## Core Components
+
+### Pipeline: The Orchestrator
+
+The [`Pipeline`](../reference/pipeline/) is ETL's central component that coordinates all other parts:
+
+**Responsibilities:**
+- Establishes connection to PostgreSQL replication stream
+- Manages initial table synchronization ("backfill")
+- Processes ongoing change events from WAL
+- Coordinates batching and delivery to destinations
+- Handles errors and retries
+
+**Why this design?** By centralizing orchestration in one component, we can ensure consistent behavior across all operations while keeping the interface simple for users.
+
+### Destinations: Where Data Goes
+
+The [`Destination`](../reference/destination-trait/) trait defines how data leaves ETL:
+
+```rust
+trait Destination {
+ async fn write_batch(&mut self, batch: BatchedData) -> Result<(), DestinationError>;
+ async fn flush(&mut self) -> Result<(), DestinationError>;
+}
+```
+
+**Built-in implementations:**
+- [`MemoryDestination`](../reference/memory-destination/) - For testing and development
+- [`BigQueryDestination`](../reference/bigquery-destination/) - Google BigQuery integration
+
+**Why this abstraction?** The trait allows ETL to support any output system while providing consistent batching, error handling, and retry behavior. New destinations get all the pipeline reliability features automatically.
+
+### Stores: Managing State and Schemas
+
+ETL uses two types of storage via the [`Store`](../reference/store-trait/) trait:
+
+**State storage** tracks replication progress:
+- WAL positions for recovery
+- Table synchronization status
+- Retry counters and backoff timers
+
+**Schema storage** manages table structures:
+- Column names and types
+- Primary key information
+- Schema evolution tracking
+
+**Implementation options:**
+- [`MemoryStore`](../reference/memory-store/) - Fast, but loses state on restart
+- [`PostgresStore`](../reference/postgres-store/) - Persistent, production-ready
+
+**Why separate storage?** This allows ETL to work in different deployment scenarios: development (memory), cloud-native (external databases), or embedded (SQLite, eventually).
+
+## Data Flow Architecture
+
+### Initial Synchronization
+
+When a pipeline starts, ETL performs a full synchronization of existing data:
+
+1. **Discovery:** Query PostgreSQL catalogs to find tables in the publication
+2. **Schema capture:** Extract column information and primary keys
+3. **Snapshot:** Copy existing rows in batches to the destination
+4. **State tracking:** Record progress to support resumption
+
+This ensures the destination has complete data before processing real-time changes.
+
+### Ongoing Replication
+
+After initial sync, ETL processes the PostgreSQL WAL stream:
+
+1. **Stream connection:** Attach to the replication slot
+2. **Event parsing:** Decode WAL records into structured changes
+3. **Batching:** Group changes for efficient destination writes
+4. **Delivery:** Send batches to destinations with retry logic
+5. **Acknowledgment:** Confirm WAL position to PostgreSQL
+
+### Error Handling Strategy
+
+ETL's error handling follows a layered approach:
+
+**Transient errors** (network issues, destination overload):
+- Exponential backoff retry
+- Circuit breaker to prevent cascading failures
+- Eventual resumption from last known good state
+
+**Permanent errors** (schema mismatches, authentication failures):
+- Immediate pipeline halt
+- Clear error reporting to operators
+- Manual intervention required
+
+**Partial failures** (some tables succeed, others fail):
+- Per-table error tracking
+- Independent retry schedules
+- Healthy tables continue processing
+
+## Scalability Patterns
+
+### Vertical Scaling
+
+ETL supports scaling up through configuration:
+
+- **Batch sizes:** Larger batches for higher throughput
+- **Worker threads:** Parallel table synchronization
+- **Buffer sizes:** More memory for better batching
+
+### Horizontal Scaling
+
+For massive databases, ETL supports:
+
+- **Multiple pipelines:** Split tables across different pipeline instances
+- **Destination sharding:** Route different tables to different destinations
+- **Read replicas:** Reduce load on primary database
+
+### Resource Management
+
+ETL is designed to be resource-predictable:
+
+- **Memory bounds:** Configurable limits on batch sizes and buffers
+- **Connection pooling:** Reuse PostgreSQL connections efficiently
+- **Backpressure:** Slow down if destinations can't keep up
+
+## Extension Points
+
+### Custom Destinations
+
+The [`Destination`](../reference/destination-trait/) trait makes it straightforward to add support for new output systems:
+
+- **REST APIs:** HTTP-based services
+- **Message queues:** Kafka, RabbitMQ, etc.
+- **Databases:** Any database with bulk insert capabilities
+- **File systems:** Parquet, JSON, CSV outputs
+
+### Custom Stores
+
+The [`Store`](../reference/store-trait/) trait allows different persistence strategies:
+
+- **Cloud databases:** RDS, CloudSQL, etc.
+- **Key-value stores:** Redis, DynamoDB
+- **Local storage:** SQLite, embedded databases
+
+### Plugin Architecture
+
+ETL's trait-based design enables:
+
+- **Runtime plugin loading:** Dynamic destination discovery
+- **Configuration-driven setup:** Choose implementations via config
+- **Testing isolation:** Mock implementations for unit tests
+
+## Design Philosophy
+
+### Correctness First
+
+ETL prioritizes data consistency over raw speed:
+- **At-least-once delivery:** Better to duplicate than lose data
+- **State durability:** Persist progress before acknowledging
+- **Schema safety:** Validate destination compatibility
+
+### Operational Simplicity
+
+ETL aims to be easy to operate:
+- **Clear error messages:** Actionable information for operators
+- **Predictable behavior:** Minimal configuration surprises
+- **Observable:** Built-in metrics and logging
+
+### Performance Where It Matters
+
+ETL optimizes the bottlenecks:
+- **Batching:** Amortize per-operation overhead
+- **Async I/O:** Maximize network utilization
+- **Zero-copy:** Minimize data copying where possible
+
+## Next Steps
+
+Now that you understand ETL's architecture:
+
+- **See it in action** β [Build your first pipeline](../tutorials/first-pipeline/)
+- **Learn about performance** β [Performance characteristics](performance/)
+- **Understand the foundation** β [PostgreSQL logical replication](replication/)
+- **Compare with alternatives** β [ETL vs. other tools](comparisons/)
+
+## See Also
+
+- [Design decisions](design/) - Why ETL is built the way it is
+- [Crate structure](crate-structure/) - How code is organized
+- [State management](state-management/) - Deep dive on state handling
\ No newline at end of file
diff --git a/docs/explanation/crate-structure.md b/docs/explanation/crate-structure.md
deleted file mode 100644
index 4450f2242..000000000
--- a/docs/explanation/crate-structure.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Crate Structure
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/explanation/design.md b/docs/explanation/design.md
deleted file mode 100644
index 1e0d3261a..000000000
--- a/docs/explanation/design.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Design Philosophy
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/explanation/index.md b/docs/explanation/index.md
index 445116398..92d766c35 100644
--- a/docs/explanation/index.md
+++ b/docs/explanation/index.md
@@ -1,4 +1,103 @@
-# Explanation
+---
+type: explanation
+title: Understanding ETL
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# Explanations
+
+**Deep dives into ETL concepts, architecture, and design decisions**
+
+Explanations help you build mental models of how ETL works and why it's designed the way it is. These topics provide background knowledge, compare alternatives, and explore the reasoning behind key architectural choices.
+
+## Core Concepts
+
+### [ETL Architecture Overview](architecture/)
+**The big picture of how ETL components work together**
+
+Understand the relationship between pipelines, destinations, stores, and the PostgreSQL replication protocol. Learn how data flows through the system and where extension points exist.
+
+*Topics covered:* Component architecture, data flow, extension patterns, scalability considerations.
+
+### [Why Postgres Logical Replication?](replication/)
+**The foundation technology and its trade-offs**
+
+Explore how PostgreSQL's logical replication works, why ETL builds on this foundation, and how it compares to other change data capture approaches.
+
+*Topics covered:* WAL-based replication, publications and subscriptions, alternatives like triggers or polling, performance characteristics.
+
+### [Design Decisions and Trade-offs](design/)
+**Key choices that shape ETL's behavior**
+
+Learn about the major design decisions in ETL, the problems they solve, and the trade-offs they represent. Understanding these choices helps you use ETL effectively.
+
+*Topics covered:* Rust as implementation language, async architecture, batching strategy, error handling philosophy.
+
+## System Characteristics
+
+### [Performance and Scalability](performance/)
+**How ETL behaves under different loads and configurations**
+
+Understand ETL's performance characteristics, bottlenecks, and scaling patterns. Learn how different configuration choices affect throughput and resource usage.
+
+*Topics covered:* Throughput patterns, memory usage, network considerations, scaling strategies.
+
+### [Crate Structure and Organization](crate-structure/)
+**How ETL's modular design supports different use cases**
+
+Explore how ETL is organized into multiple crates, what each crate provides, and how they work together. Understand the reasoning behind this modular architecture.
+
+*Topics covered:* Core vs. optional crates, dependency management, feature flags, extensibility.
+
+## Integration Patterns
+
+### [Working with Destinations](destinations-explained/)
+**Understanding the destination abstraction and ecosystem**
+
+Learn how destinations work conceptually, why they're designed as they are, and how to choose between different destination options.
+
+*Topics covered:* Destination trait design, batching strategy, error handling patterns, building ecosystems.
+
+### [State Management Philosophy](state-management/)
+**How ETL tracks replication state and schema changes**
+
+Understand ETL's approach to managing replication state, handling schema evolution, and ensuring consistency across restarts.
+
+*Topics covered:* State storage options, schema change handling, consistency guarantees, recovery behavior.
+
+## Broader Context
+
+### [ETL vs. Other Replication Tools](comparisons/)
+**How ETL fits in the data replication landscape**
+
+Compare ETL to other PostgreSQL replication tools, general-purpose ETL systems, and cloud-managed solutions. Understand when to choose each approach.
+
+*Topics covered:* Tool comparisons, use case fit, ecosystem integration, operational trade-offs.
+
+### [Future Directions](roadmap/)
+**Where ETL is heading and how to influence its evolution**
+
+Learn about planned features, architectural improvements, and community priorities. Understand how to contribute to ETL's development.
+
+*Topics covered:* Planned features, architectural evolution, community involvement, contribution guidelines.
+
+## Reading Guide
+
+**New to data replication?** Start with [Postgres Logical Replication](replication/) to understand the foundation technology.
+
+**Coming from other tools?** Jump to [ETL vs. Other Tools](comparisons/) to see how ETL fits in the landscape.
+
+**Planning a production deployment?** Read [Architecture](architecture/) and [Performance](performance/) to understand system behavior.
+
+**Building extensions?** Focus on [Crate Structure](crate-structure/) and [Destinations](destinations-explained/) for extension patterns.
+
+## Next Steps
+
+After building conceptual understanding:
+- **Start building** β [Tutorials](../tutorials/)
+- **Solve specific problems** β [How-To Guides](../how-to/)
+- **Look up technical details** β [Reference](../reference/)
+
+## Contributing to Explanations
+
+Found gaps in these explanations? See something that could be clearer?
+[Open an issue](https://github.com/supabase/etl/issues) or contribute improvements to help other users build better mental models of ETL.
\ No newline at end of file
diff --git a/docs/explanation/performance.md b/docs/explanation/performance.md
deleted file mode 100644
index c963daad9..000000000
--- a/docs/explanation/performance.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Performance Model
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/explanation/replication.md b/docs/explanation/replication.md
index 04613fb6e..2aa7a5442 100644
--- a/docs/explanation/replication.md
+++ b/docs/explanation/replication.md
@@ -1,4 +1,271 @@
-# Replication Protocol
+---
+type: explanation
+title: Why PostgreSQL Logical Replication?
+last_reviewed: 2025-01-14
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# Why PostgreSQL Logical Replication?
+
+**Understanding the foundation technology that powers ETL and its advantages over alternatives**
+
+PostgreSQL logical replication is the core technology that ETL builds upon. This document explains how it works, why it's well-suited for ETL use cases, and how it compares to other change data capture approaches.
+
+## What is Logical Replication?
+
+Logical replication streams changes from PostgreSQL databases at the **logical level** (rows and operations) rather than the **physical level** (disk blocks and binary changes). This means ETL receives structured, interpretable data changes that can be easily transformed and routed to different destinations.
+
+### Key Characteristics
+
+- **Row-based:** Changes are captured as individual row operations (INSERT, UPDATE, DELETE)
+- **Selective:** Choose which tables to replicate via publications
+- **Real-time:** Changes stream immediately as they're committed
+- **Durable:** Uses PostgreSQL's Write-Ahead Log (WAL) for reliability
+- **Ordered:** Changes arrive in commit order within each table
+
+## How Logical Replication Works
+
+### The WAL-Based Foundation
+
+PostgreSQL's logical replication is built on its Write-Ahead Log (WAL):
+
+1. **Transaction commits** are written to WAL before being applied to data files
+2. **Logical decoding** translates WAL entries into structured change events
+3. **Replication slots** track which changes have been consumed
+4. **Publications** define which tables and operations to replicate
+
+```
+Application PostgreSQL ETL Pipeline
+ β β β
+ βββββ INSERT βββββ β
+ β βββββ WAL entry βββββββββ
+ β β βββββ Structured change
+ β β β (table, operation, data)
+ βββββ SUCCESS ββββ β
+```
+
+### Publications and Subscriptions
+
+**Publications** define what to replicate:
+
+```sql
+-- Replicate specific tables
+CREATE PUBLICATION app_data FOR TABLE users, orders, products;
+
+-- Replicate all tables (use with caution)
+CREATE PUBLICATION all_data FOR ALL TABLES;
+
+-- Replicate only specific operations
+CREATE PUBLICATION inserts_only FOR TABLE users WITH (publish = 'insert');
+```
+
+**Replication slots** track consumption:
+
+```sql
+-- ETL creates and manages these automatically
+SELECT pg_create_logical_replication_slot('etl_slot', 'pgoutput');
+```
+
+### Data Consistency Guarantees
+
+Logical replication provides strong consistency:
+
+- **Transactional consistency:** All changes from a transaction arrive together
+- **Ordering guarantees:** Changes within a table maintain commit order
+- **Durability:** WAL ensures no committed changes are lost
+- **At-least-once delivery:** Changes may be delivered multiple times but never lost
+
+## Why ETL Uses Logical Replication
+
+### Real-Time Performance
+
+Unlike polling-based approaches, logical replication provides **immediate change notification**:
+
+- **Low latency:** Changes stream as they happen (milliseconds to seconds)
+- **No database overhead:** No impact on application queries
+- **Efficient bandwidth:** Only actual changes are transmitted
+
+### Operational Simplicity
+
+Logical replication is **built into PostgreSQL**:
+
+- **No triggers to maintain:** Changes are captured automatically
+- **No application changes:** Existing applications work unchanged
+- **Reliable recovery:** Built-in WAL retention and replay
+- **Minimal configuration:** Just enable logical replication and create publications
+
+### Complete Change Capture
+
+Captures **all types of changes**:
+
+- **DML operations:** INSERT, UPDATE, DELETE operations
+- **Bulk operations:** COPY, bulk updates, and imports
+- **Transaction boundaries:** Commit and rollback information
+- **Schema information:** Column types and table structure
+
+## Comparing Replication Approaches
+
+### Logical Replication vs. Physical Replication
+
+| Aspect | Logical Replication | Physical Replication |
+|--------|-------------------|-------------------|
+| **Granularity** | Table/row level | Entire database cluster |
+| **Selectivity** | Choose specific tables | All or nothing |
+| **Version compatibility** | Cross-version support | Same major version only |
+| **Overhead** | Moderate (logical decoding) | Low (binary copy) |
+| **Use case** | ETL, selective sync | Backup, disaster recovery |
+
+### Logical Replication vs. Trigger-Based CDC
+
+| Aspect | Logical Replication | Trigger-Based CDC |
+|--------|-------------------|-----------------|
+| **Performance impact** | Minimal on source | High (trigger execution) |
+| **Change coverage** | All operations including bulk | Only row-by-row operations |
+| **Maintenance** | Built-in PostgreSQL feature | Custom triggers to maintain |
+| **Reliability** | WAL-based durability | Depends on trigger implementation |
+| **Schema changes** | Handles automatically | Triggers need updates |
+
+### Logical Replication vs. Query-Based Polling
+
+| Aspect | Logical Replication | Query-Based Polling |
+|--------|-------------------|-------------------|
+| **Latency** | Real-time (seconds) | Polling interval (minutes) |
+| **Source load** | Minimal | Repeated full table scans |
+| **Delete detection** | Automatic | Requires soft deletes |
+| **Infrastructure** | Simple (ETL + PostgreSQL) | Complex (schedulers, state tracking) |
+| **Change ordering** | Guaranteed | Can miss intermediate states |
+
+## Limitations and Considerations
+
+### What Logical Replication Doesn't Capture
+
+- **DDL operations:** Schema changes (CREATE, ALTER, DROP) are not replicated
+- **TRUNCATE operations:** Not captured by default (can be enabled in PostgreSQL 11+)
+- **Sequence changes:** nextval() calls on sequences
+- **Large object changes:** BLOB/CLOB modifications
+- **Temporary table operations:** Temp tables are not replicated
+
+### Performance Considerations
+
+**WAL generation overhead:**
+- Logical replication increases WAL volume by ~10-30%
+- More detailed logging required for logical decoding
+- May require WAL retention tuning for catch-up scenarios
+
+**Replication slot management:**
+- Unused slots prevent WAL cleanup (disk space growth)
+- Slow consumers can cause WAL buildup
+- Need monitoring and automatic cleanup
+
+**Network bandwidth:**
+- All change data flows over network
+- Large transactions can cause bandwidth spikes
+- Consider batching and compression for high-volume scenarios
+
+## ETL's Enhancements to Logical Replication
+
+ETL builds on PostgreSQL's logical replication with additional features:
+
+### Intelligent Batching
+
+- **Configurable batch sizes:** Balance latency vs. throughput
+- **Time-based batching:** Ensure maximum latency bounds
+- **Backpressure handling:** Slow down if destinations can't keep up
+
+### Error Handling and Recovery
+
+- **Retry logic:** Handle transient destination failures
+- **Circuit breakers:** Prevent cascade failures
+- **State persistence:** Resume from exact WAL positions after restarts
+
+### Multi-Destination Routing
+
+- **Fan-out replication:** Send same data to multiple destinations
+- **Selective routing:** Different tables to different destinations
+- **Transformation pipelines:** Modify data en route to destinations
+
+### Operational Features
+
+- **Metrics and monitoring:** Track replication lag, throughput, errors
+- **Schema change detection:** Automatic handling of table structure changes
+- **Resource management:** Memory and connection pooling
+
+## Use Cases and Patterns
+
+### Real-Time Analytics
+
+Stream transactional data to analytical systems:
+
+```
+PostgreSQL (OLTP) ββETLβββ· BigQuery (OLAP)
+ β β
+ βββ Users insert orders βββ Real-time dashboards
+ βββ Inventory updates βββ Business intelligence
+ βββ Payment processing βββ Data science workflows
+```
+
+### Event-Driven Architecture
+
+Use database changes as event sources:
+
+```
+PostgreSQL ββETLβββ· Event Bus βββ· Microservices
+ β β β
+ βββ Order created βββ Events βββ Email service
+ βββ User updated βββ Topics βββ Notification service
+ βββ Inventory low βββ Streams βββ Recommendation engine
+```
+
+### Data Lake Ingestion
+
+Continuously populate data lakes:
+
+```
+PostgreSQL ββETLβββ· Data Lake βββ· ML/Analytics
+ β β β
+ βββ App database βββ Parquet βββ Feature stores
+ βββ User behavior βββ Delta βββ Model training
+ βββ Business data βββ Iceberg βββ Batch processing
+```
+
+## Choosing Logical Replication
+
+**Logical replication is ideal when you need:**
+
+- Real-time or near real-time change capture
+- Selective table replication
+- Cross-version or cross-platform data movement
+- Minimal impact on source database performance
+- Built-in reliability and durability guarantees
+
+**Consider alternatives when you need:**
+
+- **Immediate consistency:** Use synchronous replication or 2PC
+- **Schema change replication:** Consider schema migration tools
+- **Cross-database replication:** Look at database-specific solutions
+- **Complex transformations:** ETL tools might be simpler
+
+## Future of Logical Replication
+
+PostgreSQL continues to enhance logical replication:
+
+- **Row-level security:** Filter replicated data by user permissions
+- **Binary protocol improvements:** Faster, more efficient encoding
+- **Cross-version compatibility:** Better support for version differences
+- **Performance optimizations:** Reduced overhead and increased throughput
+
+ETL evolves alongside these improvements, providing a stable interface while leveraging new capabilities as they become available.
+
+## Next Steps
+
+Now that you understand the foundation:
+
+- **See it in practice** β [ETL Architecture](architecture/)
+- **Compare alternatives** β [ETL vs. Other Tools](comparisons/)
+- **Build your first pipeline** β [First Pipeline Tutorial](../tutorials/first-pipeline/)
+- **Configure PostgreSQL** β [PostgreSQL Setup](../how-to/configure-postgres/)
+
+## See Also
+
+- [PostgreSQL Logical Replication Docs](https://www.postgresql.org/docs/current/logical-replication.html) - Official documentation
+- [Design decisions](design/) - Why ETL is built the way it is
+- [Performance characteristics](performance/) - Understanding ETL's behavior under load
\ No newline at end of file
diff --git a/docs/getting-started/first-pipeline.md b/docs/getting-started/first-pipeline.md
deleted file mode 100644
index 002d829a3..000000000
--- a/docs/getting-started/first-pipeline.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Your First Pipeline
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md
deleted file mode 100644
index a02e348a4..000000000
--- a/docs/getting-started/installation.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Installation
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
deleted file mode 100644
index 56ac12024..000000000
--- a/docs/getting-started/quickstart.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Quick Start
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/how-to/configure-postgres.md b/docs/how-to/configure-postgres.md
index b208e601d..0e42e743e 100644
--- a/docs/how-to/configure-postgres.md
+++ b/docs/how-to/configure-postgres.md
@@ -1,4 +1,326 @@
-# Configure PostgreSQL
+---
+type: how-to
+audience: developers, database administrators
+prerequisites:
+ - PostgreSQL server access with superuser privileges
+ - Understanding of PostgreSQL configuration
+ - Knowledge of PostgreSQL user management
+version_last_tested: 0.1.0
+last_reviewed: 2025-01-14
+risk_level: medium
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# Configure PostgreSQL for Replication
+
+**Set up PostgreSQL with the correct permissions and settings for ETL logical replication**
+
+This guide walks you through configuring PostgreSQL to support logical replication for ETL, including WAL settings, user permissions, and publication setup.
+
+## Goal
+
+Configure PostgreSQL to:
+
+- Enable logical replication at the server level
+- Create appropriate user accounts with minimal required permissions
+- Set up publications for the tables you want to replicate
+- Configure replication slots for reliable WAL consumption
+
+## Prerequisites
+
+- PostgreSQL 12 or later
+- Superuser access to the PostgreSQL server
+- Ability to restart PostgreSQL server (for configuration changes)
+- Network connectivity from ETL to PostgreSQL
+
+## Decision Points
+
+**Choose your approach based on your environment:**
+
+| Environment | Security Level | Recommended Setup |
+|-------------|----------------|-------------------|
+| **Development** | Low | Single superuser account |
+| **Staging** | Medium | Dedicated replication user with specific permissions |
+| **Production** | High | Least-privilege user with row-level security |
+
+## Configuration Steps
+
+### Step 1: Enable Logical Replication
+
+Edit your PostgreSQL configuration file (usually `postgresql.conf`):
+
+```ini
+# Enable logical replication
+wal_level = logical
+
+# Increase max replication slots (default is 10)
+max_replication_slots = 20
+
+# Increase max WAL senders (default is 10)
+max_wal_senders = 20
+
+# Optional: Increase checkpoint segments for better performance
+checkpoint_segments = 32
+checkpoint_completion_target = 0.9
+```
+
+**If using PostgreSQL 13+**, also consider:
+
+```ini
+# Enable publication of truncate operations (optional)
+wal_sender_timeout = 60s
+
+# Improve WAL retention for catching up
+wal_keep_size = 1GB
+```
+
+**Restart PostgreSQL** to apply these settings:
+
+```bash
+# On systemd systems
+sudo systemctl restart postgresql
+
+# On other systems
+sudo pg_ctl restart -D /path/to/data/directory
+```
+
+### Step 2: Create a Replication User
+
+Create a dedicated user with appropriate permissions:
+
+```sql
+-- Create replication user
+CREATE USER etl_replicator WITH PASSWORD 'secure_password_here';
+
+-- Grant replication privileges
+ALTER USER etl_replicator REPLICATION;
+
+-- Grant connection privileges
+GRANT CONNECT ON DATABASE your_database TO etl_replicator;
+
+-- Grant schema usage (adjust schema names as needed)
+GRANT USAGE ON SCHEMA public TO etl_replicator;
+
+-- Grant select on specific tables (more secure than all tables)
+GRANT SELECT ON TABLE users, orders, products TO etl_replicator;
+
+-- Alternative: Grant select on all tables in schema (less secure but easier)
+-- GRANT SELECT ON ALL TABLES IN SCHEMA public TO etl_replicator;
+-- ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO etl_replicator;
+```
+
+### Step 3: Configure Connection Security
+
+**For development (less secure):**
+
+Edit `pg_hba.conf` to allow connections:
+
+```
+# Allow local connections with password
+host your_database etl_replicator localhost md5
+
+# Allow connections from specific IP range
+host your_database etl_replicator 10.0.0.0/8 md5
+```
+
+**For production (more secure):**
+
+Use SSL/TLS connections:
+
+```
+# Require SSL connections
+hostssl your_database etl_replicator 10.0.0.0/8 md5
+```
+
+Reload PostgreSQL configuration:
+
+```sql
+SELECT pg_reload_conf();
+```
+
+### Step 4: Create Publications
+
+Connect as a superuser or table owner and create publications:
+
+```sql
+-- Create publication for specific tables
+CREATE PUBLICATION etl_publication FOR TABLE users, orders, products;
+
+-- Alternative: Create publication for all tables (use with caution)
+-- CREATE PUBLICATION etl_publication FOR ALL TABLES;
+
+-- View existing publications
+SELECT * FROM pg_publication;
+
+-- View tables in a publication
+SELECT * FROM pg_publication_tables WHERE pubname = 'etl_publication';
+```
+
+### Step 5: Test the Configuration
+
+Verify your setup works:
+
+```sql
+-- Test replication slot creation (as etl_replicator user)
+SELECT pg_create_logical_replication_slot('test_slot', 'pgoutput');
+
+-- Verify the slot was created
+SELECT * FROM pg_replication_slots WHERE slot_name = 'test_slot';
+
+-- Clean up test slot
+SELECT pg_drop_replication_slot('test_slot');
+```
+
+### Step 6: Configure ETL Connection
+
+Update your ETL configuration to use the new setup:
+
+```rust
+use etl::config::{PgConnectionConfig, TlsConfig};
+
+let pg_config = PgConnectionConfig {
+ host: "your-postgres-server.com".to_string(),
+ port: 5432,
+ name: "your_database".to_string(),
+ username: "etl_replicator".to_string(),
+ password: Some("secure_password_here".into()),
+ tls: TlsConfig {
+ enabled: true, // Enable for production
+ trusted_root_certs: "/path/to/ca-certificates.crt".to_string(),
+ },
+};
+```
+
+## Validation
+
+Verify your configuration:
+
+### Test 1: Connection Test
+
+```bash
+# Test connection from ETL server
+psql -h your-postgres-server.com -p 5432 -U etl_replicator -d your_database -c "SELECT 1;"
+```
+
+### Test 2: Replication Permissions
+
+```sql
+-- As etl_replicator user, verify you can:
+-- 1. Create replication slots
+SELECT pg_create_logical_replication_slot('validation_slot', 'pgoutput');
+
+-- 2. Read from tables in the publication
+SELECT COUNT(*) FROM users;
+
+-- 3. Access publication information
+SELECT * FROM pg_publication_tables WHERE pubname = 'etl_publication';
+
+-- Clean up
+SELECT pg_drop_replication_slot('validation_slot');
+```
+
+### Test 3: ETL Pipeline Test
+
+Run a simple ETL pipeline to verify end-to-end functionality:
+
+```rust
+// Use your configuration to create a test pipeline
+// This should complete initial sync successfully
+```
+
+## Troubleshooting
+
+### "ERROR: logical decoding requires wal_level >= logical"
+
+**Solution:** Update `postgresql.conf` with `wal_level = logical` and restart PostgreSQL.
+
+### "ERROR: permission denied to create replication slot"
+
+**Solutions:**
+- Ensure user has `REPLICATION` privilege: `ALTER USER etl_replicator REPLICATION;`
+- Check if you're connecting to the right database
+- Verify `pg_hba.conf` allows the connection
+
+### "ERROR: publication does not exist"
+
+**Solutions:**
+- Verify publication name matches exactly: `SELECT * FROM pg_publication;`
+- Ensure you're connected to the correct database
+- Check if publication was created by another user
+
+### "Connection refused" or timeout issues
+
+**Solutions:**
+- Check `postgresql.conf` has `listen_addresses = '*'` (or specific IPs)
+- Verify `pg_hba.conf` allows your connection
+- Check firewall settings on PostgreSQL server
+- Confirm PostgreSQL is running: `sudo systemctl status postgresql`
+
+### "ERROR: too many replication slots"
+
+**Solutions:**
+- Increase `max_replication_slots` in `postgresql.conf`
+- Clean up unused replication slots: `SELECT pg_drop_replication_slot('unused_slot');`
+- Monitor slot usage: `SELECT * FROM pg_replication_slots;`
+
+## Security Best Practices
+
+### Principle of Least Privilege
+
+- **Don't use superuser accounts** for ETL in production
+- **Grant SELECT only on tables** that need replication
+- **Use specific database names** instead of template1 or postgres
+- **Limit connection sources** with specific IP ranges in pg_hba.conf
+
+### Network Security
+
+- **Always use SSL/TLS** in production: `hostssl` in pg_hba.conf
+- **Use certificate authentication** for highest security
+- **Restrict network access** with firewalls and VPCs
+- **Monitor connections** with log_connections = on
+
+### Operational Security
+
+- **Rotate passwords regularly** for replication users
+- **Monitor replication slots** for unused or stalled slots
+- **Set up alerting** for replication lag and failures
+- **Audit publication changes** in your change management process
+
+## Performance Considerations
+
+### WAL Configuration
+
+```ini
+# For high-throughput systems
+wal_buffers = 16MB
+checkpoint_completion_target = 0.9
+wal_writer_delay = 200ms
+commit_delay = 1000
+```
+
+### Monitoring Queries
+
+Track replication performance:
+
+```sql
+-- Monitor replication lag
+SELECT
+ slot_name,
+ pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) as lag
+FROM pg_replication_slots;
+
+-- Monitor WAL generation rate
+SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')) as total_wal;
+```
+
+## Next Steps
+
+- **Build your first pipeline** β [First ETL Pipeline](../tutorials/first-pipeline/)
+- **Handle schema changes** β [Schema Change Management](schema-changes/)
+- **Optimize performance** β [Performance Tuning](performance/)
+- **Set up monitoring** β [Debugging Guide](debugging/)
+
+## See Also
+
+- [PostgreSQL Logical Replication Documentation](https://www.postgresql.org/docs/current/logical-replication.html)
+- [ETL Architecture](../explanation/architecture/) - Understanding how ETL uses these settings
+- [Connection Configuration Reference](../reference/pg-connection-config/) - All available connection options
\ No newline at end of file
diff --git a/docs/how-to/custom-destinations.md b/docs/how-to/custom-destinations.md
index 2a4a34346..87a0c1d9b 100644
--- a/docs/how-to/custom-destinations.md
+++ b/docs/how-to/custom-destinations.md
@@ -1,4 +1,293 @@
-# Implement Custom Destinations
+---
+type: how-to
+audience: developers
+prerequisites:
+ - Complete first pipeline tutorial
+ - Rust async/await knowledge
+ - Understanding of your target system's API
+version_last_tested: 0.1.0
+last_reviewed: 2025-01-14
+risk_level: medium
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# Build Custom Destinations
+
+**Create destination implementations for systems not supported out of the box**
+
+This guide walks you through implementing the [`Destination`](../../reference/destination-trait/) trait to send replicated data to custom storage systems, APIs, or data warehouses.
+
+## Goal
+
+Build a custom destination that receives batched data changes from ETL and writes them to your target system with proper error handling and retry logic.
+
+## Prerequisites
+
+- Completed [first pipeline tutorial](../../tutorials/first-pipeline/)
+- Access to your target system (database, API, etc.)
+- Understanding of your target system's data ingestion patterns
+- Rust knowledge of traits and async programming
+
+## Decision Points
+
+**Choose your approach based on your target system:**
+
+| Target System | Key Considerations | Recommended Pattern |
+|---------------|-------------------|-------------------|
+| **REST API** | Rate limiting, authentication | Batch with retry backoff |
+| **Database** | Transaction support, connection pooling | Bulk insert transactions |
+| **File System** | File formats, compression | Append or rotate files |
+| **Message Queue** | Ordering guarantees, partitioning | Individual message sending |
+
+## Implementation Steps
+
+### Step 1: Define Your Destination Struct
+
+Create a new file `src/my_destination.rs`:
+
+```rust
+use etl::destination::base::{Destination, DestinationError};
+use etl::types::pipeline::BatchedData;
+use async_trait::async_trait;
+
+pub struct MyCustomDestination {
+ // Configuration fields
+ api_endpoint: String,
+ auth_token: String,
+ batch_size: usize,
+}
+
+impl MyCustomDestination {
+ pub fn new(api_endpoint: String, auth_token: String) -> Self {
+ Self {
+ api_endpoint,
+ auth_token,
+ batch_size: 1000,
+ }
+ }
+}
+```
+
+### Step 2: Implement the Destination Trait
+
+Add the core trait implementation:
+
+```rust
+#[async_trait]
+impl Destination for MyCustomDestination {
+ async fn write_batch(&mut self, batch: BatchedData) -> Result<(), DestinationError> {
+ // Convert ETL data to your target format
+ let payload = self.convert_batch_to_target_format(&batch)?;
+
+ // Send to your target system with retries
+ self.send_with_retries(payload).await?;
+
+ Ok(())
+ }
+
+ async fn flush(&mut self) -> Result<(), DestinationError> {
+ // Implement any final cleanup or flush logic
+ Ok(())
+ }
+}
+```
+
+### Step 3: Implement Data Conversion
+
+Add conversion logic specific to your target system:
+
+```rust
+impl MyCustomDestination {
+ fn convert_batch_to_target_format(&self, batch: &BatchedData) -> Result {
+ let mut records = Vec::new();
+
+ for change in &batch.changes {
+ match change.operation {
+ Operation::Insert => {
+ records.push(json!({
+ "action": "insert",
+ "table": change.table_name,
+ "data": change.new_values,
+ "timestamp": change.timestamp
+ }));
+ }
+ Operation::Update => {
+ records.push(json!({
+ "action": "update",
+ "table": change.table_name,
+ "old_data": change.old_values,
+ "new_data": change.new_values,
+ "timestamp": change.timestamp
+ }));
+ }
+ Operation::Delete => {
+ records.push(json!({
+ "action": "delete",
+ "table": change.table_name,
+ "data": change.old_values,
+ "timestamp": change.timestamp
+ }));
+ }
+ }
+ }
+
+ serde_json::to_string(&records)
+ .map_err(|e| DestinationError::SerializationError(e.to_string()))
+ }
+}
+```
+
+### Step 4: Add Error Handling and Retries
+
+Implement robust error handling:
+
+```rust
+impl MyCustomDestination {
+ async fn send_with_retries(&self, payload: String) -> Result<(), DestinationError> {
+ let mut attempts = 0;
+ let max_attempts = 3;
+
+ while attempts < max_attempts {
+ match self.send_to_target(&payload).await {
+ Ok(_) => return Ok(()),
+ Err(e) if self.is_retryable_error(&e) => {
+ attempts += 1;
+ if attempts < max_attempts {
+ let backoff_ms = 2_u64.pow(attempts) * 1000;
+ tokio::time::sleep(Duration::from_millis(backoff_ms)).await;
+ continue;
+ }
+ }
+ Err(e) => return Err(e),
+ }
+ }
+
+ Err(DestinationError::RetryExhausted(format!("Failed after {} attempts", max_attempts)))
+ }
+
+ async fn send_to_target(&self, payload: &str) -> Result<(), DestinationError> {
+ let client = reqwest::Client::new();
+ let response = client
+ .post(&self.api_endpoint)
+ .header("Authorization", format!("Bearer {}", self.auth_token))
+ .header("Content-Type", "application/json")
+ .body(payload.to_string())
+ .send()
+ .await
+ .map_err(|e| DestinationError::NetworkError(e.to_string()))?;
+
+ if !response.status().is_success() {
+ return Err(DestinationError::HttpError(
+ response.status().as_u16(),
+ format!("Request failed: {}", response.text().await.unwrap_or_default())
+ ));
+ }
+
+ Ok(())
+ }
+
+ fn is_retryable_error(&self, error: &DestinationError) -> bool {
+ match error {
+ DestinationError::NetworkError(_) => true,
+ DestinationError::HttpError(status, _) => {
+ // Retry on 5xx server errors and some 4xx errors
+ *status >= 500 || *status == 429
+ }
+ _ => false,
+ }
+ }
+}
+```
+
+### Step 5: Use Your Custom Destination
+
+In your main application:
+
+```rust
+use etl::pipeline::Pipeline;
+use etl::store::both::memory::MemoryStore;
+
+#[tokio::main]
+async fn main() -> Result<(), Box> {
+ let store = MemoryStore::new();
+ let destination = MyCustomDestination::new(
+ "https://api.example.com/ingest".to_string(),
+ "your-auth-token".to_string()
+ );
+
+ let mut pipeline = Pipeline::new(pipeline_config, store, destination);
+ pipeline.start().await?;
+
+ Ok(())
+}
+```
+
+## Validation
+
+Test your custom destination:
+
+1. **Unit tests** for data conversion logic
+2. **Integration tests** with a test target system
+3. **Error simulation** to verify retry behavior
+4. **Load testing** with realistic data volumes
+
+```rust
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[tokio::test]
+ async fn test_data_conversion() {
+ let destination = MyCustomDestination::new(
+ "http://test".to_string(),
+ "token".to_string()
+ );
+
+ // Create test batch
+ let batch = create_test_batch();
+
+ // Test conversion
+ let result = destination.convert_batch_to_target_format(&batch);
+ assert!(result.is_ok());
+
+ // Verify JSON structure
+ let json: serde_json::Value = serde_json::from_str(&result.unwrap()).unwrap();
+ assert!(json.is_array());
+ }
+}
+```
+
+## Troubleshooting
+
+**Data not appearing in target system:**
+- Enable debug logging to see conversion output
+- Check target system's ingestion logs
+- Verify authentication credentials
+
+**High error rates:**
+- Review retry logic and backoff timing
+- Check if target system has rate limits
+- Consider implementing circuit breaker pattern
+
+**Performance issues:**
+- Profile data conversion logic
+- Consider batch size tuning
+- Implement connection pooling for database destinations
+
+## Rollback
+
+If your destination isn't working:
+1. Switch back to [`MemoryDestination`](../../reference/memory-destination/) for testing
+2. Check ETL logs for specific error messages
+3. Test destination logic in isolation
+
+## Next Steps
+
+- **Add monitoring** β [Performance monitoring](performance/)
+- **Handle schema changes** β [Schema change handling](schema-changes/)
+- **Production deployment** β [Debugging guide](debugging/)
+
+## See Also
+
+- [Destination API Reference](../../reference/destination-trait/) - Complete trait documentation
+- [BigQuery destination example](https://github.com/supabase/etl/blob/main/etl-destinations/src/bigquery/) - Real-world implementation
+- [Error handling patterns](../../explanation/error-handling/) - Best practices for error management
\ No newline at end of file
diff --git a/docs/how-to/debugging.md b/docs/how-to/debugging.md
index 199c490ef..92a509e2b 100644
--- a/docs/how-to/debugging.md
+++ b/docs/how-to/debugging.md
@@ -1,4 +1,490 @@
-# Debug Replication Issues
+---
+type: how-to
+audience: developers, operators
+prerequisites:
+ - Basic understanding of ETL pipelines
+ - Access to PostgreSQL and ETL logs
+ - Familiarity with ETL configuration
+version_last_tested: 0.1.0
+last_reviewed: 2025-01-14
+risk_level: low
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# Debug Pipeline Issues
+
+**Diagnose and resolve common ETL pipeline problems quickly and systematically**
+
+This guide helps you identify, diagnose, and fix issues with ETL pipelines using a structured troubleshooting approach.
+
+## Goal
+
+Learn to systematically debug ETL issues:
+
+- Identify the source of pipeline problems
+- Use logging and monitoring to diagnose issues
+- Apply appropriate fixes for common failure patterns
+- Prevent similar issues in the future
+
+## Prerequisites
+
+- Running ETL pipeline (even if failing)
+- Access to PostgreSQL server and logs
+- ETL application logs and configuration
+- Basic SQL knowledge for diagnostic queries
+
+## Decision Points
+
+**Choose your debugging approach based on symptoms:**
+
+| Symptom | Most Likely Cause | Start Here |
+|---------|-------------------|------------|
+| Pipeline won't start | Configuration/connection issues | [Connection Problems](#connection-problems) |
+| Pipeline starts but no data | Publication/replication setup | [Replication Issues](#replication-issues) |
+| Pipeline stops unexpectedly | Resource/permission problems | [Runtime Failures](#runtime-failures) |
+| Data missing or incorrect | Schema/destination issues | [Data Quality Problems](#data-quality-problems) |
+| Slow performance | Batching/network issues | [Performance Issues](#performance-issues) |
+
+## Systematic Debugging Process
+
+### Step 1: Gather Information
+
+Before diving into fixes, collect diagnostic information:
+
+**Check ETL logs:**
+```bash
+# If using structured logging
+grep -E "(ERROR|FATAL|PANIC)" etl.log | tail -20
+
+# Look for specific patterns
+grep "connection" etl.log
+grep "replication slot" etl.log
+grep "publication" etl.log
+```
+
+**Check PostgreSQL logs:**
+```sql
+-- Recent PostgreSQL errors
+SELECT pg_current_logfile();
+-- Then check that file for errors around your pipeline start time
+```
+
+**Collect system information:**
+```sql
+-- Check replication slots
+SELECT slot_name, slot_type, active, confirmed_flush_lsn
+FROM pg_replication_slots;
+
+-- Check publications
+SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete
+FROM pg_publication;
+
+-- Check database connections
+SELECT pid, usename, application_name, state, query_start
+FROM pg_stat_activity
+WHERE application_name LIKE '%etl%';
+```
+
+### Step 2: Identify the Problem Category
+
+Use this decision tree to narrow down the issue:
+
+```
+Pipeline fails to start?
+ββ YES β Connection Problems
+ββ NO β Pipeline starts but...
+ ββ No data flowing β Replication Issues
+ ββ Pipeline crashes β Runtime Failures
+ ββ Wrong/missing data β Data Quality Problems
+ ββ Slow performance β Performance Issues
+```
+
+## Common Problem Categories
+
+### Connection Problems
+
+**Symptoms:**
+- "Connection refused" errors
+- "Authentication failed" errors
+- "Database does not exist" errors
+- Pipeline exits immediately on startup
+
+**Diagnosis:**
+
+```bash
+# Test basic connection
+psql -h your-host -p 5432 -U etl_user -d your_db -c "SELECT 1;"
+
+# Test from ETL server specifically
+# (run this from where ETL runs)
+telnet your-host 5432
+```
+
+**Common causes and fixes:**
+
+| Error Message | Cause | Fix |
+|--------------|-------|-----|
+| "Connection refused" | PostgreSQL not running or firewall | Check `systemctl status postgresql` and firewall rules |
+| "Authentication failed" | Wrong password/user | Verify credentials and `pg_hba.conf` |
+| "Database does not exist" | Wrong database name | Check database name in connection string |
+| "SSL required" | TLS configuration mismatch | Update `TlsConfig` to match server requirements |
+
+### Replication Issues
+
+**Symptoms:**
+- Pipeline starts successfully but no data flows
+- "Publication not found" errors
+- "Replication slot already exists" errors
+- Initial sync never completes
+
+**Diagnosis:**
+
+```sql
+-- Check if publication exists and has tables
+SELECT schemaname, tablename
+FROM pg_publication_tables
+WHERE pubname = 'your_publication_name';
+
+-- Check if replication slot is active
+SELECT slot_name, active, confirmed_flush_lsn
+FROM pg_replication_slots
+WHERE slot_name = 'your_slot_name';
+
+-- Check table permissions
+SELECT grantee, table_schema, table_name, privilege_type
+FROM information_schema.role_table_grants
+WHERE grantee = 'etl_user' AND table_name = 'your_table';
+```
+
+**Common fixes:**
+
+**Publication doesn't exist:**
+```sql
+CREATE PUBLICATION your_publication FOR TABLE table1, table2;
+```
+
+**No tables in publication:**
+```sql
+-- Add tables to existing publication
+ALTER PUBLICATION your_publication ADD TABLE missing_table;
+```
+
+**Permission denied on tables:**
+```sql
+GRANT SELECT ON TABLE your_table TO etl_user;
+```
+
+**Stale replication slot:**
+```sql
+-- Drop and recreate (will lose position)
+SELECT pg_drop_replication_slot('stale_slot_name');
+```
+
+### Runtime Failures
+
+**Symptoms:**
+- Pipeline runs for a while then crashes
+- "Out of memory" errors
+- "Too many open files" errors
+- Destination write failures
+
+**Diagnosis:**
+
+```bash
+# Check system resources
+htop # or top
+df -h # disk space
+ulimit -n # file descriptor limit
+
+# Check ETL memory usage
+ps aux | grep etl
+```
+
+**Common fixes:**
+
+**Memory issues:**
+```rust
+// Reduce batch sizes in configuration
+BatchConfig {
+ max_size: 500, // Reduce from 1000+
+ max_fill_ms: 2000,
+}
+```
+
+**File descriptor limits:**
+```bash
+# Temporary fix
+ulimit -n 10000
+
+# Permanent fix (add to /etc/security/limits.conf)
+etl_user soft nofile 65536
+etl_user hard nofile 65536
+```
+
+**Destination timeouts:**
+```rust
+// Add retry configuration or connection pooling
+// Check destination system health and capacity
+```
+
+### Data Quality Problems
+
+**Symptoms:**
+- Some rows missing in destination
+- Data appears corrupted or truncated
+- Schema mismatch errors
+- Timestamp/timezone issues
+
+**Diagnosis:**
+
+```sql
+-- Compare row counts between source and destination
+SELECT COUNT(*) FROM source_table;
+-- vs destination count
+
+-- Check for recent schema changes
+SELECT schemaname, tablename, attname, atttypid
+FROM pg_attribute
+JOIN pg_class ON attrelid = oid
+JOIN pg_namespace ON relnamespace = pg_namespace.oid
+WHERE schemaname = 'public' AND tablename = 'your_table';
+
+-- Check for problematic data types
+SELECT column_name, data_type, character_maximum_length
+FROM information_schema.columns
+WHERE table_name = 'your_table'
+ AND data_type IN ('json', 'jsonb', 'text', 'bytea');
+```
+
+**Common fixes:**
+
+**Schema evolution:**
+```sql
+-- Restart pipeline after schema changes
+-- ETL will detect and adapt to new schema
+```
+
+**Data type issues:**
+```rust
+// Enable feature flag for unknown types
+etl = { git = "https://github.com/supabase/etl", features = ["unknown-types-to-bytes"] }
+```
+
+**Character encoding problems:**
+```sql
+-- Check database encoding
+SHOW server_encoding;
+SHOW client_encoding;
+```
+
+### Performance Issues
+
+**Symptoms:**
+- Very slow initial sync
+- High replication lag
+- High CPU/memory usage
+- Destination write bottlenecks
+
+**Diagnosis:**
+
+```sql
+-- Monitor replication lag
+SELECT slot_name,
+ pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) as lag
+FROM pg_replication_slots;
+
+-- Check WAL generation rate
+SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')) as total_wal;
+
+-- Monitor long-running queries
+SELECT pid, now() - pg_stat_activity.query_start AS duration, query
+FROM pg_stat_activity
+WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';
+```
+
+**Performance tuning:**
+
+```rust
+// Optimize batch configuration
+PipelineConfig {
+ batch: BatchConfig {
+ max_size: 2000, // Increase batch size
+ max_fill_ms: 10000, // Allow longer batching
+ },
+ max_table_sync_workers: 8, // Increase parallelism
+ // ... other config
+}
+```
+
+```sql
+-- PostgreSQL tuning
+-- In postgresql.conf:
+-- shared_buffers = 1GB
+-- effective_cache_size = 4GB
+-- wal_buffers = 16MB
+-- checkpoint_completion_target = 0.9
+```
+
+## Advanced Debugging Techniques
+
+### Enable Debug Logging
+
+**For ETL:**
+```bash
+# Set environment variable
+export ETL_LOG_LEVEL=debug
+
+# Or in configuration
+RUST_LOG=etl=debug cargo run
+```
+
+**For PostgreSQL:**
+```sql
+-- Temporarily enable detailed logging
+SET log_statement = 'all';
+SET log_min_duration_statement = 0;
+```
+
+### Monitor Replication in Real-Time
+
+```sql
+-- Create a monitoring query
+WITH replication_status AS (
+ SELECT
+ slot_name,
+ active,
+ pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) as lag_size,
+ extract(EPOCH FROM (now() - pg_stat_replication.reply_time))::int as lag_seconds
+ FROM pg_replication_slots
+ LEFT JOIN pg_stat_replication ON slot_name = application_name
+ WHERE slot_name LIKE '%etl%'
+)
+SELECT * FROM replication_status;
+```
+
+### Test Individual Components
+
+**Test publication setup:**
+```sql
+-- Simulate ETL's publication query
+SELECT schemaname, tablename
+FROM pg_publication_tables
+WHERE pubname = 'your_publication';
+```
+
+**Test replication slot consumption:**
+```sql
+-- Create a test logical replication session
+SELECT * FROM pg_logical_slot_get_changes('your_slot', NULL, NULL, 'pretty-print', '1');
+```
+
+### Memory and Resource Analysis
+
+```bash
+# Monitor ETL resource usage over time
+while true; do
+ echo "$(date): $(ps -o pid,vsz,rss,pcpu -p $(pgrep etl))"
+ sleep 30
+done >> etl_resources.log
+
+# Analyze memory patterns
+cat etl_resources.log | grep -E "RSS|VSZ" | tail -20
+```
+
+## Prevention Best Practices
+
+### Configuration Validation
+
+```rust
+// Always validate configuration before starting
+impl PipelineConfig {
+ pub fn validate(&self) -> Result<(), ConfigError> {
+ if self.batch.max_size > 10000 {
+ return Err(ConfigError::BatchSizeTooLarge);
+ }
+ // ... other validations
+ }
+}
+```
+
+### Health Checks
+
+```rust
+// Implement health check endpoints
+async fn health_check() -> Result {
+ // Check PostgreSQL connection
+ // Check replication slot status
+ // Check destination connectivity
+ // Return overall status
+}
+```
+
+### Monitoring and Alerting
+
+```sql
+-- Set up monitoring queries to run periodically
+-- Alert on:
+-- - Replication lag > 1GB or 5 minutes
+-- - Inactive replication slots
+-- - Failed pipeline restarts
+-- - Unusual error rates
+```
+
+## Recovery Procedures
+
+### Recovering from WAL Position Loss
+
+```sql
+-- If replication slot is lost, you may need to recreate
+-- WARNING: This will cause a full resync
+SELECT pg_create_logical_replication_slot('new_slot_name', 'pgoutput');
+```
+
+### Handling Destination Failures
+
+```rust
+// ETL typically handles this automatically with retries
+// For manual intervention:
+// 1. Fix destination issues
+// 2. ETL will resume from last known WAL position
+// 3. May see duplicate data (destinations should handle this)
+```
+
+### Schema Change Recovery
+
+```sql
+-- After schema changes, ETL usually adapts automatically
+-- If not, restart the pipeline to force schema refresh
+```
+
+## Getting Help
+
+When you need additional support:
+
+1. **Search existing issues:** Check [GitHub issues](https://github.com/supabase/etl/issues)
+2. **Collect diagnostic information:** Use queries and commands from this guide
+3. **Prepare a minimal reproduction:** Isolate the problem to its essential parts
+4. **Open an issue:** Include PostgreSQL version, ETL version, configuration, and logs
+
+### Information to Include in Bug Reports
+
+- ETL version and build information
+- PostgreSQL version and configuration relevant settings
+- Complete error messages and stack traces
+- Configuration files (with sensitive information redacted)
+- Steps to reproduce the issue
+- Expected vs. actual behavior
+
+## Next Steps
+
+After resolving your immediate issue:
+
+- **Optimize performance** β [Performance Tuning](performance/)
+- **Implement monitoring** β [Monitoring best practices](../explanation/monitoring/)
+- **Plan for schema changes** β [Schema Change Handling](schema-changes/)
+- **Understand the architecture** β [ETL Architecture](../explanation/architecture/)
+
+## See Also
+
+- [PostgreSQL setup guide](configure-postgres/) - Prevent configuration issues
+- [Performance optimization](performance/) - Tune for better throughput
+- [ETL architecture](../explanation/architecture/) - Understand system behavior
\ No newline at end of file
diff --git a/docs/how-to/index.md b/docs/how-to/index.md
index ffd78f6d2..9bf8101a0 100644
--- a/docs/how-to/index.md
+++ b/docs/how-to/index.md
@@ -1,4 +1,78 @@
-# How-to Guides
+---
+type: how-to
+title: How-To Guides
+---
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+# How-To Guides
+
+**Practical solutions for common ETL tasks**
+
+How-to guides provide step-by-step instructions for accomplishing specific goals when working with ETL. Each guide assumes you're already familiar with ETL basics and focuses on the task at hand.
+
+## Database Configuration
+
+### [Configure PostgreSQL for Replication](configure-postgres/)
+Set up PostgreSQL with the correct permissions, settings, and publications for ETL pipelines.
+
+**When to use:** Setting up a new PostgreSQL source for replication.
+
+## Destinations and Output
+
+### [Build Custom Destinations](custom-destinations/)
+Create your own destination implementations for specific data warehouses or storage systems.
+
+**When to use:** ETL doesn't support your target system out of the box.
+
+### [Handle Schema Changes](schema-changes/)
+Manage table schema changes without breaking your replication pipeline.
+
+**When to use:** Your source database schema evolves over time.
+
+## Operations and Monitoring
+
+### [Debug Pipeline Issues](debugging/)
+Diagnose and resolve common pipeline problems like connection failures, data inconsistencies, and performance bottlenecks.
+
+**When to use:** Your pipeline isn't working as expected.
+
+### [Optimize Performance](performance/)
+Tune your ETL pipeline for maximum throughput and minimal resource usage.
+
+**When to use:** Your pipeline is working but needs to handle more data or run faster.
+
+### [Test ETL Pipelines](testing/)
+Build comprehensive test suites for your ETL applications using mocks and test utilities.
+
+**When to use:** Ensuring reliability before deploying to production.
+
+## Before You Start
+
+**Prerequisites:**
+- Complete the [first pipeline tutorial](../tutorials/first-pipeline/)
+- Have a working ETL development environment
+- Understanding of your specific use case requirements
+
+## Guide Structure
+
+Each how-to guide follows this pattern:
+
+1. **Goal statement** - What you'll accomplish
+2. **Prerequisites** - Required setup and knowledge
+3. **Decision points** - Key choices that affect the approach
+4. **Step-by-step procedure** - Actions to take
+5. **Validation** - How to verify success
+6. **Troubleshooting** - Common issues and solutions
+
+## Next Steps
+
+After solving your immediate problem:
+- **Learn more concepts** β [Explanations](../explanation/)
+- **Look up technical details** β [Reference](../reference/)
+- **Build foundational knowledge** β [Tutorials](../tutorials/)
+
+## Need Help?
+
+If these guides don't cover your specific situation:
+1. Check if it's addressed in [Debugging](debugging/)
+2. Search existing [GitHub issues](https://github.com/supabase/etl/issues)
+3. [Open a new issue](https://github.com/supabase/etl/issues/new) with details about your use case
\ No newline at end of file
diff --git a/docs/how-to/performance.md b/docs/how-to/performance.md
deleted file mode 100644
index 7826051e4..000000000
--- a/docs/how-to/performance.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Optimize Performance
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/how-to/schema-changes.md b/docs/how-to/schema-changes.md
deleted file mode 100644
index aabf712e3..000000000
--- a/docs/how-to/schema-changes.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Handle Schema Changes
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/how-to/testing.md b/docs/how-to/testing.md
deleted file mode 100644
index 48234cc70..000000000
--- a/docs/how-to/testing.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Set Up Tests
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index 160c6859d..01bf0a7fa 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,60 +1,110 @@
---
hide:
- navigation
+title: ETL Documentation
---
-# ETL
+# ETL Documentation
-!!! info "Coming Soon"
- ETL docs are coming soon!
+**Build real-time Postgres replication applications in Rust**
-Welcome to the ETL project, a Rust-based collection of tooling designed to build efficient and reliable Postgres replication applications. This documentation page provides an overview of the ETL project, the benefits of using ETL, the advantages of implementing it in Rust, and an introduction to Postgres logical replication. It also outlines the resources available in this documentation to help you get started.
+ETL is a Rust framework by [Supabase](https://supabase.com) that enables you to build high-performance, real-time data replication applications for PostgreSQL. Whether you're creating ETL pipelines, implementing CDC (Change Data Capture), or building custom data synchronization solutions, ETL provides the building blocks you need.
-## What is ETL
+## Getting Started
-ETL is a collection of Rust crates which can be used to build replication data pipelines on top of [Postgres's logical replication protocol](https://www.postgresql.org/docs/current/protocol-logical-replication.html). It provides a high-level API to work with Postgres logical replication, allowing developers to focus on building their applications without worrying about the low-level details of the replication protocol. The ETL crate abstracts away the complexities of managing replication slots, publications, and subscriptions, enabling you to create robust data pipelines that can continually copy data from Postgres to various destinations like BigQuery and other OLAP databases.
+Choose your path based on your needs:
-## What is Postgres Logical Replication?
+### New to ETL?
+Start with our **[Tutorials](tutorials/)** to learn ETL through hands-on examples:
-Postgres logical replication is a method for replicating data between PostgreSQL databases at the logical (table or row) level, rather than the physical (block-level) level. It allows selective replication of specific tables or data subsets, making it ideal for scenarios like data warehousing, real-time analytics, or cross-database synchronization.
+- [Build your first ETL pipeline](tutorials/first-pipeline/) - Complete beginner's guide (15 minutes)
+- [Set up memory-based testing](tutorials/memory-destination/) - Test your pipeline locally (10 minutes)
+- [Testing ETL pipelines](tutorials/testing-pipelines/) - Ensure reliability (20 minutes)
-Logical replication uses a publish/subscribe model, where a source database (publisher) sends changes to a replication slot, and a destination system (subscriber) applies those changes to its own tables. This approach supports selective data replication and is compatible with different PostgreSQL versions or even external systems.
+### Ready to solve specific problems?
+Jump to our **[How-To Guides](how-to/)** for practical solutions:
-### How Does Postgres Logical Replication Work?
+- [Configure PostgreSQL for replication](how-to/configure-postgres/)
+- [Build custom destinations](how-to/custom-destinations/)
+- [Debug pipeline issues](how-to/debugging/)
+- [Handle schema changes](how-to/schema-changes/)
+- [Optimize performance](how-to/performance/)
-Postgres logical replication operates through the following steps:
+### Need detailed technical information?
+Consult our **[Reference](reference/)** documentation:
-**Publication Creation**: A publication is created in the source database, specifying which tables or data to replicate. For example:
+- API reference
+- Configuration options
+- Error codes and messages
-```sql
-create publication my_publication for table orders, customers;
-```
+### Want to understand the bigger picture?
+Read our **[Explanations](explanation/)** for deeper insights:
-**Replication Slot**: A logical replication slot is created on the source database to track changes (inserts, updates, deletes) for the published tables. The slot ensures that changes are preserved until they are consumed by a subscriber.
+- [ETL architecture overview](explanation/architecture/)
+- [Why Postgres logical replication?](explanation/replication/)
+- [Performance characteristics](explanation/performance/)
+- [Design decisions](explanation/design/)
-**Subscription Setup**: The destination system (subscriber) creates a subscription that connects to the publication, specifying the source database and replication slot. For example:
+## Core Concepts
-```sql
-create subscription my_subscription
-connection 'host=localhost port=5432 dbname=postgres user=postgres password=password'
-publication my_publication;
-```
+**Postgres Logical Replication** streams data changes from PostgreSQL databases in real-time using the Write-Ahead Log (WAL). ETL builds on this foundation to provide:
+
+- π **Real-time replication** - Stream changes as they happen
+- π **Multiple destinations** - BigQuery and more coming soon
+- π‘οΈ **Fault tolerance** - Built-in error handling and recovery
+- β‘ **High performance** - Efficient batching and parallel processing
+- π§ **Extensible** - Plugin architecture for custom destinations
-**Change Data Capture (CDC)**: The source database streams changes (via the Write-Ahead Log, or WAL) to the replication slot. The subscriber receives these changes and applies them to its tables, maintaining data consistency.
+## Quick Example
-This process enables real-time data synchronization with minimal overhead, making it suitable for ETL workflows where data needs to be transformed and loaded into destinations like data warehouses or analytical databases.
+```rust
+use etl::{
+ config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig},
+ destination::memory::MemoryDestination,
+ pipeline::Pipeline,
+ store::both::memory::MemoryStore,
+};
-## Why Use ETL
+#[tokio::main]
+async fn main() -> Result<(), Box> {
+ // Configure PostgreSQL connection
+ let pg_config = PgConnectionConfig {
+ host: "localhost".to_string(),
+ port: 5432,
+ name: "mydb".to_string(),
+ username: "postgres".to_string(),
+ password: Some("password".to_string().into()),
+ tls: TlsConfig { enabled: false, trusted_root_certs: String::new() },
+ };
-ETL provides a set of building blocks to construct data pipelines which can continually copy data from Postgres to other systems. It abstracts away the low-level details of the logical replication protocol and provides a high-level API to work with. This allows developers to focus on building their applications without worrying about the intricacies of the replication protocol.
+ // Create memory-based store and destination for testing
+ let store = MemoryStore::new();
+ let destination = MemoryDestination::new();
-### Why is ETL Written in Rust?
+ // Configure the pipeline
+ let config = PipelineConfig {
+ id: 1,
+ publication_name: "my_publication".to_string(),
+ pg_connection: pg_config,
+ batch: BatchConfig { max_size: 1000, max_fill_ms: 5000 },
+ table_error_retry_delay_ms: 10000,
+ max_table_sync_workers: 4,
+ };
-The ETL crate is written in Rust to leverage the language's unique strengths, making it an ideal choice for building robust data pipelines:
+ // Create and start the pipeline
+ let mut pipeline = Pipeline::new(1, config, store, destination);
+ pipeline.start().await?;
+
+ // Pipeline will run until stopped
+ pipeline.wait().await?;
+
+ Ok(())
+}
+```
-- **Performance**: Rust's zero-cost abstractions and low-level control enable high-performance data processing, critical for handling large-scale ETL workloads.
-- **Safety**: Rust's strong type system and memory safety guarantees minimize bugs and ensure reliable data handling, reducing the risk of data corruption or crashes.
-- **Concurrency**: Rustβs ownership model and async capabilities allow efficient parallel processing, ideal for managing complex, high-throughput ETL pipelines.
-- **Ecosystem Integration**: Rustβs growing ecosystem and compatibility with modern cloud and database technologies make it a natural fit for Postgres-focused infrastructure.
+## Next Steps
-By using Rust, the ETL crate provides a fast, safe, and scalable solution for building Postgres replication applications.
+- **First time using ETL?** β Start with [Build your first pipeline](tutorials/first-pipeline/)
+- **Have a specific goal?** β Browse [How-To Guides](how-to/)
+- **Need technical details?** β Check the [Reference](reference/)
+- **Want to understand ETL deeply?** β Read [Explanations](explanation/)
diff --git a/docs/reference/index.md b/docs/reference/index.md
index 1d8074836..3806b1dfd 100644
--- a/docs/reference/index.md
+++ b/docs/reference/index.md
@@ -1,4 +1,102 @@
+---
+type: reference
+title: API Reference
+---
+
# Reference
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+**Technical documentation for ETL configuration and usage**
+
+## API Documentation
+
+Complete API documentation is available through Rust's built-in documentation system. We publish comprehensive rustdoc documentation that covers all public APIs, traits, and configuration structures.
+
+**View the API docs:** [Rust API Documentation](https://supabase.github.io/etl/docs/) *(coming soon)*
+
+The rustdoc includes:
+
+- All public APIs with detailed descriptions
+- Code examples for major components
+- Trait implementations and bounds
+- Configuration structures and their fields
+- Error types and their variants
+
+## Feature Flags
+
+ETL supports the following Cargo features:
+
+| Feature | Description | Default |
+|---------|-------------|---------|
+| `unknown-types-to-bytes` | Convert unknown PostgreSQL types to byte arrays | β |
+| `test-utils` | Include testing utilities and helpers | - |
+| `failpoints` | Enable failure injection for testing | - |
+
+## Environment Variables
+
+| Variable | Purpose | Default |
+|----------|---------|---------|
+| `ETL_LOG_LEVEL` | Logging verbosity (error, warn, info, debug, trace) | `info` |
+| `ETL_METRICS_ENABLED` | Enable metrics collection | `false` |
+
+## Error Codes
+
+### Pipeline Errors
+
+| Code | Description | Action |
+|------|-------------|---------|
+| `P001` | Connection to PostgreSQL failed | Check connection configuration |
+| `P002` | Publication not found | Verify publication exists |
+| `P003` | Replication slot creation failed | Check PostgreSQL permissions |
+
+### Destination Errors
+
+| Code | Description | Action |
+|------|-------------|---------|
+| `D001` | Batch write failed | Check destination system health |
+| `D002` | Authentication failed | Verify credentials |
+| `D003` | Data serialization error | Check data format compatibility |
+
+## Compatibility
+
+### Supported Versions
+
+- **Rust:** 1.75 or later
+- **PostgreSQL:** 12, 13, 14, 15, 16
+- **Tokio:** 1.0 or later
+
+### Platform Support
+
+- **Linux:** Full support (x86_64, aarch64)
+- **macOS:** Full support (Intel, Apple Silicon)
+- **Windows:** Experimental support
+
+## Performance Characteristics
+
+### Memory Usage
+- **Base overhead:** ~10MB per pipeline
+- **Per-table overhead:** ~1MB
+- **Batch memory:** Configurable via `BatchConfig`
+
+### Throughput
+- **Typical range:** 10,000-100,000 operations/second
+- **Factors:** Network latency, batch size, destination performance
+- **Bottlenecks:** Usually destination write speed
+
+## Navigation
+
+**By component type:**
+- [Pipeline APIs](pipeline/) - Core orchestration
+- [Destination APIs](destinations/) - Data output interfaces
+- [Store APIs](stores/) - State management
+- [Configuration](config/) - All configuration structures
+
+**By use case:**
+- [Testing](testing/) - Test utilities and mocks
+- [Monitoring](monitoring/) - Metrics and observability
+- [Extensions](extensions/) - Building custom components
+
+## See Also
+
+- [How-to guides](../how-to/) - Task-oriented instructions
+- [Tutorials](../tutorials/) - Learning-oriented lessons
+- [Explanations](../explanation/) - Understanding-oriented discussions
\ No newline at end of file
diff --git a/docs/tutorials/first-pipeline.md b/docs/tutorials/first-pipeline.md
new file mode 100644
index 000000000..3bb4fe17f
--- /dev/null
+++ b/docs/tutorials/first-pipeline.md
@@ -0,0 +1,230 @@
+---
+type: tutorial
+audience: developers
+prerequisites:
+ - Rust 1.75 or later
+ - PostgreSQL server (local or remote)
+ - Basic Rust and SQL knowledge
+version_last_tested: 0.1.0
+last_reviewed: 2025-01-14
+estimated_time: 15
+---
+
+# Build Your First ETL Pipeline
+
+**Learn the fundamentals by building a working pipeline in 15 minutes**
+
+By the end of this tutorial, you'll have a complete ETL pipeline that streams data changes from PostgreSQL to a memory destination in real-time. You'll see how to set up publications, configure pipelines, and handle live data replication.
+
+![Pipeline outcome diagram showing data flowing from PostgreSQL through ETL to memory destination]
+
+## What You'll Build
+
+A real-time data pipeline that:
+- Monitors a PostgreSQL table for changes
+- Streams INSERT, UPDATE, and DELETE operations
+- Stores replicated data in memory for immediate access
+
+## Who This Tutorial Is For
+
+- Rust developers new to ETL
+- Anyone interested in PostgreSQL logical replication
+- Developers building data synchronization tools
+
+**Time required:** 15 minutes
+**Difficulty:** Beginner
+
+## Safety Note
+
+This tutorial uses an isolated test database. To clean up, simply drop the test database when finished. No production data is affected.
+
+## Step 1: Set Up Your Environment
+
+Create a new Rust project for this tutorial:
+
+```bash
+cargo new etl-tutorial
+cd etl-tutorial
+```
+
+Add ETL to your dependencies in `Cargo.toml`:
+
+```toml
+[dependencies]
+etl = { git = "https://github.com/supabase/etl" }
+etl-config = { git = "https://github.com/supabase/etl" }
+tokio = { version = "1.0", features = ["full"] }
+```
+
+**Checkpoint:** Run `cargo check` - it should compile successfully.
+
+## Step 2: Prepare PostgreSQL
+
+Connect to your PostgreSQL server and create a test database:
+
+```sql
+CREATE DATABASE etl_tutorial;
+\c etl_tutorial
+
+-- Create a sample table
+CREATE TABLE users (
+ id SERIAL PRIMARY KEY,
+ name TEXT NOT NULL,
+ email TEXT UNIQUE NOT NULL,
+ created_at TIMESTAMP DEFAULT NOW()
+);
+
+-- Insert sample data
+INSERT INTO users (name, email) VALUES
+ ('Alice Johnson', 'alice@example.com'),
+ ('Bob Smith', 'bob@example.com');
+```
+
+Create a publication for replication:
+
+```sql
+CREATE PUBLICATION my_publication FOR TABLE users;
+```
+
+**Checkpoint:** Verify the publication exists:
+```sql
+SELECT * FROM pg_publication WHERE pubname = 'my_publication';
+```
+You should see one row returned.
+
+## Step 3: Configure Your Pipeline
+
+Replace the contents of `src/main.rs`:
+
+```rust
+use etl::config::{BatchConfig, PgConnectionConfig, PipelineConfig, TlsConfig};
+use etl::pipeline::Pipeline;
+use etl::destination::memory::MemoryDestination;
+use etl::store::both::memory::MemoryStore;
+use std::error::Error;
+
+#[tokio::main]
+async fn main() -> Result<(), Box> {
+ // Configure PostgreSQL connection
+ let pg_connection_config = PgConnectionConfig {
+ host: "localhost".to_string(),
+ port: 5432,
+ name: "etl_tutorial".to_string(),
+ username: "postgres".to_string(),
+ password: Some("your_password".into()),
+ tls: TlsConfig {
+ trusted_root_certs: String::new(),
+ enabled: false,
+ },
+ };
+
+ // Configure pipeline behavior
+ let pipeline_config = PipelineConfig {
+ id: 1,
+ publication_name: "my_publication".to_string(),
+ pg_connection: pg_connection_config,
+ batch: BatchConfig {
+ max_size: 1000,
+ max_fill_ms: 5000,
+ },
+ table_error_retry_delay_ms: 10000,
+ max_table_sync_workers: 4,
+ };
+
+ // Create stores and destination
+ let store = MemoryStore::new();
+ let destination = MemoryDestination::new();
+
+ println!("Starting ETL pipeline...");
+
+ // Create and start the pipeline
+ let mut pipeline = Pipeline::new(pipeline_config, store, destination);
+ pipeline.start().await?;
+
+ Ok(())
+}
+```
+
+**Important:** Replace `"your_password"` with your PostgreSQL password.
+
+## Step 4: Start Your Pipeline
+
+Run your pipeline:
+
+```bash
+cargo run
+```
+
+You should see output like:
+```
+Starting ETL pipeline...
+Pipeline started successfully
+Syncing table: users
+Initial sync completed: 2 rows
+Listening for changes...
+```
+
+**Checkpoint:** Your pipeline is now running and has completed initial synchronization.
+
+## Step 5: Test Real-Time Replication
+
+With your pipeline running, open a new terminal and connect to PostgreSQL:
+
+```bash
+psql -d etl_tutorial
+```
+
+Make some changes to test replication:
+
+```sql
+-- Insert a new user
+INSERT INTO users (name, email) VALUES ('Charlie Brown', 'charlie@example.com');
+
+-- Update an existing user
+UPDATE users SET name = 'Alice Cooper' WHERE email = 'alice@example.com';
+
+-- Delete a user
+DELETE FROM users WHERE email = 'bob@example.com';
+```
+
+**Checkpoint:** In your pipeline terminal, you should see log messages indicating these changes were captured and processed.
+
+## Step 6: Verify Data Replication
+
+The data is now replicated in your memory destination. While this tutorial uses memory (perfect for testing), the same pattern works with BigQuery, DuckDB, or custom destinations.
+
+Stop your pipeline with `Ctrl+C`.
+
+**Checkpoint:** You've successfully built and tested a complete ETL pipeline!
+
+## What You've Learned
+
+You've mastered the core ETL concepts:
+
+- **Publications** define which tables to replicate
+- **Pipeline configuration** controls behavior and performance
+- **Memory destinations** provide fast, local testing
+- **Real-time replication** captures all data changes automatically
+
+## Cleanup
+
+Remove the test database:
+
+```sql
+DROP DATABASE etl_tutorial;
+```
+
+## Next Steps
+
+Now that you understand the basics:
+
+- **Add robust testing** β [Testing ETL Pipelines](testing-pipelines/)
+- **Connect to BigQuery** β [How to Set Up BigQuery Destination](../how-to/custom-destinations/)
+- **Handle production scenarios** β [How to Debug Pipeline Issues](../how-to/debugging/)
+- **Understand the architecture** β [ETL Architecture](../explanation/architecture/)
+
+## See Also
+
+- [Memory Destination Tutorial](memory-destination/) - Deep dive into testing with memory
+- [API Reference](../reference/) - Complete configuration options
+- [Performance Guide](../how-to/performance/) - Optimize your pipelines
\ No newline at end of file
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 9cc1257f1..7a1e1bfdb 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -1,4 +1,57 @@
+---
+type: tutorial
+title: Tutorials
+---
+
# Tutorials
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
+**Learn ETL through guided, hands-on experiences**
+
+Tutorials provide step-by-step learning paths that take you from zero knowledge to working applications. Each tutorial is designed to be completed successfully by following the exact steps provided.
+
+## Getting Started
+
+### [Build Your First ETL Pipeline](first-pipeline/)
+**15 minutes** β’ **Beginner**
+
+Create a complete ETL pipeline that replicates data from PostgreSQL to a memory destination. You'll learn the core concepts of publications, replication slots, and pipeline configuration.
+
+*What you'll build:* A working pipeline that streams changes from a sample PostgreSQL table to an in-memory destination.
+
+## Before You Start
+
+**Prerequisites for all tutorials:**
+
+- Rust installed (1.75 or later)
+- PostgreSQL server (local or remote)
+- Basic familiarity with Rust and SQL
+
+**What you'll need:**
+
+- A terminal/command line
+- Your favorite text editor
+- About 30-60 minutes total time
+
+## Tutorial Structure
+
+Each tutorial follows the same pattern:
+
+1. **Clear outcome** - See exactly what you'll build
+2. **Step-by-step instructions** - No guessing, just follow along
+3. **Immediate feedback** - See results after each major step
+4. **Clean completion** - Working code you can build upon
+
+## Next Steps
+
+After completing the tutorials:
+- **Solve specific problems** β [How-To Guides](../how-to/)
+- **Understand the architecture** β [ETL Architecture](../explanation/architecture/)
+- **Look up technical details** β [API Reference](../reference/)
+
+## Need Help?
+
+If you get stuck:
+1. Double-check the prerequisites
+2. Ensure your PostgreSQL setup matches the requirements
+3. Check our [debugging guide](../how-to/debugging/)
+4. [Open an issue](https://github.com/supabase/etl/issues) with your specific problem
\ No newline at end of file
diff --git a/docs/tutorials/memory-destination.md b/docs/tutorials/memory-destination.md
deleted file mode 100644
index e97cc0889..000000000
--- a/docs/tutorials/memory-destination.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Memory Destination
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
diff --git a/docs/tutorials/testing-pipelines.md b/docs/tutorials/testing-pipelines.md
deleted file mode 100644
index 44bd431da..000000000
--- a/docs/tutorials/testing-pipelines.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# Testing Pipelines
-
-!!! info "Coming Soon"
- This page is under development.
\ No newline at end of file
From 0847b6339ef541ae8c2da836733e6e53aaa659c9 Mon Sep 17 00:00:00 2001
From: Riccardo Busetti
Date: Thu, 14 Aug 2025 16:16:26 +0200
Subject: [PATCH 2/9] Update
---
docs/explanation/architecture.md | 52 ++-
docs/how-to/custom-destinations.md | 293 -------------
docs/reference/index.md | 94 +---
docs/test-mermaid.md | 39 ++
docs/tutorials/custom-implementations.md | 528 +++++++++++++++++++++++
docs/tutorials/first-pipeline.md | 63 ++-
docs/tutorials/index.md | 18 +
mkdocs.yaml | 23 +-
8 files changed, 685 insertions(+), 425 deletions(-)
delete mode 100644 docs/how-to/custom-destinations.md
create mode 100644 docs/test-mermaid.md
create mode 100644 docs/tutorials/custom-implementations.md
diff --git a/docs/explanation/architecture.md b/docs/explanation/architecture.md
index 1d757e915..e945cadd5 100644
--- a/docs/explanation/architecture.md
+++ b/docs/explanation/architecture.md
@@ -14,18 +14,46 @@ ETL's architecture is built around a few key abstractions that work together to
At its core, ETL connects PostgreSQL's logical replication stream to configurable destination systems:
-```
-PostgreSQL ETL Pipeline Destination
-βββββββββββββββ ββββββββββββββββ βββββββββββββββ
-β WAL Stream ββββ·β Data Processing ββββββ·β BigQuery β
-β Publicationsβ β Batching β β Custom API β
-β Repl. Slots β β Error Handling β β Memory β
-βββββββββββββββ ββββββββββββββββββββ βββββββββββββββ
- β
- ββββββββΌβββββββ
- β State Store β
- β Schema Info β
- βββββββββββββββ
+```mermaid
+flowchart LR
+ subgraph PostgreSQL
+ A["WAL Stream Publications Replication Slots"]
+ end
+
+ subgraph ETL_Pipeline[ETL Pipeline]
+ subgraph ApplyWorker[Apply Worker]
+ B1["CDC Events Processing and Tables Synchronization"]
+ end
+
+ subgraph TableSyncWorkers[Table Sync Workers]
+ B2["Table 1 Sync + CDC"]
+ B3["Table 2 Sync + CDC"]
+ B4["Table N Sync + CDC"]
+ end
+ end
+
+ subgraph Destination[Destination]
+ Dest["BigQuery Custom API Memory"]
+ end
+
+ subgraph Store[Store]
+ subgraph StateStore[State Store]
+ D1["Memory PostgreSQL"]
+ end
+
+ subgraph SchemaStore[Schema Store]
+ D2["Memory PostgreSQL"]
+ end
+ end
+
+ A --> ApplyWorker
+ ApplyWorker --> TableSyncWorkers
+
+ ApplyWorker --> Destination
+ TableSyncWorkers --> Destination
+
+ ApplyWorker --> Store
+ TableSyncWorkers --> Store
```
The architecture separates concerns to make the system extensible, testable, and maintainable.
diff --git a/docs/how-to/custom-destinations.md b/docs/how-to/custom-destinations.md
deleted file mode 100644
index 87a0c1d9b..000000000
--- a/docs/how-to/custom-destinations.md
+++ /dev/null
@@ -1,293 +0,0 @@
----
-type: how-to
-audience: developers
-prerequisites:
- - Complete first pipeline tutorial
- - Rust async/await knowledge
- - Understanding of your target system's API
-version_last_tested: 0.1.0
-last_reviewed: 2025-01-14
-risk_level: medium
----
-
-# Build Custom Destinations
-
-**Create destination implementations for systems not supported out of the box**
-
-This guide walks you through implementing the [`Destination`](../../reference/destination-trait/) trait to send replicated data to custom storage systems, APIs, or data warehouses.
-
-## Goal
-
-Build a custom destination that receives batched data changes from ETL and writes them to your target system with proper error handling and retry logic.
-
-## Prerequisites
-
-- Completed [first pipeline tutorial](../../tutorials/first-pipeline/)
-- Access to your target system (database, API, etc.)
-- Understanding of your target system's data ingestion patterns
-- Rust knowledge of traits and async programming
-
-## Decision Points
-
-**Choose your approach based on your target system:**
-
-| Target System | Key Considerations | Recommended Pattern |
-|---------------|-------------------|-------------------|
-| **REST API** | Rate limiting, authentication | Batch with retry backoff |
-| **Database** | Transaction support, connection pooling | Bulk insert transactions |
-| **File System** | File formats, compression | Append or rotate files |
-| **Message Queue** | Ordering guarantees, partitioning | Individual message sending |
-
-## Implementation Steps
-
-### Step 1: Define Your Destination Struct
-
-Create a new file `src/my_destination.rs`:
-
-```rust
-use etl::destination::base::{Destination, DestinationError};
-use etl::types::pipeline::BatchedData;
-use async_trait::async_trait;
-
-pub struct MyCustomDestination {
- // Configuration fields
- api_endpoint: String,
- auth_token: String,
- batch_size: usize,
-}
-
-impl MyCustomDestination {
- pub fn new(api_endpoint: String, auth_token: String) -> Self {
- Self {
- api_endpoint,
- auth_token,
- batch_size: 1000,
- }
- }
-}
-```
-
-### Step 2: Implement the Destination Trait
-
-Add the core trait implementation:
-
-```rust
-#[async_trait]
-impl Destination for MyCustomDestination {
- async fn write_batch(&mut self, batch: BatchedData) -> Result<(), DestinationError> {
- // Convert ETL data to your target format
- let payload = self.convert_batch_to_target_format(&batch)?;
-
- // Send to your target system with retries
- self.send_with_retries(payload).await?;
-
- Ok(())
- }
-
- async fn flush(&mut self) -> Result<(), DestinationError> {
- // Implement any final cleanup or flush logic
- Ok(())
- }
-}
-```
-
-### Step 3: Implement Data Conversion
-
-Add conversion logic specific to your target system:
-
-```rust
-impl MyCustomDestination {
- fn convert_batch_to_target_format(&self, batch: &BatchedData) -> Result {
- let mut records = Vec::new();
-
- for change in &batch.changes {
- match change.operation {
- Operation::Insert => {
- records.push(json!({
- "action": "insert",
- "table": change.table_name,
- "data": change.new_values,
- "timestamp": change.timestamp
- }));
- }
- Operation::Update => {
- records.push(json!({
- "action": "update",
- "table": change.table_name,
- "old_data": change.old_values,
- "new_data": change.new_values,
- "timestamp": change.timestamp
- }));
- }
- Operation::Delete => {
- records.push(json!({
- "action": "delete",
- "table": change.table_name,
- "data": change.old_values,
- "timestamp": change.timestamp
- }));
- }
- }
- }
-
- serde_json::to_string(&records)
- .map_err(|e| DestinationError::SerializationError(e.to_string()))
- }
-}
-```
-
-### Step 4: Add Error Handling and Retries
-
-Implement robust error handling:
-
-```rust
-impl MyCustomDestination {
- async fn send_with_retries(&self, payload: String) -> Result<(), DestinationError> {
- let mut attempts = 0;
- let max_attempts = 3;
-
- while attempts < max_attempts {
- match self.send_to_target(&payload).await {
- Ok(_) => return Ok(()),
- Err(e) if self.is_retryable_error(&e) => {
- attempts += 1;
- if attempts < max_attempts {
- let backoff_ms = 2_u64.pow(attempts) * 1000;
- tokio::time::sleep(Duration::from_millis(backoff_ms)).await;
- continue;
- }
- }
- Err(e) => return Err(e),
- }
- }
-
- Err(DestinationError::RetryExhausted(format!("Failed after {} attempts", max_attempts)))
- }
-
- async fn send_to_target(&self, payload: &str) -> Result<(), DestinationError> {
- let client = reqwest::Client::new();
- let response = client
- .post(&self.api_endpoint)
- .header("Authorization", format!("Bearer {}", self.auth_token))
- .header("Content-Type", "application/json")
- .body(payload.to_string())
- .send()
- .await
- .map_err(|e| DestinationError::NetworkError(e.to_string()))?;
-
- if !response.status().is_success() {
- return Err(DestinationError::HttpError(
- response.status().as_u16(),
- format!("Request failed: {}", response.text().await.unwrap_or_default())
- ));
- }
-
- Ok(())
- }
-
- fn is_retryable_error(&self, error: &DestinationError) -> bool {
- match error {
- DestinationError::NetworkError(_) => true,
- DestinationError::HttpError(status, _) => {
- // Retry on 5xx server errors and some 4xx errors
- *status >= 500 || *status == 429
- }
- _ => false,
- }
- }
-}
-```
-
-### Step 5: Use Your Custom Destination
-
-In your main application:
-
-```rust
-use etl::pipeline::Pipeline;
-use etl::store::both::memory::MemoryStore;
-
-#[tokio::main]
-async fn main() -> Result<(), Box> {
- let store = MemoryStore::new();
- let destination = MyCustomDestination::new(
- "https://api.example.com/ingest".to_string(),
- "your-auth-token".to_string()
- );
-
- let mut pipeline = Pipeline::new(pipeline_config, store, destination);
- pipeline.start().await?;
-
- Ok(())
-}
-```
-
-## Validation
-
-Test your custom destination:
-
-1. **Unit tests** for data conversion logic
-2. **Integration tests** with a test target system
-3. **Error simulation** to verify retry behavior
-4. **Load testing** with realistic data volumes
-
-```rust
-#[cfg(test)]
-mod tests {
- use super::*;
-
- #[tokio::test]
- async fn test_data_conversion() {
- let destination = MyCustomDestination::new(
- "http://test".to_string(),
- "token".to_string()
- );
-
- // Create test batch
- let batch = create_test_batch();
-
- // Test conversion
- let result = destination.convert_batch_to_target_format(&batch);
- assert!(result.is_ok());
-
- // Verify JSON structure
- let json: serde_json::Value = serde_json::from_str(&result.unwrap()).unwrap();
- assert!(json.is_array());
- }
-}
-```
-
-## Troubleshooting
-
-**Data not appearing in target system:**
-- Enable debug logging to see conversion output
-- Check target system's ingestion logs
-- Verify authentication credentials
-
-**High error rates:**
-- Review retry logic and backoff timing
-- Check if target system has rate limits
-- Consider implementing circuit breaker pattern
-
-**Performance issues:**
-- Profile data conversion logic
-- Consider batch size tuning
-- Implement connection pooling for database destinations
-
-## Rollback
-
-If your destination isn't working:
-1. Switch back to [`MemoryDestination`](../../reference/memory-destination/) for testing
-2. Check ETL logs for specific error messages
-3. Test destination logic in isolation
-
-## Next Steps
-
-- **Add monitoring** β [Performance monitoring](performance/)
-- **Handle schema changes** β [Schema change handling](schema-changes/)
-- **Production deployment** β [Debugging guide](debugging/)
-
-## See Also
-
-- [Destination API Reference](../../reference/destination-trait/) - Complete trait documentation
-- [BigQuery destination example](https://github.com/supabase/etl/blob/main/etl-destinations/src/bigquery/) - Real-world implementation
-- [Error handling patterns](../../explanation/error-handling/) - Best practices for error management
\ No newline at end of file
diff --git a/docs/reference/index.md b/docs/reference/index.md
index 3806b1dfd..8340678a2 100644
--- a/docs/reference/index.md
+++ b/docs/reference/index.md
@@ -5,95 +5,11 @@ title: API Reference
# Reference
-**Technical documentation for ETL configuration and usage**
-
-## API Documentation
-
-Complete API documentation is available through Rust's built-in documentation system. We publish comprehensive rustdoc documentation that covers all public APIs, traits, and configuration structures.
-
-**View the API docs:** [Rust API Documentation](https://supabase.github.io/etl/docs/) *(coming soon)*
-
-The rustdoc includes:
-
-- All public APIs with detailed descriptions
-- Code examples for major components
-- Trait implementations and bounds
-- Configuration structures and their fields
-- Error types and their variants
-
-## Feature Flags
-
-ETL supports the following Cargo features:
-
-| Feature | Description | Default |
-|---------|-------------|---------|
-| `unknown-types-to-bytes` | Convert unknown PostgreSQL types to byte arrays | β |
-| `test-utils` | Include testing utilities and helpers | - |
-| `failpoints` | Enable failure injection for testing | - |
-
-## Environment Variables
-
-| Variable | Purpose | Default |
-|----------|---------|---------|
-| `ETL_LOG_LEVEL` | Logging verbosity (error, warn, info, debug, trace) | `info` |
-| `ETL_METRICS_ENABLED` | Enable metrics collection | `false` |
-
-## Error Codes
-
-### Pipeline Errors
-
-| Code | Description | Action |
-|------|-------------|---------|
-| `P001` | Connection to PostgreSQL failed | Check connection configuration |
-| `P002` | Publication not found | Verify publication exists |
-| `P003` | Replication slot creation failed | Check PostgreSQL permissions |
-
-### Destination Errors
-
-| Code | Description | Action |
-|------|-------------|---------|
-| `D001` | Batch write failed | Check destination system health |
-| `D002` | Authentication failed | Verify credentials |
-| `D003` | Data serialization error | Check data format compatibility |
-
-## Compatibility
-
-### Supported Versions
-
-- **Rust:** 1.75 or later
-- **PostgreSQL:** 12, 13, 14, 15, 16
-- **Tokio:** 1.0 or later
-
-### Platform Support
-
-- **Linux:** Full support (x86_64, aarch64)
-- **macOS:** Full support (Intel, Apple Silicon)
-- **Windows:** Experimental support
-
-## Performance Characteristics
-
-### Memory Usage
-- **Base overhead:** ~10MB per pipeline
-- **Per-table overhead:** ~1MB
-- **Batch memory:** Configurable via `BatchConfig`
-
-### Throughput
-- **Typical range:** 10,000-100,000 operations/second
-- **Factors:** Network latency, batch size, destination performance
-- **Bottlenecks:** Usually destination write speed
-
-## Navigation
-
-**By component type:**
-- [Pipeline APIs](pipeline/) - Core orchestration
-- [Destination APIs](destinations/) - Data output interfaces
-- [Store APIs](stores/) - State management
-- [Configuration](config/) - All configuration structures
-
-**By use case:**
-- [Testing](testing/) - Test utilities and mocks
-- [Monitoring](monitoring/) - Metrics and observability
-- [Extensions](extensions/) - Building custom components
+Complete API documentation is available through Rust's built-in documentation system. We will publish comprehensive rustdoc documentation that covers all public APIs, traits, and configuration structures.
+Right now the docs are accessible via the code or by running:
+```shell
+cargo doc --workspace --all-features --no-deps --open
+```
## See Also
diff --git a/docs/test-mermaid.md b/docs/test-mermaid.md
new file mode 100644
index 000000000..4d644b200
--- /dev/null
+++ b/docs/test-mermaid.md
@@ -0,0 +1,39 @@
+# Mermaid Test
+
+This page tests Mermaid diagram rendering in MkDocs.
+
+## Simple Flowchart
+
+```mermaid
+flowchart TD
+ A[Start] --> B{Is it?}
+ B -->|Yes| C[OK]
+ C --> D[Rethink]
+ D --> B
+ B ---->|No| E[End]
+```
+
+## Sequence Diagram
+
+```mermaid
+sequenceDiagram
+ participant Alice
+ participant Bob
+ Alice->>John: Hello John, how are you?
+ loop Healthcheck
+ John->>John: Fight against hypochondria
+ end
+ Note right of John: Rational thoughts prevail!
+ John-->>Alice: Great!
+ John->>Bob: How about you?
+ Bob-->>John: Jolly good!
+```
+
+## Database Schema Example
+
+```mermaid
+erDiagram
+ CUSTOMER ||--o{ ORDER : places
+ ORDER ||--|{ LINE-ITEM : contains
+ CUSTOMER }|..|{ DELIVERY-ADDRESS : uses
+```
\ No newline at end of file
diff --git a/docs/tutorials/custom-implementations.md b/docs/tutorials/custom-implementations.md
new file mode 100644
index 000000000..faa54f2aa
--- /dev/null
+++ b/docs/tutorials/custom-implementations.md
@@ -0,0 +1,528 @@
+---
+type: tutorial
+audience: developers
+prerequisites:
+ - Complete first pipeline tutorial
+ - Advanced Rust knowledge (traits, async, Arc/Mutex)
+ - Understanding of ETL architecture
+version_last_tested: 0.1.0
+last_reviewed: 2025-01-14
+estimated_time: 25
+---
+
+# Build Custom Stores and Destinations
+
+**Learn ETL's extension patterns by implementing simple custom components**
+
+This tutorial teaches you ETL's design patterns by implementing minimal custom stores and destinations. You'll understand the separation between state and schema storage, and learn the patterns needed for production extensions.
+
+## What You'll Build
+
+Simple custom implementations to understand the patterns:
+
+- **Custom in-memory store** with logging to see the flow
+- **Custom HTTP destination** with basic retry logic
+- Understanding of ETL's architectural contracts
+
+**Time required:** 25 minutes
+**Difficulty:** Advanced
+
+## Understanding ETL's Storage Design
+
+ETL separates storage into two focused traits:
+
+### SchemaStore: Table Structure Information
+
+```rust
+pub trait SchemaStore {
+ // Get cached schema (fast reads from memory)
+ fn get_table_schema(&self, table_id: &TableId) -> EtlResult