Skip to content

serenorg/database-replicator

Repository files navigation

database-replicator

CI Crates.io License: Apache 2.0 Rust Version Latest Release

Universal database-to-PostgreSQL replication for AI agents

Replicate any database to PostgreSQL with zero downtime. Supports PostgreSQL, SQLite, MongoDB, and MySQL/MariaDB.


SerenAI Cloud Replication

New to SerenAI? Sign up at console.serendb.com to get started with managed cloud replication.

SerenAI provides managed PostgreSQL databases optimized for AI workloads. When replicating to SerenDB targets, this tool can run your replication jobs on SerenAI's cloud infrastructure - no local resources required.

Benefits of SerenAI Cloud Execution:

  • No local compute resources needed
  • Automatic retry and error handling
  • Job monitoring and logging
  • Optimized for large database transfers

To replicate to SerenDB, simply run:

export SEREN_API_KEY="your-api-key"  # Get from console.serendb.com
database-replicator init \
  --source "postgresql://user:pass@source:5432/db" \
  --target "postgresql://user:pass@your-db.serendb.com:5432/db"

For local execution (non-SerenDB targets), use the --local flag. See Remote Execution for details.


Overview

database-replicator is a command-line tool that replicates databases from multiple sources to PostgreSQL (including Seren Cloud). It automatically detects your source database type and handles the replication accordingly:

  • PostgreSQL: Zero-downtime replication with continuous sync via logical replication
  • SQLite: One-time replication using JSONB storage
  • MongoDB: One-time replication with JSONB storage and periodic refresh support
  • MySQL/MariaDB: One-time replication with JSONB storage and periodic refresh support

Why This Tool?

  • Multi-database support: Single tool for all your database replications
  • AI-friendly storage: Non-PostgreSQL sources use JSONB for flexible querying
  • Zero downtime: PostgreSQL-to-PostgreSQL replication with continuous sync
  • Remote execution: Run replications on SerenAI cloud infrastructure
  • Production-ready: Data integrity verification, checkpointing, and error handling

Supported Databases

Source Database Replication Type Continuous Sync Periodic Refresh Remote Execution
PostgreSQL Native replication ✅ Logical replication N/A ✅ Yes
SQLite JSONB storage ❌ One-time ❌ No ❌ Local only
MongoDB JSONB storage ❌ One-time ✅ 24hr default ✅ Yes
MySQL/MariaDB JSONB storage ❌ One-time ✅ 24hr default ✅ Yes

Quick Start

Choose your source database to get started:

PostgreSQL → PostgreSQL

Zero-downtime replication with continuous sync:

database-replicator init \
  --source "postgresql://user:pass@source-host:5432/db" \
  --target "postgresql://user:pass@target-host:5432/db"

📖 Full PostgreSQL Guide →


SQLite → PostgreSQL

One-time replication to JSONB storage:

database-replicator init \
  --source /path/to/database.db \
  --target "postgresql://user:pass@host:5432/db"

📖 Full SQLite Guide →


MongoDB → PostgreSQL

One-time replication with periodic refresh support:

database-replicator init \
  --source "mongodb://user:pass@host:27017/db" \
  --target "postgresql://user:pass@host:5432/db"

📖 Full MongoDB Guide →


MySQL/MariaDB → PostgreSQL

One-time replication with periodic refresh support:

database-replicator init \
  --source "mysql://user:pass@host:3306/db" \
  --target "postgresql://user:pass@host:5432/db"

📖 Full MySQL Guide →


Features

PostgreSQL-to-PostgreSQL

  • Zero-downtime replication using PostgreSQL logical replication
  • Continuous sync keeps databases in sync in real-time
  • Selective replication with database and table-level filtering
  • Interactive mode for selecting databases and tables
  • Remote execution on SerenAI cloud infrastructure
  • Data integrity verification with checksums

Non-PostgreSQL Sources (SQLite, MongoDB, MySQL)

  • JSONB storage preserves data fidelity for querying in PostgreSQL
  • Type preservation with special encoding for complex types
  • One-time replication for initial data transfer
  • Periodic refresh (MongoDB, MySQL) for keeping data up to date
  • Schema-aware filtering for precise table targeting
  • Remote execution (MongoDB, MySQL) on cloud infrastructure

Universal Features

  • Multi-provider support: Works with any PostgreSQL provider (Neon, AWS RDS, Hetzner, self-hosted)
  • Size estimation: Analyze database sizes before replication
  • High performance: Parallel operations with automatic CPU detection
  • Checkpointing: Resume interrupted replications automatically
  • Security: Credentials passed via .pgpass files, never in command output

Installation

Download Pre-built Binaries

Download the latest release for your platform from GitHub Releases:

  • Linux (x64): database-replicator-linux-x64-binary
  • macOS (Intel): database-replicator-macos-x64-binary
  • macOS (Apple Silicon): database-replicator-macos-arm64-binary

Make the binary executable:

chmod +x database-replicator-*-binary
./database-replicator-*-binary --help

Install from crates.io

cargo install database-replicator

Build from Source

Requires Rust 1.70 or later:

git clone https://github.com/serenorg/database-replicator.git
cd database-replicator
cargo build --release

The binary will be available at target/release/database-replicator.

Prerequisites

  • PostgreSQL client tools (pg_dump, pg_dumpall, psql) - Required for all database types
  • Source database access: Connection credentials and appropriate permissions
  • Target database access: PostgreSQL connection with write permissions

Documentation

Database-Specific Guides


PostgreSQL-to-PostgreSQL Replication

For comprehensive PostgreSQL replication documentation, see README-PostgreSQL.md.

Quick Overview

PostgreSQL-to-PostgreSQL replication uses logical replication for zero-downtime replication:

  1. Validate - Check prerequisites and permissions
  2. Init - Perform initial snapshot (schema + data)
  3. Sync - Set up continuous logical replication
  4. Status - Monitor replication lag and health
  5. Verify - Validate data integrity with checksums

Example:

# Validate prerequisites
database-replicator validate \
  --source "postgresql://user:pass@source:5432/db" \
  --target "postgresql://user:pass@target:5432/db"

# Initial snapshot
database-replicator init \
  --source "postgresql://user:pass@source:5432/db" \
  --target "postgresql://user:pass@target:5432/db"

# Continuous sync
database-replicator sync \
  --source "postgresql://user:pass@source:5432/db" \
  --target "postgresql://user:pass@target:5432/db"

See README-PostgreSQL.md for:

  • Prerequisites and permission setup
  • Detailed command documentation
  • Selective replication (filtering databases/tables)
  • Interactive mode
  • Remote execution on cloud infrastructure
  • Multi-provider support (Neon, AWS RDS, Hetzner, etc.)
  • Schema-aware filtering
  • Performance optimizations
  • Troubleshooting guide
  • Complete examples and FAQ

Remote Execution (AWS)

By default, the init command uses SerenAI's managed cloud service to execute replication jobs. This means your replication runs on AWS infrastructure managed by SerenAI, with no AWS account or setup required on your part.

Important: Remote execution is restricted to SerenDB targets only. To replicate to other PostgreSQL databases (AWS RDS, Neon, Hetzner, self-hosted), use the --local flag to run on your own hardware.

Benefits of Remote Execution

  • No network interruptions: Your replication continues even if your laptop loses connectivity
  • No laptop sleep: Your computer can sleep or shut down without affecting the job
  • Faster performance: Replication runs on dedicated cloud infrastructure closer to your databases
  • No local resource usage: Your machine's CPU, memory, and disk are not consumed
  • Automatic monitoring: Built-in observability with CloudWatch logs and metrics
  • Cost-free: SerenAI covers all AWS infrastructure costs

How It Works

When you run init without the --local flag, the tool:

  1. Submits your job to SerenDB's managed API with encrypted credentials
  2. Provisions an EC2 worker sized appropriately for your database
  3. Executes replication on the cloud worker
  4. Monitors progress and shows you real-time status updates
  5. Self-terminates when complete to minimize costs

Your database credentials are encrypted with AWS KMS and never logged or stored in plaintext.

Authentication

Remote execution requires a SerenDB API key for authentication. The tool obtains the API key in one of two ways:

Option 1: Environment Variable (Recommended for scripts)

export SEREN_API_KEY="your-api-key-here"
./database-replicator init --source "..." --target "..."

Option 2: Interactive Prompt

If SEREN_API_KEY is not set, the tool will prompt you to enter your API key:

Remote execution requires a SerenDB API key for authentication.

You can generate an API key at:
  https://console.serendb.com/api-keys

Enter your SerenDB API key: [input]

Getting Your API Key:

  1. Sign up for SerenDB at console.serendb.com/signup
  2. Navigate to console.serendb.com/api-keys
  3. Generate a new API key
  4. Copy and save it securely (you won't be able to see it again)

Security Note: Never commit API keys to version control. Use environment variables or secure credential management.

Usage Example

Remote execution is the default - just run init as normal:

# Runs on SerenDB's managed cloud infrastructure (default)
./database-replicator init \
  --source "postgresql://user:pass@source-host:5432/db" \
  --target "postgresql://user:pass@seren-host:5432/db"

The tool will:

  • Submit the job to SerenDB's managed API
  • Show you the job ID and trace ID for monitoring
  • Poll for status updates and display progress
  • Report success or failure when complete

Example output:

Submitting replication job...
✓ Job submitted
Job ID: 550e8400-e29b-41d4-a716-446655440000
Trace ID: 660e8400-e29b-41d4-a716-446655440000

Polling for status...
Status: provisioning EC2 instance...
Status: running (1/2): myapp
Status: running (2/2): analytics

✓ Replication completed successfully

Local Execution

To run replication on your local machine instead of SerenAI's cloud infrastructure, use the --local flag:

# Runs on your local machine
./database-replicator init \
  --source "postgresql://user:pass@source-host:5432/db" \
  --target "postgresql://user:pass@target-host:5432/db" \
  --local

Local execution is required when:

  • Replicating to non-SerenDB targets (AWS RDS, Neon, Hetzner, self-hosted PostgreSQL)
  • Your databases are not accessible from the internet
  • You're testing or developing
  • You need full control over the execution environment

Advanced Configuration

Custom API endpoint (for testing or development)

# Override the default API endpoint if needed
export SEREN_REMOTE_API="https://your-custom-endpoint.example.com"
./database-replicator init \
  --source "..." \
  --target "..."

Job timeout (default: 8 hours)

# Set 12-hour timeout for very large databases
./database-replicator init \
  --source "..." \
  --target "..." \
  --job-timeout 43200

Remote Execution Troubleshooting

"Failed to submit job to remote service"

  • Check your internet connection
  • Verify you can reach SerenDB's API endpoint
  • Try with --local as a fallback

Job stuck in "provisioning" state

  • AWS may be experiencing capacity issues in the region
  • Wait a few minutes and check status again
  • Contact SerenAI support if it persists for > 10 minutes

Job failed with error

  • Check the error message in the status response
  • Verify your source and target database credentials
  • Ensure databases are accessible from the internet
  • Try running with --local to validate locally first

For more details on the AWS infrastructure and architecture, see the AWS Setup Guide.


Requirements

Source Database

  • PostgreSQL 12 or later (for PostgreSQL sources)
  • SQLite 3.x (for SQLite sources)
  • MongoDB 4.0+ (for MongoDB sources)
  • MySQL 5.7+ or MariaDB 10.2+ (for MySQL/MariaDB sources)
  • Appropriate privileges for source database type

Target Database

  • SerenDB: Agentic-data access database for AI Agent queries. Signup at console.serendb.com/signup
  • PostgreSQL 12 or later
  • Database owner or superuser privileges
  • Ability to create tables and schemas
  • Network connectivity to source database (for continuous replication)

Architecture

  • src/commands/ - CLI command implementations
  • src/postgres/ - PostgreSQL connection and utilities
  • src/migration/ - Schema introspection, dump/restore, checksums
  • src/replication/ - Logical replication management
  • src/sqlite/ - SQLite reader and JSONB conversion
  • src/mongodb/ - MongoDB reader and BSON to JSONB conversion
  • src/mysql/ - MySQL reader and JSONB conversion
  • tests/ - Integration tests

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Reporting Issues

Please report bugs and feature requests on the GitHub Issues page.

About SerenAI

SerenAI is building infrastructure for AI agent data access. Agents are hungry for data and they will pay to access the data in your database. We're creating the layer that powers secure, compliant enterprise data commerce and data delivery for AI agents. SerenAI includes agent identity verification, persistent memory via SerenDB, data access control, tiered data-access pricing, SOC2-ready compliance systems, as well as micropayments and settlement.

Our team brings decades of experience building enterprise databases and security systems. We believe AI agents need to pay to access your data.

Get in touch: hello@serendb.com | serendb.com

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

About

Universal database-to-PostgreSQL replication CLI. Supports PostgreSQL, SQLite, MongoDB, and MySQL.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages